Re: "Method json([class java.util.HashMap]) does not exist" when reading JSON on PySpark

2015-10-05 Thread Fernando Paladini
Thank you for the replies and sorry about the delay, my e-mail client send this conversation to Spam (??). I'll take a look in your tips and come back later to post my questions / progress. Again, thank you so much! 2015-09-30 18:37 GMT-03:00 Michael Armbrust : > I think

Re: "Method json([class java.util.HashMap]) does not exist" when reading JSON on PySpark

2015-10-05 Thread Fernando Paladini
Update: I've updated my code and now I have the following JSON: https://gist.github.com/paladini/27bb5636d91dec79bd56 In the same link you can check the output from "spark-submit myPythonScript.py", where I call "myDataframe.show()". The following is printed by Spark (among other useless debug

Re: "Method json([class java.util.HashMap]) does not exist" when reading JSON on PySpark

2015-10-05 Thread Michael Armbrust
Looks correct to me. Try for example: from pyspark.sql.functions import * df.withColumn("value", explode(df['values'])).show() On Mon, Oct 5, 2015 at 2:15 PM, Fernando Paladini wrote: > Update: > > I've updated my code and now I have the following JSON: >

Re: "Method json([class java.util.HashMap]) does not exist" when reading JSON on PySpark

2015-09-30 Thread Akhil Das
Each Json Doc should be in a single line i guess. http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets Note that the file that is offered as *a json file* is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. As a consequence, a

Re: "Method json([class java.util.HashMap]) does not exist" when reading JSON on PySpark

2015-09-30 Thread Michael Armbrust
I think the problem here is that you are passing in parsed JSON that stored as a dictionary (which is converted to a hashmap when going into the JVM). You should instead be passing in the path to the json file (formatted as Akhil suggests) so that Spark can do the parsing in parallel. The other

Fwd: "Method json([class java.util.HashMap]) does not exist" when reading JSON on PySpark

2015-09-28 Thread Fernando Paladini
Hello guys, I'm very new to Spark and I'm having some troubles when reading a JSON to dataframe on PySpark. I'm getting a JSON object from an API response and I would like to store it in Spark as a DataFrame (I've read that DataFrame is better than RDD, that's accurate?). For what I've read