Thank you for the replies and sorry about the delay, my e-mail client send
this conversation to Spam (??).
I'll take a look in your tips and come back later to post my questions /
progress. Again, thank you so much!
2015-09-30 18:37 GMT-03:00 Michael Armbrust :
> I think
Update:
I've updated my code and now I have the following JSON:
https://gist.github.com/paladini/27bb5636d91dec79bd56
In the same link you can check the output from "spark-submit
myPythonScript.py", where I call "myDataframe.show()". The following is
printed by Spark (among other useless debug
Looks correct to me. Try for example:
from pyspark.sql.functions import *
df.withColumn("value", explode(df['values'])).show()
On Mon, Oct 5, 2015 at 2:15 PM, Fernando Paladini
wrote:
> Update:
>
> I've updated my code and now I have the following JSON:
>
Each Json Doc should be in a single line i guess.
http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets
Note that the file that is offered as *a json file* is not a typical JSON
file. Each line must contain a separate, self-contained valid JSON object.
As a consequence, a
I think the problem here is that you are passing in parsed JSON that stored
as a dictionary (which is converted to a hashmap when going into the JVM).
You should instead be passing in the path to the json file (formatted as
Akhil suggests) so that Spark can do the parsing in parallel. The other
Hello guys,
I'm very new to Spark and I'm having some troubles when reading a JSON to
dataframe on PySpark.
I'm getting a JSON object from an API response and I would like to store it
in Spark as a DataFrame (I've read that DataFrame is better than RDD,
that's accurate?). For what I've read