Hi,
I am new to Apache Spark. I am trying to parse nested json using pyspark.
Here is the code by which I am trying to parse Json.
I am using Apache Spark 1.2.0 version of cloudera CDH 5.3.2.
lines = sc.textFile(inputFile)
import json
def func(x):
json_str = json.loads(x)
if json_str['label']:
Could you try SQLContext.read.json()?
On Mon, Jul 20, 2015 at 9:06 AM, Davies Liu dav...@databricks.com wrote:
Before using the json file as text file, can you make sure that each
json string can fit in one line? Because textFile() will split the
file by '\n'
On Mon, Jul 20, 2015 at 3:26 AM,
Before using the json file as text file, can you make sure that each
json string can fit in one line? Because textFile() will split the
file by '\n'
On Mon, Jul 20, 2015 at 3:26 AM, Ajay ajay0...@gmail.com wrote:
Hi,
I am new to Apache Spark. I am trying to parse nested json using pyspark.
I had the similar issue with spark 1.3
After migrating to Spark 1.4 and using sqlcontext.read.json it worked well
I think you can look at dataframe select and explode options to read the
nested json elements, array etc.
Thanks.
On Mon, Jul 20, 2015 at 11:07 AM, Davies Liu dav...@databricks.com