PySpark Nested Json Parsing

2015-07-20 Thread Ajay
Hi, I am new to Apache Spark. I am trying to parse nested json using pyspark. Here is the code by which I am trying to parse Json. I am using Apache Spark 1.2.0 version of cloudera CDH 5.3.2. lines = sc.textFile(inputFile) import json def func(x): json_str = json.loads(x) if json_str['label']:

Re: PySpark Nested Json Parsing

2015-07-20 Thread Davies Liu
Could you try SQLContext.read.json()? On Mon, Jul 20, 2015 at 9:06 AM, Davies Liu dav...@databricks.com wrote: Before using the json file as text file, can you make sure that each json string can fit in one line? Because textFile() will split the file by '\n' On Mon, Jul 20, 2015 at 3:26 AM,

Re: PySpark Nested Json Parsing

2015-07-20 Thread Davies Liu
Before using the json file as text file, can you make sure that each json string can fit in one line? Because textFile() will split the file by '\n' On Mon, Jul 20, 2015 at 3:26 AM, Ajay ajay0...@gmail.com wrote: Hi, I am new to Apache Spark. I am trying to parse nested json using pyspark.

Re: PySpark Nested Json Parsing

2015-07-20 Thread Naveen Madhire
I had the similar issue with spark 1.3 After migrating to Spark 1.4 and using sqlcontext.read.json it worked well I think you can look at dataframe select and explode options to read the nested json elements, array etc. Thanks. On Mon, Jul 20, 2015 at 11:07 AM, Davies Liu dav...@databricks.com