Re: Parse Json in Spark

2016-05-09 Thread KhajaAsmath Mohammed
Thanks Ewan. I did the same way you explained. Thanks for your response once again. On Mon, May 9, 2016 at 4:21 PM, Ewan Leith wrote: > The simplest way is probably to use the sc.binaryFiles or > sc.wholeTextFiles API to create an RDD containing the JSON files (maybe

RE: Parse Json in Spark

2016-05-09 Thread Ewan Leith
The simplest way is probably to use the sc.binaryFiles or sc.wholeTextFiles API to create an RDD containing the JSON files (maybe need a sc.wholeTextFiles(…).map(x => x._2) to drop off the filename column) then do a sqlContext.read.json(rddName) That way, you don’t need to worry about

Re: Parse Json in Spark

2016-05-08 Thread Ashish Dubey
This limit is due to underlying inputFormat implementation. you can always write your own inputFormat and then use spark newAPIHadoopFile api to pass your inputFormat class path. You will have to place the jar file in /lib location on all the nodes.. Ashish On Sun, May 8, 2016 at 4:02 PM,

Re: Parse Json in Spark

2016-05-08 Thread Hyukjin Kwon
I remember this Jira, https://issues.apache.org/jira/browse/SPARK-7366. Parsing multiple lines are not supported in Json fsta source. Instead this can be done by sc.wholeTextFiles(). I found some examples here, http://searchdatascience.com/spark-adventures-1-processing-multi-line-json-files