Re: Parse Json in Spark
Thanks Ewan. I did the same way you explained. Thanks for your response once again. On Mon, May 9, 2016 at 4:21 PM, Ewan Leith <ewan.le...@realitymine.com> wrote: > The simplest way is probably to use the sc.binaryFiles or > sc.wholeTextFiles API to create an RDD containing the JSON files (maybe > need a sc.wholeTextFiles(…).map(x => x._2) to drop off the filename column) > then do a sqlContext.read.json(rddName) > > > > That way, you don’t need to worry about combining lines. > > > > Ewan > > > > *From:* KhajaAsmath Mohammed [mailto:mdkhajaasm...@gmail.com] > *Sent:* 08 May 2016 23:20 > *To:* user @spark <user@spark.apache.org> > *Subject:* Parse Json in Spark > > > > Hi, > > > > I am working on parsing the json in spark but most of the information > available online states that I need to have entire JSON in single line. > > > > In my case, Json file is delivered in complex structure and not in a > single line. could anyone know how to process this in SPARK. > > > > I used Jackson jar to process json and was able to do it when it is > present in single line. Any ideas? > > > > Thanks, > > Asmath >
RE: Parse Json in Spark
The simplest way is probably to use the sc.binaryFiles or sc.wholeTextFiles API to create an RDD containing the JSON files (maybe need a sc.wholeTextFiles(…).map(x => x._2) to drop off the filename column) then do a sqlContext.read.json(rddName) That way, you don’t need to worry about combining lines. Ewan From: KhajaAsmath Mohammed [mailto:mdkhajaasm...@gmail.com] Sent: 08 May 2016 23:20 To: user @spark <user@spark.apache.org> Subject: Parse Json in Spark Hi, I am working on parsing the json in spark but most of the information available online states that I need to have entire JSON in single line. In my case, Json file is delivered in complex structure and not in a single line. could anyone know how to process this in SPARK. I used Jackson jar to process json and was able to do it when it is present in single line. Any ideas? Thanks, Asmath
Re: Parse Json in Spark
This limit is due to underlying inputFormat implementation. you can always write your own inputFormat and then use spark newAPIHadoopFile api to pass your inputFormat class path. You will have to place the jar file in /lib location on all the nodes.. Ashish On Sun, May 8, 2016 at 4:02 PM, Hyukjin Kwonwrote: > > I remember this Jira, https://issues.apache.org/jira/browse/SPARK-7366. > Parsing multiple lines are not supported in Json fsta source. > > Instead this can be done by sc.wholeTextFiles(). I found some examples > here, > http://searchdatascience.com/spark-adventures-1-processing-multi-line-json-files > > Although this reads a file as a whole record, this should work. > > Thanks! > On 9 May 2016 7:20 a.m., "KhajaAsmath Mohammed" > wrote: > >> Hi, >> >> I am working on parsing the json in spark but most of the information >> available online states that I need to have entire JSON in single line. >> >> In my case, Json file is delivered in complex structure and not in a >> single line. could anyone know how to process this in SPARK. >> >> I used Jackson jar to process json and was able to do it when it is >> present in single line. Any ideas? >> >> Thanks, >> Asmath >> >
Re: Parse Json in Spark
I remember this Jira, https://issues.apache.org/jira/browse/SPARK-7366. Parsing multiple lines are not supported in Json fsta source. Instead this can be done by sc.wholeTextFiles(). I found some examples here, http://searchdatascience.com/spark-adventures-1-processing-multi-line-json-files Although this reads a file as a whole record, this should work. Thanks! On 9 May 2016 7:20 a.m., "KhajaAsmath Mohammed"wrote: > Hi, > > I am working on parsing the json in spark but most of the information > available online states that I need to have entire JSON in single line. > > In my case, Json file is delivered in complex structure and not in a > single line. could anyone know how to process this in SPARK. > > I used Jackson jar to process json and was able to do it when it is > present in single line. Any ideas? > > Thanks, > Asmath >
Parse Json in Spark
Hi, I am working on parsing the json in spark but most of the information available online states that I need to have entire JSON in single line. In my case, Json file is delivered in complex structure and not in a single line. could anyone know how to process this in SPARK. I used Jackson jar to process json and was able to do it when it is present in single line. Any ideas? Thanks, Asmath
Fast way to parse JSON in Spark
Hi, I had a Java parser using GSON and packaged it as java lib (e.g. messageparserLib.jar). I use this lib in the Spark streaming and parse the coming json messages. This is very slow and lots of time lag in parsing/inserting messages to Cassandra. What is the fast way to parse JSON messages in Spark on-the-fly? My Json message is complex and I want to extract over 30 fields and wrap them in a case class, then store it in Cassandra with Structure format. Some candidate solutions are appearing to my mind: (1) Use Spark SQL to register a temp table and then select the fields what I want to wrap in the case class. (2) Use native standard lib of Scala, like "scala.util.parsing.json.JSON.parseFull" to browse, parse and extract the fields to map the case class. (3) Use third-party libraries, play-json, lift-json to browse, parse then extract the fields to map the case class. The json messages are coming from Kafka consumer. It's over 1,500 messages per second. So the message processing (parser and write to Cassandra) is also need to be completed at the same time (1,500/second). Thanks in advance. Jerry I appreciate it if you can give me any helps and advice. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Fast-way-to-parse-JSON-in-Spark-tp26306.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: What's the best practice to parse JSON using spark
I've been using spray-json<https://github.com/spray/spray-json> for general JSON ser/deser in scala (spark app), mostly for config files and data exchange. Haven't used it in conjunction with jobs that process large JSON data sources, so can't speak for those use cases. -adrian From: Petr Novak <oss.mli...@gmail.com> Sent: Monday, September 21, 2015 12:11 PM To: Cui Lin; user Subject: Re: What's the best practice to parse JSON using spark Surprisingly I had the same issue when including json4s dependency at the same version v3.2.10. I had to remove json4s deps from my code. I'm using Scala 2.11, there might be some issue with mixing 2.10/2.11 and it could be just my environment. I haven't investigated much as depending on Spark provided version is fine for us for now. Regards, Petr On Mon, Sep 21, 2015 at 11:06 AM, Petr Novak <oss.mli...@gmail.com<mailto:oss.mli...@gmail.com>> wrote: Internally Spark is using json4s and jackson parser v3.2.10, AFAIK. So if you are using Scala they should be available without adding dependencies. There is v3.2.11 already available but adding to my app was causing NoSuchMethod exception so I would have to shade it. I'm simply staying on v3.2.10 for now. Regards, Petr On Sat, Sep 19, 2015 at 2:45 AM, Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote: For #1, see this thread: http://search-hadoop.com/m/q3RTti0Thneenne2 For #2, also see: examples//src/main/python/hbase_inputformat.py examples//src/main/python/hbase_outputformat.py Cheers On Fri, Sep 18, 2015 at 5:12 PM, Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote: For #2, please see: examples/src/main/scala//org/apache/spark/examples/HBaseTest.scala examples/src/main/scala//org/apache/spark/examples/pythonconverters/HBaseConverters.scala In hbase, there is hbase-spark module which is being polished. Should be available in hbase 1.3.0 release. Cheers On Fri, Sep 18, 2015 at 5:09 PM, Cui Lin <icecreamlc...@gmail.com<mailto:icecreamlc...@gmail.com>> wrote: Hello,All, Parsing JSON's nested structure is easy if using Java or Python API. Where I can find the similar way to parse JSON file using spark? Another question is by using SparkSQL, how can i easily save the results into NOSQL DB? any examples? Thanks a lot! -- Best regards! Lin,Cui
Re: What's the best practice to parse JSON using spark
Internally Spark is using json4s and jackson parser v3.2.10, AFAIK. So if you are using Scala they should be available without adding dependencies. There is v3.2.11 already available but adding to my app was causing NoSuchMethod exception so I would have to shade it. I'm simply staying on v3.2.10 for now. Regards, Petr On Sat, Sep 19, 2015 at 2:45 AM, Ted Yu <yuzhih...@gmail.com> wrote: > For #1, see this thread: http://search-hadoop.com/m/q3RTti0Thneenne2 > > For #2, also see: > examples//src/main/python/hbase_inputformat.py > examples//src/main/python/hbase_outputformat.py > > Cheers > > On Fri, Sep 18, 2015 at 5:12 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> For #2, please see: >> >> examples/src/main/scala//org/apache/spark/examples/HBaseTest.scala >> >> examples/src/main/scala//org/apache/spark/examples/pythonconverters/HBaseConverters.scala >> >> In hbase, there is hbase-spark module which is being polished. Should be >> available in hbase 1.3.0 release. >> >> Cheers >> >> On Fri, Sep 18, 2015 at 5:09 PM, Cui Lin <icecreamlc...@gmail.com> wrote: >> >>> Hello,All, >>> >>> Parsing JSON's nested structure is easy if using Java or Python API. >>> Where I can find the similar way to parse JSON file using spark? >>> >>> Another question is by using SparkSQL, how can i easily save the results >>> into NOSQL DB? any examples? Thanks a lot! >>> >>> >>> >>> -- >>> Best regards! >>> >>> Lin,Cui >>> >> >> >
Re: What's the best practice to parse JSON using spark
Surprisingly I had the same issue when including json4s dependency at the same version v3.2.10. I had to remove json4s deps from my code. I'm using Scala 2.11, there might be some issue with mixing 2.10/2.11 and it could be just my environment. I haven't investigated much as depending on Spark provided version is fine for us for now. Regards, Petr On Mon, Sep 21, 2015 at 11:06 AM, Petr Novak <oss.mli...@gmail.com> wrote: > Internally Spark is using json4s and jackson parser v3.2.10, AFAIK. So if > you are using Scala they should be available without adding dependencies. > There is v3.2.11 already available but adding to my app was causing > NoSuchMethod exception so I would have to shade it. I'm simply staying on > v3.2.10 for now. > > Regards, > Petr > > On Sat, Sep 19, 2015 at 2:45 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> For #1, see this thread: http://search-hadoop.com/m/q3RTti0Thneenne2 >> >> For #2, also see: >> examples//src/main/python/hbase_inputformat.py >> examples//src/main/python/hbase_outputformat.py >> >> Cheers >> >> On Fri, Sep 18, 2015 at 5:12 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> For #2, please see: >>> >>> examples/src/main/scala//org/apache/spark/examples/HBaseTest.scala >>> >>> examples/src/main/scala//org/apache/spark/examples/pythonconverters/HBaseConverters.scala >>> >>> In hbase, there is hbase-spark module which is being polished. Should be >>> available in hbase 1.3.0 release. >>> >>> Cheers >>> >>> On Fri, Sep 18, 2015 at 5:09 PM, Cui Lin <icecreamlc...@gmail.com> >>> wrote: >>> >>>> Hello,All, >>>> >>>> Parsing JSON's nested structure is easy if using Java or Python API. >>>> Where I can find the similar way to parse JSON file using spark? >>>> >>>> Another question is by using SparkSQL, how can i easily save the >>>> results into NOSQL DB? any examples? Thanks a lot! >>>> >>>> >>>> >>>> -- >>>> Best regards! >>>> >>>> Lin,Cui >>>> >>> >>> >> >
What's the best practice to parse JSON using spark
Hello,All, Parsing JSON's nested structure is easy if using Java or Python API. Where I can find the similar way to parse JSON file using spark? Another question is by using SparkSQL, how can i easily save the results into NOSQL DB? any examples? Thanks a lot! -- Best regards! Lin,Cui
Re: What's the best practice to parse JSON using spark
For #2, please see: examples/src/main/scala//org/apache/spark/examples/HBaseTest.scala examples/src/main/scala//org/apache/spark/examples/pythonconverters/HBaseConverters.scala In hbase, there is hbase-spark module which is being polished. Should be available in hbase 1.3.0 release. Cheers On Fri, Sep 18, 2015 at 5:09 PM, Cui Lin <icecreamlc...@gmail.com> wrote: > Hello,All, > > Parsing JSON's nested structure is easy if using Java or Python API. Where > I can find the similar way to parse JSON file using spark? > > Another question is by using SparkSQL, how can i easily save the results > into NOSQL DB? any examples? Thanks a lot! > > > > -- > Best regards! > > Lin,Cui >
Re: What's the best practice to parse JSON using spark
For #1, see this thread: http://search-hadoop.com/m/q3RTti0Thneenne2 For #2, also see: examples//src/main/python/hbase_inputformat.py examples//src/main/python/hbase_outputformat.py Cheers On Fri, Sep 18, 2015 at 5:12 PM, Ted Yu <yuzhih...@gmail.com> wrote: > For #2, please see: > > examples/src/main/scala//org/apache/spark/examples/HBaseTest.scala > > examples/src/main/scala//org/apache/spark/examples/pythonconverters/HBaseConverters.scala > > In hbase, there is hbase-spark module which is being polished. Should be > available in hbase 1.3.0 release. > > Cheers > > On Fri, Sep 18, 2015 at 5:09 PM, Cui Lin <icecreamlc...@gmail.com> wrote: > >> Hello,All, >> >> Parsing JSON's nested structure is easy if using Java or Python API. >> Where I can find the similar way to parse JSON file using spark? >> >> Another question is by using SparkSQL, how can i easily save the results >> into NOSQL DB? any examples? Thanks a lot! >> >> >> >> -- >> Best regards! >> >> Lin,Cui >> > >