Re: Parse Json in Spark

2016-05-09 Thread KhajaAsmath Mohammed
Thanks Ewan. I did the same way you explained. Thanks for your response
once again.

On Mon, May 9, 2016 at 4:21 PM, Ewan Leith <ewan.le...@realitymine.com>
wrote:

> The simplest way is probably to use the sc.binaryFiles or
> sc.wholeTextFiles API to create an RDD containing the JSON files (maybe
> need a sc.wholeTextFiles(…).map(x => x._2) to drop off the filename column)
> then do a sqlContext.read.json(rddName)
>
>
>
> That way, you don’t need to worry about combining lines.
>
>
>
> Ewan
>
>
>
> *From:* KhajaAsmath Mohammed [mailto:mdkhajaasm...@gmail.com]
> *Sent:* 08 May 2016 23:20
> *To:* user @spark <user@spark.apache.org>
> *Subject:* Parse Json in Spark
>
>
>
> Hi,
>
>
>
> I am working on parsing the json in spark but most of the information
> available online states that  I need to have entire JSON in single line.
>
>
>
> In my case, Json file is delivered in complex structure and not in a
> single line. could anyone know how to process this in SPARK.
>
>
>
> I used Jackson jar to process json and was able to do it when it is
> present in single line. Any ideas?
>
>
>
> Thanks,
>
> Asmath
>


RE: Parse Json in Spark

2016-05-09 Thread Ewan Leith
The simplest way is probably to use the sc.binaryFiles or sc.wholeTextFiles API 
to create an RDD containing the JSON files (maybe need a 
sc.wholeTextFiles(…).map(x => x._2) to drop off the filename column) then do a 
sqlContext.read.json(rddName)

That way, you don’t need to worry about combining lines.

Ewan

From: KhajaAsmath Mohammed [mailto:mdkhajaasm...@gmail.com]
Sent: 08 May 2016 23:20
To: user @spark <user@spark.apache.org>
Subject: Parse Json in Spark

Hi,

I am working on parsing the json in spark but most of the information available 
online states that  I need to have entire JSON in single line.

In my case, Json file is delivered in complex structure and not in a single 
line. could anyone know how to process this in SPARK.

I used Jackson jar to process json and was able to do it when it is present in 
single line. Any ideas?

Thanks,
Asmath


Re: Parse Json in Spark

2016-05-08 Thread Ashish Dubey
This limit is due to underlying inputFormat implementation.  you can always
write your own inputFormat and then use spark newAPIHadoopFile api to pass
your inputFormat class path. You will have to place the jar file in /lib
location on all the nodes..

Ashish

On Sun, May 8, 2016 at 4:02 PM, Hyukjin Kwon  wrote:

>
> I remember this Jira, https://issues.apache.org/jira/browse/SPARK-7366.
> Parsing multiple lines are not supported in Json fsta source.
>
> Instead this can be done by sc.wholeTextFiles(). I found some examples
> here,
> http://searchdatascience.com/spark-adventures-1-processing-multi-line-json-files
>
> Although this reads a file as a whole record, this should work.
>
> Thanks!
> On 9 May 2016 7:20 a.m., "KhajaAsmath Mohammed" 
> wrote:
>
>> Hi,
>>
>> I am working on parsing the json in spark but most of the information
>> available online states that  I need to have entire JSON in single line.
>>
>> In my case, Json file is delivered in complex structure and not in a
>> single line. could anyone know how to process this in SPARK.
>>
>> I used Jackson jar to process json and was able to do it when it is
>> present in single line. Any ideas?
>>
>> Thanks,
>> Asmath
>>
>


Re: Parse Json in Spark

2016-05-08 Thread Hyukjin Kwon
I remember this Jira, https://issues.apache.org/jira/browse/SPARK-7366.
Parsing multiple lines are not supported in Json fsta source.

Instead this can be done by sc.wholeTextFiles(). I found some examples
here,
http://searchdatascience.com/spark-adventures-1-processing-multi-line-json-files

Although this reads a file as a whole record, this should work.

Thanks!
On 9 May 2016 7:20 a.m., "KhajaAsmath Mohammed" 
wrote:

> Hi,
>
> I am working on parsing the json in spark but most of the information
> available online states that  I need to have entire JSON in single line.
>
> In my case, Json file is delivered in complex structure and not in a
> single line. could anyone know how to process this in SPARK.
>
> I used Jackson jar to process json and was able to do it when it is
> present in single line. Any ideas?
>
> Thanks,
> Asmath
>


Parse Json in Spark

2016-05-08 Thread KhajaAsmath Mohammed
Hi,

I am working on parsing the json in spark but most of the information
available online states that  I need to have entire JSON in single line.

In my case, Json file is delivered in complex structure and not in a single
line. could anyone know how to process this in SPARK.

I used Jackson jar to process json and was able to do it when it is present
in single line. Any ideas?

Thanks,
Asmath


Fast way to parse JSON in Spark

2016-02-23 Thread Jerry
Hi, 
I had a Java parser using GSON and packaged it as java lib (e.g.
messageparserLib.jar). I use this lib in the Spark streaming and parse the
coming json messages. This is very slow and lots of time lag in
parsing/inserting messages to Cassandra. 
What is the fast way to parse JSON messages in Spark on-the-fly? My Json
message is complex and I want to extract over 30 fields and wrap them in a
case class, then store it in Cassandra with Structure format.
Some candidate solutions are appearing to my mind:
(1) Use Spark SQL to register a temp table and then select the fields what I
want to wrap in the case class.
(2) Use native standard lib of Scala, like
"scala.util.parsing.json.JSON.parseFull" to browse, parse and extract the
fields to map the case class.
(3) Use third-party libraries, play-json, lift-json to browse, parse then
extract the fields to map the case class.
The json messages are coming from Kafka consumer. It's over 1,500 messages
per second. So the message processing (parser and write to Cassandra) is
also need to be completed at the same time (1,500/second).

Thanks in advance.
Jerry

I appreciate it if you can give me any helps and advice. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Fast-way-to-parse-JSON-in-Spark-tp26306.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: What's the best practice to parse JSON using spark

2015-09-21 Thread Adrian Tanase
I've been using spray-json<https://github.com/spray/spray-json> for general 
JSON ser/deser in scala (spark app), mostly for config files and data exchange. 
Haven't used it in conjunction with jobs that process large JSON data sources, 
so can't speak for those use cases.


-adrian



From: Petr Novak <oss.mli...@gmail.com>
Sent: Monday, September 21, 2015 12:11 PM
To: Cui Lin; user
Subject: Re: What's the best practice to parse JSON using spark

Surprisingly I had the same issue when including json4s dependency at the same 
version v3.2.10. I had to remove json4s deps from my code. I'm using Scala 
2.11, there might be some issue with mixing 2.10/2.11 and it could be just my 
environment. I haven't investigated much as depending on Spark provided version 
is fine for us for now.

Regards,
Petr

On Mon, Sep 21, 2015 at 11:06 AM, Petr Novak 
<oss.mli...@gmail.com<mailto:oss.mli...@gmail.com>> wrote:
Internally Spark is using json4s and jackson parser v3.2.10, AFAIK. So if you 
are using Scala they should be available without adding dependencies. There is 
v3.2.11 already available but adding to my app was causing NoSuchMethod 
exception so I would have to shade it. I'm simply staying on v3.2.10 for now.

Regards,
Petr

On Sat, Sep 19, 2015 at 2:45 AM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:
For #1, see this thread: http://search-hadoop.com/m/q3RTti0Thneenne2

For #2, also see:
examples//src/main/python/hbase_inputformat.py
examples//src/main/python/hbase_outputformat.py

Cheers

On Fri, Sep 18, 2015 at 5:12 PM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:
For #2, please see:

examples/src/main/scala//org/apache/spark/examples/HBaseTest.scala
examples/src/main/scala//org/apache/spark/examples/pythonconverters/HBaseConverters.scala

In hbase, there is hbase-spark module which is being polished. Should be 
available in hbase 1.3.0 release.

Cheers

On Fri, Sep 18, 2015 at 5:09 PM, Cui Lin 
<icecreamlc...@gmail.com<mailto:icecreamlc...@gmail.com>> wrote:
Hello,All,

Parsing JSON's nested structure is easy if using Java or Python API. Where I 
can find the similar way to parse JSON file using spark?

Another question is by using SparkSQL, how can i easily save the results into 
NOSQL DB? any examples? Thanks a lot!



--
Best regards!

Lin,Cui






Re: What's the best practice to parse JSON using spark

2015-09-21 Thread Petr Novak
Internally Spark is using json4s and jackson parser v3.2.10, AFAIK. So if
you are using Scala they should be available without adding dependencies.
There is v3.2.11 already available but adding to my app was causing
NoSuchMethod exception so I would have to shade it. I'm simply staying on
v3.2.10 for now.

Regards,
Petr

On Sat, Sep 19, 2015 at 2:45 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> For #1, see this thread: http://search-hadoop.com/m/q3RTti0Thneenne2
>
> For #2, also see:
> examples//src/main/python/hbase_inputformat.py
> examples//src/main/python/hbase_outputformat.py
>
> Cheers
>
> On Fri, Sep 18, 2015 at 5:12 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> For #2, please see:
>>
>> examples/src/main/scala//org/apache/spark/examples/HBaseTest.scala
>>
>> examples/src/main/scala//org/apache/spark/examples/pythonconverters/HBaseConverters.scala
>>
>> In hbase, there is hbase-spark module which is being polished. Should be
>> available in hbase 1.3.0 release.
>>
>> Cheers
>>
>> On Fri, Sep 18, 2015 at 5:09 PM, Cui Lin <icecreamlc...@gmail.com> wrote:
>>
>>> Hello,All,
>>>
>>> Parsing JSON's nested structure is easy if using Java or Python API.
>>> Where I can find the similar way to parse JSON file using spark?
>>>
>>> Another question is by using SparkSQL, how can i easily save the results
>>> into NOSQL DB? any examples? Thanks a lot!
>>>
>>>
>>>
>>> --
>>> Best regards!
>>>
>>> Lin,Cui
>>>
>>
>>
>


Re: What's the best practice to parse JSON using spark

2015-09-21 Thread Petr Novak
Surprisingly I had the same issue when including json4s dependency at the
same version v3.2.10. I had to remove json4s deps from my code. I'm using
Scala 2.11, there might be some issue with mixing 2.10/2.11 and it could be
just my environment. I haven't investigated much as depending on Spark
provided version is fine for us for now.

Regards,
Petr

On Mon, Sep 21, 2015 at 11:06 AM, Petr Novak <oss.mli...@gmail.com> wrote:

> Internally Spark is using json4s and jackson parser v3.2.10, AFAIK. So if
> you are using Scala they should be available without adding dependencies.
> There is v3.2.11 already available but adding to my app was causing
> NoSuchMethod exception so I would have to shade it. I'm simply staying on
> v3.2.10 for now.
>
> Regards,
> Petr
>
> On Sat, Sep 19, 2015 at 2:45 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> For #1, see this thread: http://search-hadoop.com/m/q3RTti0Thneenne2
>>
>> For #2, also see:
>> examples//src/main/python/hbase_inputformat.py
>> examples//src/main/python/hbase_outputformat.py
>>
>> Cheers
>>
>> On Fri, Sep 18, 2015 at 5:12 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> For #2, please see:
>>>
>>> examples/src/main/scala//org/apache/spark/examples/HBaseTest.scala
>>>
>>> examples/src/main/scala//org/apache/spark/examples/pythonconverters/HBaseConverters.scala
>>>
>>> In hbase, there is hbase-spark module which is being polished. Should be
>>> available in hbase 1.3.0 release.
>>>
>>> Cheers
>>>
>>> On Fri, Sep 18, 2015 at 5:09 PM, Cui Lin <icecreamlc...@gmail.com>
>>> wrote:
>>>
>>>> Hello,All,
>>>>
>>>> Parsing JSON's nested structure is easy if using Java or Python API.
>>>> Where I can find the similar way to parse JSON file using spark?
>>>>
>>>> Another question is by using SparkSQL, how can i easily save the
>>>> results into NOSQL DB? any examples? Thanks a lot!
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards!
>>>>
>>>> Lin,Cui
>>>>
>>>
>>>
>>
>


What's the best practice to parse JSON using spark

2015-09-18 Thread Cui Lin
Hello,All,

Parsing JSON's nested structure is easy if using Java or Python API. Where
I can find the similar way to parse JSON file using spark?

Another question is by using SparkSQL, how can i easily save the results
into NOSQL DB? any examples? Thanks a lot!



-- 
Best regards!

Lin,Cui


Re: What's the best practice to parse JSON using spark

2015-09-18 Thread Ted Yu
For #2, please see:

examples/src/main/scala//org/apache/spark/examples/HBaseTest.scala
examples/src/main/scala//org/apache/spark/examples/pythonconverters/HBaseConverters.scala

In hbase, there is hbase-spark module which is being polished. Should be
available in hbase 1.3.0 release.

Cheers

On Fri, Sep 18, 2015 at 5:09 PM, Cui Lin <icecreamlc...@gmail.com> wrote:

> Hello,All,
>
> Parsing JSON's nested structure is easy if using Java or Python API. Where
> I can find the similar way to parse JSON file using spark?
>
> Another question is by using SparkSQL, how can i easily save the results
> into NOSQL DB? any examples? Thanks a lot!
>
>
>
> --
> Best regards!
>
> Lin,Cui
>


Re: What's the best practice to parse JSON using spark

2015-09-18 Thread Ted Yu
For #1, see this thread: http://search-hadoop.com/m/q3RTti0Thneenne2

For #2, also see:
examples//src/main/python/hbase_inputformat.py
examples//src/main/python/hbase_outputformat.py

Cheers

On Fri, Sep 18, 2015 at 5:12 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> For #2, please see:
>
> examples/src/main/scala//org/apache/spark/examples/HBaseTest.scala
>
> examples/src/main/scala//org/apache/spark/examples/pythonconverters/HBaseConverters.scala
>
> In hbase, there is hbase-spark module which is being polished. Should be
> available in hbase 1.3.0 release.
>
> Cheers
>
> On Fri, Sep 18, 2015 at 5:09 PM, Cui Lin <icecreamlc...@gmail.com> wrote:
>
>> Hello,All,
>>
>> Parsing JSON's nested structure is easy if using Java or Python API.
>> Where I can find the similar way to parse JSON file using spark?
>>
>> Another question is by using SparkSQL, how can i easily save the results
>> into NOSQL DB? any examples? Thanks a lot!
>>
>>
>>
>> --
>> Best regards!
>>
>> Lin,Cui
>>
>
>