Re: JSON Arrays and Spark

2016-10-12 Thread sujeet jog
I generally use Play Framework Api's for comple json structures.

https://www.playframework.com/documentation/2.5.x/ScalaJson#Json

On Wed, Oct 12, 2016 at 11:34 AM, Kappaganthu, Sivaram (ES) <
sivaram.kappagan...@adp.com> wrote:

> Hi,
>
>
>
> Does this mean that handling any Json with kind of below schema  with
> spark is not a good fit?? I have requirement to parse the below Json that
> spans across multiple lines. Whats the best way to parse the jsns of this
> kind?? Please suggest.
>
>
>
> root
>
> |-- maindate: struct (nullable = true)
>
> ||-- mainidnId: string (nullable = true)
>
> |-- Entity: array (nullable = true)
>
> ||-- element: struct (containsNull = true)
>
> |||-- Profile: struct (nullable = true)
>
> ||||-- Kind: string (nullable = true)
>
> |||-- Identifier: string (nullable = true)
>
> |||-- Group: array (nullable = true)
>
> ||||-- element: struct (containsNull = true)
>
> |||||-- Period: struct (nullable = true)
>
> ||||||-- pid: string (nullable = true)
>
> ||||||-- pDate: string (nullable = true)
>
> ||||||-- quarter: long (nullable = true)
>
> ||||||-- labour: array (nullable = true)
>
> |||||||-- element: struct (containsNull = true)
>
> ||||||||-- category: string (nullable = true)
>
> ||||||||-- id: string (nullable = true)
>
> ||||||||-- person: struct (nullable = true)
>
> |||||||||-- address: array (nullable =
> true)
>
> ||||||||||-- element: struct
> (containsNull = true)
>
> |||||||||||-- city: string
> (nullable = true)
>
> |||||||||||-- line1: string
> (nullable = true)
>
> |||||||||||-- line2: string
> (nullable = true)
>
> |||||||||||-- postalCode: string
> (nullable = true)
>
> |||||||||||-- state: string
> (nullable = true)
>
> |||||||||||-- type: string
> (nullable = true)
>
> |||||||||-- familyName: string (nullable =
> true)
>
> ||||||||-- tax: array (nullable = true)
>
> |||||||||-- element: struct (containsNull
> = true)
>
> ||||||||||-- code: string (nullable =
> true)
>
> ||||||||||-- qwage: double (nullable =
> true)
>
> ||||||||||-- qvalue: double (nullable
> = true)
>
> ||||||||||-- qSubjectvalue: double
> (nullable = true)
>
> ||||||||||-- qfinalvalue: double
> (nullable = true)
>
> ||||||||||-- ywage: double (nullable =
> true)
>
> ||||||||||-- yalue: double (nullable =
> true)
>
> ||||||||||-- ySubjectvalue: double
> (nullable = true)
>
> ||||||||||-- yfinalvalue: double
> (nullable = true)
>
> ||||||||-- tProfile: array (nullable = true)
>
> |||||||||-- element: struct (containsNull
> = true)
>
> ||||||||||-- isExempt: boolean
> (nullable = true)
>
> ||||||||||-- jurisdiction: struct
> (nullable = true)
>
> |||||||||||-- code: string
> (nullable = true)
>
> ||||||||||-- maritalStatus: string
> (nullable = true)
>
> ||||||||||-- numberOfDeductions: long
> (nullable = true)
>
> |    |    |    |    |    |    ||-- wDate: struct (nullable = true)
>
> |||||||||-- originalHireDate: string
> (nullable = true)
>
> ||||||-- year: long (nullable = true)
>
>
>
>
>
> *From:* Luciano Resende [mailto:luckbr1...@gmail.com]
> *Sent:* Monday, October 10, 2016 11:39 PM
> *To:* Jean Georges Perrin
> *Cc:* user @spark
> *Subject:* Re: JSON Arrays and Spark
>
>
>
> Please take a look at
> http://spark.apache.org/docs/latest/sql-programming-guide.
> html#json-datasets

Re: JSON Arrays and Spark

2016-10-12 Thread Hyukjin Kwon
No, I meant it should be in a single line but it supports array type too as
a root wrapper of JSON objects.

If you need to parse multiple lines, I have a reference here.

http://searchdatascience.com/spark-adventures-1-processing-multi-line-json-files/

2016-10-12 15:04 GMT+09:00 Kappaganthu, Sivaram (ES) <
sivaram.kappagan...@adp.com>:

> Hi,
>
>
>
> Does this mean that handling any Json with kind of below schema  with
> spark is not a good fit?? I have requirement to parse the below Json that
> spans across multiple lines. Whats the best way to parse the jsns of this
> kind?? Please suggest.
>
>
>
> root
>
> |-- maindate: struct (nullable = true)
>
> ||-- mainidnId: string (nullable = true)
>
> |-- Entity: array (nullable = true)
>
> ||-- element: struct (containsNull = true)
>
> |||-- Profile: struct (nullable = true)
>
> ||||-- Kind: string (nullable = true)
>
> |||-- Identifier: string (nullable = true)
>
> |||-- Group: array (nullable = true)
>
> ||||-- element: struct (containsNull = true)
>
> |||||-- Period: struct (nullable = true)
>
> ||||||-- pid: string (nullable = true)
>
> ||||||-- pDate: string (nullable = true)
>
> ||||||-- quarter: long (nullable = true)
>
> ||||||-- labour: array (nullable = true)
>
> |||||||-- element: struct (containsNull = true)
>
> ||||||||-- category: string (nullable = true)
>
> ||||||||-- id: string (nullable = true)
>
> ||||||||-- person: struct (nullable = true)
>
> |||||||||-- address: array (nullable =
> true)
>
> ||||||||||-- element: struct
> (containsNull = true)
>
> |||||||||||-- city: string
> (nullable = true)
>
> |||||||||||-- line1: string
> (nullable = true)
>
> |||||||||||-- line2: string
> (nullable = true)
>
> |||||||||||-- postalCode: string
> (nullable = true)
>
> |||||||||||-- state: string
> (nullable = true)
>
> |||||||||||-- type: string
> (nullable = true)
>
> |||||||||-- familyName: string (nullable =
> true)
>
> ||||||||-- tax: array (nullable = true)
>
> |||||||||-- element: struct (containsNull
> = true)
>
> ||||||||||-- code: string (nullable =
> true)
>
> ||||||||||-- qwage: double (nullable =
> true)
>
> ||||||||||-- qvalue: double (nullable
> = true)
>
> ||||||||||-- qSubjectvalue: double
> (nullable = true)
>
> ||||||||||-- qfinalvalue: double
> (nullable = true)
>
> ||||||||||-- ywage: double (nullable =
> true)
>
> ||||||||||-- yalue: double (nullable =
> true)
>
> ||||||||||-- ySubjectvalue: double
> (nullable = true)
>
> ||||||||||-- yfinalvalue: double
> (nullable = true)
>
> ||||||||-- tProfile: array (nullable = true)
>
> |||||||||-- element: struct (containsNull
> = true)
>
> ||||||||||-- isExempt: boolean
> (nullable = true)
>
> ||||||||||-- jurisdiction: struct
> (nullable = true)
>
> |||||||||||-- code: string
> (nullable = true)
>
> ||||||||||-- maritalStatus: string
> (nullable = true)
>
> ||||||||||-- numberOfDeductions: long
> (nullable = true)
>
> |    |    |    |    |    |    ||-- wDate: struct (nullable = true)
>
> |||||||||-- originalHireDate: string
> (nullable = true)
>
> ||||||-- year: long (nullable = true)
>
>
>
>
>
> *From:* Luciano Resende [mailto:luckbr1...@gmail.com]
> *Sent:* Monday, October 10, 2016 11:39 PM
> *To:* Jean Georges Perrin
> *Cc:* user @spark
> *Subject:* Re: JSON Arrays and Spark
>
>
>
&

RE: JSON Arrays and Spark

2016-10-12 Thread Kappaganthu, Sivaram (ES)
Hi,

Does this mean that handling any Json with kind of below schema  with spark is 
not a good fit?? I have requirement to parse the below Json that spans across 
multiple lines. Whats the best way to parse the jsns of this kind?? Please 
suggest.

root
|-- maindate: struct (nullable = true)
||-- mainidnId: string (nullable = true)
|-- Entity: array (nullable = true)
||-- element: struct (containsNull = true)
|||-- Profile: struct (nullable = true)
||||-- Kind: string (nullable = true)
|||-- Identifier: string (nullable = true)
|||-- Group: array (nullable = true)
||||-- element: struct (containsNull = true)
|||||-- Period: struct (nullable = true)
||||||-- pid: string (nullable = true)
||||||-- pDate: string (nullable = true)
||||||-- quarter: long (nullable = true)
||||||-- labour: array (nullable = true)
|||||||-- element: struct (containsNull = true)
||||||||-- category: string (nullable = true)
||||||||-- id: string (nullable = true)
||||||||-- person: struct (nullable = true)
|||||||||-- address: array (nullable = true)
||||||||||-- element: struct (containsNull 
= true)
|||||||||||-- city: string (nullable = 
true)
|||||||||||-- line1: string (nullable = 
true)
|||||||||||-- line2: string (nullable = 
true)
|||||||||||-- postalCode: string 
(nullable = true)
|||||||||||-- state: string (nullable = 
true)
|||||||||||-- type: string (nullable = 
true)
|||||||||-- familyName: string (nullable = true)
||||||||-- tax: array (nullable = true)
|||||||||-- element: struct (containsNull = 
true)
||||||||||-- code: string (nullable = true)
||||||||||-- qwage: double (nullable = true)
||||||||||-- qvalue: double (nullable = 
true)
||||||||||-- qSubjectvalue: double 
(nullable = true)
||||||||||-- qfinalvalue: double (nullable 
= true)
||||||||||-- ywage: double (nullable = true)
||||||||||-- yalue: double (nullable = true)
||||||||||-- ySubjectvalue: double 
(nullable = true)
||||||||||-- yfinalvalue: double (nullable 
= true)
||||||||-- tProfile: array (nullable = true)
|||||||||-- element: struct (containsNull = 
true)
||||||||||-- isExempt: boolean (nullable = 
true)
||||||||||-- jurisdiction: struct (nullable 
= true)
|||||||||||-- code: string (nullable = 
true)
||||||||||-- maritalStatus: string 
(nullable = true)
||||||||||-- numberOfDeductions: long 
(nullable = true)
||||||||-- wDate: struct (nullable = true)
|||||||||-- originalHireDate: string (nullable 
= true)
||||||-- year: long (nullable = true)


From: Luciano Resende [mailto:luckbr1...@gmail.com]
Sent: Monday, October 10, 2016 11:39 PM
To: Jean Georges Perrin
Cc: user @spark
Subject: Re: JSON Arrays and Spark

Please take a look at
http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets
Particularly the note at the required format :

Note that the file that is offered as a json file is not a typical JSON file. 
Each line must contain a separate, self-contained valid JSON object. As a 
consequence, a regular multi-line JSON file will most often fail.


On Mon, Oct 10, 2016 at 9:57 AM, Jean Georges Perrin 
<j...@jgp.net<mailto:j...@jgp.net>> wrote:
Hi folks,

I am trying to parse JSON arrays and it’s getting a little crazy (for me at 
least)…

1)
If my JSON is:
{"vals":[100,500,600,700,800,200,900,300]}

I get:
++
|vals|
++
|[100, 500, 600, 7...|
++

root
 |-- vals: array (nullable = true)
 ||-- element: long (containsNull = true)

and I am :)

2)
If my JSON is:
[100,500,600,700,800,200,900,300]

I get:
++
| _corrupt_record|
++
|[100,500,600,700,...|
++

root
 |-- _corrupt_re

Re: JSON Arrays and Spark

2016-10-10 Thread Hyukjin Kwon
FYI, it supports

[{...}, {...} ...]

Or

{...}

format as input.

On 11 Oct 2016 3:19 a.m., "Jean Georges Perrin"  wrote:

> Thanks Luciano - I think this is my issue :(
>
> On Oct 10, 2016, at 2:08 PM, Luciano Resende  wrote:
>
> Please take a look at
> http://spark.apache.org/docs/latest/sql-programming-guide.
> html#json-datasets
>
> Particularly the note at the required format :
>
> Note that the file that is offered as *a json file* is not a typical JSON
> file. Each line must contain a separate, self-contained valid JSON object.
> As a consequence, a regular multi-line JSON file will most often fail.
>
>
>
> On Mon, Oct 10, 2016 at 9:57 AM, Jean Georges Perrin  wrote:
>
>> Hi folks,
>>
>> I am trying to parse JSON arrays and it’s getting a little crazy (for me
>> at least)…
>>
>> 1)
>> If my JSON is:
>> {"vals":[100,500,600,700,800,200,900,300]}
>>
>> I get:
>> ++
>> |vals|
>> ++
>> |[100, 500, 600, 7...|
>> ++
>>
>> root
>>  |-- vals: array (nullable = true)
>>  ||-- element: long (containsNull = true)
>>
>> and I am :)
>>
>> 2)
>> If my JSON is:
>> [100,500,600,700,800,200,900,300]
>>
>> I get:
>> ++
>> | _corrupt_record|
>> ++
>> |[100,500,600,700,...|
>> ++
>>
>> root
>>  |-- _corrupt_record: string (nullable = true)
>>
>> Both are legit JSON structures… Do you think that #2 is a bug?
>>
>> jg
>>
>>
>>
>>
>>
>>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>
>
>


Re: JSON Arrays and Spark

2016-10-10 Thread Jean Georges Perrin
Thanks Luciano - I think this is my issue :(

> On Oct 10, 2016, at 2:08 PM, Luciano Resende  wrote:
> 
> Please take a look at 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets 
> 
> 
> Particularly the note at the required format :
> 
> Note that the file that is offered as a json file is not a typical JSON file. 
> Each line must contain a separate, self-contained valid JSON object. As a 
> consequence, a regular multi-line JSON file will most often fail.
> 
> 
> 
> On Mon, Oct 10, 2016 at 9:57 AM, Jean Georges Perrin  > wrote:
> Hi folks,
> 
> I am trying to parse JSON arrays and it’s getting a little crazy (for me at 
> least)…
> 
> 1)
> If my JSON is:
> {"vals":[100,500,600,700,800,200,900,300]}
> 
> I get:
> ++
> |vals|
> ++
> |[100, 500, 600, 7...|
> ++
> 
> root
>  |-- vals: array (nullable = true)
>  ||-- element: long (containsNull = true)
> 
> and I am :)
> 
> 2)
> If my JSON is:
> [100,500,600,700,800,200,900,300]
> 
> I get:
> ++
> | _corrupt_record|
> ++
> |[100,500,600,700,...|
> ++
> 
> root
>  |-- _corrupt_record: string (nullable = true)
> 
> Both are legit JSON structures… Do you think that #2 is a bug?
> 
> jg
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975 
> http://lresende.blogspot.com/ 


Re: JSON Arrays and Spark

2016-10-10 Thread Jean Georges Perrin
Thanks!

I am ok with strict rules (despite being French), but even:
[{
"red": "#f00", 
"green": "#0f0"
},{
"red": "#f01", 
"green": "#0f1"
}]

is not going through…

Is there a way to see what he does not like?

the JSON parser has been pretty good to me until recently.


> On Oct 10, 2016, at 12:59 PM, Sudhanshu Janghel <> wrote:
> 
> As far as my experience goes spark can parse only certain types of Json 
> correctly not all and has strict Parsing rules unlike python
> 
> 
> On Oct 10, 2016 6:57 PM, "Jean Georges Perrin"  > wrote:
> Hi folks,
> 
> I am trying to parse JSON arrays and it’s getting a little crazy (for me at 
> least)…
> 
> 1)
> If my JSON is:
> {"vals":[100,500,600,700,800,200,900,300]}
> 
> I get:
> ++
> |vals|
> ++
> |[100, 500, 600, 7...|
> ++
> 
> root
>  |-- vals: array (nullable = true)
>  ||-- element: long (containsNull = true)
> 
> and I am :)
> 
> 2)
> If my JSON is:
> [100,500,600,700,800,200,900,300]
> 
> I get:
> ++
> | _corrupt_record|
> ++
> |[100,500,600,700,...|
> ++
> 
> root
>  |-- _corrupt_record: string (nullable = true)
> 
> Both are legit JSON structures… Do you think that #2 is a bug?
> 
> jg
> 
> 
> 
> 
> 
> 
> Disclaimer: The information in this email is confidential and may be legally 
> privileged. Access to this email by anyone other than the intended addressee 
> is unauthorized. If you are not the intended recipient of this message, any 
> review, disclosure, copying, distribution, retention, or any action taken or 
> omitted to be taken in reliance on it is prohibited and may be unlawful.



Re: JSON Arrays and Spark

2016-10-10 Thread Luciano Resende
Please take a look at
http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets

Particularly the note at the required format :

Note that the file that is offered as *a json file* is not a typical JSON
file. Each line must contain a separate, self-contained valid JSON object.
As a consequence, a regular multi-line JSON file will most often fail.



On Mon, Oct 10, 2016 at 9:57 AM, Jean Georges Perrin  wrote:

> Hi folks,
>
> I am trying to parse JSON arrays and it’s getting a little crazy (for me
> at least)…
>
> 1)
> If my JSON is:
> {"vals":[100,500,600,700,800,200,900,300]}
>
> I get:
> ++
> |vals|
> ++
> |[100, 500, 600, 7...|
> ++
>
> root
>  |-- vals: array (nullable = true)
>  ||-- element: long (containsNull = true)
>
> and I am :)
>
> 2)
> If my JSON is:
> [100,500,600,700,800,200,900,300]
>
> I get:
> ++
> | _corrupt_record|
> ++
> |[100,500,600,700,...|
> ++
>
> root
>  |-- _corrupt_record: string (nullable = true)
>
> Both are legit JSON structures… Do you think that #2 is a bug?
>
> jg
>
>
>
>
>
>


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/