Re: Flatten JSON to multiple columns in Spark

lucas.g...@gmail.com Tue, 18 Jul 2017 09:47:33 -0700

I've been wondering about this for awhile.

We wanted to do something similar for generically saving thousands of
individual homogenous events into well formed parquet.

Ultimately I couldn't find something I wanted to own and pushed back on the
requirements.

It seems the canonical answer is that you need to 'own' the schema of the
json and parse it out manually and into your dataframe.  There's nothing
challenging about it.  Just verbose code.  If you're 'info' is a consistent
schema then you'll be fine.  For us it was 12 wildly diverging schemas and
I didn't want to own the transforms.

I also recommend persisting anything that isn't part of your schema in an
'extras field'  So when you parse out your json, if you've got anything
leftover drop it in there for later analysis.

I can provide some sample code but I think it's pretty straightforward /
you can google it.

What you can't seem to do efficiently is dynamically generate a dataframe
from random JSON.

On 18 July 2017 at 01:57, Chetan Khatri <chetan.opensou...@gmail.com> wrote:

> Implicit tried - didn't worked!
>
> from_json - didnt support spark 2.0.1 any alternate solution would be
> welcome please
>
>
> On Tue, Jul 18, 2017 at 12:18 PM, Georg Heiler <georg.kf.hei...@gmail.com>
> wrote:
>
>> You need to have spark implicits in scope
>> Richard Xin <richardxin...@yahoo.com.invalid> schrieb am Di. 18. Juli
>> 2017 um 08:45:
>>
>>> I believe you could use JOLT (bazaarvoice/jolt
>>> <https://github.com/bazaarvoice/jolt>) to flatten it to a json string
>>> and then to dataframe or dataset.
>>>
>>> bazaarvoice/jolt
>>>
>>> jolt - JSON to JSON transformation library written in Java.
>>> <https://github.com/bazaarvoice/jolt>
>>>
>>>
>>>
>>>
>>> On Monday, July 17, 2017, 11:18:24 PM PDT, Chetan Khatri <
>>> chetan.opensou...@gmail.com> wrote:
>>>
>>>
>>> Explode is not working in this scenario with error - string cannot be
>>> used in explore either array or map in spark
>>> On Tue, Jul 18, 2017 at 11:39 AM, 刘虓 <ipf...@gmail.com> wrote:
>>>
>>> Hi,
>>> have you tried to use explode?
>>>
>>> Chetan Khatri <chetan.opensou...@gmail.com> 于2017年7月18日 周二下午2:06写道：
>>>
>>> Hello Spark Dev's,
>>>
>>> Can you please guide me, how to flatten JSON to multiple columns in
>>> Spark.
>>>
>>> *Example:*
>>>
>>> Sr No Title ISBN Info
>>> 1 Calculus Theory 1234567890
>>>
>>> [{"cert":[{
>>> "authSbmtr":"009415da-c8cd- 418d-869e-0a19601d79fa",
>>> 009415da-c8cd-418d-869e- 0a19601d79fa
>>> "certUUID":"03ea5a1a-5530- 4fa3-8871-9d1ebac627c4",
>>>
>>> "effDt":"2016-05-06T15:04:56. 279Z",
>>>
>>>
>>> "fileFmt":"rjrCsv","status":" live"}],
>>>
>>> "expdCnt":"15",
>>> "mfgAcctNum":"531093",
>>>
>>> "oUUID":"23d07397-4fbe-4897- 8a18-b79c9f64726c",
>>>
>>>
>>> "pgmRole":["RETAILER"],
>>> "pgmUUID":"1cb5dd63-817a-45bc- a15c-5660e4accd63",
>>> "regUUID":"cc1bd898-657d-40dc- af5d-4bf1569a1cc4",
>>> "rtlrsSbmtd":["009415da-c8cd- 418d-869e-0a19601d79fa"]}]
>>>
>>> I want to get single row with 11 columns.
>>>
>>> Thanks.
>>>
>>>
>

Re: Flatten JSON to multiple columns in Spark

Reply via email to