Re: Flatten JSON to multiple columns in Spark

Chetan Khatri Wed, 19 Jul 2017 03:02:23 -0700

As i am beginner, if some one can give psuedocode would be highly
appreciated


On Tue, Jul 18, 2017 at 11:43 PM, lucas.g...@gmail.com <lucas.g...@gmail.com
> wrote:

> That's a great link Michael, thanks!
>
> For us it was around attempting to provide for dynamic schemas which is a
> bit of an anti-pattern.
>
> Ultimately it just comes down to owning your transforms, all the basic
> tools are there.
>
>
>
> On 18 July 2017 at 11:03, Michael Armbrust <mich...@databricks.com> wrote:
>
>> Here is an overview of how to work with complex JSON in Spark:
>> https://databricks.com/blog/2017/02/23/working-comple
>> x-data-formats-structured-streaming-apache-spark-2-1.html (works in
>> streaming and batch)
>>
>> On Tue, Jul 18, 2017 at 10:29 AM, Riccardo Ferrari <ferra...@gmail.com>
>> wrote:
>>
>>> What's against:
>>>
>>> df.rdd.map(...)
>>>
>>> or
>>>
>>> dataset.foreach()
>>>
>>> https://spark.apache.org/docs/2.0.1/api/scala/index.html#org
>>> .apache.spark.sql.Dataset@foreach(f:T=>Unit):Unit
>>>
>>> Best,
>>>
>>> On Tue, Jul 18, 2017 at 6:46 PM, lucas.g...@gmail.com <
>>> lucas.g...@gmail.com> wrote:
>>>
>>>> I've been wondering about this for awhile.
>>>>
>>>> We wanted to do something similar for generically saving thousands of
>>>> individual homogenous events into well formed parquet.
>>>>
>>>> Ultimately I couldn't find something I wanted to own and pushed back on
>>>> the requirements.
>>>>
>>>> It seems the canonical answer is that you need to 'own' the schema of
>>>> the json and parse it out manually and into your dataframe.  There's
>>>> nothing challenging about it.  Just verbose code.  If you're 'info' is a
>>>> consistent schema then you'll be fine.  For us it was 12 wildly diverging
>>>> schemas and I didn't want to own the transforms.
>>>>
>>>> I also recommend persisting anything that isn't part of your schema in
>>>> an 'extras field'  So when you parse out your json, if you've got anything
>>>> leftover drop it in there for later analysis.
>>>>
>>>> I can provide some sample code but I think it's pretty straightforward
>>>> / you can google it.
>>>>
>>>> What you can't seem to do efficiently is dynamically generate a
>>>> dataframe from random JSON.
>>>>
>>>>
>>>> On 18 July 2017 at 01:57, Chetan Khatri <chetan.opensou...@gmail.com>
>>>> wrote:
>>>>
>>>>> Implicit tried - didn't worked!
>>>>>
>>>>> from_json - didnt support spark 2.0.1 any alternate solution would be
>>>>> welcome please
>>>>>
>>>>>
>>>>> On Tue, Jul 18, 2017 at 12:18 PM, Georg Heiler <
>>>>> georg.kf.hei...@gmail.com> wrote:
>>>>>
>>>>>> You need to have spark implicits in scope
>>>>>> Richard Xin <richardxin...@yahoo.com.invalid> schrieb am Di. 18.
>>>>>> Juli 2017 um 08:45:
>>>>>>
>>>>>>> I believe you could use JOLT (bazaarvoice/jolt
>>>>>>> <https://github.com/bazaarvoice/jolt>) to flatten it to a json
>>>>>>> string and then to dataframe or dataset.
>>>>>>>
>>>>>>> bazaarvoice/jolt
>>>>>>>
>>>>>>> jolt - JSON to JSON transformation library written in Java.
>>>>>>> <https://github.com/bazaarvoice/jolt>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Monday, July 17, 2017, 11:18:24 PM PDT, Chetan Khatri <
>>>>>>> chetan.opensou...@gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Explode is not working in this scenario with error - string cannot
>>>>>>> be used in explore either array or map in spark
>>>>>>> On Tue, Jul 18, 2017 at 11:39 AM, 刘虓 <ipf...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>> have you tried to use explode?
>>>>>>>
>>>>>>> Chetan Khatri <chetan.opensou...@gmail.com> 于2017年7月18日 周二下午2:06写道：
>>>>>>>
>>>>>>> Hello Spark Dev's,
>>>>>>>
>>>>>>> Can you please guide me, how to flatten JSON to multiple columns in
>>>>>>> Spark.
>>>>>>>
>>>>>>> *Example:*
>>>>>>>
>>>>>>> Sr No Title ISBN Info
>>>>>>> 1 Calculus Theory 1234567890
>>>>>>>
>>>>>>> [{"cert":[{
>>>>>>> "authSbmtr":"009415da-c8cd- 418d-869e-0a19601d79fa",
>>>>>>> 009415da-c8cd-418d-869e- 0a19601d79fa
>>>>>>> "certUUID":"03ea5a1a-5530- 4fa3-8871-9d1ebac627c4",
>>>>>>>
>>>>>>> "effDt":"2016-05-06T15:04:56. 279Z",
>>>>>>>
>>>>>>>
>>>>>>> "fileFmt":"rjrCsv","status":" live"}],
>>>>>>>
>>>>>>> "expdCnt":"15",
>>>>>>> "mfgAcctNum":"531093",
>>>>>>>
>>>>>>> "oUUID":"23d07397-4fbe-4897- 8a18-b79c9f64726c",
>>>>>>>
>>>>>>>
>>>>>>> "pgmRole":["RETAILER"],
>>>>>>> "pgmUUID":"1cb5dd63-817a-45bc- a15c-5660e4accd63",
>>>>>>> "regUUID":"cc1bd898-657d-40dc- af5d-4bf1569a1cc4",
>>>>>>> "rtlrsSbmtd":["009415da-c8cd- 418d-869e-0a19601d79fa"]}]
>>>>>>>
>>>>>>> I want to get single row with 11 columns.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Flatten JSON to multiple columns in Spark

Reply via email to