As i am beginner, if some one can give psuedocode would be highly appreciated
On Tue, Jul 18, 2017 at 11:43 PM, lucas.g...@gmail.com <lucas.g...@gmail.com > wrote: > That's a great link Michael, thanks! > > For us it was around attempting to provide for dynamic schemas which is a > bit of an anti-pattern. > > Ultimately it just comes down to owning your transforms, all the basic > tools are there. > > > > On 18 July 2017 at 11:03, Michael Armbrust <mich...@databricks.com> wrote: > >> Here is an overview of how to work with complex JSON in Spark: >> https://databricks.com/blog/2017/02/23/working-comple >> x-data-formats-structured-streaming-apache-spark-2-1.html (works in >> streaming and batch) >> >> On Tue, Jul 18, 2017 at 10:29 AM, Riccardo Ferrari <ferra...@gmail.com> >> wrote: >> >>> What's against: >>> >>> df.rdd.map(...) >>> >>> or >>> >>> dataset.foreach() >>> >>> https://spark.apache.org/docs/2.0.1/api/scala/index.html#org >>> .apache.spark.sql.Dataset@foreach(f:T=>Unit):Unit >>> >>> Best, >>> >>> On Tue, Jul 18, 2017 at 6:46 PM, lucas.g...@gmail.com < >>> lucas.g...@gmail.com> wrote: >>> >>>> I've been wondering about this for awhile. >>>> >>>> We wanted to do something similar for generically saving thousands of >>>> individual homogenous events into well formed parquet. >>>> >>>> Ultimately I couldn't find something I wanted to own and pushed back on >>>> the requirements. >>>> >>>> It seems the canonical answer is that you need to 'own' the schema of >>>> the json and parse it out manually and into your dataframe. There's >>>> nothing challenging about it. Just verbose code. If you're 'info' is a >>>> consistent schema then you'll be fine. For us it was 12 wildly diverging >>>> schemas and I didn't want to own the transforms. >>>> >>>> I also recommend persisting anything that isn't part of your schema in >>>> an 'extras field' So when you parse out your json, if you've got anything >>>> leftover drop it in there for later analysis. >>>> >>>> I can provide some sample code but I think it's pretty straightforward >>>> / you can google it. >>>> >>>> What you can't seem to do efficiently is dynamically generate a >>>> dataframe from random JSON. >>>> >>>> >>>> On 18 July 2017 at 01:57, Chetan Khatri <chetan.opensou...@gmail.com> >>>> wrote: >>>> >>>>> Implicit tried - didn't worked! >>>>> >>>>> from_json - didnt support spark 2.0.1 any alternate solution would be >>>>> welcome please >>>>> >>>>> >>>>> On Tue, Jul 18, 2017 at 12:18 PM, Georg Heiler < >>>>> georg.kf.hei...@gmail.com> wrote: >>>>> >>>>>> You need to have spark implicits in scope >>>>>> Richard Xin <richardxin...@yahoo.com.invalid> schrieb am Di. 18. >>>>>> Juli 2017 um 08:45: >>>>>> >>>>>>> I believe you could use JOLT (bazaarvoice/jolt >>>>>>> <https://github.com/bazaarvoice/jolt>) to flatten it to a json >>>>>>> string and then to dataframe or dataset. >>>>>>> >>>>>>> bazaarvoice/jolt >>>>>>> >>>>>>> jolt - JSON to JSON transformation library written in Java. >>>>>>> <https://github.com/bazaarvoice/jolt> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Monday, July 17, 2017, 11:18:24 PM PDT, Chetan Khatri < >>>>>>> chetan.opensou...@gmail.com> wrote: >>>>>>> >>>>>>> >>>>>>> Explode is not working in this scenario with error - string cannot >>>>>>> be used in explore either array or map in spark >>>>>>> On Tue, Jul 18, 2017 at 11:39 AM, 刘虓 <ipf...@gmail.com> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> have you tried to use explode? >>>>>>> >>>>>>> Chetan Khatri <chetan.opensou...@gmail.com> 于2017年7月18日 周二下午2:06写道: >>>>>>> >>>>>>> Hello Spark Dev's, >>>>>>> >>>>>>> Can you please guide me, how to flatten JSON to multiple columns in >>>>>>> Spark. >>>>>>> >>>>>>> *Example:* >>>>>>> >>>>>>> Sr No Title ISBN Info >>>>>>> 1 Calculus Theory 1234567890 >>>>>>> >>>>>>> [{"cert":[{ >>>>>>> "authSbmtr":"009415da-c8cd- 418d-869e-0a19601d79fa", >>>>>>> 009415da-c8cd-418d-869e- 0a19601d79fa >>>>>>> "certUUID":"03ea5a1a-5530- 4fa3-8871-9d1ebac627c4", >>>>>>> >>>>>>> "effDt":"2016-05-06T15:04:56. 279Z", >>>>>>> >>>>>>> >>>>>>> "fileFmt":"rjrCsv","status":" live"}], >>>>>>> >>>>>>> "expdCnt":"15", >>>>>>> "mfgAcctNum":"531093", >>>>>>> >>>>>>> "oUUID":"23d07397-4fbe-4897- 8a18-b79c9f64726c", >>>>>>> >>>>>>> >>>>>>> "pgmRole":["RETAILER"], >>>>>>> "pgmUUID":"1cb5dd63-817a-45bc- a15c-5660e4accd63", >>>>>>> "regUUID":"cc1bd898-657d-40dc- af5d-4bf1569a1cc4", >>>>>>> "rtlrsSbmtd":["009415da-c8cd- 418d-869e-0a19601d79fa"]}] >>>>>>> >>>>>>> I want to get single row with 11 columns. >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> >>>>> >>>> >>> >> >