Re: Flatten JSON to multiple columns in Spark

2017-07-18 Thread Chetan Khatri
Implicit tried - didn't worked! from_json - didnt support spark 2.0.1 any alternate solution would be welcome please On Tue, Jul 18, 2017 at 12:18 PM, Georg Heiler wrote: > You need to have spark implicits in scope > Richard Xin

Spark history server running on Mongo

2017-07-18 Thread Ivan Sadikov
Hello everyone! I have been working on Spark history server that uses MongoDB as a datastore for processed events to iterate on idea that Spree project uses for Spark UI. Project was originally designed to improve on standalone history server with reduced memory footprint. Project lives here:

Re: Flatten JSON to multiple columns in Spark

2017-07-18 Thread Chetan Khatri
Explode is not working in this scenario with error - string cannot be used in explore either array or map in spark On Tue, Jul 18, 2017 at 11:39 AM, 刘虓 wrote: > Hi, > have you tried to use explode? > > Chetan Khatri 于2017年7月18日 周二下午2:06写道: > >>

Re: Flatten JSON to multiple columns in Spark

2017-07-18 Thread Georg Heiler
You need to have spark implicits in scope Richard Xin schrieb am Di. 18. Juli 2017 um 08:45: > I believe you could use JOLT (bazaarvoice/jolt > ) to flatten it to a json string and > then to dataframe or dataset. > >

Re: Spark UI crashes on Large Workloads

2017-07-18 Thread Riccardo Ferrari
Hi, can you share more details. do you have any exceptions from the driver? or executors? best, On Jul 18, 2017 02:49, "saatvikshah1994" wrote: > Hi, > > I have a pyspark App which when provided a huge amount of data as input > throws the error explained here

Flatten JSON to multiple columns in Spark

2017-07-18 Thread Chetan Khatri
Hello Spark Dev's, Can you please guide me, how to flatten JSON to multiple columns in Spark. *Example:* Sr No Title ISBN Info 1 Calculus Theory 1234567890 [{"cert":[{ "authSbmtr":"009415da-c8cd-418d-869e-0a19601d79fa", 009415da-c8cd-418d-869e-0a19601d79fa

Re: Flatten JSON to multiple columns in Spark

2017-07-18 Thread Chetan Khatri
Georg, Thank you for revert, it throws error because it is coming as string. On Tue, Jul 18, 2017 at 11:38 AM, Georg Heiler wrote: > df.select ($"Info.*") should help > Chetan Khatri schrieb am Di. 18. Juli 2017 > um 08:06: > >> Hello

Re: Flatten JSON to multiple columns in Spark

2017-07-18 Thread Georg Heiler
df.select ($"Info.*") should help Chetan Khatri schrieb am Di. 18. Juli 2017 um 08:06: > Hello Spark Dev's, > > Can you please guide me, how to flatten JSON to multiple columns in Spark. > > *Example:* > > Sr No Title ISBN Info > 1 Calculus Theory 1234567890

Re: Flatten JSON to multiple columns in Spark

2017-07-18 Thread 刘虓
Hi, have you tried to use explode? Chetan Khatri 于2017年7月18日 周二下午2:06写道: > Hello Spark Dev's, > > Can you please guide me, how to flatten JSON to multiple columns in Spark. > > *Example:* > > Sr No Title ISBN Info > 1 Calculus Theory 1234567890 [{"cert":[{ >

Re: Flatten JSON to multiple columns in Spark

2017-07-18 Thread Richard Xin
I believe you could use JOLT (bazaarvoice/jolt) to flatten it to a json string and then to dataframe or dataset. | | | | | | | | | | | bazaarvoice/jolt jolt - JSON to JSON transformation library written in Java. | | | On Monday, July 17, 2017, 11:18:24 PM PDT, Chetan

Re: Solutions.Hamburg conference

2017-07-18 Thread Myrle Krantz
Thanks for volunteering Jacek! Anyone else interested? I'm using a doodle to keep track of which slots we have possible speakers for. I've entered Jacek for the two Spark slots: https://doodle.com/poll/a2xquraqu27nekd3 People are also quite welcome to enter themselves. Best Regards, Myrle

Re: Spark UI crashes on Large Workloads

2017-07-18 Thread Riccardo Ferrari
The reason you get connection refused when connecting to the application UI (port 4040) is because you app gets stopped thus the application UI stops as well. To inspect your executors logs after the fact you might find useful the Spark History server

Solutions.Hamburg conference

2017-07-18 Thread Myrle Krantz
Hello Apache Spark Community, Solutions.Hamburg is offering Apache conference speaking slots, and some time to advertise for the ASF. We were wondering if anyone in the Spark community would like to give a technical talk on Apache Spark, between September 6-8 in Hamburg. This is an opportunity

Re: Solutions.Hamburg conference

2017-07-18 Thread Jacek Laskowski
Hi Myrle, If no one steps forward, I could and could have a short trip to Hamburg to give a talk (or few) about Spark. I live in Warsaw, Poland so pretty close. p.s. It's been a while since I spoke German so that could be a nice opportunity to remind the good ol' days :) p.s. Posting publicly

Re: Solutions.Hamburg conference

2017-07-18 Thread Jacek Laskowski
Hi Myrle, You're welcome. Pleasure's all mine. Could you please change Spark Streaming (technically a dead end) with the modern Structured Streaming. That's what I'd be shooting at. Thanks. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2

Re: Reading Hive tables Parallel in Spark

2017-07-18 Thread Matteo Cossu
The context you use for calling SparkSQL can be used only in the driver. Moreover, collect() works because it takes in local memory the RDD, but it should be used only for debugging reasons(95% of the times), if all your data fits into a single machine memory you shouldn't use Spark at all but

Re: Spark UI crashes on Large Workloads

2017-07-18 Thread Saatvik Shah
Hi Riccardo, Yes, Thanks for suggesting I do that. [Stage 1:==> (12750 + 40) / 15000]17/07/18 13:22:28 ERROR org.apache.spark.scheduler.LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue. This likely means one of

Re: Spark UI crashes on Large Workloads

2017-07-18 Thread Saatvik Shah
Hi Riccardo, Thanks for your suggestions. The thing is that my Spark UI is the one thing that is crashing - and not the app. In fact the app does end up completing successfully. That's why I'm a bit confused by this issue? I'll still try out some of your suggestions. Thanks and Regards, Saatvik

Re: Spark history server running on Mongo

2017-07-18 Thread Ivan Sadikov
Thanks for JIRA ticket reference! Frankly, I was aware of this work, but didn't know that there was an API for storage implementation. Will try exploring that as well, thanks! On Wed, 19 Jul 2017 at 4:18 AM, Marcelo Vanzin wrote: > See SPARK-18085. That has much of the same

Re: Flatten JSON to multiple columns in Spark

2017-07-18 Thread lucas.g...@gmail.com
I've been wondering about this for awhile. We wanted to do something similar for generically saving thousands of individual homogenous events into well formed parquet. Ultimately I couldn't find something I wanted to own and pushed back on the requirements. It seems the canonical answer is that

Re: Spark history server running on Mongo

2017-07-18 Thread Marcelo Vanzin
See SPARK-18085. That has much of the same goals re: SHS resource usage, and also provides a (currently non-public) API where you could just create a MongoDB implementation if you want. On Tue, Jul 18, 2017 at 12:56 AM, Ivan Sadikov wrote: > Hello everyone! > > I have

Requesting feedback on Fluo+Spark

2017-07-18 Thread Christopher
Hi Spark users and developers, We on the Apache Fluo (incubation) team recently saw this article[1] and were wondering if there were anybody using Fluo with Spark who could share their story. If so, we'd love to hear your feedback. Thanks! [1]:

Re: Flatten JSON to multiple columns in Spark

2017-07-18 Thread Riccardo Ferrari
What's against: df.rdd.map(...) or dataset.foreach() https://spark.apache.org/docs/2.0.1/api/scala/index.html#org.apache.spark.sql.Dataset@foreach(f:T= >Unit):Unit Best, On Tue, Jul 18, 2017 at 6:46 PM, lucas.g...@gmail.com wrote: > I've been wondering about this for

Re: Flatten JSON to multiple columns in Spark

2017-07-18 Thread Michael Armbrust
Here is an overview of how to work with complex JSON in Spark: https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html (works in streaming and batch) On Tue, Jul 18, 2017 at 10:29 AM, Riccardo Ferrari wrote: > What's

Re: [Spark Core] unhashable type: 'dict' during shuffle step

2017-07-18 Thread Josh Holbrook
Hi all, Just an update: I ran a variation of the job with the new latest_time code and it failed again, but I think I was misreading the history dash. This time, it shows 2 attempts, the second of which failed during the max call as before, but the *first* of which appears to be failing during

Re: Flatten JSON to multiple columns in Spark

2017-07-18 Thread lucas.g...@gmail.com
That's a great link Michael, thanks! For us it was around attempting to provide for dynamic schemas which is a bit of an anti-pattern. Ultimately it just comes down to owning your transforms, all the basic tools are there. On 18 July 2017 at 11:03, Michael Armbrust

[Spark Core] unhashable type: 'dict' during shuffle step

2017-07-18 Thread Josh Holbrook
Hello! I'm running into a very strange issue with pretty much no hits on the internet, and I'm hoping someone here can give me some protips! At this point, I'm at a loss. This is a little long-winded, but hopefully you'll indulge me. Background: I'm currently trying to port some existing spark

[Spark Streaming] How to make this code work?

2017-07-18 Thread Noppanit Charassinvichai
I'm super new to Spark and I'm writing this job to parse nginx log to ORC file format so it can be read from Presto. We wrote LogLine2Json which parse a line of nginx log to json. And that works fine. val sqs = streamContext.receiverStream(new SQSReceiver("elb") //.credentials("key",

Re: Spark | Window Function |

2017-07-18 Thread Radhwane Chebaane
Hi Julien, Could you give more details about the problems you faced? Here is a working example with Spark dataframe and Spark SQL: https://gist.github.com/radcheb/d16042d8bb3815d3dd42030ecedc43cf Cheers, Radhwane Chebaane 2017-07-18 18:21 GMT+02:00 Julien CHAMP : > Hi

Azure key vault

2017-07-18 Thread ayan guha
Hi Anyone here any exp in integrating spark with azure keyvault? -- Best Regards, Ayan Guha

Re: Spark history server running on Mongo

2017-07-18 Thread Ivan Sadikov
Hi Marcelo, Thanks for the reference, again. I looked at your code - really great work! I had to replace Spark distribution to use it though - could not figure out how to build it separately. Repository that I linked to does not require rebuilding Spark and could be used with current