Re: How to read json data from kafka and store to hdfs with spark structued streaming?

2018-07-27 Thread Arbab Khalil
Please try adding an other option of starting offset. I have done the same thing many times with different versions of spark that supports structured streaming. The other I am seeing is could be something that it could be at write time. Can you please confirm it be doing printSchema function after

Re: Iterative rdd union + reduceByKey operations on small dataset leads to "No space left on device" error on account of lot of shuffle spill.

2018-07-27 Thread Dinesh Dharme
Yeah, you are right. I ran the experiments locally not on YARN. On Fri, Jul 27, 2018 at 11:54 PM, Vadim Semenov wrote: > `spark.worker.cleanup.enabled=true` doesn't work for YARN. > On Fri, Jul 27, 2018 at 8:52 AM dineshdharme > wrote: > > > > I am trying to do few (union + reduceByKey)

Re: Iterative rdd union + reduceByKey operations on small dataset leads to "No space left on device" error on account of lot of shuffle spill.

2018-07-27 Thread Vadim Semenov
`spark.worker.cleanup.enabled=true` doesn't work for YARN. On Fri, Jul 27, 2018 at 8:52 AM dineshdharme wrote: > > I am trying to do few (union + reduceByKey) operations on a hiearchical > dataset in a iterative fashion in rdd. The first few loops run fine but on > the subsequent loops, the

Re: Question of spark streaming

2018-07-27 Thread Arun Mahadevan
“activityQuery.awaitTermination()” is a blocking call. You can just skip this line and run other commands in the same shell to query the stream. Running the query from a different shell won’t help since the memory sink where the results are store is not shared between the two shells.

Iterative rdd union + reduceByKey operations on small dataset leads to "No space left on device" error on account of lot of shuffle spill.

2018-07-27 Thread dineshdharme
I am trying to do few (union + reduceByKey) operations on a hiearchical dataset in a iterative fashion in rdd. The first few loops run fine but on the subsequent loops, the operations ends up using the whole scratch space provided to it. I have set the spark scratch directory, i.e.

Question of spark streaming

2018-07-27 Thread utkarsh rathor
I am following the book *Spark the Definitive Guide* The following code is *executed locally using spark-shell* Procedure: Started the spark-shell without any other options val static = spark.read.json("/part-00079-tid-730451297822678341-1dda7027-2071-4d73-a0e2-7fb6a91e1d1f-0-c000.json") val

Re: How to read json data from kafka and store to hdfs with spark structued streaming?

2018-07-27 Thread dddaaa
This is a mistake in the code snippet I posted. The right code that is actually running and producing the error is: / df = spark \ .readStream \ .format("kafka") \ .option("kafka.bootstrap.servers", "kafka_broker") \ .option("subscribe", "test_hdfs3") \ .load()

Re: How to read json data from kafka and store to hdfs with spark structued streaming?

2018-07-27 Thread Arbab Khalil
Why are you reading batch from kafka and writing it as stream? On Fri, Jul 27, 2018, 1:40 PM dddaaa wrote: > No, I just made sure I'm not doing it. > changed the path in .start() to another path and the same still occurs. > > > > -- > Sent from:

Re: How to read json data from kafka and store to hdfs with spark structued streaming?

2018-07-27 Thread dddaaa
No, I just made sure I'm not doing it. changed the path in .start() to another path and the same still occurs. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: