Re: Spark and Oozie

2019-08-05 Thread Dennis Suhari
Hi William, because it is the only job that is running I don't think it is resource contention. We have configured capacity scheduler which means using yarn queues. As it is the only job I cant see that it is waiting somehow in the queue. Br, Dennis Von meinem iPhone gesendet > Am

Re: Incremental (online) machine learning algorithms on ML

2019-08-05 Thread Stephen Boesch
There are several high bars to getting a new algorithm adopted. * It needs to be deemed by the MLLib committers/shepherds as widely useful to the community. Algorithms offered by larger companies after having demonstrated usefulness at scale for use cases likely to be encountered by many

Incremental (online) machine learning algorithms on ML

2019-08-05 Thread chagas
Hi, After searching the machine learning library for streaming algorithms, I found two that fit the criteria: Streaming Linear Regression (https://spark.apache.org/docs/latest/mllib-linear-methods.html#streaming-linear-regression) and Streaming K-Means

spark job getting hang

2019-08-05 Thread Amit Sharma
I am running spark job and if i run it sometimes it ran successfully but most of the time getting ERROR Dropping event from queue appStatus. This likely means one of the listeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.

How to programmatically pause and resume Spark/Kafka structured streaming?

2019-08-05 Thread kant kodali
Hi All, I am trying to see if there is a way to pause a spark stream that process data from Kafka such that my application can take some actions while the stream is paused and resume when the application completes those actions. Thanks!

Re: How to programmatically pause and resume Spark/Kafka structured streaming?

2019-08-05 Thread Gourav Sengupta
Hi, exactly my question, I was also looking for ways to gracefully exit spark structured streaming. Regards, Gourav On Tue, Aug 6, 2019 at 3:43 AM kant kodali wrote: > Hi All, > > I am trying to see if there is a way to pause a spark stream that process > data from Kafka such that my