retry strategy on numerous places around Kafka related routines).
Not that i'm complaining or competing, at the end of the day having
a Spark App that continues to work overnight gives developer a good
sleep at night :)
On Thu, Aug 25, 2016 at 3:23 AM, Jeoffrey Lim <jeoffr...@gmail.com> wrote:
l can't achieve sub-millisecond
> end-to-end stream processing, so my guess is you need to be more
> specific about your terms there.
>
> I promise I'm not trying to start a pissing contest :) just wanted to
> check if you were aware of the current state of the other consumers.
> C
or Spark 1.3 and Kafka 0.8.2.1 (and of
course with the latest Kafka 0.10 as well)
On Wed, Aug 24, 2016 at 9:49 AM, Cody Koeninger <c...@koeninger.org> wrote:
> Were you aware that the spark 2.0 / kafka 0.10 integration also reuses
> kafka consumer instances on the executors?
>
> On Tu
has been running stable in production environment and has
been proven to be resilient to numerous production issues.
Please check out the project's page in github:
https://github.com/jeoffreylim/maelstrom
Contributors welcome!
Cheers!
Jeoffrey Lim
P.S. I am also looking for a job opportunity
What we did for gracefully shutting down the spark streaming context is
extend a Spark Web UI Tab and perform a
SparkContext.SparkUI.attachTab(custom web ui). However, the custom scala
Web UI extensions needs to be under the package org.apache.spark.ui to get
around with the package access
Our issue could be related to this problem as described in:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-in-1-hour-batch-duration-RDD-files-gets-lost-td14027.html
which
the DStream is processed for every 1 hour batch duration.
I have implemented IO throttling in the
Hi,
Our spark streaming app is configured to pull data from Kafka in 1 hour
batch duration which performs aggregation of data by specific keys and
store the related RDDs to HDFS in the transform phase. We have tried
checkpoint of 7 days on the DStream of Kafka to ensure that the generated
stream