heartbeat.interval.ms default
group.max.session.timeout.ms default
session.timeout.ms 6
default values as of kafka 0.10.
complete Kafka params:
val kafkaParams = Map[String, String](
"bootstrap.servers" -> kafkaBrokers,
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> "false",
For you or anyone else having issues with consumer rebalance, what are
your settings for
heartbeat.interval.ms
session.timeout.ms
group.max.session.timeout.ms
relative to your batch time?
On Tue, Oct 11, 2016 at 10:19 AM, static-max wrote:
> Hi,
>
> I run into the
The new underlying kafka consumer prefetches data and is generally heavier
weight, so it is cached on executors. Group id is part of the cache key. I
assumed kafka users would use different group ids for consumers they wanted
to be distinct, since otherwise would cause problems even with the
good point, I changed the group id to be unique for the separate streams
and now it works. Thanks!
Although changing this is ok for us, i am interested in the why :-) With
the old connector this was not a problem nor is it afaik with the pure
kafka consumer api
2016-10-11 14:30 GMT+02:00 Cody
Just out of curiosity, have you tried using separate group ids for the
separate streams?
On Oct 11, 2016 4:46 AM, "Matthias Niehoff"
wrote:
> I stripped down the job to just consume the stream and print it, without
> avro deserialization. When I only consume one
I re-ran the job with DEBUG Log Level for org.apache.spark, kafka.consumer
and org.apache.kafka. Please find the output here:
http://pastebin.com/VgtRUQcB
most of the delay is introduced by *16/10/11 13:20:12 DEBUG RecurringTimer:
Callback for JobGenerator called at time x*, which repeats
I stripped down the job to just consume the stream and print it, without
avro deserialization. When I only consume one topic, everything is fine. As
soon as I add a second stream it stucks after about 5 minutes. So I
basically bails down to:
val kafkaParams = Map[String, String](
This Job will fail after about 5 minutes:
object SparkJobMinimal {
//read Avro schemas
var stream = getClass.getResourceAsStream("/avro/AdRequestLog.avsc")
val avroSchemaAdRequest =
scala.io.Source.fromInputStream(stream).getLines.mkString
stream.close
stream =
Yes, without commiting the data the consumer rebalances.
The job consumes 3 streams process them. When consuming only one stream it
runs fine. But when consuming three streams, even without joining them,
just deserialize the payload and trigger an output action it fails.
I will prepare code
Well, I'd start at the first thing suggested by the error, namely that
the group has rebalanced.
Is that stream using a unique group id?
On Wed, Sep 28, 2016 at 5:17 AM, Matthias Niehoff
wrote:
> Hi,
>
> the stacktrace:
>
>
Hi,
the stacktrace:
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be
completed since the group has already rebalanced and assigned the
partitions to another member. This means that the time between subsequent
calls to poll() was longer than the configured
What's the actual stacktrace / exception you're getting related to
commit failure?
On Tue, Sep 27, 2016 at 9:37 AM, Matthias Niehoff
wrote:
> Hi everybody,
>
> i am using the new Kafka Receiver for Spark Streaming for my Job. When
> running with old consumer it
Hi everybody,
i am using the new Kafka Receiver for Spark Streaming for my Job. When
running with old consumer it runs fine.
The Job consumes 3 Topics, saves the data to Cassandra, cogroups the topic,
calls mapWithState and stores the results in cassandra. After that I
manually commit the Kafka
13 matches
Mail list logo