Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-17 Thread Matthias Niehoff
heartbeat.interval.ms default group.max.session.timeout.ms default session.timeout.ms 6 default values as of kafka 0.10. complete Kafka params: val kafkaParams = Map[String, String]( "bootstrap.servers" -> kafkaBrokers, "auto.offset.reset" -> "latest", "enable.auto.commit" -> "false",

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-14 Thread Cody Koeninger
For you or anyone else having issues with consumer rebalance, what are your settings for heartbeat.interval.ms session.timeout.ms group.max.session.timeout.ms relative to your batch time? On Tue, Oct 11, 2016 at 10:19 AM, static-max wrote: > Hi, > > I run into the

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Cody Koeninger
The new underlying kafka consumer prefetches data and is generally heavier weight, so it is cached on executors. Group id is part of the cache key. I assumed kafka users would use different group ids for consumers they wanted to be distinct, since otherwise would cause problems even with the

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff
good point, I changed the group id to be unique for the separate streams and now it works. Thanks! Although changing this is ok for us, i am interested in the why :-) With the old connector this was not a problem nor is it afaik with the pure kafka consumer api 2016-10-11 14:30 GMT+02:00 Cody

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Cody Koeninger
Just out of curiosity, have you tried using separate group ids for the separate streams? On Oct 11, 2016 4:46 AM, "Matthias Niehoff" wrote: > I stripped down the job to just consume the stream and print it, without > avro deserialization. When I only consume one

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff
I re-ran the job with DEBUG Log Level for org.apache.spark, kafka.consumer and org.apache.kafka. Please find the output here: http://pastebin.com/VgtRUQcB most of the delay is introduced by *16/10/11 13:20:12 DEBUG RecurringTimer: Callback for JobGenerator called at time x*, which repeats

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff
I stripped down the job to just consume the stream and print it, without avro deserialization. When I only consume one topic, everything is fine. As soon as I add a second stream it stucks after about 5 minutes. So I basically bails down to: val kafkaParams = Map[String, String](

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff
This Job will fail after about 5 minutes: object SparkJobMinimal { //read Avro schemas var stream = getClass.getResourceAsStream("/avro/AdRequestLog.avsc") val avroSchemaAdRequest = scala.io.Source.fromInputStream(stream).getLines.mkString stream.close stream =

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-10 Thread Matthias Niehoff
Yes, without commiting the data the consumer rebalances. The job consumes 3 streams process them. When consuming only one stream it runs fine. But when consuming three streams, even without joining them, just deserialize the payload and trigger an output action it fails. I will prepare code

Re: Problems with new experimental Kafka Consumer for 0.10

2016-09-28 Thread Cody Koeninger
Well, I'd start at the first thing suggested by the error, namely that the group has rebalanced. Is that stream using a unique group id? On Wed, Sep 28, 2016 at 5:17 AM, Matthias Niehoff wrote: > Hi, > > the stacktrace: > >

Re: Problems with new experimental Kafka Consumer for 0.10

2016-09-28 Thread Matthias Niehoff
Hi, the stacktrace: org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured

Re: Problems with new experimental Kafka Consumer for 0.10

2016-09-27 Thread Cody Koeninger
What's the actual stacktrace / exception you're getting related to commit failure? On Tue, Sep 27, 2016 at 9:37 AM, Matthias Niehoff wrote: > Hi everybody, > > i am using the new Kafka Receiver for Spark Streaming for my Job. When > running with old consumer it

Problems with new experimental Kafka Consumer for 0.10

2016-09-27 Thread Matthias Niehoff
Hi everybody, i am using the new Kafka Receiver for Spark Streaming for my Job. When running with old consumer it runs fine. The Job consumes 3 Topics, saves the data to Cassandra, cogroups the topic, calls mapWithState and stores the results in cassandra. After that I manually commit the Kafka