Re: Rebalancing when adding kafka partitions

2016-08-16 Thread Cody Koeninger
The underlying kafka consumer On Tue, Aug 16, 2016 at 2:17 PM, Srikanth wrote: > Yes, SubscribePattern detects new partition. Also, it has a comment saying > >> Subscribe to all topics matching specified pattern to get dynamically >> assigned partitions. >> * The pattern

Re: Rebalancing when adding kafka partitions

2016-08-16 Thread Srikanth
Yes, SubscribePattern detects new partition. Also, it has a comment saying Subscribe to all topics matching specified pattern to get dynamically > assigned partitions. > * The pattern matching will be done periodically against topics existing > at the time of check. > * @param pattern pattern

Re: Rebalancing when adding kafka partitions

2016-08-12 Thread Cody Koeninger
Hrrm, that's interesting. Did you try with subscribe pattern, out of curiosity? I haven't tested repartitioning on the underlying new Kafka consumer, so its possible I misunderstood something. On Aug 12, 2016 2:47 PM, "Srikanth" wrote: > I did try a test with spark 2.0 +

Re: Rebalancing when adding kafka partitions

2016-08-12 Thread Srikanth
I did try a test with spark 2.0 + spark-streaming-kafka-0-10-assembly. Partition was increased using "bin/kafka-topics.sh --alter" after spark job was started. I don't see messages from new partitions in the DStream. KafkaUtils.createDirectStream[Array[Byte], Array[Byte]] ( > ssc,

Re: Rebalancing when adding kafka partitions

2016-07-22 Thread Cody Koeninger
Scaladoc is already in the code, just not the html docs On Fri, Jul 22, 2016 at 1:46 PM, Srikanth wrote: > Yeah, that's what I thought. We need to redefine not just restart. > Thanks for the info! > > I do see the usage of subscribe[K,V] in your DStreams example. > Looks

Re: Rebalancing when adding kafka partitions

2016-07-22 Thread Srikanth
Yeah, that's what I thought. We need to redefine not just restart. Thanks for the info! I do see the usage of subscribe[K,V] in your DStreams example. Looks simple but its not very obvious how it works :-) I'll watch out for the docs and ScalaDoc. Srikanth On Fri, Jul 22, 2016 at 2:15 PM, Cody

Re: Rebalancing when adding kafka partitions

2016-07-22 Thread Cody Koeninger
No, restarting from a checkpoint won't do it, you need to re-define the stream. Here's the jira for the 0.10 integration https://issues.apache.org/jira/browse/SPARK-12177 I haven't gotten docs completed yet, but there are examples at

Re: Rebalancing when adding kafka partitions

2016-07-22 Thread Srikanth
In Spark 1.x, if we restart from a checkpoint, will it read from new partitions? If you can, pls point us to some doc/link that talks about Kafka 0.10 integ in Spark 2.0. On Fri, Jul 22, 2016 at 1:33 PM, Cody Koeninger wrote: > For the integration for kafka 0.8, you are

Re: Rebalancing when adding kafka partitions

2016-07-22 Thread Cody Koeninger
For the integration for kafka 0.8, you are literally starting a streaming job against a fixed set of topicapartitions, It will not change throughout the job, so you'll need to restart the spark job if you change kafka partitions. For the integration for kafka 0.10 / spark 2.0, if you use

Rebalancing when adding kafka partitions

2016-07-22 Thread Srikanth
Hello, I'd like to understand how Spark Streaming(direct) would handle Kafka partition addition? Will a running job be aware of new partitions and read from it? Since it uses Kafka APIs to query offsets and offsets are handled internally. Srikanth