Re: Mac vs cluster Re: kafka 0.10 with Spark 2.02 auto.offset.reset=earliest will only read from a single partition on a multi partition topic

2016-11-19 Thread Hster Geguri
Hi Cody, Our test producer has been vetted for producing evenly into each partition. We use kafka-manager to track this. $ kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list ' > 10.102.22.11:9092' --topic simple_logtest --time -2 > simple_logtest:2:0 > simple_logtest:4:0 >

Re: Mac vs cluster Re: kafka 0.10 with Spark 2.02 auto.offset.reset=earliest will only read from a single partition on a multi partition topic

2016-11-19 Thread Cody Koeninger
This is running locally on my mac, but it's still a standalone spark master with multiple separate executor jvms (i.e. using --master not --local[2]), so it should be the same code paths. I can't speak to yarn one way or the other, but you said you tried it with the standalone scheduler. At the

Mac vs cluster Re: kafka 0.10 with Spark 2.02 auto.offset.reset=earliest will only read from a single partition on a multi partition topic

2016-11-19 Thread Hster Geguri
Hi Cody, Thank you for testing this on a Saturday morning! I failed to mention that when our data engineer runs our drivers(even complex ones) locally on his Mac, the drivers work fine. However when we launch it into the cluster (4 machines either for a YARN cluster or spark standalone) we get

Re: kafka 0.10 with Spark 2.02 auto.offset.reset=earliest will only read from a single partition on a multi partition topic

2016-11-19 Thread Cody Koeninger
I ran your example using the versions of kafka and spark you are using, against a standalone cluster. This is what I observed: (in kafka working directory) bash-3.2$ ./bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 'localhost:9092' --topic simple_logtest --time -2

kafka 0.10 with Spark 2.02 auto.offset.reset=earliest will only read from a single partition on a multi partition topic

2016-11-17 Thread Hster Geguri
Our team is trying to upgrade to Spark 2.0.2/Kafka 0.10.1.0 and we have been struggling with this show stopper problem. When we run our drivers with auto.offset.reset=latest ingesting from a single kafka topic with 10 partitions, the driver reads correctly from all 10 partitions. However when we