Hi Cody,
Our test producer has been vetted for producing evenly into each
partition. We use kafka-manager to track this.
$ kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list '
> 10.102.22.11:9092' --topic simple_logtest --time -2
> simple_logtest:2:0
> simple_logtest:4:0
>
This is running locally on my mac, but it's still a standalone spark
master with multiple separate executor jvms (i.e. using --master not
--local[2]), so it should be the same code paths. I can't speak to
yarn one way or the other, but you said you tried it with the
standalone scheduler.
At the
Hi Cody,
Thank you for testing this on a Saturday morning! I failed to mention that
when our data engineer runs our drivers(even complex ones) locally on his
Mac, the drivers work fine. However when we launch it into the cluster (4
machines either for a YARN cluster or spark standalone) we get
I ran your example using the versions of kafka and spark you are
using, against a standalone cluster. This is what I observed:
(in kafka working directory)
bash-3.2$ ./bin/kafka-run-class.sh kafka.tools.GetOffsetShell
--broker-list 'localhost:9092' --topic simple_logtest --time -2
Our team is trying to upgrade to Spark 2.0.2/Kafka 0.10.1.0 and we have
been struggling with this show stopper problem.
When we run our drivers with auto.offset.reset=latest ingesting from a
single kafka topic with 10 partitions, the driver reads correctly from all
10 partitions.
However when we