Re: Mac vs cluster Re: kafka 0.10 with Spark 2.02 auto.offset.reset=earliest will only read from a single partition on a multi partition topic

2016-11-19 Thread Hster Geguri
Hi Cody, Our test producer has been vetted for producing evenly into each partition. We use kafka-manager to track this. $ kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list ' > 10.102.22.11:9092' --topic simple_logtest --time -2 > simple_logtest:2:0 > simple_logtest:4:0 >

Re: Mac vs cluster Re: kafka 0.10 with Spark 2.02 auto.offset.reset=earliest will only read from a single partition on a multi partition topic

2016-11-19 Thread Cody Koeninger
This is running locally on my mac, but it's still a standalone spark master with multiple separate executor jvms (i.e. using --master not --local[2]), so it should be the same code paths. I can't speak to yarn one way or the other, but you said you tried it with the standalone scheduler. At the

Mac vs cluster Re: kafka 0.10 with Spark 2.02 auto.offset.reset=earliest will only read from a single partition on a multi partition topic

2016-11-19 Thread Hster Geguri
Hi Cody, Thank you for testing this on a Saturday morning! I failed to mention that when our data engineer runs our drivers(even complex ones) locally on his Mac, the drivers work fine. However when we launch it into the cluster (4 machines either for a YARN cluster or spark standalone) we get