Re: Mac vs cluster Re: kafka 0.10 with Spark 2.02 auto.offset.reset=earliest will only read from a single partition on a multi partition topic

2016-11-19 Thread Hster Geguri
those results > to what you're seeing from spark. The results you posted from spark > didn't show any incoming messages at all. > > On Sat, Nov 19, 2016 at 11:12 AM, Hster Geguri > <hster.investiga...@gmail.com> wrote: > > Hi Cody, > > > > Thank you for te

Mac vs cluster Re: kafka 0.10 with Spark 2.02 auto.offset.reset=earliest will only read from a single partition on a multi partition topic

2016-11-19 Thread Hster Geguri
k is indeed seeing offsets for each partition. > > > The results you posted look to me like there aren't any messages going > into the other partitions, which looks like a misbehaving producer. > > On Thu, Nov 17, 2016 at 5:58 PM, Hster Geguri > <hster.investiga...@gmail.com>

kafka 0.10 with Spark 2.02 auto.offset.reset=earliest will only read from a single partition on a multi partition topic

2016-11-17 Thread Hster Geguri
Our team is trying to upgrade to Spark 2.0.2/Kafka 0.10.1.0 and we have been struggling with this show stopper problem. When we run our drivers with auto.offset.reset=latest ingesting from a single kafka topic with 10 partitions, the driver reads correctly from all 10 partitions. However when we

hang correlated to number of shards Re: Checkpointing with Kinesis hangs with socket timeouts when driver is relaunched while transforming on a 0 event batch

2015-11-13 Thread Hster Geguri
, Hster Geguri <hster.investiga...@gmail.com > wrote: > Hello everyone, > > We are testing checkpointing against YARN 2.7.1 with Spark 1.5. We are > trying to make sure checkpointing works with orderly shutdowns(i.e. yarn > application --kill) and unexpected shutdowns which we simu

Kinesis connection timeout setting on Spark Streaming Kinesis ASL

2015-11-05 Thread Hster Geguri
Is there any way to set the underlying AWS client connection socket timeout for the kinesis requests made in the spark-streaming-kinesis-asl? Currently we get socket timeouts which appear to default to about 120 seconds on driver restarts causing all kinds of backup. We'd like to shorten it to 10

Re: kinesis batches hang after YARN automatic driver restart

2015-11-03 Thread Hster Geguri
s. In your case, it could be > happening that because of your killing and restarting, the restarted KCL > may be taking a while to get new lease and start getting data again. > > On Mon, Nov 2, 2015 at 11:26 AM, Hster Geguri < > hster.investiga...@gmail.com> wrote: > >

kinesis batches hang after YARN automatic driver restart

2015-11-02 Thread Hster Geguri
Hello Wonderful Sparks Peoples, We are testing AWS Kinesis/Spark Streaming (1.5) failover behavior with Hadoop/Yarn 2.6 and 2.71 and want to understand expected behavior. When I manually kill a yarn application master/driver with a linux kill -9, YARN will automatically relaunch another master

expected Kinesis checkpoint behavior when driver restarts

2015-10-27 Thread Hster Geguri
We are using Kinesis with Spark Streaming 1.5 on a YARN cluster. When we enable checkpointing in Spark, where in the Kinesis stream should a restarted driver continue? I run a simple experiment as follows: 1. In the first driver run, Spark driver processes 1 million records starting from