"1 userid data" is ambiguous though (user-input data? stream? shard?), since a kinesis worker fetch data from shards that the worker has an ownership of, IIUC user-input data in a shard are transferred into an assigned worker as long as you get no failure.
// maropu On Mon, Nov 21, 2016 at 1:59 PM, Shushant Arora <shushantaror...@gmail.com> wrote: > Hi > > Thanks. > Have a doubt on spark streaming kinesis consumer. Say I have a batch time > of 500 ms and kiensis stream is partitioned on userid(uniformly > distributed).But since IdleTimeBetweenReadsInMillis is set to 1000ms so > Spark receiver nodes will fetch the data at interval of 1 second and store > in InputDstream. > > 1. When worker executors will fetch the data from receiver at after every > 500 ms does its gurantee that 1 userid data will go to one partition and > that to one worker only always ? > 2.If not - can I repartition stream data before processing? If yes how- > since JavaDStream has only one method repartition which takes number of > partitions and not the partitioner function ?So it will randomly > repartition the Dstream data. > > Thanks > > > > > > > > > On Tue, Nov 15, 2016 at 8:23 AM, Takeshi Yamamuro <linguin....@gmail.com> > wrote: > >> Seems it it not a good design to frequently restart workers in a minute >> because >> their initialization and shutdown take much time as you said >> (e.g., interconnection overheads with dynamodb and graceful shutdown). >> >> Anyway, since this is a kind of questions about the aws kinesis library, >> so >> you'd better to ask aws guys in their forum or something. >> >> // maropu >> >> >> On Mon, Nov 14, 2016 at 11:20 PM, Shushant Arora < >> shushantaror...@gmail.com> wrote: >> >>> 1.No, I want to implement low level consumer on kinesis stream. >>> so need to stop the worker once it read the latest sequence number sent >>> by driver. >>> >>> 2.What is the cost of frequent register and deregister of worker node. >>> Is that when worker's shutdown is called it will terminate run method but >>> leasecoordinator will wait for 2seconds before releasing the lease. So I >>> cannot deregister a worker in less than 2 seconds ? >>> >>> Thanks! >>> >>> >>> >>> On Mon, Nov 14, 2016 at 7:36 PM, Takeshi Yamamuro <linguin....@gmail.com >>> > wrote: >>> >>>> Is "aws kinesis get-shard-iterator --shard-iterator-type LATEST" not >>>> enough for your usecase? >>>> >>>> On Mon, Nov 14, 2016 at 10:23 PM, Shushant Arora < >>>> shushantaror...@gmail.com> wrote: >>>> >>>>> Thanks! >>>>> Is there a way to get the latest sequence number of all shards of a >>>>> kinesis stream? >>>>> >>>>> >>>>> >>>>> On Mon, Nov 14, 2016 at 5:43 PM, Takeshi Yamamuro < >>>>> linguin....@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> The time interval can be controlled by `IdleTimeBetweenReadsInMillis` >>>>>> in KinesisClientLibConfiguration though, >>>>>> it is not configurable in the current implementation. >>>>>> >>>>>> The detail can be found in; >>>>>> https://github.com/apache/spark/blob/master/external/kinesis >>>>>> -asl/src/main/scala/org/apache/spark/streaming/kinesis/Kines >>>>>> isReceiver.scala#L152 >>>>>> >>>>>> // maropu >>>>>> >>>>>> >>>>>> On Sun, Nov 13, 2016 at 12:08 AM, Shushant Arora < >>>>>> shushantaror...@gmail.com> wrote: >>>>>> >>>>>>> *Hi * >>>>>>> >>>>>>> *is **spark.streaming.blockInterval* for kinesis input stream is >>>>>>> hardcoded to 1 sec or is it configurable ? Time interval at which >>>>>>> receiver >>>>>>> fetched data from kinesis . >>>>>>> >>>>>>> Means stream batch interval cannot be less than >>>>>>> *spark.streaming.blockInterval >>>>>>> and this should be configrable , Also is there any minimum value for >>>>>>> streaming batch interval ?* >>>>>>> >>>>>>> *Thanks* >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> --- >>>>>> Takeshi Yamamuro >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> --- >>>> Takeshi Yamamuro >>>> >>> >>> >> >> >> -- >> --- >> Takeshi Yamamuro >> > > -- --- Takeshi Yamamuro