"1 userid data" is ambiguous though (user-input data? stream? shard?),
since a kinesis worker fetch data from shards that the worker has an
ownership of, IIUC user-input data in a shard are transferred into an
assigned worker as long as you get no failure.

// maropu

On Mon, Nov 21, 2016 at 1:59 PM, Shushant Arora <shushantaror...@gmail.com>
wrote:

> Hi
>
> Thanks.
> Have a doubt on spark streaming kinesis consumer. Say I have a batch time
> of 500 ms and kiensis stream is partitioned on userid(uniformly
> distributed).But since IdleTimeBetweenReadsInMillis is set to 1000ms so
> Spark receiver nodes will fetch the data at interval of 1 second and store
> in InputDstream.
>
> 1. When worker executors will fetch the data from receiver at after every
> 500 ms does its gurantee that 1 userid data will go to one partition and
> that to one worker only always ?
> 2.If not - can I repartition stream data before processing? If yes how-
> since JavaDStream has only one method repartition which takes number of
> partitions and not the partitioner function ?So it will randomly
> repartition the Dstream data.
>
> Thanks
>
>
>
>
>
>
>
>
> On Tue, Nov 15, 2016 at 8:23 AM, Takeshi Yamamuro <linguin....@gmail.com>
> wrote:
>
>> Seems it it not a good design to frequently restart workers in a minute
>> because
>> their initialization and shutdown take much time as you said
>> (e.g., interconnection overheads with dynamodb and graceful shutdown).
>>
>> Anyway, since this is a kind of questions about the aws kinesis library,
>> so
>> you'd better to ask aws guys in their forum or something.
>>
>> // maropu
>>
>>
>> On Mon, Nov 14, 2016 at 11:20 PM, Shushant Arora <
>> shushantaror...@gmail.com> wrote:
>>
>>> 1.No, I want to implement low level consumer on kinesis stream.
>>> so need to stop the worker once it read the latest sequence number sent
>>> by driver.
>>>
>>> 2.What is the cost of frequent register and deregister of worker node.
>>> Is that when worker's shutdown is called it will terminate run method but
>>> leasecoordinator will wait for 2seconds before releasing the lease. So I
>>> cannot deregister a worker in less than 2 seconds ?
>>>
>>> Thanks!
>>>
>>>
>>>
>>> On Mon, Nov 14, 2016 at 7:36 PM, Takeshi Yamamuro <linguin....@gmail.com
>>> > wrote:
>>>
>>>> Is "aws kinesis get-shard-iterator --shard-iterator-type LATEST" not
>>>> enough for your usecase?
>>>>
>>>> On Mon, Nov 14, 2016 at 10:23 PM, Shushant Arora <
>>>> shushantaror...@gmail.com> wrote:
>>>>
>>>>> Thanks!
>>>>> Is there a way to get the latest sequence number of all shards of a
>>>>> kinesis stream?
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Nov 14, 2016 at 5:43 PM, Takeshi Yamamuro <
>>>>> linguin....@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The time interval can be controlled by `IdleTimeBetweenReadsInMillis`
>>>>>> in KinesisClientLibConfiguration though,
>>>>>> it is not configurable in the current implementation.
>>>>>>
>>>>>> The detail can be found in;
>>>>>> https://github.com/apache/spark/blob/master/external/kinesis
>>>>>> -asl/src/main/scala/org/apache/spark/streaming/kinesis/Kines
>>>>>> isReceiver.scala#L152
>>>>>>
>>>>>> // maropu
>>>>>>
>>>>>>
>>>>>> On Sun, Nov 13, 2016 at 12:08 AM, Shushant Arora <
>>>>>> shushantaror...@gmail.com> wrote:
>>>>>>
>>>>>>> *Hi *
>>>>>>>
>>>>>>> *is **spark.streaming.blockInterval* for kinesis input stream is
>>>>>>> hardcoded to 1 sec or is it configurable ? Time interval at which 
>>>>>>> receiver
>>>>>>> fetched data from kinesis .
>>>>>>>
>>>>>>> Means stream batch interval cannot be less than 
>>>>>>> *spark.streaming.blockInterval
>>>>>>> and this should be configrable , Also is there any minimum value for
>>>>>>> streaming batch interval ?*
>>>>>>>
>>>>>>> *Thanks*
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ---
>>>>>> Takeshi Yamamuro
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ---
>>>> Takeshi Yamamuro
>>>>
>>>
>>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>


-- 
---
Takeshi Yamamuro

Reply via email to