I'm not familiar with the kafka implementation though, a kinesis receiver
runs in a thread of executors.
You can set any value in the interval, but frequent checkpoints cause
excess loads in dynamodb.
See:
http://spark.apache.org/docs/latest/streaming-kinesis-integration.html#kinesis-checkpointing

// maropu

On Mon, Nov 7, 2016 at 1:36 PM, Shushant Arora <shushantaror...@gmail.com>
wrote:

> Hi
>
> By receicer I meant spark streaming receiver architecture- means worker
> nodes are different than receiver nodes. There is no direct consumer/low
> level consumer like of  Kafka in kinesis spark streaming?
>
> Is there any limitation on interval checkpoint - minimum of 1second in
> spark streaming with kinesis. But as such there is no limit on checkpoint
> interval in KCL side ?
>
> Thanks
>
> On Tue, Oct 25, 2016 at 8:36 AM, Takeshi Yamamuro <linguin....@gmail.com>
> wrote:
>
>> I'm not exactly sure about the receiver you pointed though,
>> if you point the "KinesisReceiver" implementation, yes.
>>
>> Also, we currently cannot disable the interval checkpoints.
>>
>> On Tue, Oct 25, 2016 at 11:53 AM, Shushant Arora <
>> shushantaror...@gmail.com> wrote:
>>
>>> Thanks!
>>>
>>> Is kinesis streams are receiver based only? Is there non receiver based
>>> consumer for Kinesis ?
>>>
>>> And Instead of having fixed checkpoint interval,Can I disable auto
>>> checkpoint and say  when my worker has processed the data after last record
>>> of mapPartition now checkpoint the sequence no using some api.
>>>
>>>
>>>
>>> On Tue, Oct 25, 2016 at 7:07 AM, Takeshi Yamamuro <linguin....@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>
>>>> The only thing you can do for Kinesis checkpoints is tune the interval
>>>> of them.
>>>> https://github.com/apache/spark/blob/master/external/kinesis
>>>> -asl/src/main/scala/org/apache/spark/streaming/kinesis/Kines
>>>> isUtils.scala#L68
>>>>
>>>> Whether the dataloss occurs or not depends on the storage level you set;
>>>> if you set StorageLevel.MEMORY_AND_DISK_2, Spark may continue
>>>> processing
>>>> in case of the dataloss because the stream data Spark receives are
>>>> replicated across executors.
>>>> However,  all the executors that have the replicated data crash,
>>>> IIUC the dataloss occurs.
>>>>
>>>> // maropu
>>>>
>>>> On Mon, Oct 24, 2016 at 4:43 PM, Shushant Arora <
>>>> shushantaror...@gmail.com> wrote:
>>>>
>>>>> Does spark streaming consumer for kinesis uses Kinesis Client Library
>>>>>  and mandates to checkpoint the sequence number of shards in dynamo db.
>>>>>
>>>>> Will it lead to dataloss if consumed datarecords are not yet processed
>>>>> and kinesis checkpointed the consumed sequenece numbers in dynamo db and
>>>>> spark worker crashes - then spark launched the worker on another node but
>>>>> start consuming from dynamo db's checkpointed sequence number which is
>>>>> ahead of processed sequenece number .
>>>>>
>>>>> is there a way to checkpoint the sequenece numbers ourselves in
>>>>> Kinesis as it is in Kafka low level consumer ?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ---
>>>> Takeshi Yamamuro
>>>>
>>>
>>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>


-- 
---
Takeshi Yamamuro

Reply via email to