Re: Consuming data from dynamoDB streams to flink

2019-08-08 Thread Vinay Patil
Hello,

For anyone looking for setting up alerts for flink application ,here is
good blog by Flink itself :
https://www.ververica.com/blog/monitoring-apache-flink-applications-101
So, for dynamoDb streams we can set the alert on millisBehindLatest

Regards,
Vinay Patil


On Wed, Aug 7, 2019 at 2:24 PM Vinay Patil  wrote:

> Hi Andrey,
>
> Thank you for your reply, I understand that the checkpoints are gone when
> the job is cancelled or killed, may be configuring external checkpoints
> will help here so that we can resume from there.
>
> My points was if the job is terminated, and the stream position is set to
> TRIM_HORIZON , the consumer will start processing from the start, I was
> curious to know if there is a configuration like kafka_group_id that we can
> set for dynamoDB streams.
>
> Also, can you please let me know on which metrics should I generate an
> alert in case of DynamoDb Streams  (I am sending the metrics to
> prometheus), I see these metrics :
> https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/metrics/ShardMetricsReporter.java
>
>
> For Example: in case of Kafka we generate  alert when the consumer is
> lagging behind.
>
>
> Regards,
> Vinay Patil
>
>
> On Fri, Jul 19, 2019 at 10:40 PM Andrey Zagrebin 
> wrote:
>
>> Hi Vinay,
>>
>> 1. I would assume it works similar to kinesis connector (correct me if
>> wrong, people who actually developed it)
>> 2. If you have activated just checkpointing, the checkpoints are gone if
>> you externally kill the job. You might be interested in savepoints [1]
>> 3. See paragraph in [2] about kinesis consumer parallelism
>>
>> Best,
>> Andrey
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/connectors/kinesis.html#kinesis-consumer
>>
>> On Fri, Jul 19, 2019 at 1:45 PM Vinay Patil 
>> wrote:
>>
>>> Hi,
>>>
>>> I am using this consumer for processing records from DynamoDb Streams ,
>>> few questions on this :
>>>
>>> 1. How does checkpointing works with Dstreams, since this class is
>>> extending FlinkKinesisConsumer, I am assuming it will start from the last
>>> successful checkpoint in case of failure, right ?
>>> 2. Currently when I kill the pipeline and start again it reads all the
>>> data from the start of the stream, is there any configuration to avoid this
>>> (apart from ConsumerConfigConstants.InitialPosition.LATEST), similar to
>>> group-id in Kafka.
>>> 3. As these DynamoDB Streams are separated by shards what is the
>>> recommended parallelism to be set for the source , should it be one to one
>>> mapping , for example if there are 3 shards , then parallelism should be 3 ?
>>>
>>>
>>> Regards,
>>> Vinay Patil
>>>
>>>
>>> On Wed, Aug 1, 2018 at 3:42 PM Ying Xu [via Apache Flink Mailing List
>>> archive.]  wrote:
>>>
 Thank you so much Fabian!

 Will update status in the JIRA.

 -
 Ying

 On Tue, Jul 31, 2018 at 1:37 AM, Fabian Hueske <[hidden email]
 > wrote:

 > Done!
 >
 > Thank you :-)
 >
 > 2018-07-31 6:41 GMT+02:00 Ying Xu <[hidden email]
 >:
 >
 > > Thanks Fabian and Thomas.
 > >
 > > Please assign FLINK-4582 to the following username:
 > > *yxu-apache
 > > <
 https://issues.apache.org/jira/secure/ViewProfile.jspa?name=yxu-apache
 > >*
 > >
 > > If needed I can get a ICLA or CCLA whichever is proper.
 > >
 > > *Ying Xu*
 > > Software Engineer
 > > 510.368.1252 <+15103681252>
 > > [image: Lyft] 
 > >
 > > On Mon, Jul 30, 2018 at 8:31 PM, Thomas Weise <[hidden email]
 > wrote:
 > >
 > > > The user is yxu-lyft, Ying had commented on that JIRA as well.
 > > >
 > > > https://issues.apache.org/jira/browse/FLINK-4582
 > > >
 > > >
 > > > On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <[hidden email]
 >
 > wrote:
 > > >
 > > > > Hi Ying,
 > > > >
 > > > > Thanks for considering to contribute the connector!
 > > > >
 > > > > In general, you don't need special permissions to contribute to
 > Flink.
 > > > > Anybody can open Jiras and PRs.
 > > > > You only need to be assigned to the Contributor role in Jira to
 be
 > able
 > > > to
 > > > > assign an issue to you.
 > > > > I can give you these permissions if you tell me your Jira user.
 > > > >
 > > > > It would also be good if you could submit a CLA [1] if you plan
 to
 > > > > contribute a larger feature.
 > > > >
 > > > > Thanks, Fabian
 > > > >
 > > > > [1] htt

Re: Consuming data from dynamoDB streams to flink

2019-08-07 Thread Vinay Patil
Hi Andrey,

Thank you for your reply, I understand that the checkpoints are gone when
the job is cancelled or killed, may be configuring external checkpoints
will help here so that we can resume from there.

My points was if the job is terminated, and the stream position is set to
TRIM_HORIZON , the consumer will start processing from the start, I was
curious to know if there is a configuration like kafka_group_id that we can
set for dynamoDB streams.

Also, can you please let me know on which metrics should I generate an
alert in case of DynamoDb Streams  (I am sending the metrics to
prometheus), I see these metrics :
https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/metrics/ShardMetricsReporter.java


For Example: in case of Kafka we generate  alert when the consumer is
lagging behind.


Regards,
Vinay Patil


On Fri, Jul 19, 2019 at 10:40 PM Andrey Zagrebin 
wrote:

> Hi Vinay,
>
> 1. I would assume it works similar to kinesis connector (correct me if
> wrong, people who actually developed it)
> 2. If you have activated just checkpointing, the checkpoints are gone if
> you externally kill the job. You might be interested in savepoints [1]
> 3. See paragraph in [2] about kinesis consumer parallelism
>
> Best,
> Andrey
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/connectors/kinesis.html#kinesis-consumer
>
> On Fri, Jul 19, 2019 at 1:45 PM Vinay Patil 
> wrote:
>
>> Hi,
>>
>> I am using this consumer for processing records from DynamoDb Streams ,
>> few questions on this :
>>
>> 1. How does checkpointing works with Dstreams, since this class is
>> extending FlinkKinesisConsumer, I am assuming it will start from the last
>> successful checkpoint in case of failure, right ?
>> 2. Currently when I kill the pipeline and start again it reads all the
>> data from the start of the stream, is there any configuration to avoid this
>> (apart from ConsumerConfigConstants.InitialPosition.LATEST), similar to
>> group-id in Kafka.
>> 3. As these DynamoDB Streams are separated by shards what is the
>> recommended parallelism to be set for the source , should it be one to one
>> mapping , for example if there are 3 shards , then parallelism should be 3 ?
>>
>>
>> Regards,
>> Vinay Patil
>>
>>
>> On Wed, Aug 1, 2018 at 3:42 PM Ying Xu [via Apache Flink Mailing List
>> archive.]  wrote:
>>
>>> Thank you so much Fabian!
>>>
>>> Will update status in the JIRA.
>>>
>>> -
>>> Ying
>>>
>>> On Tue, Jul 31, 2018 at 1:37 AM, Fabian Hueske <[hidden email]
>>> > wrote:
>>>
>>> > Done!
>>> >
>>> > Thank you :-)
>>> >
>>> > 2018-07-31 6:41 GMT+02:00 Ying Xu <[hidden email]
>>> >:
>>> >
>>> > > Thanks Fabian and Thomas.
>>> > >
>>> > > Please assign FLINK-4582 to the following username:
>>> > > *yxu-apache
>>> > > <
>>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=yxu-apache
>>> > >*
>>> > >
>>> > > If needed I can get a ICLA or CCLA whichever is proper.
>>> > >
>>> > > *Ying Xu*
>>> > > Software Engineer
>>> > > 510.368.1252 <+15103681252>
>>> > > [image: Lyft] 
>>> > >
>>> > > On Mon, Jul 30, 2018 at 8:31 PM, Thomas Weise <[hidden email]
>>> > wrote:
>>> > >
>>> > > > The user is yxu-lyft, Ying had commented on that JIRA as well.
>>> > > >
>>> > > > https://issues.apache.org/jira/browse/FLINK-4582
>>> > > >
>>> > > >
>>> > > > On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <[hidden email]
>>> >
>>> > wrote:
>>> > > >
>>> > > > > Hi Ying,
>>> > > > >
>>> > > > > Thanks for considering to contribute the connector!
>>> > > > >
>>> > > > > In general, you don't need special permissions to contribute to
>>> > Flink.
>>> > > > > Anybody can open Jiras and PRs.
>>> > > > > You only need to be assigned to the Contributor role in Jira to
>>> be
>>> > able
>>> > > > to
>>> > > > > assign an issue to you.
>>> > > > > I can give you these permissions if you tell me your Jira user.
>>> > > > >
>>> > > > > It would also be good if you could submit a CLA [1] if you plan
>>> to
>>> > > > > contribute a larger feature.
>>> > > > >
>>> > > > > Thanks, Fabian
>>> > > > >
>>> > > > > [1] https://www.apache.org/licenses/#clas
>>> > > > >
>>> > > > >
>>> > > > > 2018-07-30 10:07 GMT+02:00 Ying Xu <[hidden email]
>>> >:
>>> > > > >
>>> > > > > > Hello Flink dev:
>>> > > > > >
>>> > > > > > We have implemented the prototype design and the initial PoC
>>> worked
>>> > > > > pretty
>>> > > > > > well.  Currently, we plan to move ahead with this design in
>>> our
>>> > > > internal
>>> > > > > > production system.

Re: Consuming data from dynamoDB streams to flink

2019-07-19 Thread Andrey Zagrebin
Hi Vinay,

1. I would assume it works similar to kinesis connector (correct me if
wrong, people who actually developed it)
2. If you have activated just checkpointing, the checkpoints are gone if
you externally kill the job. You might be interested in savepoints [1]
3. See paragraph in [2] about kinesis consumer parallelism

Best,
Andrey

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/connectors/kinesis.html#kinesis-consumer

On Fri, Jul 19, 2019 at 1:45 PM Vinay Patil  wrote:

> Hi,
>
> I am using this consumer for processing records from DynamoDb Streams ,
> few questions on this :
>
> 1. How does checkpointing works with Dstreams, since this class is
> extending FlinkKinesisConsumer, I am assuming it will start from the last
> successful checkpoint in case of failure, right ?
> 2. Currently when I kill the pipeline and start again it reads all the
> data from the start of the stream, is there any configuration to avoid this
> (apart from ConsumerConfigConstants.InitialPosition.LATEST), similar to
> group-id in Kafka.
> 3. As these DynamoDB Streams are separated by shards what is the
> recommended parallelism to be set for the source , should it be one to one
> mapping , for example if there are 3 shards , then parallelism should be 3 ?
>
>
> Regards,
> Vinay Patil
>
>
> On Wed, Aug 1, 2018 at 3:42 PM Ying Xu [via Apache Flink Mailing List
> archive.]  wrote:
>
>> Thank you so much Fabian!
>>
>> Will update status in the JIRA.
>>
>> -
>> Ying
>>
>> On Tue, Jul 31, 2018 at 1:37 AM, Fabian Hueske <[hidden email]
>> > wrote:
>>
>> > Done!
>> >
>> > Thank you :-)
>> >
>> > 2018-07-31 6:41 GMT+02:00 Ying Xu <[hidden email]
>> >:
>> >
>> > > Thanks Fabian and Thomas.
>> > >
>> > > Please assign FLINK-4582 to the following username:
>> > > *yxu-apache
>> > > <
>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=yxu-apache
>> > >*
>> > >
>> > > If needed I can get a ICLA or CCLA whichever is proper.
>> > >
>> > > *Ying Xu*
>> > > Software Engineer
>> > > 510.368.1252 <+15103681252>
>> > > [image: Lyft] 
>> > >
>> > > On Mon, Jul 30, 2018 at 8:31 PM, Thomas Weise <[hidden email]
>> > wrote:
>> > >
>> > > > The user is yxu-lyft, Ying had commented on that JIRA as well.
>> > > >
>> > > > https://issues.apache.org/jira/browse/FLINK-4582
>> > > >
>> > > >
>> > > > On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <[hidden email]
>> >
>> > wrote:
>> > > >
>> > > > > Hi Ying,
>> > > > >
>> > > > > Thanks for considering to contribute the connector!
>> > > > >
>> > > > > In general, you don't need special permissions to contribute to
>> > Flink.
>> > > > > Anybody can open Jiras and PRs.
>> > > > > You only need to be assigned to the Contributor role in Jira to
>> be
>> > able
>> > > > to
>> > > > > assign an issue to you.
>> > > > > I can give you these permissions if you tell me your Jira user.
>> > > > >
>> > > > > It would also be good if you could submit a CLA [1] if you plan
>> to
>> > > > > contribute a larger feature.
>> > > > >
>> > > > > Thanks, Fabian
>> > > > >
>> > > > > [1] https://www.apache.org/licenses/#clas
>> > > > >
>> > > > >
>> > > > > 2018-07-30 10:07 GMT+02:00 Ying Xu <[hidden email]
>> >:
>> > > > >
>> > > > > > Hello Flink dev:
>> > > > > >
>> > > > > > We have implemented the prototype design and the initial PoC
>> worked
>> > > > > pretty
>> > > > > > well.  Currently, we plan to move ahead with this design in our
>> > > > internal
>> > > > > > production system.
>> > > > > >
>> > > > > > We are thinking of contributing this connector back to the
>> flink
>> > > > > community
>> > > > > > sometime soon.  May I request to be granted with a contributor
>> > role?
>> > > > > >
>> > > > > > Many thanks in advance.
>> > > > > >
>> > > > > > *Ying Xu*
>> > > > > > Software Engineer
>> > > > > > 510.368.1252 <+15103681252>
>> > > > > > [image: Lyft] 
>> > > > > >
>> > > > > > On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <[hidden email]
>> > wrote:
>> > > > > >
>> > > > > > > Hi Gordon:
>> > > > > > >
>> > > > > > > Cool. Thanks for the thumb-up!
>> > > > > > >
>> > > > > > > We will include some test cases around the behavior of
>> > re-sharding.
>> > > > If
>> > > > > > > needed we can double check the behavior with AWS, and see if
>> > > > additional
>> > > > > > > changes are needed.  Will keep you posted.
>> > > > > > >
>> > > > > > > -
>> > > > > > > Ying
>> > > > > > >
>> > > > > > > On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <
>> > > > > [hidden email]
>> 
>> > 

Re: Consuming data from dynamoDB streams to flink

2019-07-19 Thread Vinay Patil
Hi,

I am using this consumer for processing records from DynamoDb Streams , few
questions on this :

1. How does checkpointing works with Dstreams, since this class is
extending FlinkKinesisConsumer, I am assuming it will start from the last
successful checkpoint in case of failure, right ?
2. Currently when I kill the pipeline and start again it reads all the data
from the start of the stream, is there any configuration to avoid this
(apart from ConsumerConfigConstants.InitialPosition.LATEST), similar to
group-id in Kafka.
3. As these DynamoDB Streams are separated by shards what is the
recommended parallelism to be set for the source , should it be one to one
mapping , for example if there are 3 shards , then parallelism should be 3 ?


Regards,
Vinay Patil


On Wed, Aug 1, 2018 at 3:42 PM Ying Xu [via Apache Flink Mailing List
archive.]  wrote:

> Thank you so much Fabian!
>
> Will update status in the JIRA.
>
> -
> Ying
>
> On Tue, Jul 31, 2018 at 1:37 AM, Fabian Hueske <[hidden email]
> > wrote:
>
> > Done!
> >
> > Thank you :-)
> >
> > 2018-07-31 6:41 GMT+02:00 Ying Xu <[hidden email]
> >:
> >
> > > Thanks Fabian and Thomas.
> > >
> > > Please assign FLINK-4582 to the following username:
> > > *yxu-apache
> > > <
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=yxu-apache
> > >*
> > >
> > > If needed I can get a ICLA or CCLA whichever is proper.
> > >
> > > *Ying Xu*
> > > Software Engineer
> > > 510.368.1252 <+15103681252>
> > > [image: Lyft] 
> > >
> > > On Mon, Jul 30, 2018 at 8:31 PM, Thomas Weise <[hidden email]
> > wrote:
> > >
> > > > The user is yxu-lyft, Ying had commented on that JIRA as well.
> > > >
> > > > https://issues.apache.org/jira/browse/FLINK-4582
> > > >
> > > >
> > > > On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <[hidden email]
> >
> > wrote:
> > > >
> > > > > Hi Ying,
> > > > >
> > > > > Thanks for considering to contribute the connector!
> > > > >
> > > > > In general, you don't need special permissions to contribute to
> > Flink.
> > > > > Anybody can open Jiras and PRs.
> > > > > You only need to be assigned to the Contributor role in Jira to be
> > able
> > > > to
> > > > > assign an issue to you.
> > > > > I can give you these permissions if you tell me your Jira user.
> > > > >
> > > > > It would also be good if you could submit a CLA [1] if you plan to
> > > > > contribute a larger feature.
> > > > >
> > > > > Thanks, Fabian
> > > > >
> > > > > [1] https://www.apache.org/licenses/#clas
> > > > >
> > > > >
> > > > > 2018-07-30 10:07 GMT+02:00 Ying Xu <[hidden email]
> >:
> > > > >
> > > > > > Hello Flink dev:
> > > > > >
> > > > > > We have implemented the prototype design and the initial PoC
> worked
> > > > > pretty
> > > > > > well.  Currently, we plan to move ahead with this design in our
> > > > internal
> > > > > > production system.
> > > > > >
> > > > > > We are thinking of contributing this connector back to the flink
> > > > > community
> > > > > > sometime soon.  May I request to be granted with a contributor
> > role?
> > > > > >
> > > > > > Many thanks in advance.
> > > > > >
> > > > > > *Ying Xu*
> > > > > > Software Engineer
> > > > > > 510.368.1252 <+15103681252>
> > > > > > [image: Lyft] 
> > > > > >
> > > > > > On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <[hidden email]
> > wrote:
> > > > > >
> > > > > > > Hi Gordon:
> > > > > > >
> > > > > > > Cool. Thanks for the thumb-up!
> > > > > > >
> > > > > > > We will include some test cases around the behavior of
> > re-sharding.
> > > > If
> > > > > > > needed we can double check the behavior with AWS, and see if
> > > > additional
> > > > > > > changes are needed.  Will keep you posted.
> > > > > > >
> > > > > > > -
> > > > > > > Ying
> > > > > > >
> > > > > > > On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <
> > > > > [hidden email]
> 
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Ying,
> > > > > > >>
> > > > > > >> Sorry for the late reply here.
> > > > > > >>
> > > > > > >> From the looks of the AmazonDynamoDBStreamsClient, yes it
> seems
> > > like
> > > > > > this
> > > > > > >> should simply work.
> > > > > > >>
> > > > > > >> Regarding the resharding behaviour I mentioned in the JIRA:
> > > > > > >> I'm not sure if this is really a difference in behaviour.
> > > > Internally,
> > > > > if
> > > > > > >> DynamoDB streams is actually just working on Kinesis Streams,
> > then
> > > > the
> > > > > > >> resharding primitives should be similar.
> > > > > > >> The shard discovery logic of the Flink Kinesis Consumer
> assumes
> > > that
> > > > > > >> splittin