Re: Last batch of stream data could not be sinked when data comes very slow

2018-11-17 Thread 徐涛
Hi Gordon,
Later I found the implementation of the Elastic Search Sink provided by 
Flink, and I found it also use the mechanism to flush the data when checkpoints 
happens. I apply the method, now the problem is solved. It uses exactly the 
method you have provided. Thanks a lot for your help.

Best
Henry

> 在 2018年11月14日,下午5:08,Tzu-Li (Gordon) Tai  写道:
> 
> Hi Henry,
> 
> Flushing of buffered data in sinks should occur on two occasions - 1) when 
> some buffer size limit is reached or a fixed-flush interval is fired, and 2) 
> on checkpoints.
> 
> Flushing any pending data before completing a checkpoint ensures the sink has 
> at-least-once guarantees, so that should answer your question about data loss.
> For data delay due to the buffering, my only suggestion would be to have a 
> time-interval based flushing configuration.
> That is what is currently happening, for example, in the Kafka / Kinesis 
> producer sinks. Records are buffered, and flushed at fixed intervals or when 
> the buffer is full. They are also flushed on every checkpoint.
> 
> Cheers,
> Gordon
> 
> On 13 November 2018 at 5:07:32 PM, 徐涛 (happydexu...@gmail.com 
> ) wrote:
> 
>> Hi Experts,
>> When we implement a sink, usually we implement a batch, according to the 
>> record number or when reaching a time interval, however this may lead to 
>> data of last batch do not write to sink. Because it is triggered by the 
>> incoming record.
>> I also test the JDBCOutputFormat provided by flink, and found that it also 
>> has the same problem. If the batch size is 50, and 49 items arrive, but the 
>> last one comes in an hour later, then the 49 items will not be written to 
>> sink during the one hour. This may cause data delay or data loss.
>> So should any pose a solution to this problem?
>> Thanks a lot.
>> 
>> Best
>> Henry 



Re: Last batch of stream data could not be sinked when data comes very slow

2018-11-14 Thread Tzu-Li (Gordon) Tai
Hi Henry,

Flushing of buffered data in sinks should occur on two occasions - 1) when some 
buffer size limit is reached or a fixed-flush interval is fired, and 2) on 
checkpoints.

Flushing any pending data before completing a checkpoint ensures the sink has 
at-least-once guarantees, so that should answer your question about data loss.
For data delay due to the buffering, my only suggestion would be to have a 
time-interval based flushing configuration.
That is what is currently happening, for example, in the Kafka / Kinesis 
producer sinks. Records are buffered, and flushed at fixed intervals or when 
the buffer is full. They are also flushed on every checkpoint.

Cheers,
Gordon

On 13 November 2018 at 5:07:32 PM, 徐涛 (happydexu...@gmail.com) wrote:

Hi Experts,
When we implement a sink, usually we implement a batch, according to the record 
number or when reaching a time interval, however this may lead to data of last 
batch do not write to sink. Because it is triggered by the incoming record.
I also test the JDBCOutputFormat provided by flink, and found that it also has 
the same problem. If the batch size is 50, and 49 items arrive, but the last 
one comes in an hour later, then the 49 items will not be written to sink 
during the one hour. This may cause data delay or data loss.
So should any pose a solution to this problem?
Thanks a lot.

Best
Henry 

Last batch of stream data could not be sinked when data comes very slow

2018-11-13 Thread 徐涛
Hi Experts,
When we implement a sink, usually we implement a batch, according to 
the record number or when reaching a time interval, however this may lead to 
data of last batch do not write to sink. Because it is triggered by the 
incoming record.
I also test the JDBCOutputFormat provided by flink, and found that it 
also has the same problem. If the batch size is 50, and 49 items arrive, but 
the last one comes in an hour later, then the 49 items will not be written to 
sink during the one hour. This may cause data delay or data loss.
So should any pose a solution to this problem?
Thanks a lot.

Best
Henry