Flink 1.1.3 RollingSink - mismatch in the number of records consumed/produced

Dominik Safaric Mon, 12 Dec 2016 10:55:03 -0800

Hi everyone,

As I’ve implemented a RollingSink writing messages consumed from a Kafka log, 
I’ve observed that there is a significant mismatch in the number of messages 
consumed and written to file system.


Namely, the consumed Kafka topic contains in total 1.000.000 messages. The 
topology does not perform any data transformation whatsoever, but instead of, 
data from the source is pushed straight to the RollingSink. 

After I’ve checksummed the output files, I’ve observed that the total number of 
messages written to the output files is greater then 7.000.000 - a different of 
6.000.000 records more then consumed/available.

What is the cause of this behaviour? 

Regards,
Dominik

Flink 1.1.3 RollingSink - mismatch in the number of records consumed/produced

Reply via email to