[ 
https://issues.apache.org/jira/browse/FLINK-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-3515.
-----------------------------------
    Resolution: Duplicate

> Make the "file monitoring source" exactly-once
> ----------------------------------------------
>
>                 Key: FLINK-3515
>                 URL: https://issues.apache.org/jira/browse/FLINK-3515
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 0.10.2
>            Reporter: Stephan Ewen
>
> The stream source that watches directories for changes is currently not 
> "exactly-once".
> To make it exactly once, the source (that generates files to be read) and the 
> flatMap (that reads the files) need to keep track of where they were at the 
> point of a checkpoint.
> Assuming that files do not change after creation (HDFS / S3 style), we can 
> make this the following way:
>   - The source can track the files it already emitted downstream via file 
> creation/modification timestamp, assuming that new files always get newer 
> timestamps.
>   - The flatMappers need to always store the path of their current file 
> fragment, plus the byte offset where they were within that file split.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to