[GitHub] flink pull request: [FLINK-2583] Add Stream Sink For Rolling HDFS ...

StephanEwen Thu, 03 Sep 2015 02:44:41 -0700

Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1084#issuecomment-137393554
  
    I think using truncate for exactly once is the way to go. To support users 
with older HDFS versions, how about this:
    
    1. We consider only valid what was written successfully at a checkpoint 
(hflush/hsync). When we roll over to a new file on restart, we write a 
`.length` file for that other file that indicates how many bytes are valid in 
that file. Basically simulating truncate by adding a metadata file.
    
    2. Optionally, the user can activate a merge-on roll-over, that takes all 
the files from the attempts and all the metadata files, and merges them into 
one file. This rollover can be written such that it works incrementally and 
re-tries on failures, etc...




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2583] Add Stream Sink For Rolling HDFS ...

Reply via email to