[ 
https://issues.apache.org/jira/browse/FLINK-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143244#comment-16143244
 ] 

ASF GitHub Bot commented on FLINK-6306:
---------------------------------------

GitHub user sjwiesman opened a pull request:

    https://github.com/apache/flink/pull/4607

    [FLINK-6306][connectors] Sink for eventually consistent file systems

    ## What is the purpose of the change
    
    This pull request implements a sink for writing out to an eventually 
consistent filesystem, such as Amazon S3, with exactly once semantics. 
    
    
    ## Brief change log
      - The sink stages files on a consistent filesystem (local, hdfs, etc) .
      - Once per checkpoint, files are copied to the eventually consistent 
filesystem. 
      - When a checkpoint completion notification is sent, the files are marked 
consistent. Otherwise, they are left because delete is not a consistent 
operation.
      - It is up to consumers to choose their semantics; at least once by 
reading all files, or exactly once by only reading files marked consistent. 
    
    
    ## Verifying this change
    This change added tests and can be verified as follows:
    
      - Added tests based on the existing BucketingSink test suite. 
      - Added tests that verify semantics based on different checkpointing 
combinations (successful, concurrent, timed out, and failed). 
      - Added integration test that verifies exactly once holds during failure. 
      - Manually verified by having run in production for several months. 
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): no
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
      - The serializers: no
      - The runtime per-record code paths (performance sensitive): no
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper:no 
    
    ## Documentation
    
      - Does this pull request introduce a new feature? yes
      - If yes, how is the feature documented? JavaDocs


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sjwiesman/flink FLINK-6306

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4607.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4607
    
----
commit 347ea767195d74efc39964c02ace1bbe10d8aa0a
Author: Seth Wiesman <swies...@mediamath.com>
Date:   2017-08-27T21:36:04Z

    [FLINK-6306][connectors] Sink for eventually consistent file systems

----


> Sink for eventually consistent file systems
> -------------------------------------------
>
>                 Key: FLINK-6306
>                 URL: https://issues.apache.org/jira/browse/FLINK-6306
>             Project: Flink
>          Issue Type: New Feature
>          Components: filesystem-connector
>            Reporter: Seth Wiesman
>            Assignee: Seth Wiesman
>         Attachments: eventually-consistent-sink
>
>
> Currently Flink provides the BucketingSink as an exactly once method for 
> writing out to a file system. It provides these guarantees by moving files 
> through several stages and deleting or truncating files that get into a bad 
> state. While this is a powerful abstraction, it causes issues with eventually 
> consistent file systems such as Amazon's S3 where most operations (ie rename, 
> delete, truncate) are not guaranteed to become consistent within a reasonable 
> amount of time. Flink should provide a sink that provides exactly once writes 
> to a file system where only PUT operations are considered consistent. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to