[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596545#comment-16596545
 ] 

Reuven Lax commented on BEAM-5036:
----------------------------------

As to ignoring errors: the one thing we need to make sure is that the operation 
is idempotent. The bundle might fail at any point and get retried, and the 
retry should succeed if possible.

For filesystems that use copy/delete, this means that we should ignore 
file-already-exists errors. Otherwise retrying the bundle will cause a 
permanent failure as the transform gets retried, and eventually fail the job 
(depending on runner).

For filesystems such as HDFS (or local) for which atomic rename exists, this 
means we have to ignore failures where the _source_ file doesn't exist (we also 
have to do this with GCS/S3). I believe the code already attempts to do this 
with IGNORE_MISSING_FILES, though there are slight race conditions in that 
check today.

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> ------------------------------------------------------
>
>                 Key: BEAM-5036
>                 URL: https://issues.apache.org/jira/browse/BEAM-5036
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-files
>    Affects Versions: 2.5.0
>            Reporter: Jozef Vilcek
>            Assignee: Tim Robertson
>            Priority: Major
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to