[ 
https://issues.apache.org/jira/browse/HADOOP-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298954#comment-14298954
 ] 

Chris Nauroth commented on HADOOP-11525:
----------------------------------------

[~eddyxu], thank you for posting this patch.

As an alternative, have you considered that perhaps {{FileSystem}} needs a new 
high-level "transactional put" operation that file system subclasses can 
implement according to their own implementation details?  IOW, should we 
consider moving {{copyStreamToTarget}} into {{FileSystem}}, using the current 
write+rename implementation as the default in the base class, and then S3 could 
override it to do just plain write?

The {{Characteristics}} approach puts the burden on every application using 
{{FileSystem}} to check the property and dispatch to different logic.  The 
subclassing approach keeps the burden on the file system implementor and 
potentially prevents callers from needing to change code to get the benefits.

Unfortunately, I don't think the approach I described is applicable to the 
{{OutputCommitter}} problem mentioned by Thomas.  That's an area where exposing 
file system characteristics might be more helpful.

> FileSystem should expose some performance characteristics for caller (e.g., 
> FsShell) to choose the right algorithm.
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11525
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11525
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools
>    Affects Versions: 2.6.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>         Attachments: HADOOP-11525.000.patch
>
>
> When running {{hadoop fs -put}},  {{FsShell}} creates a {{._COPYING_.}} file 
> on the target directory, and then renames it to target file when the write is 
> done. However, for some targeted systems, such as S3, Azure and Swift, a 
> partial failure write request (i.e., {{PUT}}) has not side effect, while the 
> {{rename}} operation is expensive. 
> {{FileSystem}} should expose some characteristics so that the operation such 
> as {{CommandWithDestination#copyStreamToTarget()}} can detect and choose the 
> right way to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to