[
https://issues.apache.org/jira/browse/HADOOP-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tsz Wo (Nicholas), SZE updated HADOOP-1292:
-------------------------------------------
Attachment: (was: HADOOP-1292_20070620.patch)
> dfs -copyToLocal should guarantee file is complete
> --------------------------------------------------
>
> Key: HADOOP-1292
> URL: https://issues.apache.org/jira/browse/HADOOP-1292
> Project: Hadoop
> Issue Type: Improvement
> Components: dfs
> Reporter: eric baldeschwieler
>
> We should copy to a temporary file, maybe _tmp.<realname>, and then rename
> the file when the copy is complete. Restarting a copy should reuse the _tmp
> file, just checksumming it. Then ^Cing a copy will do the right thing.
> Original suggestion:
> On Apr 23, 2007, at 2:38 AM, Richard Kasperski wrote:
> I'd like to have a guarantee that a file copy is both completed and that the
> file is whole. In the past I've done this by copying the file to a temporary
> name tmp.<realname> and then moving it to <realname> once I have the file
> copy is complete. This has the following very nice properties; If the
> <realname> exists then the file copy is complete and I'm not looking at a
> partial copy of the file. I believe that the copy to the cluster has both of
> these properties in that the file doesn't appear in a DFS directory until the
> whole file has been copied. The copy from the cluster to a local file system
> does not have these guarantees and it would be very nice if it did. There are
> two scenarios under what I wish to use this. First is that if I ctrl-c the
> 'hadoop dfs -copyToLocal' I know what parts are complete and what parts
> aren't. Second I can run a background compressor to compress the files as
> they are copied.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.