Thanks Brian. I will go with the copy to tmp and flip with rename model.
-B

On Thu, Feb 26, 2009 at 3:49 PM, Brian Bockelman <bbock...@cse.unl.edu>wrote:

>
> On Feb 26, 2009, at 4:14 PM, Brian Long wrote:
>
>  What kind of atomicity/visibility claims are made regarding the various
>> operations on a FileSystem?
>> I have multiple processes that write into local sequence files, then
>> uploads
>> them into a remote directory in HDFS. A map/reduce job runs which operates
>> on whatever is in the directory. The processes are not synchronized with
>> the
>> job, so it is entirely possible that the job might start as a file is
>> being
>> uploaded. Thus, my concern is that the job may include a partially
>> uploaded
>> file if "FileSystem.copyFromLocalFile" is not atomic (in the sense that
>> the
>> file will not appear until all bytes are written).
>>
>
> Hey Brian,
>
> I can't speak for knowing about the whole file system, but I do know that,
> like you'd expect in Unix, open files which are being written to are
> visible.
>
>
>>
>> Are any of the FileSystem API's atomic in this sense? What about, at the
>> very least, rename (e.g. first write to a temp hdfs location, then use
>> rename to atomically flip the file into the live directory)?
>>
>>
> I'm not sure on this one; I suspect you're safe here.
>
> Brian
>

Reply via email to