Thanks Brian. I will go with the copy to tmp and flip with rename model. -B
On Thu, Feb 26, 2009 at 3:49 PM, Brian Bockelman <bbock...@cse.unl.edu>wrote: > > On Feb 26, 2009, at 4:14 PM, Brian Long wrote: > > What kind of atomicity/visibility claims are made regarding the various >> operations on a FileSystem? >> I have multiple processes that write into local sequence files, then >> uploads >> them into a remote directory in HDFS. A map/reduce job runs which operates >> on whatever is in the directory. The processes are not synchronized with >> the >> job, so it is entirely possible that the job might start as a file is >> being >> uploaded. Thus, my concern is that the job may include a partially >> uploaded >> file if "FileSystem.copyFromLocalFile" is not atomic (in the sense that >> the >> file will not appear until all bytes are written). >> > > Hey Brian, > > I can't speak for knowing about the whole file system, but I do know that, > like you'd expect in Unix, open files which are being written to are > visible. > > >> >> Are any of the FileSystem API's atomic in this sense? What about, at the >> very least, rename (e.g. first write to a temp hdfs location, then use >> rename to atomically flip the file into the live directory)? >> >> > I'm not sure on this one; I suspect you're safe here. > > Brian >