The problem you describe occurs with NFS also. Basically, single-site-semantics are very hard to achieve on a networked file system.
On Mon, Feb 14, 2011 at 8:21 PM, Gokulakannan M <gok...@huawei.com> wrote: > I agree that HDFS doesn't strongly follow POSIX semantics. But it would > have > been better if this issue is fixed. > > > > _____ > > From: Ted Dunning [mailto:tdunn...@maprtech.com] > Sent: Monday, February 14, 2011 10:18 PM > To: gok...@huawei.com > Cc: common-user@hadoop.apache.org; hdfs-u...@hadoop.apache.org; > dhr...@gmail.com > Subject: Re: hadoop 0.20 append - some clarifications > > > > HDFS definitely doesn't follow anything like POSIX file semantics. > > > > They may be a vague inspiration for what HDFS does, but generally the > behavior of HDFS is not tightly specified. Even the unit tests have some > real surprising behavior. > > On Mon, Feb 14, 2011 at 7:21 AM, Gokulakannan M <gok...@huawei.com> wrote: > > > > >> I think that in general, the behavior of any program reading data from > an > HDFS file before hsync or close is called is pretty much undefined. > > > > In Unix, users can parallelly read a file when another user is writing a > file. And I suppose the sync feature design is based on that. > > So at any point of time during the file write, parallel users should be > able > to read the file. > > > > https://issues.apache.org/jira/browse/HDFS-142?focusedCommentId=12663958 > < > https://issues.apache.org/jira/browse/HDFS-142?focusedCommentId=12663958&pa > > ge=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment- > 12663958> > > &page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comme > nt-12663958 > > _____ > > From: Ted Dunning [mailto:tdunn...@maprtech.com] > Sent: Friday, February 11, 2011 2:14 PM > To: common-user@hadoop.apache.org; gok...@huawei.com > Cc: hdfs-u...@hadoop.apache.org; dhr...@gmail.com > Subject: Re: hadoop 0.20 append - some clarifications > > > > I think that in general, the behavior of any program reading data from an > HDFS file before hsync or close is called is pretty much undefined. > > > > If you don't wait until some point were part of the file is defined, you > can't expect any particular behavior. > > On Fri, Feb 11, 2011 at 12:31 AM, Gokulakannan M <gok...@huawei.com> > wrote: > > I am not concerned about the sync behavior. > > The thing is the reader reading non-flushed(non-synced) data from HDFS as > you have explained in previous post.(in hadoop 0.20 append branch) > > I identified one specific scenario where the above statement is not holding > true. > > Following is how you can reproduce the problem. > > 1. add debug point at createBlockOutputStream() method in DFSClient and run > your HDFS write client in debug mode > > 2. allow client to write 1 block to HDFS > > 3. for the 2nd block, the flow will come to the debug point mentioned in > 1(do not execute the createBlockOutputStream() method). hold here. > > 4. parallely, try to read the file from another client > > Now you will get an error saying that file cannot be read. > > > > _____ > > From: Ted Dunning [mailto:tdunn...@maprtech.com] > Sent: Friday, February 11, 2011 11:04 AM > To: gok...@huawei.com > Cc: common-user@hadoop.apache.org; hdfs-u...@hadoop.apache.org; > c...@boudnik.org > Subject: Re: hadoop 0.20 append - some clarifications > > > > It is a bit confusing. > > > > SequenceFile.Writer#sync isn't really sync. > > > > There is SequenceFile.Writer#syncFs which is more what you might expect to > be sync. > > > > Then there is HADOOP-6313 which specifies hflush and hsync. Generally, if > you want portable code, you have to reflect a bit to figure out what can be > done. > > On Thu, Feb 10, 2011 at 8:38 PM, Gokulakannan M <gok...@huawei.com> wrote: > > Thanks Ted for clarifying. > > So the sync is to just flush the current buffers to datanode and persist > the > block info in namenode once per block, isn't it? > > > > Regarding reader able to see the unflushed data, I faced an issue in the > following scneario: > > 1. a writer is writing a 10MB file(block size 2 MB) > > 2. wrote the file upto 4MB (2 finalized blocks in current and nothing in > blocksBeingWritten directory in DN) . So 2 blocks are written > > 3. client calls addBlock for the 3rd block on namenode and not yet created > outputstream to DN(or written anything to DN). At this point of time, the > namenode knows about the 3rd block but the datanode doesn't. > > 4. at point 3, a reader is trying to read the file and he is getting > exception and not able to read the file as the datanode's getBlockInfo > returns null to the client(of course DN doesn't know about the 3rd block > yet) > > In this situation the reader cannot see the file. But when the block > writing > is in progress , the read is successful. > > Is this a bug that needs to be handled in append branch? > > > > >> -----Original Message----- > >> From: Konstantin Boudnik [mailto:c...@boudnik.org] > >> Sent: Friday, February 11, 2011 4:09 AM > >>To: common-user@hadoop.apache.org > >> Subject: Re: hadoop 0.20 append - some clarifications > > >> You might also want to check append design doc published at HDFS-265 > > > > I was asking about the hadoop 0.20 append branch. I suppose HDFS-265's > design doc won't apply to it. > > > > _____ > > From: Ted Dunning [mailto:tdunn...@maprtech.com] > Sent: Thursday, February 10, 2011 9:29 PM > To: common-user@hadoop.apache.org; gok...@huawei.com > Cc: hdfs-u...@hadoop.apache.org > Subject: Re: hadoop 0.20 append - some clarifications > > > > Correct is a strong word here. > > > > There is actually an HDFS unit test that checks to see if partially written > and unflushed data is visible. The basic rule of thumb is that you need to > synchronize readers and writers outside of HDFS. There is no guarantee > that > data is visible or invisible after writing, but there is a guarantee that > it > will become visible after sync or close. > > On Thu, Feb 10, 2011 at 7:11 AM, Gokulakannan M <gok...@huawei.com> wrote: > > Is this the correct behavior or my understanding is wrong? > > > > > > > > > >