It is a bit confusing. SequenceFile.Writer#sync isn't really sync.
There is SequenceFile.Writer#syncFs which is more what you might expect to be sync. Then there is HADOOP-6313 which specifies hflush and hsync. Generally, if you want portable code, you have to reflect a bit to figure out what can be done. On Thu, Feb 10, 2011 at 8:38 PM, Gokulakannan M <gok...@huawei.com> wrote: > Thanks Ted for clarifying. > > So the *sync* is to just flush the current buffers to datanode and persist > the block info in namenode once per block, isn't it? > > > > Regarding reader able to see the unflushed data, I faced an issue in the > following scneario: > > 1. a writer is writing a *10MB* file(block size 2 MB) > > 2. wrote the file upto 4MB (2 finalized blocks in *current* and nothing in > *blocksBeingWritten* directory in DN) . So 2 blocks are written > > 3. client calls addBlock for the 3rd block on namenode and not yet created > outputstream to DN(or written anything to DN). At this point of time, the > namenode knows about the 3rd block but the datanode doesn't. > > 4. at point 3, a reader is trying to read the file and he is getting > exception and not able to read the file as the datanode's getBlockInfo > returns null to the client(of course DN doesn't know about the 3rd block > yet) > > In this situation the reader cannot see the file. But when the block > writing is in progress , the read is successful. > > *Is this a bug that needs to be handled in append branch?* > > > > >> -----Original Message----- > >> From: Konstantin Boudnik [mailto:c...@boudnik.org] > >> Sent: Friday, February 11, 2011 4:09 AM > >>To: common-user@hadoop.apache.org > >> Subject: Re: hadoop 0.20 append - some clarifications > > >> You might also want to check append design doc published at HDFS-265 > > > > I was asking about the hadoop 0.20 append branch. I suppose HDFS-265's > design doc won't apply to it. > > > ------------------------------ > > *From:* Ted Dunning [mailto:tdunn...@maprtech.com] > *Sent:* Thursday, February 10, 2011 9:29 PM > *To:* common-user@hadoop.apache.org; gok...@huawei.com > *Cc:* hdfs-u...@hadoop.apache.org > *Subject:* Re: hadoop 0.20 append - some clarifications > > > > Correct is a strong word here. > > > > There is actually an HDFS unit test that checks to see if partially written > and unflushed data is visible. The basic rule of thumb is that you need to > synchronize readers and writers outside of HDFS. There is no guarantee that > data is visible or invisible after writing, but there is a guarantee that it > will become visible after sync or close. > > On Thu, Feb 10, 2011 at 7:11 AM, Gokulakannan M <gok...@huawei.com> wrote: > > Is this the correct behavior or my understanding is wrong? > > >