Thanks Harsh for detailed info. It clears things up. Only thing from those page is concerning is what happens when client crashes. It says you could lose upto a block worth of information. Is this still true given that NN would auto close the file?
Is it a good practice to reduce NN default value so that it auto-closes before 1 hr. Regarding OS cache, I think it should be ok since chances of loosing replica nodes all at the same time is low. On Sat, Jun 9, 2012 at 5:13 AM, Harsh J <ha...@cloudera.com> wrote: > Hi Mohit, > > > In this scenario is data also replicated as defined by the replication > factor to other nodes as well? I am wondering if at this point if crash > occurs do I have data in other nodes? > > What kind of crash are you talking about here? A client crash or a > cluster crash? If a cluster, is the loss you're thinking of one DN or > all the replicating DNs? > > If client fails to close a file due to a crash, it is auto-closed > later (default is one hour) by the NameNode and whatever the client > successfully wrote (i.e. into its last block) is then made available > to readers at that point. If the client synced, then its last sync > point is always available to readers and whatever it didn't sync is > made available when the file is closed later by the NN. For DN > failures, read on. > > Replication in 1.x/0.20.x is done via pipelines. Its done regardless > of sync() calls. All write packets are indeed sent to and acknowledged > by each DN in the constructed pipeline as the write progresses. For a > good diagram on the sequence here, see Figure 3.3 | Page 66 | Chapter > 3: The Hadoop Distributed Filesystem, in Tom's "Hadoop: The Definitive > Guide" (2nd ed. page nos. Gotta get 3rd ed. soon :)) > > The sync behavior is further explained under the 'Coherency Model' > title at Page 68 | Chapter 3: The Hadoop Distributed Filesystem of the > same book. Think of sync() more as a checkpoint done over the write > pipeline, such that new readers can read the length of synced bytes > immediately and that they are guaranteed to be outside of the DN > application (JVM) buffers (i.e. flushed). > > Some further notes, for general info: In 0.20.x/1.x releases, there's > no hard-guarantee that the write buffer flushing done via sync ensures > the data went to the *disk*. It may remain in the OS buffers (a > feature in OSes, for performance). This is cause we do not do an > fsync() (i.e. calling force on the FileChannel for the block and > metadata outputs), but rather just an output stream flush. In the > future, via 2.0.1-alpha release (soon to come at this point) and > onwards, the specific call hsync() will ensure that this is not the > case. > > However, if you are OK with the OS buffers feature/caveat and > primarily need syncing not for reliability but for readers, you may > use the call hflush() and save on performance. One place where hsync() > is to be preferred instead of hflush() is where you use WALs (for data > reliability), and HBase is one such application. With hsync(), HBase > can survive potential failures caused by major power failure cases > (among others). > > Let us know if this clears it up for you! > > On Sat, Jun 9, 2012 at 4:58 AM, Mohit Anchlia <mohitanch...@gmail.com> > wrote: > > I am wondering the role of sync in replication of data to other nodes. > Say > > client writes a line to a file in Hadoop, at this point file handle is > open > > and sync has not been called. In this scenario is data also replicated as > > defined by the replication factor to other nodes as well? I am wondering > if > > at this point if crash occurs do I have data in other nodes? > > > > -- > Harsh J >