Nope, flush just flushes the java side buffer to the Linux buffer cache -- not all the way to the media.
Hsync is the API that will eventually go all the way to disk, but it has not yet been implemented. -Todd On Wednesday, November 10, 2010, Thanh Do <than...@cs.wisc.edu> wrote: > Or another way to rephase my question: > does data.flush and checksumOut.flush guarantee > data be synchronized with underlying disk, > just like fsync(). > > Thanks > Thanh > > On Wed, Nov 10, 2010 at 10:26 PM, Thanh Do <than...@cs.wisc.edu> wrote: > >> Hi all, >> >> After reading the appenddesign3.pdf in HDFS-256, >> and looking at the BlockReceiver.java code in 0.21.0, >> I am confused by the following. >> >> The document says that: >> *For each packet, a DataNode in the pipeline has to do 3 things. >> 1. Stream data >> a. Receive data from the upstream DataNode or the client >> b. Push the data to the downstream DataNode if there is any >> 2. Write the data/crc to its block file/meta file. >> 3. Stream ack >> a. Receive an ack from the downstream DataNode if there is any >> b. Send an ack to the upstream DataNode or the client* >> >> And *"...there is no guarantee on the order of (2) and (3)"* >> >> In BlockReceiver.receivePacket(), after read the packet buffer, >> datanode does: >> 1) put the packet seqno in the ack queue >> 2) write data and checksum to disk >> 3) flush data and checksum (to disk) >> >> The thing that confusing me is that: the streaming of ack does not >> necessary depends on whether data has been flush to disk or not. >> Then, my question is: >> Why do DataNode need to flush data and checksum >> every time the DataNode receives a packet. This flush may be costly. >> Why cant the DataNode just batch server write (after receiving >> server packet) and flush all at once? >> Is there any particular reason for doing so? >> >> Can somebody clarify this for me? >> >> Thanks so much. >> Thanh >> >> >> >> > -- Todd Lipcon Software Engineer, Cloudera