Hi all, After reading the appenddesign3.pdf in HDFS-256, and looking at the BlockReceiver.java code in 0.21.0, I am confused by the following.
The document says that: *For each packet, a DataNode in the pipeline has to do 3 things. 1. Stream data a. Receive data from the upstream DataNode or the client b. Push the data to the downstream DataNode if there is any 2. Write the data/crc to its block file/meta file. 3. Stream ack a. Receive an ack from the downstream DataNode if there is any b. Send an ack to the upstream DataNode or the client* And *"...there is no guarantee on the order of (2) and (3)"* In BlockReceiver.receivePacket(), after read the packet buffer, datanode does: 1) put the packet seqno in the ack queue 2) write data and checksum to disk 3) flush data and checksum (to disk) The thing that confusing me is that: the streaming of ack does not necessary depends on whether data has been flush to disk or not. Then, my question is: Why do DataNode need to flush data and checksum every time the DataNode receives a packet. This flush may be costly. Why cant the DataNode just batch server write (after receiving server packet) and flush all at once? Is there any particular reason for doing so? Can somebody clarify this for me? Thanks so much. Thanh