Nope, flush just flushes the java side buffer to the Linux buffer
cache -- not all the way to the media.

Hsync is the API that will eventually go all the way to disk, but it
has not yet been implemented.

-Todd

On Wednesday, November 10, 2010, Thanh Do <than...@cs.wisc.edu> wrote:
> Or another way to rephase my question:
> does data.flush and checksumOut.flush guarantee
> data be synchronized with underlying disk,
> just like fsync().
>
> Thanks
> Thanh
>
> On Wed, Nov 10, 2010 at 10:26 PM, Thanh Do <than...@cs.wisc.edu> wrote:
>
>> Hi all,
>>
>> After reading the appenddesign3.pdf in HDFS-256,
>> and looking at the BlockReceiver.java code in 0.21.0,
>> I am confused by the following.
>>
>> The document says that:
>> *For each packet, a DataNode in the pipeline has to do 3 things.
>> 1. Stream data
>>       a. Receive data from the upstream DataNode or the client
>>       b. Push the data to the downstream DataNode if there is any
>> 2. Write the data/crc to its block file/meta file.
>> 3. Stream ack
>>       a. Receive an ack from the downstream DataNode if there is any
>>       b. Send an ack to the upstream DataNode or the client*
>>
>> And *"...there is no guarantee on the order of (2) and (3)"*
>>
>> In BlockReceiver.receivePacket(), after read the packet buffer,
>> datanode does:
>> 1) put the packet seqno in the ack queue
>> 2) write data and checksum to disk
>> 3) flush data and checksum (to disk)
>>
>> The thing that confusing me is that: the streaming of ack does not
>> necessary depends on whether data has been flush to disk or not.
>> Then, my question is:
>> Why do DataNode need to flush data and checksum
>> every time the DataNode receives a packet. This flush may be costly.
>> Why cant the DataNode just batch server write (after receiving
>> server packet) and flush all at once?
>> Is there any particular reason for doing so?
>>
>> Can somebody clarify this for me?
>>
>> Thanks so much.
>> Thanh
>>
>>
>>
>>
>

-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to