----- Original Message -----
From: kartheek muthyala <kartheek0...@gmail.com>
Date: Thursday, November 3, 2011 11:23 am
Subject: Packets->Block
To: common-user@hadoop.apache.org

> Hi all,
> I need some info related to the code section which handles the 
> followingoperations.
> 
> Basically DataXceiver.c on the client side  transmits the block in 
> packetsand
Actually DataXceiver will run only in DN. Whenever you create a file 
DataStreamer thread will start in DFSClient. Whenever application writing the 
bytes, they will be enqueued into dataQueue. Streamer thread will pick the 
packets from dataqueue and write on to the pipeline sockets. Also it will write 
the opcodes to tell the DN about the kind of operation.
 on the data node side we have DataXceiver.c and 
> BlockReceiver.c files
> which take care of writing these packets in order to a block file 
> until the
> last packet for the block is received. I want some info around 
> this area
DataXceiverServer will run and listen for the requests. For every request it 
receives, it will create DataXceiver thread and pass the info to it. Based on 
the opcode it will create BlockReceiver or BlockSender objects and give the 
control to it.
> where in BlockReceiver.c , i have seen a PacketResponder class and a
> BlockReceiver class where in two places you are finalizing the 
> block (What
> i understood by finalizing is that when the last packet for the 
> block is
> received, you are closing the block file). In PacketResponder 
> class in two
> places you are using finalizeBlock() function, one in 
> lastDataNodeRun()function and the other in run() method and in 
> BlockReceiver.c you are using
> finalizeBlock() in receiveBlock() function. I understood from the 
> commentsthat the finalizeBlock() call from run() method is done 
> for the datanode
> with which client directly interacts and finalizeBlock() call from
> receiveBlock() function is done for all the datanodes where the 
> block is
> sent for replication.
 As part replication, if one block has received by DN and also block length 
will be know before itself. So, receivePacket() invocation in while loop itself 
can read the complete block. So, after reading, it need to finalize the block 
to add into volumesMap.
 But i didn't understand why there is a
> finalizeBlock() call from lastDataNodeRun() function.
This call will be for current writes from client/DN, it will not know the 
actual size untill client says that is last packet in current block.
finalizeBlock will be called if the packet is lastPacket for that block.
finalizeBlock will add the replica into volumesMap. Also if the packet is last 
one, then it needs to close all the blocks files in DN which were opened for 
writes.
> Can someone explain me about this? I may be wrong at most of the 
> places of
> my understanding of the workflow. Correct me if i am wrong.
> 
> Thanks,
> Kartheek
> 

Regards,
Uma

Reply via email to