Thanks Uma for the info. On Thu, Nov 3, 2011 at 4:31 PM, Uma Maheswara Rao G 72686 < mahesw...@huawei.com> wrote:
> Hello Karthik, > see inline > ----- Original Message ----- > From: kartheek muthyala <kartheek0...@gmail.com> > Date: Thursday, November 3, 2011 4:02 pm > Subject: Re: Packets->Block > To: common-user@hadoop.apache.org > > > Thanks Uma for the prompt reply. > > I have one more doubt, as i can see block class contains only metadata > > information like Timestamp, length. But the actual data is in the > > streams.What I cannot understand is that where is the data getting > > written from > > streams to blockfile.(which function is taking care of this? ). > Yes, block will contains all the information like blockID, generation > timestamp, number of bytes... > Block is writable, so that we can transfer them through network. ( ex: DN > will send block reports,...etc ). > Actual data will in disk with the name of blk_<block id> > So, using this block id, we can identify the block name directly. > When the block is created at the DN side, volumes map will maintans > replicaBeingWritten objs with this block ID information . > > You can see the code in BlockReceiver constructor, i.e, once it gets the > replicaInfo, it will call creatStreams on that replicainfo. So, that will > create the FileOutPutStreams. > > Regards, > Uma > > ~Kartheek. > > > > On Thu, Nov 3, 2011 at 12:55 PM, Uma Maheswara Rao G 72686 < > > mahesw...@huawei.com> wrote: > > > > > ----- Original Message ----- > > > From: kartheek muthyala <kartheek0...@gmail.com> > > > Date: Thursday, November 3, 2011 11:23 am > > > Subject: Packets->Block > > > To: common-user@hadoop.apache.org > > > > > > > Hi all, > > > > I need some info related to the code section which handles the > > > > followingoperations. > > > > > > > > Basically DataXceiver.c on the client side transmits the > > block in > > > > packetsand > > > Actually DataXceiver will run only in DN. Whenever you create a file > > > DataStreamer thread will start in DFSClient. Whenever > > application writing > > > the bytes, they will be enqueued into dataQueue. Streamer thread > > will pick > > > the packets from dataqueue and write on to the pipeline sockets. > > Also it > > > will write the opcodes to tell the DN about the kind of operation. > > > on the data node side we have DataXceiver.c and > > > > BlockReceiver.c files > > > > which take care of writing these packets in order to a block file > > > > until the > > > > last packet for the block is received. I want some info around > > > > this area > > > DataXceiverServer will run and listen for the requests. For > > every request > > > it receives, it will create DataXceiver thread and pass the info > > to it. > > > Based on the opcode it will create BlockReceiver or BlockSender > > objects and > > > give the control to it. > > > > where in BlockReceiver.c , i have seen a PacketResponder class > > and a > > > > BlockReceiver class where in two places you are finalizing the > > > > block (What > > > > i understood by finalizing is that when the last packet for the > > > > block is > > > > received, you are closing the block file). In PacketResponder > > > > class in two > > > > places you are using finalizeBlock() function, one in > > > > lastDataNodeRun()function and the other in run() method and in > > > > BlockReceiver.c you are using > > > > finalizeBlock() in receiveBlock() function. I understood from the > > > > commentsthat the finalizeBlock() call from run() method is done > > > > for the datanode > > > > with which client directly interacts and finalizeBlock() call from > > > > receiveBlock() function is done for all the datanodes where the > > > > block is > > > > sent for replication. > > > As part replication, if one block has received by DN and also block > > > length will be know before itself. So, receivePacket() > > invocation in while > > > loop itself can read the complete block. So, after reading, it > > need to > > > finalize the block to add into volumesMap. > > > But i didn't understand why there is a > > > > finalizeBlock() call from lastDataNodeRun() function. > > > This call will be for current writes from client/DN, it will not > > know the > > > actual size untill client says that is last packet in current block. > > > finalizeBlock will be called if the packet is lastPacket for > > that block. > > > finalizeBlock will add the replica into volumesMap. Also if the > > packet is > > > last one, then it needs to close all the blocks files in DN > > which were > > > opened for writes. > > > > Can someone explain me about this? I may be wrong at most of the > > > > places of > > > > my understanding of the workflow. Correct me if i am wrong. > > > > > > > > Thanks, > > > > Kartheek > > > > > > > > > > Regards, > > > Uma > > > > > >