Thanks Uma for the info.

On Thu, Nov 3, 2011 at 4:31 PM, Uma Maheswara Rao G 72686 <
mahesw...@huawei.com> wrote:

> Hello Karthik,
>  see inline
> ----- Original Message -----
> From: kartheek muthyala <kartheek0...@gmail.com>
> Date: Thursday, November 3, 2011 4:02 pm
> Subject: Re: Packets->Block
> To: common-user@hadoop.apache.org
>
> > Thanks Uma for the prompt reply.
> > I have one more doubt, as i can see block class contains only metadata
> > information like Timestamp, length. But the actual data is in the
> > streams.What I cannot understand is that where is the data getting
> > written  from
> > streams to blockfile.(which function is taking care of this? ).
> Yes, block will contains all the information like blockID, generation
> timestamp,  number of bytes...
>  Block is writable, so that we can transfer them through network. ( ex: DN
> will send block reports,...etc ).
>  Actual data will in disk with the name of blk_<block id>
>  So, using this block id, we can identify the block name directly.
>  When the block is created at the DN side, volumes map will maintans
> replicaBeingWritten objs with this block ID information .
>
> You can see the code in BlockReceiver constructor, i.e, once it gets the
> replicaInfo, it will call creatStreams on that replicainfo. So, that will
> create the FileOutPutStreams.
>
> Regards,
> Uma
> > ~Kartheek.
> >
> > On Thu, Nov 3, 2011 at 12:55 PM, Uma Maheswara Rao G 72686 <
> > mahesw...@huawei.com> wrote:
> >
> > > ----- Original Message -----
> > > From: kartheek muthyala <kartheek0...@gmail.com>
> > > Date: Thursday, November 3, 2011 11:23 am
> > > Subject: Packets->Block
> > > To: common-user@hadoop.apache.org
> > >
> > > > Hi all,
> > > > I need some info related to the code section which handles the
> > > > followingoperations.
> > > >
> > > > Basically DataXceiver.c on the client side  transmits the
> > block in
> > > > packetsand
> > > Actually DataXceiver will run only in DN. Whenever you create a file
> > > DataStreamer thread will start in DFSClient. Whenever
> > application writing
> > > the bytes, they will be enqueued into dataQueue. Streamer thread
> > will pick
> > > the packets from dataqueue and write on to the pipeline sockets.
> > Also it
> > > will write the opcodes to tell the DN about the kind of operation.
> > >  on the data node side we have DataXceiver.c and
> > > > BlockReceiver.c files
> > > > which take care of writing these packets in order to a block file
> > > > until the
> > > > last packet for the block is received. I want some info around
> > > > this area
> > > DataXceiverServer will run and listen for the requests. For
> > every request
> > > it receives, it will create DataXceiver thread and pass the info
> > to it.
> > > Based on the opcode it will create BlockReceiver or BlockSender
> > objects and
> > > give the control to it.
> > > > where in BlockReceiver.c , i have seen a PacketResponder class
> > and a
> > > > BlockReceiver class where in two places you are finalizing the
> > > > block (What
> > > > i understood by finalizing is that when the last packet for the
> > > > block is
> > > > received, you are closing the block file). In PacketResponder
> > > > class in two
> > > > places you are using finalizeBlock() function, one in
> > > > lastDataNodeRun()function and the other in run() method and in
> > > > BlockReceiver.c you are using
> > > > finalizeBlock() in receiveBlock() function. I understood from the
> > > > commentsthat the finalizeBlock() call from run() method is done
> > > > for the datanode
> > > > with which client directly interacts and finalizeBlock() call from
> > > > receiveBlock() function is done for all the datanodes where the
> > > > block is
> > > > sent for replication.
> > >  As part replication, if one block has received by DN and also block
> > > length will be know before itself. So, receivePacket()
> > invocation in while
> > > loop itself can read the complete block. So, after reading, it
> > need to
> > > finalize the block to add into volumesMap.
> > >  But i didn't understand why there is a
> > > > finalizeBlock() call from lastDataNodeRun() function.
> > > This call will be for current writes from client/DN, it will not
> > know the
> > > actual size untill client says that is last packet in current block.
> > > finalizeBlock will be called if the packet is lastPacket for
> > that block.
> > > finalizeBlock will add the replica into volumesMap. Also if the
> > packet is
> > > last one, then it needs to close all the blocks files in DN
> > which were
> > > opened for writes.
> > > > Can someone explain me about this? I may be wrong at most of the
> > > > places of
> > > > my understanding of the workflow. Correct me if i am wrong.
> > > >
> > > > Thanks,
> > > > Kartheek
> > > >
> > >
> > > Regards,
> > > Uma
> > >
> >
>

Reply via email to