Re: How-to use DFSClient's BlockReader from Java

David Pavlis Mon, 09 Jan 2012 09:57:20 -0800

Hi Denny,

Thanks a lot. I was able to make my code work.


I am posting a small example below - in case somebody in the future has
similar need ;-)
(not handling replica datablocks).

David.

***************************************************************************
public static void main(String args[]){
        String filename="/user/hive/warehouse/sample_07/sample_07.csv";
        int DATANODE_PORT = 50010;
        int NAMENODE_PORT = 8020;
        String HOST_IP = "192.168.1.230";
        
        byte[] buf=new byte[1000];
        

        try{
                
                ClientProtocol client= DFSClient.createNamenode(new
InetSocketAddress(HOST_IP,NAMENODE_PORT), new Configuration());
                
                
                
                
                LocatedBlocks located= client.getBlockLocations(filename, 0,
Long.MAX_VALUE);
                
                

                for(LocatedBlock block : located.getLocatedBlocks()){
                        Socket sock = SocketFactory.getDefault().createSocket();
                        InetSocketAddress targetAddr = new
InetSocketAddress(HOST_IP,DATANODE_PORT);
                        NetUtils.connect(sock, targetAddr, 10000);
                        sock.setSoTimeout(10000);
                        
                        
                        BlockReader reader=BlockReader.newBlockReader(sock,  
filename,
                                block.getBlock().getBlockId(),  
block.getBlockToken(),
block.getBlock().getGenerationStamp(), 0,                                       
        block.getBlockSize(),
1000);
                

                        int count=0;
                        int length;
                        while((length=reader.read(buf,0,1000))>0){
                                //System.out.print(new 
String(buf,0,length,"UTF-8"));
                                if (length<1000) break;
                        }
                        reader.close();                 
                        sock.close();
                }

                
        }catch(IOException ex){
                ex.printStackTrace();
        }
        
}






***************************************************************************



From:  Denny Ye <denny...@gmail.com>
Reply-To:  <hdfs-user@hadoop.apache.org>
Date:  Mon, 9 Jan 2012 16:29:18 +0800
To:  <hdfs-user@hadoop.apache.org>
Subject:  Re: How-to use DFSClient's BlockReader from Java


hi David     Please refer to the method "DFSInputStream#blockSeekTo", it
has same purpose with you.

***************************************************************************
        LocatedBlock targetBlock = getBlockAt(target, true);
        assert (target==this.pos) : "Wrong postion " + pos + " expect " +
target;
        long offsetIntoBlock = target - targetBlock.getStartOffset();

        DNAddrPair retval = chooseDataNode(targetBlock);
        chosenNode = retval.info <http://retval.info>;
        InetSocketAddress targetAddr = retval.addr;

        try {
          s = socketFactory.createSocket();
          NetUtils.connect(s, targetAddr, socketTimeout);
          s.setSoTimeout(socketTimeout);
          Block blk = targetBlock.getBlock();
          Token<BlockTokenIdentifier> accessToken =
targetBlock.getBlockToken();

          blockReader = BlockReader.newBlockReader(s, src,
blk.getBlockId(),
              accessToken,
              blk.getGenerationStamp(),
              offsetIntoBlock, blk.getNumBytes() - offsetIntoBlock,
              buffersize, verifyChecksum, clientName);


***************************************************************************


-Regards
Denny Ye

2012/1/6 David Pavlis <david.pav...@javlin.eu>

Hi,

I am relatively new to Hadoop and I am trying to utilize HDFS for own
application where I want to take advantage of data partitioning HDFS
performs.

The idea is that I get list of individual blocks - BlockLocations of
particular file and then directly read those (go to individual DataNodes).
So far I found org.apache.hadoop.hdfs.DFSClient.BlockReader to be the way
to go.

However I am struggling with instantiating the BlockReader() class, namely
creating the "Token<BlockTokenIdentifier>".

Is there an example Java code showing how to access individual blocks of
particular file stored on HDFS ?

Thanks in advance,

David.

Re: How-to use DFSClient's BlockReader from Java

Reply via email to