Hi Denny, Thanks a lot. I was able to make my code work.
I am posting a small example below - in case somebody in the future has similar need ;-) (not handling replica datablocks). David. *************************************************************************** public static void main(String args[]){ String filename="/user/hive/warehouse/sample_07/sample_07.csv"; int DATANODE_PORT = 50010; int NAMENODE_PORT = 8020; String HOST_IP = "192.168.1.230"; byte[] buf=new byte[1000]; try{ ClientProtocol client= DFSClient.createNamenode(new InetSocketAddress(HOST_IP,NAMENODE_PORT), new Configuration()); LocatedBlocks located= client.getBlockLocations(filename, 0, Long.MAX_VALUE); for(LocatedBlock block : located.getLocatedBlocks()){ Socket sock = SocketFactory.getDefault().createSocket(); InetSocketAddress targetAddr = new InetSocketAddress(HOST_IP,DATANODE_PORT); NetUtils.connect(sock, targetAddr, 10000); sock.setSoTimeout(10000); BlockReader reader=BlockReader.newBlockReader(sock, filename, block.getBlock().getBlockId(), block.getBlockToken(), block.getBlock().getGenerationStamp(), 0, block.getBlockSize(), 1000); int count=0; int length; while((length=reader.read(buf,0,1000))>0){ //System.out.print(new String(buf,0,length,"UTF-8")); if (length<1000) break; } reader.close(); sock.close(); } }catch(IOException ex){ ex.printStackTrace(); } } *************************************************************************** From: Denny Ye <denny...@gmail.com> Reply-To: <hdfs-user@hadoop.apache.org> Date: Mon, 9 Jan 2012 16:29:18 +0800 To: <hdfs-user@hadoop.apache.org> Subject: Re: How-to use DFSClient's BlockReader from Java hi David Please refer to the method "DFSInputStream#blockSeekTo", it has same purpose with you. *************************************************************************** LocatedBlock targetBlock = getBlockAt(target, true); assert (target==this.pos) : "Wrong postion " + pos + " expect " + target; long offsetIntoBlock = target - targetBlock.getStartOffset(); DNAddrPair retval = chooseDataNode(targetBlock); chosenNode = retval.info <http://retval.info>; InetSocketAddress targetAddr = retval.addr; try { s = socketFactory.createSocket(); NetUtils.connect(s, targetAddr, socketTimeout); s.setSoTimeout(socketTimeout); Block blk = targetBlock.getBlock(); Token<BlockTokenIdentifier> accessToken = targetBlock.getBlockToken(); blockReader = BlockReader.newBlockReader(s, src, blk.getBlockId(), accessToken, blk.getGenerationStamp(), offsetIntoBlock, blk.getNumBytes() - offsetIntoBlock, buffersize, verifyChecksum, clientName); *************************************************************************** -Regards Denny Ye 2012/1/6 David Pavlis <david.pav...@javlin.eu> Hi, I am relatively new to Hadoop and I am trying to utilize HDFS for own application where I want to take advantage of data partitioning HDFS performs. The idea is that I get list of individual blocks - BlockLocations of particular file and then directly read those (go to individual DataNodes). So far I found org.apache.hadoop.hdfs.DFSClient.BlockReader to be the way to go. However I am struggling with instantiating the BlockReader() class, namely creating the "Token<BlockTokenIdentifier>". Is there an example Java code showing how to access individual blocks of particular file stored on HDFS ? Thanks in advance, David.