Re: How-to use DFSClient's BlockReader from Java

Todd Lipcon Mon, 09 Jan 2012 09:59:54 -0800

Hi David,

For what it's worth, you should be aware that you're calling internal
APIs that have no guarantee of stability between versions. I can
practically guarantee that your code will have to be modified for any
HDFS upgrade you do. That's why these APIs are undocumented.


Perhaps you can explain what your high-level goal is, here, and we can
suggest a supported mechanism for achieving it.

-Todd

On Mon, Jan 9, 2012 at 9:56 AM, David Pavlis <david.pav...@javlin.eu> wrote:
> Hi Denny,
>
> Thanks a lot. I was able to make my code work.
>
> I am posting a small example below - in case somebody in the future has
> similar need ;-)
> (not handling replica datablocks).
>
> David.
>
> ***************************************************************************
> public static void main(String args[]){
>        String filename="/user/hive/warehouse/sample_07/sample_07.csv";
>        int DATANODE_PORT = 50010;
>        int NAMENODE_PORT = 8020;
>        String HOST_IP = "192.168.1.230";
>
>        byte[] buf=new byte[1000];
>
>
>        try{
>
>                ClientProtocol client= DFSClient.createNamenode(new
> InetSocketAddress(HOST_IP,NAMENODE_PORT), new Configuration());
>
>
>
>
>                LocatedBlocks located= client.getBlockLocations(filename, 0,
> Long.MAX_VALUE);
>
>
>
>                for(LocatedBlock block : located.getLocatedBlocks()){
>                        Socket sock = 
> SocketFactory.getDefault().createSocket();
>                        InetSocketAddress targetAddr = new
> InetSocketAddress(HOST_IP,DATANODE_PORT);
>                        NetUtils.connect(sock, targetAddr, 10000);
>                        sock.setSoTimeout(10000);
>
>
>                        BlockReader reader=BlockReader.newBlockReader(sock,  
> filename,
>                                block.getBlock().getBlockId(),  
> block.getBlockToken(),
> block.getBlock().getGenerationStamp(), 0,                                     
>           block.getBlockSize(),
> 1000);
>
>
>                        int count=0;
>                        int length;
>                        while((length=reader.read(buf,0,1000))>0){
>                                //System.out.print(new 
> String(buf,0,length,"UTF-8"));
>                                if (length<1000) break;
>                        }
>                        reader.close();
>                        sock.close();
>                }
>
>
>        }catch(IOException ex){
>                ex.printStackTrace();
>        }
>
> }
>
>
>
>
>
>
> ***************************************************************************
>
>
>
> From:  Denny Ye <denny...@gmail.com>
> Reply-To:  <hdfs-user@hadoop.apache.org>
> Date:  Mon, 9 Jan 2012 16:29:18 +0800
> To:  <hdfs-user@hadoop.apache.org>
> Subject:  Re: How-to use DFSClient's BlockReader from Java
>
>
> hi David     Please refer to the method "DFSInputStream#blockSeekTo", it
> has same purpose with you.
>
> ***************************************************************************
>        LocatedBlock targetBlock = getBlockAt(target, true);
>        assert (target==this.pos) : "Wrong postion " + pos + " expect " +
> target;
>        long offsetIntoBlock = target - targetBlock.getStartOffset();
>
>        DNAddrPair retval = chooseDataNode(targetBlock);
>        chosenNode = retval.info <http://retval.info>;
>        InetSocketAddress targetAddr = retval.addr;
>
>        try {
>          s = socketFactory.createSocket();
>          NetUtils.connect(s, targetAddr, socketTimeout);
>          s.setSoTimeout(socketTimeout);
>          Block blk = targetBlock.getBlock();
>          Token<BlockTokenIdentifier> accessToken =
> targetBlock.getBlockToken();
>
>          blockReader = BlockReader.newBlockReader(s, src,
> blk.getBlockId(),
>              accessToken,
>              blk.getGenerationStamp(),
>              offsetIntoBlock, blk.getNumBytes() - offsetIntoBlock,
>              buffersize, verifyChecksum, clientName);
>
>
> ***************************************************************************
>
>
> -Regards
> Denny Ye
>
> 2012/1/6 David Pavlis <david.pav...@javlin.eu>
>
> Hi,
>
> I am relatively new to Hadoop and I am trying to utilize HDFS for own
> application where I want to take advantage of data partitioning HDFS
> performs.
>
> The idea is that I get list of individual blocks - BlockLocations of
> particular file and then directly read those (go to individual DataNodes).
> So far I found org.apache.hadoop.hdfs.DFSClient.BlockReader to be the way
> to go.
>
> However I am struggling with instantiating the BlockReader() class, namely
> creating the "Token<BlockTokenIdentifier>".
>
> Is there an example Java code showing how to access individual blocks of
> particular file stored on HDFS ?
>
> Thanks in advance,
>
> David.
>
>
>
>
>
>
>
>
>
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: How-to use DFSClient's BlockReader from Java

Reply via email to