Re: Read or Save specific blocks of a file

Thodoris Zois Tue, 24 Apr 2018 03:50:03 -0700

Thank you very much for your answers. I will probably go through
searching for the blockID and then read the block from the local file
system directly. I need it for a specific purpose!


Thank you very much for your answers!

- Thodoris


On Tue, 2018-04-24 at 05:54 +0000, Takanobu Asanuma wrote:
> In addition to others' comments, I think fsck command like below is
> the easiest way to find the block locations of the file.
> 
> $ hdfs fsck /path/to/the/data -blocks -files -locations
> 
> Thanks,
> - Takanobu
> 
> -----Original Message-----
> From: Jim Clampffer [mailto:james.clampf...@gmail.com] 
> Sent: Tuesday, April 24, 2018 10:42 AM
> To: Arpit Agarwal <aagar...@hortonworks.com>
> Cc: hdfs-dev@hadoop.apache.org
> Subject: Re: Read or Save specific blocks of a file
> 
> If you want to read replicas from a specific DN after determining the
> block bounds via getFileBlockLocations you could abuse the rack
> locality infrastructure by generating a dummy topology script to get
> the NN to order replicas such that the client tries to read from the
> DNs you prefer first.
> It's not going to guarantee a read from a specific DN and is a
> terrible idea to do in a multi-tenant/production cluster but if you
> have a very specific goal in mind or want to learn more about the
> storage layer it may be an interesting exercise.
> 
> On Mon, Apr 23, 2018 at 9:14 PM, Arpit Agarwal <aagarwal@hortonworks.
> com>
> wrote:
> 
> > Hi,
> > 
> > Perhaps I missed something in the question. 
> > FileSystem#getFileBlockLocations followed by open, seek to start
> > of 
> > target block, read. This will let you read the contents of a
> > specific block using public APIs.
> > 
> > 
> > 
> > On 4/23/18, 5:26 PM, "Daniel Templeton" <dan...@cloudera.com>
> > wrote:
> > 
> >     I'm not aware of a way to work with blocks using the public
> > APIs. The
> >     easiest way to do it is probably to retrieve the block IDs and
> > then go
> >     grab those blocks from the data nodes' local file systems
> > directly.
> > 
> >     Daniel
> > 
> >     On 4/23/18 9:05 AM, Thodoris Zois wrote:
> >     > Hello list,
> >     >
> >     > I have a file on HDFS that is divided into 10 blocks
> > (partitions).
> >     >
> >     > Is there any way to retrieve data from a specific block?
> > (e.g: using
> >     > the blockID).
> >     >
> >     > Except that, is there any option to write the contents of
> > each block
> >     > (or of one block) into separate files?
> >     >
> >     > Thank you very much,
> >     > Thodoris
> >     >
> >     >
> >     >
> >     >
> >     > ------------------------------------------------------------
> > ---------
> >     > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.or
> > g
> >     > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.
> > org
> >     >
> > 
> > 
> >     ---------------------------------------------------------------
> > ------
> >     To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> >     For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.or
> > g
> > 
> > 
> > 
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: Read or Save specific blocks of a file

Reply via email to