Re: Create a block - file map

2020-01-01 Thread Amith sha
enable DEBUG mode on org.apache.hadoop.hdfs.server.blockmanagement on
namenode.

Thanks & Regards
Amithsha


On Wed, Jan 1, 2020 at 4:55 AM Arpit Agarwal 
wrote:

> That is the only way to do it using the client API.
>
> Just curious why you need the mapping.
>
>
> On Tue, Dec 31, 2019, 00:41 Davide Vergari 
> wrote:
>
>> Hi all,
>> I need to create a block map for all files in a specific directory (and
>> subdir) in HDFS.
>>
>> I'm using fs.listFiles API then I loop in the
>> RemoteIterator[LocatedFileStatus] returned by listFiles and for each
>> LocatedFileStatus I use the getFileBlockLocations api to get all the block
>> ids of that file, but it takes long time because I have millions of file in
>> the HDFS directory.
>> I also tried to use Spark to parallelize the execution, but HDFS' API are
>> not serializable.
>>
>> Is there a better way? I know there is the "hdfs oiv" command but I can't
>> access directly the Namenode directory, also the ImageFS file could be
>> outdated and I can't force the safemode to execute the saveNamespace
>> command.
>>
>> I'm using Scala 2.11 with Hadoop 2.7.1 (HDP 2.6.3)
>>
>> Thank you
>>
>


Re: Create a block - file map

2019-12-31 Thread Arpit Agarwal
That is the only way to do it using the client API.

Just curious why you need the mapping.


On Tue, Dec 31, 2019, 00:41 Davide Vergari  wrote:

> Hi all,
> I need to create a block map for all files in a specific directory (and
> subdir) in HDFS.
>
> I'm using fs.listFiles API then I loop in the
> RemoteIterator[LocatedFileStatus] returned by listFiles and for each
> LocatedFileStatus I use the getFileBlockLocations api to get all the block
> ids of that file, but it takes long time because I have millions of file in
> the HDFS directory.
> I also tried to use Spark to parallelize the execution, but HDFS' API are
> not serializable.
>
> Is there a better way? I know there is the "hdfs oiv" command but I can't
> access directly the Namenode directory, also the ImageFS file could be
> outdated and I can't force the safemode to execute the saveNamespace
> command.
>
> I'm using Scala 2.11 with Hadoop 2.7.1 (HDP 2.6.3)
>
> Thank you
>


Create a block - file map

2019-12-31 Thread Davide Vergari
Hi all,
I need to create a block map for all files in a specific directory (and
subdir) in HDFS.

I'm using fs.listFiles API then I loop in the
RemoteIterator[LocatedFileStatus] returned by listFiles and for each
LocatedFileStatus I use the getFileBlockLocations api to get all the block
ids of that file, but it takes long time because I have millions of file in
the HDFS directory.
I also tried to use Spark to parallelize the execution, but HDFS' API are
not serializable.

Is there a better way? I know there is the "hdfs oiv" command but I can't
access directly the Namenode directory, also the ImageFS file could be
outdated and I can't force the safemode to execute the saveNamespace
command.

I'm using Scala 2.11 with Hadoop 2.7.1 (HDP 2.6.3)

Thank you