Re: Create a block - file map
enable DEBUG mode on org.apache.hadoop.hdfs.server.blockmanagement on namenode. Thanks & Regards Amithsha On Wed, Jan 1, 2020 at 4:55 AM Arpit Agarwal wrote: > That is the only way to do it using the client API. > > Just curious why you need the mapping. > > > On Tue, Dec 31, 2019, 00:41 Davide Vergari > wrote: > >> Hi all, >> I need to create a block map for all files in a specific directory (and >> subdir) in HDFS. >> >> I'm using fs.listFiles API then I loop in the >> RemoteIterator[LocatedFileStatus] returned by listFiles and for each >> LocatedFileStatus I use the getFileBlockLocations api to get all the block >> ids of that file, but it takes long time because I have millions of file in >> the HDFS directory. >> I also tried to use Spark to parallelize the execution, but HDFS' API are >> not serializable. >> >> Is there a better way? I know there is the "hdfs oiv" command but I can't >> access directly the Namenode directory, also the ImageFS file could be >> outdated and I can't force the safemode to execute the saveNamespace >> command. >> >> I'm using Scala 2.11 with Hadoop 2.7.1 (HDP 2.6.3) >> >> Thank you >> >
Re: Create a block - file map
That is the only way to do it using the client API. Just curious why you need the mapping. On Tue, Dec 31, 2019, 00:41 Davide Vergari wrote: > Hi all, > I need to create a block map for all files in a specific directory (and > subdir) in HDFS. > > I'm using fs.listFiles API then I loop in the > RemoteIterator[LocatedFileStatus] returned by listFiles and for each > LocatedFileStatus I use the getFileBlockLocations api to get all the block > ids of that file, but it takes long time because I have millions of file in > the HDFS directory. > I also tried to use Spark to parallelize the execution, but HDFS' API are > not serializable. > > Is there a better way? I know there is the "hdfs oiv" command but I can't > access directly the Namenode directory, also the ImageFS file could be > outdated and I can't force the safemode to execute the saveNamespace > command. > > I'm using Scala 2.11 with Hadoop 2.7.1 (HDP 2.6.3) > > Thank you >
Create a block - file map
Hi all, I need to create a block map for all files in a specific directory (and subdir) in HDFS. I'm using fs.listFiles API then I loop in the RemoteIterator[LocatedFileStatus] returned by listFiles and for each LocatedFileStatus I use the getFileBlockLocations api to get all the block ids of that file, but it takes long time because I have millions of file in the HDFS directory. I also tried to use Spark to parallelize the execution, but HDFS' API are not serializable. Is there a better way? I know there is the "hdfs oiv" command but I can't access directly the Namenode directory, also the ImageFS file could be outdated and I can't force the safemode to execute the saveNamespace command. I'm using Scala 2.11 with Hadoop 2.7.1 (HDP 2.6.3) Thank you