[GitHub] [hadoop] fanlinqian commented on pull request #1725: HDFS-13616. Batch listing of multiple directories

GitBox Sun, 04 Dec 2022 02:38:17 -0800


fanlinqian commented on PR #1725:
URL: https://github.com/apache/hadoop/pull/1725#issuecomment-1336373809


           Hello, I encountered a bug when using the batch method, when I input 
a directory with more than 1000 files in it and 2 replications of each file's 
data block, only the first 500 files of this directory are returned and then it 
stops. I think it should be 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
 in getBatchedListing() method to modify, as follows.
   ` for (; srcsIndex < srcs.length; srcsIndex++) {
                   String src = srcs[srcsIndex];
                   HdfsPartialListing listing;
                   try {
                       DirectoryListing dirListing = getListingInt(dir, pc, 
src, indexStartAfter, needLocation);
                       if (dirListing == null) {
                           throw new FileNotFoundException("Path " + src + " 
does not exist");}
                       listing = new HdfsPartialListing(srcsIndex, 
Lists.newArrayList(dirListing.getPartialListing()));
                       numEntries += listing.getPartialListing().size();
                       lastListing = dirListing;
   
                   } catch (Exception e) {
                       if (e instanceof AccessControlException) {
                           logAuditEvent(false, operationName, src);}
                       listing = new HdfsPartialListing(srcsIndex,
                               new 
RemoteException(e.getClass().getCanonicalName(), e.getMessage()));
                       lastListing = null;
                       LOG.info("Exception listing src {}", src, e);}
                   listings.put(srcsIndex, listing);
   
                 //My modification
                   (lastListing.getRemainingEntries()!=0)
                   {
                        break;
                   }
   
                   if (indexStartAfter.length != 0)
                   {
                       indexStartAfter = new byte[0];
                   }
                   // Terminate if we've reached the maximum listing size
                   if (numEntries >= dir.getListLimit()) {
                       break;
                   }
               }`
          The reason for this bug is mainly that the result returned by the 
getListingInt(dir, pc, src, indexStartAfter, needLocation) method will limit 
both the number of files in the directory as well as the number of data blocks 
and replications of the files at the same time. But the getBatchedListing() 
method will only exit the loop if the number of returned results is greater 
than 1000.
           Looking forward to your reply.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[GitHub] [hadoop] fanlinqian commented on pull request #1725: HDFS-13616. Batch listing of multiple directories

Reply via email to