fanlinqian commented on PR #1725: URL: https://github.com/apache/hadoop/pull/1725#issuecomment-1336373809
Hello, I encountered a bug when using the batch method, when I input a directory with more than 1000 files in it and 2 replications of each file's data block, only the first 500 files of this directory are returned and then it stops. I think it should be hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java in getBatchedListing() method to modify, as follows. ` for (; srcsIndex < srcs.length; srcsIndex++) { String src = srcs[srcsIndex]; HdfsPartialListing listing; try { DirectoryListing dirListing = getListingInt(dir, pc, src, indexStartAfter, needLocation); if (dirListing == null) { throw new FileNotFoundException("Path " + src + " does not exist");} listing = new HdfsPartialListing(srcsIndex, Lists.newArrayList(dirListing.getPartialListing())); numEntries += listing.getPartialListing().size(); lastListing = dirListing; } catch (Exception e) { if (e instanceof AccessControlException) { logAuditEvent(false, operationName, src);} listing = new HdfsPartialListing(srcsIndex, new RemoteException(e.getClass().getCanonicalName(), e.getMessage())); lastListing = null; LOG.info("Exception listing src {}", src, e);} listings.put(srcsIndex, listing); //My modification (lastListing.getRemainingEntries()!=0) { break; } if (indexStartAfter.length != 0) { indexStartAfter = new byte[0]; } // Terminate if we've reached the maximum listing size if (numEntries >= dir.getListLimit()) { break; } }` The reason for this bug is mainly that the result returned by the getListingInt(dir, pc, src, indexStartAfter, needLocation) method will limit both the number of files in the directory as well as the number of data blocks and replications of the files at the same time. But the getBatchedListing() method will only exit the loop if the number of returned results is greater than 1000. Looking forward to your reply. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org