steveloughran commented on a change in pull request #1208: HADOOP-16423. 
S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
URL: https://github.com/apache/hadoop/pull/1208#discussion_r322796262
 
 

 ##########
 File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3GuardFsck.java
 ##########
 @@ -89,50 +89,53 @@
    * The violations are listed in Enums: {@link Violation}
    *
    * @param p the root path to start the traversal
-   * @throws IOException
    * @return a list of {@link ComparePair}
+   * @throws IOException
    */
   public List<ComparePair> compareS3ToMs(Path p) throws IOException {
     Stopwatch stopwatch = Stopwatch.createStarted();
     int scannedItems = 0;
 
     final Path rootPath = rawFS.qualify(p);
-    S3AFileStatus root = null;
-    try {
-      root = (S3AFileStatus) rawFS.getFileStatus(rootPath);
-    } catch (AWSBadRequestException e) {
-      throw new IOException(e.getMessage());
-    }
+    S3AFileStatus root = (S3AFileStatus) rawFS.getFileStatus(rootPath);
     final List<ComparePair> comparePairs = new ArrayList<>();
     final Queue<S3AFileStatus> queue = new ArrayDeque<>();
     queue.add(root);
 
     while (!queue.isEmpty()) {
       final S3AFileStatus currentDir = queue.poll();
-      scannedItems++;
+
 
       final Path currentDirPath = currentDir.getPath();
-      List<FileStatus> s3DirListing = 
Arrays.asList(rawFS.listStatus(currentDirPath));
-
-      // DIRECTORIES
-      // Check directory authoritativeness consistency
-      compareAuthoritativeDirectoryFlag(comparePairs, currentDirPath, 
s3DirListing);
-      // Add all descendant directory to the queue
-      s3DirListing.stream().filter(pm -> pm.isDirectory())
-              .map(S3AFileStatus.class::cast)
-              .forEach(pm -> queue.add(pm));
-
-      // FILES
-      // check files for consistency
-      final List<S3AFileStatus> children = s3DirListing.stream()
-              .filter(status -> !status.isDirectory())
-              .map(S3AFileStatus.class::cast).collect(toList());
-      final List<ComparePair> compareResult =
-          compareS3DirToMs(currentDir, children).stream()
-              .filter(comparePair -> comparePair.containsViolation())
-              .collect(toList());
-      comparePairs.addAll(compareResult);
-      scannedItems += children.size();
+      try {
+        List<FileStatus> s3DirListing = Arrays.asList(
+            rawFS.listStatus(currentDirPath));
+
+        // Check authoritative directory flag.
+        compareAuthoritativeDirectoryFlag(comparePairs, currentDirPath,
+            s3DirListing);
+        // Add all descendant directory to the queue
+        s3DirListing.stream().filter(pm -> pm.isDirectory())
+            .map(S3AFileStatus.class::cast)
+            .forEach(pm -> queue.add(pm));
+
+        // Check file and directory metadata for consistency.
+        final List<S3AFileStatus> children = s3DirListing.stream()
+            .filter(status -> !status.isDirectory())
+            .map(S3AFileStatus.class::cast).collect(toList());
+        final List<ComparePair> compareResult =
+            compareS3DirContentToMs(currentDir, children);
+        comparePairs.addAll(compareResult);
+
+        // Increase the scanned file size.
+        // One for the directory, one for the children.
+        scannedItems++;
+        scannedItems += children.size();
+      } catch (FileNotFoundException e) {
+        LOG.error("The path has been deleted since it was queued: "
 
 Review comment:
   error or warn? And normally I'd go for a slf4j {} reference but as this is 
intended to always be logged, it'll do as is

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to