[ 
https://issues.apache.org/jira/browse/HIVE-27135?focusedWorklogId=853132&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853132
 ]

ASF GitHub Bot logged work on HIVE-27135:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Mar/23 10:52
            Start Date: 27/Mar/23 10:52
    Worklog Time Spent: 10m 
      Work Description: mdayakar commented on code in PR #4114:
URL: https://github.com/apache/hive/pull/4114#discussion_r1149137172


##########
ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java:
##########
@@ -1538,32 +1538,36 @@ private static HdfsDirSnapshot addToSnapshot(Map<Path, 
HdfsDirSnapshot> dirToSna
   public static Map<Path, HdfsDirSnapshot> getHdfsDirSnapshots(final 
FileSystem fs, final Path path)
       throws IOException {
     Map<Path, HdfsDirSnapshot> dirToSnapshots = new HashMap<>();
-    RemoteIterator<LocatedFileStatus> itr = FileUtils.listFiles(fs, path, 
true, acidHiddenFileFilter);
-    while (itr.hasNext()) {
-      FileStatus fStatus = itr.next();
-      Path fPath = fStatus.getPath();
-      if (fStatus.isDirectory() && acidTempDirFilter.accept(fPath)) {
-        addToSnapshot(dirToSnapshots, fPath);
-      } else {
-        Path parentDirPath = fPath.getParent();
-        if (acidTempDirFilter.accept(parentDirPath)) {
-          while (isChildOfDelta(parentDirPath, path)) {
-            // Some cases there are other directory layers between the delta 
and the datafiles
-            // (export-import mm table, insert with union all to mm table, 
skewed tables).
-            // But it does not matter for the AcidState, we just need the 
deltas and the data files
-            // So build the snapshot with the files inside the delta directory
-            parentDirPath = parentDirPath.getParent();
-          }
-          HdfsDirSnapshot dirSnapshot = addToSnapshot(dirToSnapshots, 
parentDirPath);
-          // We're not filtering out the metadata file and acid format file,
-          // as they represent parts of a valid snapshot
-          // We're not using the cached values downstream, but we can 
potentially optimize more in a follow-up task
-          if 
(fStatus.getPath().toString().contains(MetaDataFile.METADATA_FILE)) {
-            dirSnapshot.addMetadataFile(fStatus);
-          } else if 
(fStatus.getPath().toString().contains(OrcAcidVersion.ACID_FORMAT)) {
-            dirSnapshot.addOrcAcidFormatFile(fStatus);
-          } else {
-            dirSnapshot.addFile(fStatus);
+    Deque<RemoteIterator<LocatedFileStatus>> stack = new ArrayDeque<>();
+    stack.push(FileUtils.listLocatedStatusIterator(fs, path, 
acidHiddenFileFilter));
+    while (!stack.isEmpty()) {
+      RemoteIterator<LocatedFileStatus> itr = stack.pop();
+      while (itr.hasNext()) {
+        FileStatus fStatus = itr.next();
+        Path fPath = fStatus.getPath();
+        if (fStatus.isDirectory()) {
+          stack.push(FileUtils.listLocatedStatusIterator(fs, fPath, 
acidHiddenFileFilter));

Review Comment:
   No, `addToSnapshot(dirToSnapshots, fPath) ` need to add if a folder contains 
a file which is taken care in else part. Same logic exists in the existing code.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 853132)
    Time Spent: 4h 40m  (was: 4.5h)

> AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in 
> HDFS
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-27135
>                 URL: https://issues.apache.org/jira/browse/HIVE-27135
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Dayakar M
>            Assignee: Dayakar M
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> AcidUtils#getHdfsDirSnapshots() throws FileNotFoundException when a directory 
> is removed in HDFS while fetching HDFS Snapshots.
> Below testcode can be used to reproduce this issue.
> {code:java}
>  @Test
>   public void 
> testShouldNotThrowFNFEWhenHiveStagingDirectoryIsRemovedWhileFetchingHDFSSnapshots()
>  throws Exception {
>     MockFileSystem fs = new MockFileSystem(new HiveConf(),
>         new MockFile("mock:/tbl/part1/.hive-staging_dir/-ext-10002", 500, new 
> byte[0]),
>         new MockFile("mock:/tbl/part2/.hive-staging_dir", 500, new byte[0]),
>         new MockFile("mock:/tbl/part1/_tmp_space.db", 500, new byte[0]),
>         new MockFile("mock:/tbl/part1/delta_1_1/bucket-0000-0000", 500, new 
> byte[0]));
>     Path path = new MockPath(fs, "/tbl");
>     Path stageDir = new MockPath(fs, "mock:/tbl/part1/.hive-staging_dir");
>     FileSystem mockFs = spy(fs);
>     Mockito.doThrow(new 
> FileNotFoundException("")).when(mockFs).listLocatedStatus(eq(stageDir));
>     try {
>       Map<Path, AcidUtils.HdfsDirSnapshot> hdfsDirSnapshots = 
> AcidUtils.getHdfsDirSnapshots(mockFs, path);
>       Assert.assertEquals(1, hdfsDirSnapshots.size());
>     }
>     catch (FileNotFoundException fnf) {
>       fail("Should not throw FileNotFoundException when a directory is 
> removed while fetching HDFSSnapshots");
>     }
>   }{code}
> This issue got fixed as a part of HIVE-26481 but here its not fixed 
> completely. 
> [Here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1541]
>  FileUtils.listFiles() API which returns a RemoteIterator<LocatedFileStatus>. 
> So while iterating over, it checks if it is a directory and recursive listing 
> then it will try to list files from that directory but if that directory is 
> removed by other thread/task then it throws FileNotFoundException. Here the 
> directory which got removed is the .staging directory which needs to be 
> excluded through by using passed filter.
>  
> So here we can use same logic written in 
> _org.apache.hadoop.hive.ql.io.AcidUtils#getHdfsDirSnapshotsForCleaner()_ API 
> to avoid FileNotFoundException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to