[ https://issues.apache.org/jira/browse/HIVE-27135?focusedWorklogId=853132&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853132 ]
ASF GitHub Bot logged work on HIVE-27135: ----------------------------------------- Author: ASF GitHub Bot Created on: 27/Mar/23 10:52 Start Date: 27/Mar/23 10:52 Worklog Time Spent: 10m Work Description: mdayakar commented on code in PR #4114: URL: https://github.com/apache/hive/pull/4114#discussion_r1149137172 ########## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ########## @@ -1538,32 +1538,36 @@ private static HdfsDirSnapshot addToSnapshot(Map<Path, HdfsDirSnapshot> dirToSna public static Map<Path, HdfsDirSnapshot> getHdfsDirSnapshots(final FileSystem fs, final Path path) throws IOException { Map<Path, HdfsDirSnapshot> dirToSnapshots = new HashMap<>(); - RemoteIterator<LocatedFileStatus> itr = FileUtils.listFiles(fs, path, true, acidHiddenFileFilter); - while (itr.hasNext()) { - FileStatus fStatus = itr.next(); - Path fPath = fStatus.getPath(); - if (fStatus.isDirectory() && acidTempDirFilter.accept(fPath)) { - addToSnapshot(dirToSnapshots, fPath); - } else { - Path parentDirPath = fPath.getParent(); - if (acidTempDirFilter.accept(parentDirPath)) { - while (isChildOfDelta(parentDirPath, path)) { - // Some cases there are other directory layers between the delta and the datafiles - // (export-import mm table, insert with union all to mm table, skewed tables). - // But it does not matter for the AcidState, we just need the deltas and the data files - // So build the snapshot with the files inside the delta directory - parentDirPath = parentDirPath.getParent(); - } - HdfsDirSnapshot dirSnapshot = addToSnapshot(dirToSnapshots, parentDirPath); - // We're not filtering out the metadata file and acid format file, - // as they represent parts of a valid snapshot - // We're not using the cached values downstream, but we can potentially optimize more in a follow-up task - if (fStatus.getPath().toString().contains(MetaDataFile.METADATA_FILE)) { - dirSnapshot.addMetadataFile(fStatus); - } else if (fStatus.getPath().toString().contains(OrcAcidVersion.ACID_FORMAT)) { - dirSnapshot.addOrcAcidFormatFile(fStatus); - } else { - dirSnapshot.addFile(fStatus); + Deque<RemoteIterator<LocatedFileStatus>> stack = new ArrayDeque<>(); + stack.push(FileUtils.listLocatedStatusIterator(fs, path, acidHiddenFileFilter)); + while (!stack.isEmpty()) { + RemoteIterator<LocatedFileStatus> itr = stack.pop(); + while (itr.hasNext()) { + FileStatus fStatus = itr.next(); + Path fPath = fStatus.getPath(); + if (fStatus.isDirectory()) { + stack.push(FileUtils.listLocatedStatusIterator(fs, fPath, acidHiddenFileFilter)); Review Comment: No, `addToSnapshot(dirToSnapshots, fPath) ` need to add if a folder contains a file which is taken care in else part. Same logic exists in the existing code. Issue Time Tracking ------------------- Worklog Id: (was: 853132) Time Spent: 4h 40m (was: 4.5h) > AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in > HDFS > ------------------------------------------------------------------------------- > > Key: HIVE-27135 > URL: https://issues.apache.org/jira/browse/HIVE-27135 > Project: Hive > Issue Type: Bug > Reporter: Dayakar M > Assignee: Dayakar M > Priority: Major > Labels: pull-request-available > Time Spent: 4h 40m > Remaining Estimate: 0h > > AcidUtils#getHdfsDirSnapshots() throws FileNotFoundException when a directory > is removed in HDFS while fetching HDFS Snapshots. > Below testcode can be used to reproduce this issue. > {code:java} > @Test > public void > testShouldNotThrowFNFEWhenHiveStagingDirectoryIsRemovedWhileFetchingHDFSSnapshots() > throws Exception { > MockFileSystem fs = new MockFileSystem(new HiveConf(), > new MockFile("mock:/tbl/part1/.hive-staging_dir/-ext-10002", 500, new > byte[0]), > new MockFile("mock:/tbl/part2/.hive-staging_dir", 500, new byte[0]), > new MockFile("mock:/tbl/part1/_tmp_space.db", 500, new byte[0]), > new MockFile("mock:/tbl/part1/delta_1_1/bucket-0000-0000", 500, new > byte[0])); > Path path = new MockPath(fs, "/tbl"); > Path stageDir = new MockPath(fs, "mock:/tbl/part1/.hive-staging_dir"); > FileSystem mockFs = spy(fs); > Mockito.doThrow(new > FileNotFoundException("")).when(mockFs).listLocatedStatus(eq(stageDir)); > try { > Map<Path, AcidUtils.HdfsDirSnapshot> hdfsDirSnapshots = > AcidUtils.getHdfsDirSnapshots(mockFs, path); > Assert.assertEquals(1, hdfsDirSnapshots.size()); > } > catch (FileNotFoundException fnf) { > fail("Should not throw FileNotFoundException when a directory is > removed while fetching HDFSSnapshots"); > } > }{code} > This issue got fixed as a part of HIVE-26481 but here its not fixed > completely. > [Here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1541] > FileUtils.listFiles() API which returns a RemoteIterator<LocatedFileStatus>. > So while iterating over, it checks if it is a directory and recursive listing > then it will try to list files from that directory but if that directory is > removed by other thread/task then it throws FileNotFoundException. Here the > directory which got removed is the .staging directory which needs to be > excluded through by using passed filter. > > So here we can use same logic written in > _org.apache.hadoop.hive.ql.io.AcidUtils#getHdfsDirSnapshotsForCleaner()_ API > to avoid FileNotFoundException. -- This message was sent by Atlassian Jira (v8.20.10#820010)