[ https://issues.apache.org/jira/browse/HIVE-28106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823578#comment-17823578 ]
Taraka Rama Rao Lethavadla commented on HIVE-28106: --------------------------------------------------- Seems like some code refactoring made as part of https://issues.apache.org/jira/browse/HIVE-24581 seems to have caused this behaviour. But not able to reproduce this problem be it in cluster or using junit test cases > Parallel select queries are failing on external tables with FNF due to > staging directory > ---------------------------------------------------------------------------------------- > > Key: HIVE-28106 > URL: https://issues.apache.org/jira/browse/HIVE-28106 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Taraka Rama Rao Lethavadla > Priority: Major > > The issue reported here is similar to that of HIVE-26481 > But here it is happening between simultaneous queries on external tables. > Query1: > > {noformat} > 2024-02-27 09:41:59,349 INFO org.apache.hadoop.hive.common.FileUtils: > [d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-395]: Creating directory > if it doesn't exist: > hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20 > .. > .. > 2024-02-2709:42:42,859INFOorg.apache.hadoop.hive.ql.Driver: > [HiveServer2-Background-Pool: Thread-416]: Executing > command(queryId=sdphive_20240227094159_75903d85-5c0b-4e80-8292-1e7943e85ea8): > SELECT COUNT(*) FROM database.tbl WHERE XXXX IS NULL OR YYYY='' > .. > .. > 2024-02-27 09:42:54,407 INFO org.apache.hadoop.hive.ql.Driver: > [HiveServer2-Background-Pool: Thread-416]: Completed executing > command(queryId=sdphive_20240227094159_75903d85-5c0b-4e80-8292-1e7943e85ea8); > Time taken: 11.548 seconds > {noformat} > This query got completed and deleted the respective staging directory. > {noformat} > 2024-02-27 09:42:54,565 DEBUG hive.ql.Context: > [d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-436]: Deleting result dir: > hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20/-mr-10001 > > .. > .. > 2024-02-27 09:42:54,566 DEBUG hive.ql.Context: > [d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-436]: Deleting scratch > dir: > hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20 > {noformat} > Query 2 started to execute at the same time on the same table > {noformat} > 2024-02-27 09:42:53,989 INFO org.apache.tez.client.TezClient: > [HiveServer2-Background-Pool: Thread-457]: Submitting dag to TezSession, > sessionName=HIVE-08b22263-8e80-470f-81b7-f70bb5561487, > applicationId=application_1708662665640_1222, dagName=SELECT ABS(((XXXX - > YYYY... (Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, > callerId=sdphive_20240227094206_21193765-6a9d-42ab-bc82-3229150fc334_User:UUUU > } {noformat} > Tez AM logs (syslog_dag_1708662665640_1222_1) > > {noformat} > 2024-02-27 09:42:54,053 [INFO] [IPC Server handler 1 on 46229] > |app.DAGAppMaster|: Running DAG: SELECT ABS(((XXXX - YYYY... (Stage-1), > callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, > callerId=sdphive_20240227094206_21193765-6a9d-42ab-bc82-3229150fc334_User:UUUU > } > .. > .. > 2024-02-27 09:42:54,443 [INFO] [App Shared Pool - #1] |exec.Utilities|: > Adding 1 inputs; the first input is > hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl > .. > .. > 2024-02-27 09:42:54,445 [INFO] [App Shared Pool - #1] |io.HiveInputFormat|: > Generating splits for dirs: > hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl > .. > .. > 2024-02-27 09:42:54,487 [INFO] [App Shared Pool - #2] > |tez.HiveSplitGenerator|: The preferred split size is 33554432 > .. > .. > 2024-02-27 09:42:54,488 [INFO] [App Shared Pool - #2] |exec.Utilities|: > Adding 1 inputs; the first input is > hdfs://namespace/data/eisds/apps/qlys/final/history/tbl/partition_year=2023/partition_month=12/partition_date=2023-12-30 > .. > .. > 2024-02-27 09:42:54,631 [TRACE] [ORC_GET_SPLITS #0] |ipc.ProtobufRpcEngine|: > 111: Call -> xx-yy-zz.net/170.42.154.76:8020: getListing {src: > "/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20" > startAfter: "" needLocation: true} {noformat} > And the query failed since that directory got removed at the same time > {noformat} > 2024-02-27 09:42:54,634 [ERROR] [Dispatcher thread {Central}] > |impl.VertexImpl|: Vertex Input: tbl initializer failed, > vertex=vertex_1708662665640_1222_1_00 [Map 1] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.RuntimeException: ORC split generation failed with exception: > java.io.FileNotFoundException: File > hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20 > does not exist. > at > org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:188) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda(RootInputInitializerManager.java:171) > at java.util.concurrent.Executors.call(Executors.java:511) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.lang.RuntimeException: ORC split generation failed with > exception: java.io.FileNotFoundException: File > hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20 > does not exist. > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1853) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1940) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:543) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:851) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:289) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda(RootInputInitializerManager.java:203) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializer(RootInputInitializerManager.java:196) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:177) > ... 8 more > Caused by: java.util.concurrent.ExecutionException: > java.io.FileNotFoundException: File > hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20 > does not exist. > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1785) > ... 18 more > Caused by: java.io.FileNotFoundException: File > hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20 > does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.<init>(DistributedFileSystem.java:1280) > at > org.apache.hadoop.hdfs.DistributedFileSystem.<init>(DistributedFileSystem.java:1254) > at > org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1199) > at > org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1195) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1213) > at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144) > at org.apache.hadoop.fs.FileSystem.handleFileStat(FileSystem.java:2332) > at org.apache.hadoop.fs.FileSystem.hasNext(FileSystem.java:2309) > at > org.apache.hadoop.hive.ql.io.HdfsUtils.listLocatedFileStatus(HdfsUtils.java:104) > at > org.apache.hadoop.hive.ql.io.HdfsUtils.listFileStatusWithId(HdfsUtils.java:215) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.listOriginalFiles(OrcInputFormat.java:1281) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.callInternal(OrcInputFormat.java:1271) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.lambda-zsh(OrcInputFormat.java:1245) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.call(OrcInputFormat.java:1245) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.call(OrcInputFormat.java:1210){noformat} > So table directory will be recursively traversed and filter out unwanted > files to execute query. But the file exists while traversing but got deleted > before it gets filtered out and causing an exception > -- This message was sent by Atlassian Jira (v8.20.10#820010)