Taraka Rama Rao Lethavadla created HIVE-28106: -------------------------------------------------
Summary: Parallel select queries are failing on external tables with FNF due to staging directory Key: HIVE-28106 URL: https://issues.apache.org/jira/browse/HIVE-28106 Project: Hive Issue Type: Bug Components: Hive Reporter: Taraka Rama Rao Lethavadla The issue reported here is similar to that of HIVE-26481 But here it is happening between simultaneous queries on external tables. Query1: {noformat} 2024-02-27 09:41:59,349 INFO org.apache.hadoop.hive.common.FileUtils: [d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-395]: Creating directory if it doesn't exist: hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20 .. .. 2024-02-2709:42:42,859INFOorg.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-416]: Executing command(queryId=sdphive_20240227094159_75903d85-5c0b-4e80-8292-1e7943e85ea8): SELECT COUNT(*) FROM database.tbl WHERE XXXX IS NULL OR YYYY='' .. .. 2024-02-27 09:42:54,407 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-416]: Completed executing command(queryId=sdphive_20240227094159_75903d85-5c0b-4e80-8292-1e7943e85ea8); Time taken: 11.548 seconds {noformat} This query got completed and deleted the respective staging directory. {noformat} 2024-02-27 09:42:54,565 DEBUG hive.ql.Context: [d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-436]: Deleting result dir: hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20/-mr-10001 .. .. 2024-02-27 09:42:54,566 DEBUG hive.ql.Context: [d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-436]: Deleting scratch dir: hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20 {noformat} Query 2 started to execute at the same time on the same table {noformat} 2024-02-27 09:42:53,989 INFO org.apache.tez.client.TezClient: [HiveServer2-Background-Pool: Thread-457]: Submitting dag to TezSession, sessionName=HIVE-08b22263-8e80-470f-81b7-f70bb5561487, applicationId=application_1708662665640_1222, dagName=SELECT ABS(((XXXX - YYYY... (Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, callerId=sdphive_20240227094206_21193765-6a9d-42ab-bc82-3229150fc334_User:UUUU } {noformat} Tez AM logs (syslog_dag_1708662665640_1222_1) {noformat} 2024-02-27 09:42:54,053 [INFO] [IPC Server handler 1 on 46229] |app.DAGAppMaster|: Running DAG: SELECT ABS(((XXXX - YYYY... (Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, callerId=sdphive_20240227094206_21193765-6a9d-42ab-bc82-3229150fc334_User:UUUU } .. .. 2024-02-27 09:42:54,443 [INFO] [App Shared Pool - #1] |exec.Utilities|: Adding 1 inputs; the first input is hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl .. .. 2024-02-27 09:42:54,445 [INFO] [App Shared Pool - #1] |io.HiveInputFormat|: Generating splits for dirs: hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl .. .. 2024-02-27 09:42:54,487 [INFO] [App Shared Pool - #2] |tez.HiveSplitGenerator|: The preferred split size is 33554432 .. .. 2024-02-27 09:42:54,488 [INFO] [App Shared Pool - #2] |exec.Utilities|: Adding 1 inputs; the first input is hdfs://namespace/data/eisds/apps/qlys/final/history/qualys_authentication/partition_year=2023/partition_month=12/partition_date=2023-12-30 .. .. 2024-02-27 09:42:54,631 [TRACE] [ORC_GET_SPLITS #0] |ipc.ProtobufRpcEngine|: 111: Call -> xx-yy-zz.net/170.42.154.76:8020: getListing {src: "/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20" startAfter: "" needLocation: true} {noformat} And the query failed since that directory got removed at the same time {noformat} 2024-02-27 09:42:54,634 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Vertex Input: qualys_authentication initializer failed, vertex=vertex_1708662665640_1222_1_00 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.lang.RuntimeException: ORC split generation failed with exception: java.io.FileNotFoundException: File hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20 does not exist. at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:188) at org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda(RootInputInitializerManager.java:171) at java.util.concurrent.Executors.call(Executors.java:511) at com.google.common.util.concurrent.TrustedListenableFutureTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.RuntimeException: ORC split generation failed with exception: java.io.FileNotFoundException: File hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20 does not exist. at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1853) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1940) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:543) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:851) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:289) at org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda(RootInputInitializerManager.java:203) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializer(RootInputInitializerManager.java:196) at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:177) ... 8 more Caused by: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20 does not exist. at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1785) ... 18 more Caused by: java.io.FileNotFoundException: File hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.<init>(DistributedFileSystem.java:1280) at org.apache.hadoop.hdfs.DistributedFileSystem.<init>(DistributedFileSystem.java:1254) at org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1199) at org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1195) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1213) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144) at org.apache.hadoop.fs.FileSystem.handleFileStat(FileSystem.java:2332) at org.apache.hadoop.fs.FileSystem.hasNext(FileSystem.java:2309) at org.apache.hadoop.hive.ql.io.HdfsUtils.listLocatedFileStatus(HdfsUtils.java:104) at org.apache.hadoop.hive.ql.io.HdfsUtils.listFileStatusWithId(HdfsUtils.java:215) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.listOriginalFiles(OrcInputFormat.java:1281) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.callInternal(OrcInputFormat.java:1271) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.lambda-zsh(OrcInputFormat.java:1245) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.call(OrcInputFormat.java:1245) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.call(OrcInputFormat.java:1210){noformat} So table directory will be recursively traversed and filter out unwanted files to execute query. But the file exists while traversing but got deleted before it gets filtered out and causing an exception -- This message was sent by Atlassian Jira (v8.20.10#820010)