Taraka Rama Rao Lethavadla created HIVE-28106:
-------------------------------------------------

             Summary: Parallel select queries are failing on external tables 
with FNF due to staging directory
                 Key: HIVE-28106
                 URL: https://issues.apache.org/jira/browse/HIVE-28106
             Project: Hive
          Issue Type: Bug
          Components: Hive
            Reporter: Taraka Rama Rao Lethavadla


The issue reported here is similar to that of HIVE-26481

But here it is happening between simultaneous queries on external tables.

Query1:

 
{noformat}
2024-02-27 09:41:59,349 INFO org.apache.hadoop.hive.common.FileUtils: 
[d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-395]: Creating directory if 
it doesn't exist: 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
..
..
2024-02-2709:42:42,859INFOorg.apache.hadoop.hive.ql.Driver: 
[HiveServer2-Background-Pool: Thread-416]: Executing 
command(queryId=sdphive_20240227094159_75903d85-5c0b-4e80-8292-1e7943e85ea8): 
SELECT COUNT(*) FROM database.tbl WHERE XXXX IS NULL OR YYYY=''
..
..
2024-02-27 09:42:54,407 INFO org.apache.hadoop.hive.ql.Driver: 
[HiveServer2-Background-Pool: Thread-416]: Completed executing 
command(queryId=sdphive_20240227094159_75903d85-5c0b-4e80-8292-1e7943e85ea8); 
Time taken: 11.548 seconds
{noformat}
This query got completed and deleted the respective staging directory.
{noformat}
2024-02-27 09:42:54,565 DEBUG hive.ql.Context: 
[d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-436]: Deleting result dir: 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20/-mr-10001
 
..   
..
2024-02-27 09:42:54,566 DEBUG hive.ql.Context: 
[d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-436]: Deleting scratch dir: 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
  {noformat}
 Query 2 started to execute at the same time on the same table
{noformat}
2024-02-27 09:42:53,989 INFO org.apache.tez.client.TezClient: 
[HiveServer2-Background-Pool: Thread-457]: Submitting dag to TezSession, 
sessionName=HIVE-08b22263-8e80-470f-81b7-f70bb5561487, 
applicationId=application_1708662665640_1222, dagName=SELECT ABS(((XXXX - 
YYYY... (Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, 
callerId=sdphive_20240227094206_21193765-6a9d-42ab-bc82-3229150fc334_User:UUUU 
}  {noformat}
Tez AM logs (syslog_dag_1708662665640_1222_1)
 
{noformat}
2024-02-27 09:42:54,053 [INFO] [IPC Server handler 1 on 46229] 
|app.DAGAppMaster|: Running DAG: SELECT ABS(((XXXX - YYYY...  (Stage-1), 
callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, 
callerId=sdphive_20240227094206_21193765-6a9d-42ab-bc82-3229150fc334_User:UUUU 
} 
.. 
..
2024-02-27 09:42:54,443 [INFO] [App Shared Pool - #1] |exec.Utilities|: Adding 
1 inputs; the first input is 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl
..
..
2024-02-27 09:42:54,445 [INFO] [App Shared Pool - #1] |io.HiveInputFormat|: 
Generating splits for dirs: 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl
..
..
2024-02-27 09:42:54,487 [INFO] [App Shared Pool - #2] |tez.HiveSplitGenerator|: 
The preferred split size is 33554432
..
..
2024-02-27 09:42:54,488 [INFO] [App Shared Pool - #2] |exec.Utilities|: Adding 
1 inputs; the first input is 
hdfs://namespace/data/eisds/apps/qlys/final/history/qualys_authentication/partition_year=2023/partition_month=12/partition_date=2023-12-30
..
..
2024-02-27 09:42:54,631 [TRACE] [ORC_GET_SPLITS #0] |ipc.ProtobufRpcEngine|: 
111: Call -> xx-yy-zz.net/170.42.154.76:8020: getListing {src: 
"/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20"
 startAfter: "" needLocation: true}  {noformat}
And the query failed since that directory got removed at the same time
{noformat}
2024-02-27 09:42:54,634 [ERROR] [Dispatcher thread {Central}] 
|impl.VertexImpl|: Vertex Input: qualys_authentication initializer failed, 
vertex=vertex_1708662665640_1222_1_00 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
java.lang.RuntimeException: ORC split generation failed with exception: 
java.io.FileNotFoundException: File 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
 does not exist.
    at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:188)
    at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda(RootInputInitializerManager.java:171)
    at java.util.concurrent.Executors.call(Executors.java:511)
    at 
com.google.common.util.concurrent.TrustedListenableFutureTask.runInterruptibly(TrustedListenableFutureTask.java:125)
    at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
    at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: ORC split generation failed with 
exception: java.io.FileNotFoundException: File 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
 does not exist.
    at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1853)
    at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1940)
    at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:543)
    at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:851)
    at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:289)
    at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda(RootInputInitializerManager.java:203)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
    at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializer(RootInputInitializerManager.java:196)
    at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:177)
    ... 8 more
Caused by: java.util.concurrent.ExecutionException: 
java.io.FileNotFoundException: File 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
 does not exist.
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1785)
    ... 18 more
Caused by: java.io.FileNotFoundException: File 
hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
 does not exist.
    at 
org.apache.hadoop.hdfs.DistributedFileSystem.<init>(DistributedFileSystem.java:1280)
    at 
org.apache.hadoop.hdfs.DistributedFileSystem.<init>(DistributedFileSystem.java:1254)
    at 
org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1199)
    at 
org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1195)
    at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at 
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1213)
    at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144)
    at org.apache.hadoop.fs.FileSystem.handleFileStat(FileSystem.java:2332)
    at org.apache.hadoop.fs.FileSystem.hasNext(FileSystem.java:2309)
    at 
org.apache.hadoop.hive.ql.io.HdfsUtils.listLocatedFileStatus(HdfsUtils.java:104)
    at 
org.apache.hadoop.hive.ql.io.HdfsUtils.listFileStatusWithId(HdfsUtils.java:215)
    at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.listOriginalFiles(OrcInputFormat.java:1281)
    at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.callInternal(OrcInputFormat.java:1271)
    at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.lambda-zsh(OrcInputFormat.java:1245)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
    at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.call(OrcInputFormat.java:1245)
    at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.call(OrcInputFormat.java:1210){noformat}
So table directory will be recursively traversed and filter out unwanted files 
to execute query. But the file exists while traversing but got deleted before 
it gets filtered out and causing an exception

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to