[ 
https://issues.apache.org/jira/browse/HIVE-28106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823578#comment-17823578
 ] 

Taraka Rama Rao Lethavadla commented on HIVE-28106:
---------------------------------------------------

Seems like some code refactoring made as part of 
https://issues.apache.org/jira/browse/HIVE-24581 seems to have caused this 
behaviour. But not able to reproduce this problem be it in cluster or using 
junit test cases

> Parallel select queries are failing on external tables with FNF due to 
> staging directory
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-28106
>                 URL: https://issues.apache.org/jira/browse/HIVE-28106
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Taraka Rama Rao Lethavadla
>            Priority: Major
>
> The issue reported here is similar to that of HIVE-26481
> But here it is happening between simultaneous queries on external tables.
> Query1:
>  
> {noformat}
> 2024-02-27 09:41:59,349 INFO org.apache.hadoop.hive.common.FileUtils: 
> [d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-395]: Creating directory 
> if it doesn't exist: 
> hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
> ..
> ..
> 2024-02-2709:42:42,859INFOorg.apache.hadoop.hive.ql.Driver: 
> [HiveServer2-Background-Pool: Thread-416]: Executing 
> command(queryId=sdphive_20240227094159_75903d85-5c0b-4e80-8292-1e7943e85ea8): 
> SELECT COUNT(*) FROM database.tbl WHERE XXXX IS NULL OR YYYY=''
> ..
> ..
> 2024-02-27 09:42:54,407 INFO org.apache.hadoop.hive.ql.Driver: 
> [HiveServer2-Background-Pool: Thread-416]: Completed executing 
> command(queryId=sdphive_20240227094159_75903d85-5c0b-4e80-8292-1e7943e85ea8); 
> Time taken: 11.548 seconds
> {noformat}
> This query got completed and deleted the respective staging directory.
> {noformat}
> 2024-02-27 09:42:54,565 DEBUG hive.ql.Context: 
> [d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-436]: Deleting result dir: 
> hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20/-mr-10001
>  
> ..   
> ..
> 2024-02-27 09:42:54,566 DEBUG hive.ql.Context: 
> [d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-436]: Deleting scratch 
> dir: 
> hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
>   {noformat}
>  Query 2 started to execute at the same time on the same table
> {noformat}
> 2024-02-27 09:42:53,989 INFO org.apache.tez.client.TezClient: 
> [HiveServer2-Background-Pool: Thread-457]: Submitting dag to TezSession, 
> sessionName=HIVE-08b22263-8e80-470f-81b7-f70bb5561487, 
> applicationId=application_1708662665640_1222, dagName=SELECT ABS(((XXXX - 
> YYYY... (Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, 
> callerId=sdphive_20240227094206_21193765-6a9d-42ab-bc82-3229150fc334_User:UUUU
>  }  {noformat}
> Tez AM logs (syslog_dag_1708662665640_1222_1)
>  
> {noformat}
> 2024-02-27 09:42:54,053 [INFO] [IPC Server handler 1 on 46229] 
> |app.DAGAppMaster|: Running DAG: SELECT ABS(((XXXX - YYYY...  (Stage-1), 
> callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, 
> callerId=sdphive_20240227094206_21193765-6a9d-42ab-bc82-3229150fc334_User:UUUU
>  } 
> .. 
> ..
> 2024-02-27 09:42:54,443 [INFO] [App Shared Pool - #1] |exec.Utilities|: 
> Adding 1 inputs; the first input is 
> hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl
> ..
> ..
> 2024-02-27 09:42:54,445 [INFO] [App Shared Pool - #1] |io.HiveInputFormat|: 
> Generating splits for dirs: 
> hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl
> ..
> ..
> 2024-02-27 09:42:54,487 [INFO] [App Shared Pool - #2] 
> |tez.HiveSplitGenerator|: The preferred split size is 33554432
> ..
> ..
> 2024-02-27 09:42:54,488 [INFO] [App Shared Pool - #2] |exec.Utilities|: 
> Adding 1 inputs; the first input is 
> hdfs://namespace/data/eisds/apps/qlys/final/history/tbl/partition_year=2023/partition_month=12/partition_date=2023-12-30
> ..
> ..
> 2024-02-27 09:42:54,631 [TRACE] [ORC_GET_SPLITS #0] |ipc.ProtobufRpcEngine|: 
> 111: Call -> xx-yy-zz.net/170.42.154.76:8020: getListing {src: 
> "/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20"
>  startAfter: "" needLocation: true}  {noformat}
> And the query failed since that directory got removed at the same time
> {noformat}
> 2024-02-27 09:42:54,634 [ERROR] [Dispatcher thread {Central}] 
> |impl.VertexImpl|: Vertex Input: tbl initializer failed, 
> vertex=vertex_1708662665640_1222_1_00 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.io.FileNotFoundException: File 
> hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
>  does not exist.
>     at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:188)
>     at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda(RootInputInitializerManager.java:171)
>     at java.util.concurrent.Executors.call(Executors.java:511)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>     at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.RuntimeException: ORC split generation failed with 
> exception: java.io.FileNotFoundException: File 
> hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
>  does not exist.
>     at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1853)
>     at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1940)
>     at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:543)
>     at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:851)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:289)
>     at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda(RootInputInitializerManager.java:203)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>     at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializer(RootInputInitializerManager.java:196)
>     at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:177)
>     ... 8 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.io.FileNotFoundException: File 
> hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
>  does not exist.
>     at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>     at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>     at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1785)
>     ... 18 more
> Caused by: java.io.FileNotFoundException: File 
> hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
>  does not exist.
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem.<init>(DistributedFileSystem.java:1280)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem.<init>(DistributedFileSystem.java:1254)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1199)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1195)
>     at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1213)
>     at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144)
>     at org.apache.hadoop.fs.FileSystem.handleFileStat(FileSystem.java:2332)
>     at org.apache.hadoop.fs.FileSystem.hasNext(FileSystem.java:2309)
>     at 
> org.apache.hadoop.hive.ql.io.HdfsUtils.listLocatedFileStatus(HdfsUtils.java:104)
>     at 
> org.apache.hadoop.hive.ql.io.HdfsUtils.listFileStatusWithId(HdfsUtils.java:215)
>     at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.listOriginalFiles(OrcInputFormat.java:1281)
>     at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.callInternal(OrcInputFormat.java:1271)
>     at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.lambda-zsh(OrcInputFormat.java:1245)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>     at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.call(OrcInputFormat.java:1245)
>     at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.call(OrcInputFormat.java:1210){noformat}
> So table directory will be recursively traversed and filter out unwanted 
> files to execute query. But the file exists while traversing but got deleted 
> before it gets filtered out and causing an exception
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to