[ 
https://issues.apache.org/jira/browse/HIVE-23444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104875#comment-17104875
 ] 

Marta Kuczora commented on HIVE-23444:
--------------------------------------

The missing directory (_tmp.delta_0000001_0000001_0000) is the manifest 
directory which is written when inserting into an ACID table with 
hive.acid.direct.insert.enabled=true. 
The exception occurs in the AcidUtils.getHdfsDirSnapshots method when trying to 
list the newly written files from the partition directory. The manifest 
directory in case of static partitions is located in the partition folder. If 
inserts are happening concurrently, it can happen that one thread already wrote 
the manifest file, but not yet deleted it. Then an other thread calls the 
AcidUtils.getHdfsDirSnapshots method which lists all the files and directories 
from the partition folder, including the manifest directory. But then the first 
thread deletes the manifest file after the listing, but before iterating over 
the files and directories. So the iterator throws a FileNotFoundException when 
trying to get the delete manifest directory.

> Concurrent ACID direct inserts may fail with FileNotFoundException
> ------------------------------------------------------------------
>
>                 Key: HIVE-23444
>                 URL: https://issues.apache.org/jira/browse/HIVE-23444
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Marta Kuczora
>            Assignee: Marta Kuczora
>            Priority: Major
>             Fix For: 4.0.0
>
>
> The following exception may occur when concurrently inserting into an ACID 
> table with static partitions and the 'hive.acid.direct.insert.enabled' 
> parameter is true. This issue occurs intermittently.
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.FileNotFoundException: File 
> hdfs://ns1/warehouse/tablespace/managed/hive/tpch_unbucketed.db/concurrent_insert_partitioned/l_tax=0.0/_tmp.delta_0000001_0000001_0000
>  does not exist.
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2465) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:2228) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at 
> org.apache.hadoop.hive.ql.exec.MoveTask.handleStaticParts(MoveTask.java:522) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:442) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  ~[hive-service-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       ... 13 more
> Caused by: java.io.IOException: java.io.FileNotFoundException: File 
> hdfs://ns1/warehouse/tablespace/managed/hive/tpch_unbucketed.db/concurrent_insert_partitioned/l_tax=0.0/_tmp.delta_0000001_0000001_0000
>  does not exist.
>       at 
> org.apache.hadoop.hive.ql.io.AcidUtils.getHdfsDirSnapshots(AcidUtils.java:1472)
>  ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at 
> org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:1297) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at 
> org.apache.hadoop.hive.ql.io.AcidUtils.getAcidFilesForStats(AcidUtils.java:2695)
>  ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2448) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:2228) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at 
> org.apache.hadoop.hive.ql.exec.MoveTask.handleStaticParts(MoveTask.java:522) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:442) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
>       at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  ~[hive-service-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to