[ https://issues.apache.org/jira/browse/HIVE-23444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104875#comment-17104875 ]
Marta Kuczora commented on HIVE-23444: -------------------------------------- The missing directory (_tmp.delta_0000001_0000001_0000) is the manifest directory which is written when inserting into an ACID table with hive.acid.direct.insert.enabled=true. The exception occurs in the AcidUtils.getHdfsDirSnapshots method when trying to list the newly written files from the partition directory. The manifest directory in case of static partitions is located in the partition folder. If inserts are happening concurrently, it can happen that one thread already wrote the manifest file, but not yet deleted it. Then an other thread calls the AcidUtils.getHdfsDirSnapshots method which lists all the files and directories from the partition folder, including the manifest directory. But then the first thread deletes the manifest file after the listing, but before iterating over the files and directories. So the iterator throws a FileNotFoundException when trying to get the delete manifest directory. > Concurrent ACID direct inserts may fail with FileNotFoundException > ------------------------------------------------------------------ > > Key: HIVE-23444 > URL: https://issues.apache.org/jira/browse/HIVE-23444 > Project: Hive > Issue Type: Bug > Reporter: Marta Kuczora > Assignee: Marta Kuczora > Priority: Major > Fix For: 4.0.0 > > > The following exception may occur when concurrently inserting into an ACID > table with static partitions and the 'hive.acid.direct.insert.enabled' > parameter is true. This issue occurs intermittently. > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.FileNotFoundException: File > hdfs://ns1/warehouse/tablespace/managed/hive/tpch_unbucketed.db/concurrent_insert_partitioned/l_tax=0.0/_tmp.delta_0000001_0000001_0000 > does not exist. > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2465) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:2228) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at > org.apache.hadoop.hive.ql.exec.MoveTask.handleStaticParts(MoveTask.java:522) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:442) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > ~[hive-service-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > ... 13 more > Caused by: java.io.IOException: java.io.FileNotFoundException: File > hdfs://ns1/warehouse/tablespace/managed/hive/tpch_unbucketed.db/concurrent_insert_partitioned/l_tax=0.0/_tmp.delta_0000001_0000001_0000 > does not exist. > at > org.apache.hadoop.hive.ql.io.AcidUtils.getHdfsDirSnapshots(AcidUtils.java:1472) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at > org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:1297) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at > org.apache.hadoop.hive.ql.io.AcidUtils.getAcidFilesForStats(AcidUtils.java:2695) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2448) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:2228) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at > org.apache.hadoop.hive.ql.exec.MoveTask.handleStaticParts(MoveTask.java:522) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:442) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > ~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > ~[hive-service-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493] > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)