[jira] [Comment Edited] (SPARK-32536) deleted not existing hdfs locations when use spark sql to execute "insert overwrite" statement to dynamic partition
[ https://issues.apache.org/jira/browse/SPARK-32536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172889#comment-17172889 ] yx91490 edited comment on SPARK-32536 at 8/9/20, 2:55 AM: -- the method org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace() is in standalone-metastore-1.21.2.3.1.4.0-315-hive3.jar, the source code is in [Hive.deleteOldPathForReplace()|https://github.com/hortonworks/hive-release/blob/HDP-3.1.4.0-315-tag/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4647] was (Author: yx91490): the method org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace() is in standalone-metastore-1.21.2.3.1.4.0-315-hive3.jar, but I cannot found the source code. btw, I will try to reproduce it this weekend:) > deleted not existing hdfs locations when use spark sql to execute "insert > overwrite" statement to dynamic partition > --- > > Key: SPARK-32536 > URL: https://issues.apache.org/jira/browse/SPARK-32536 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 > Environment: HDP version 2.3.2.3.1.4.0-315 >Reporter: yx91490 >Priority: Major > Attachments: SPARK-32536.full.log > > > when execute insert overwrite table statement to dynamic partition : > > {code:java} > set hive.exec.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=nostrict; > insert overwrite table tmp.id_name2 partition(dt) select * from tmp.id_name > where dt='2001'; > {code} > output log: > {code:java} > 20/08/05 14:38:05 ERROR Hive: Exception when loading partition with > parameters > partPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-1/dt=2001, > table=id_name2, partSpec={dt=2001}, loadFileType=REPLACE_ALL, > listBucketingLevel=0, isAcid=false, resetStatistics=false > org.apache.hadoop.hive.ql.metadata.HiveException: Directory > hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 could not be > cleaned up. > at > org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4666) > at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:4597) > at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:2132) > at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2588) > at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2579) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.FileNotFoundException: File > hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1053) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1113) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1110) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1120) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) > at > org.apache.hadoop.hive.ql.metadata.Hive.cleanUpOneDirectoryForReplace(Hive.java:4681) > at > org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4661) > ... 8 more > Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Exception > when loading 1 in table id_name2 with > loadPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-1; > {code} > it seems that spark doesn't test if the partitions hdfs locations whether > exists before delete it. > and Hive can successfully execute the same sql. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-32536) deleted not existing hdfs locations when use spark sql to execute "insert overwrite" statement to dynamic partition
[ https://issues.apache.org/jira/browse/SPARK-32536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172889#comment-17172889 ] yx91490 edited comment on SPARK-32536 at 8/7/20, 6:49 AM: -- the method org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace() is in standalone-metastore-1.21.2.3.1.4.0-315-hive3.jar, but I cannot found the source code. btw, I will try to reproduce it this weekend:) was (Author: yx91490): the method org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace() is in standalone-metastore-1.21.2.3.1.4.0-315-hive3.jar, but I cannot found the source code. > deleted not existing hdfs locations when use spark sql to execute "insert > overwrite" statement to dynamic partition > --- > > Key: SPARK-32536 > URL: https://issues.apache.org/jira/browse/SPARK-32536 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 > Environment: HDP version 2.3.2.3.1.4.0-315 >Reporter: yx91490 >Priority: Major > Attachments: SPARK-32536.full.log > > > when execute insert overwrite table statement to dynamic partition : > > {code:java} > set hive.exec.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=nostrict; > insert overwrite table tmp.id_name2 partition(dt) select * from tmp.id_name > where dt='2001'; > {code} > output log: > {code:java} > 20/08/05 14:38:05 ERROR Hive: Exception when loading partition with > parameters > partPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-1/dt=2001, > table=id_name2, partSpec={dt=2001}, loadFileType=REPLACE_ALL, > listBucketingLevel=0, isAcid=false, resetStatistics=false > org.apache.hadoop.hive.ql.metadata.HiveException: Directory > hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 could not be > cleaned up. > at > org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4666) > at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:4597) > at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:2132) > at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2588) > at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2579) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.FileNotFoundException: File > hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1053) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1113) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1110) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1120) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) > at > org.apache.hadoop.hive.ql.metadata.Hive.cleanUpOneDirectoryForReplace(Hive.java:4681) > at > org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4661) > ... 8 more > Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Exception > when loading 1 in table id_name2 with > loadPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-1; > {code} > it seems that spark doesn't test if the partitions hdfs locations whether > exists before delete it. > and Hive can successfully execute the same sql. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org