[jira] [Comment Edited] (SPARK-32536) deleted not existing hdfs locations when use spark sql to execute "insert overwrite" statement to dynamic partition

2020-08-08 Thread yx91490 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172889#comment-17172889
 ] 

yx91490 edited comment on SPARK-32536 at 8/9/20, 2:55 AM:
--

the method org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace() is 
in standalone-metastore-1.21.2.3.1.4.0-315-hive3.jar,  the source code  is in 
[Hive.deleteOldPathForReplace()|https://github.com/hortonworks/hive-release/blob/HDP-3.1.4.0-315-tag/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4647]


was (Author: yx91490):
the method org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace() is 
in standalone-metastore-1.21.2.3.1.4.0-315-hive3.jar, but I cannot found the 
source code.

btw, I will try to reproduce it this weekend:)

> deleted not existing hdfs locations when use spark sql to execute "insert 
> overwrite" statement to dynamic partition
> ---
>
> Key: SPARK-32536
> URL: https://issues.apache.org/jira/browse/SPARK-32536
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
> Environment: HDP version 2.3.2.3.1.4.0-315
>Reporter: yx91490
>Priority: Major
> Attachments: SPARK-32536.full.log
>
>
> when execute insert overwrite table statement to dynamic partition :
>  
> {code:java}
> set hive.exec.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nostrict;
> insert overwrite table tmp.id_name2 partition(dt) select * from tmp.id_name 
> where dt='2001';
> {code}
> output log:
> {code:java}
> 20/08/05 14:38:05 ERROR Hive: Exception when loading partition with 
> parameters  
> partPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-1/dt=2001,
>   table=id_name2,  partSpec={dt=2001},  loadFileType=REPLACE_ALL,  
> listBucketingLevel=0,  isAcid=false,  resetStatistics=false
> org.apache.hadoop.hive.ql.metadata.HiveException: Directory 
> hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 could not be 
> cleaned up.
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4666)
> at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:4597)
> at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:2132)
> at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2588)
> at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2579)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: File 
> hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1053)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1113)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1110)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1120)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.cleanUpOneDirectoryForReplace(Hive.java:4681)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4661)
> ... 8 more
> Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Exception 
> when loading 1 in table id_name2 with 
> loadPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-1;
> {code}
> it seems that spark doesn't test if the partitions hdfs locations whether 
> exists before delete it.
> and Hive can successfully execute the same sql.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-32536) deleted not existing hdfs locations when use spark sql to execute "insert overwrite" statement to dynamic partition

2020-08-06 Thread yx91490 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172889#comment-17172889
 ] 

yx91490 edited comment on SPARK-32536 at 8/7/20, 6:49 AM:
--

the method org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace() is 
in standalone-metastore-1.21.2.3.1.4.0-315-hive3.jar, but I cannot found the 
source code.

btw, I will try to reproduce it this weekend:)


was (Author: yx91490):
the method org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace() is 
in standalone-metastore-1.21.2.3.1.4.0-315-hive3.jar, but I cannot found the 
source code.

> deleted not existing hdfs locations when use spark sql to execute "insert 
> overwrite" statement to dynamic partition
> ---
>
> Key: SPARK-32536
> URL: https://issues.apache.org/jira/browse/SPARK-32536
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
> Environment: HDP version 2.3.2.3.1.4.0-315
>Reporter: yx91490
>Priority: Major
> Attachments: SPARK-32536.full.log
>
>
> when execute insert overwrite table statement to dynamic partition :
>  
> {code:java}
> set hive.exec.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nostrict;
> insert overwrite table tmp.id_name2 partition(dt) select * from tmp.id_name 
> where dt='2001';
> {code}
> output log:
> {code:java}
> 20/08/05 14:38:05 ERROR Hive: Exception when loading partition with 
> parameters  
> partPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-1/dt=2001,
>   table=id_name2,  partSpec={dt=2001},  loadFileType=REPLACE_ALL,  
> listBucketingLevel=0,  isAcid=false,  resetStatistics=false
> org.apache.hadoop.hive.ql.metadata.HiveException: Directory 
> hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 could not be 
> cleaned up.
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4666)
> at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:4597)
> at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:2132)
> at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2588)
> at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2579)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: File 
> hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1053)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1113)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1110)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1120)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.cleanUpOneDirectoryForReplace(Hive.java:4681)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4661)
> ... 8 more
> Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Exception 
> when loading 1 in table id_name2 with 
> loadPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-1;
> {code}
> it seems that spark doesn't test if the partitions hdfs locations whether 
> exists before delete it.
> and Hive can successfully execute the same sql.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org