[ https://issues.apache.org/jira/browse/HIVE-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898685#comment-16898685 ]
Hui An commented on HIVE-22077: ------------------------------- This issue is caused by method loadPartitionInternal of Hive.java {code:java} Path oldPartPath = (oldPart != null) ? oldPart.getDataLocation() : null; Path newPartPath = null; if (inheritLocation) { newPartPath = genPartPathFromTable(tbl, partSpec, tblDataLocationPath); if(oldPart != null) { /* * If we are moving the partition across filesystem boundaries * inherit from the table properties. Otherwise (same filesystem) use the * original partition location. * * See: HIVE-1707 and HIVE-2117 for background */ FileSystem oldPartPathFS = oldPartPath.getFileSystem(getConf()); FileSystem loadPathFS = loadPath.getFileSystem(getConf()); if (FileUtils.equalsFileSystem(oldPartPathFS,loadPathFS)) { newPartPath = oldPartPath; } } } else { newPartPath = oldPartPath == null ? genPartPathFromTable(tbl, partSpec, tblDataLocationPath) : oldPartPath; } {code} Actually, oldPart is null does not mean oldPartPath is not exists in HDFS, but it just set oldPartPath is null, and give null value to following method replaceFiles. > Inserting overwrite partitions clause does not clean directories while > partitions' info is not stored in metadata > ----------------------------------------------------------------------------------------------------------------- > > Key: HIVE-22077 > URL: https://issues.apache.org/jira/browse/HIVE-22077 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.1.1, 4.0.0, 2.3.4 > Reporter: Hui An > Assignee: Hui An > Priority: Major > > Inserting overwrite static partitions may not clean related HDFS location if > partitions' info is not stored in metadata. > Steps to Reproduce this issue : > ------------------------------------------------ > 1. Create a managed table : > ------------------------------------------------ > {code:sql} > CREATE TABLE `test`( > `id` string) > PARTITIONED BY ( > `dayno` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' > LOCATION | > 'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1564731656') > {code} > ------------------------------------------------ > 2. Create partition's directory and put some data under it > ------------------------------------------------ > {code:java} > hdfs dfs -mkdir > hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802 > hdfs dfs -put test.data > hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802 > {code} > ------------------------------------------------ > 3. Insert overwrite partition dayno=20190802 > ------------------------------------------------ > {code:sql} > INSERT OVERWRITE TABLE test PARTITION(dayno='20190802') > SELECT 1; > {code} > ------------------------------------------------ > 4. We could see the test.data under partition directory is not deleted. > ------------------------------------------------ -- This message was sent by Atlassian JIRA (v7.6.14#76016)