[ 
https://issues.apache.org/jira/browse/HIVE-19830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507843#comment-16507843
 ] 

Gabor Kaszab commented on HIVE-19830:
-------------------------------------

I have heard about users taking advantage of having multiple partitions 
pointing to the same location. For example with a table that is partitioned by 
date it is common to create an extra partition called 'latest' and point it to 
another partition's location and change the location once new partitions are 
introduced.
I feel that considering this pattern is used in production it should be 
guaranteed that the queries on these partition return consistent results. i.e. 
the listing of partitions shouldn't show a partition that actually doesn't 
exist anymore.

> Inconsistent behavior when multiple partitions point to the same location
> -------------------------------------------------------------------------
>
>                 Key: HIVE-19830
>                 URL: https://issues.apache.org/jira/browse/HIVE-19830
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.4.0
>            Reporter: Gabor Kaszab
>            Assignee: Adam Szita
>            Priority: Major
>
> // create a table with 2 partitions where both partitions share the same 
> location and inserting a single line to one of them.
> create table test (i int) partitioned by (j int) stored as parquet;
> alter table test add partition (j=1) location 
> 'hdfs://localhost:20500/test-warehouse/test/j=1';
> alter table test add partition (j=2) location 
> 'hdfs://localhost:20500/test-warehouse/test/j=1';
> insert into table test partition (j=1) values (1);
> // select * show this single line in both partitions as expected.
> select * from test;
> 1 1
> 1 2
> // however, sum() doesn't add up the line for all the partitions. This is 
> +Issue #1+.
> select sum( i), sum(j) from test;
> 1 2
> // On the file system there is a common dir for the 2 partitions that is 
> expected.
> hdfs dfs -ls hdfs://localhost:20500/test-warehouse/test/
> Found 1 items
> drwxr-xr-x - gaborkaszab supergroup 0 2018-06-08 10:54 
> hdfs://localhost:20500/test-warehouse/test/j=1
> // Let's drop one of the partitions now!
> alter table test drop partition (j=2);
> // running the same hdfs dfs -ls command shows that the j=1 directory is 
> dropped. I think this is a good behavior, we just have to document that this 
> is the expected case.
> // select * from test; returns zero rows, this is still as expected.
> // Even though the dir is dropped j=1 partition is still visible with show 
> partitions. This is +Issue #2+.
> show partitions test;
> j=1
> After dropping the directory with Hive, when Impala reloads it's partitions 
> it asks Hive to tell what are the existing partitions. Apparently, Hive sends 
> down a list with j=1 partition included and then Impala takes it as an 
> existing one and doesn't drop it from Catalog's cache. Here Hive shouldn't 
> send that partition down. This is +Issue #3+.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to