[ 
https://issues.apache.org/jira/browse/HIVE-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15666323#comment-15666323
 ] 

Rui Li commented on HIVE-15202:
-------------------------------

Prior to HIVE-13040, selecting on such a table fails with NPE in split 
generation. With HIVE-13040, the select returns properly. But I'm not sure if 
it 100% solves the problem because this isn't the original goal of HIVE-13040.

The root cause is in {{CompactorOutputCommitter::commitJob}}, we 're calling 
rename to move output from tmp location to final location. However, if the 
final location already exists, i.e. computed by another compaction task, the 
rename will merge the two outputs, resulting the nested base dir we see.
A mitigation is to delete the existing final location before the rename. But I 
guess it won't 100% solve the race condition here.

> Concurrent compactions for the same partition may generate malformed folder 
> structure
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-15202
>                 URL: https://issues.apache.org/jira/browse/HIVE-15202
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rui Li
>
> If two compactions run concurrently on a single partition, it may generate 
> folder structure like this: (nested base dir)
> {noformat}
> drwxr-xr-x   - root supergroup          0 2016-11-14 22:23 
> /user/hive/warehouse/test/z=1/base_0000007/base_0000007
> -rw-r--r--   3 root supergroup        201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_0000007/bucket_00000
> -rw-r--r--   3 root supergroup        611 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_0000007/bucket_00001
> -rw-r--r--   3 root supergroup        614 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_0000007/bucket_00002
> -rw-r--r--   3 root supergroup        621 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_0000007/bucket_00003
> -rw-r--r--   3 root supergroup        621 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_0000007/bucket_00004
> -rw-r--r--   3 root supergroup        201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_0000007/bucket_00005
> -rw-r--r--   3 root supergroup        201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_0000007/bucket_00006
> -rw-r--r--   3 root supergroup        201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_0000007/bucket_00007
> -rw-r--r--   3 root supergroup        201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_0000007/bucket_00008
> -rw-r--r--   3 root supergroup        201 2016-11-14 21:46 
> /user/hive/warehouse/test/z=1/base_0000007/bucket_00009
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to