[jira] [Comment Edited] (HIVE-24715) Increase bucketId range

Attila Magyar (Jira) Mon, 01 Feb 2021 08:05:44 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276424#comment-17276424
 ]


Attila Magyar edited comment on HIVE-24715 at 2/1/21, 4:04 PM:
---------------------------------------------------------------

Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks 
than 4095 it overflows. See TEZ-4271 and TEZ-4130 for more context.

 
{code:java}
* Represents format of "bucket" property in Hive 3.0.
* top 3 bits - version code.
* next 1 bit - reserved for future
* next 12 bits - the bucket ID
* next 4 bits reserved for future {code}
Simply increasing the range would have an undesired effect on compaction 
efficiency. If hundred thousands of tasks are started than we would and up 
having hundred thousands of files and since compaction works across statement 
ids it wouldn't merge those.

Instead of increasing the range, the proposed solution is to let bucket id 
overflow into the statement id, so that the 4096th bucket will bucket_0 and it 
will look like it was created by statement_id+1.

This way compaction will be able to merge the same buckets that belong to 
different statements.

 

 


was (Author: amagyar):
Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks 
than 4095 it overflows. See TEZ-4271 for more context.

 
{code:java}
* Represents format of "bucket" property in Hive 3.0.
* top 3 bits - version code.
* next 1 bit - reserved for future
* next 12 bits - the bucket ID
* next 4 bits reserved for future {code}
Simply increasing the range would have an undesired effect on compaction 
efficiency. If hundred thousands of tasks are started than we would and up 
having hundred thousands of files and since compaction works across statement 
ids it wouldn't merge those.

Instead of increasing the range, the proposed solution is to let bucket id 
overflow into the statement id, so that the 4096th bucket will bucket_0 and it 
will look like it was created by statement_id+1.

This way compaction will be able to merge the same buckets that belong to 
different statements.

 

 

> Increase bucketId range
> -----------------------
>
>                 Key: HIVE-24715
>                 URL: https://issues.apache.org/jira/browse/HIVE-24715
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>            Reporter: Attila Magyar
>            Assignee: Attila Magyar
>            Priority: Major
>             Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-24715) Increase bucketId range

Reply via email to