[ 
https://issues.apache.org/jira/browse/CARBONDATA-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala reassigned CARBONDATA-466:
------------------------------------------

    Assignee: Ravindra Pesala

> Implement bucketing table in carbondata
> ---------------------------------------
>
>                 Key: CARBONDATA-466
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-466
>             Project: CarbonData
>          Issue Type: New Feature
>            Reporter: Ravindra Pesala
>            Assignee: Ravindra Pesala
>
> Bucketing is the useful feature when user wants to join big tables. And also 
> it is useful in driver level partition pruning to improve query performance.
> User can add buckets on any dimension column (except complex types) as follows
> {code}
> CREATE TABLE test(user_id BIGINT, firstname STRING, lastname STRING)
> CLUSTERED BY(user_id) INTO 32 BUCKETS
> STORED BY 'carbondata';
> {code}
> In the above example column user_id is hash partitioned and creates 32 bucket 
> files in carbondata. So while doing the join with other table on bucketed 
> column it can select same buckets and do the join with out shuffling.
> Carbon format changes
> 1. Bucketing information needs to be stored inside schema thrift file
> 2. Bucket id can be stored inside every carbondata index file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to