[ https://issues.apache.org/jira/browse/CARBONDATA-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ravindra Pesala reassigned CARBONDATA-466: ------------------------------------------ Assignee: Ravindra Pesala > Implement bucketing table in carbondata > --------------------------------------- > > Key: CARBONDATA-466 > URL: https://issues.apache.org/jira/browse/CARBONDATA-466 > Project: CarbonData > Issue Type: New Feature > Reporter: Ravindra Pesala > Assignee: Ravindra Pesala > > Bucketing is the useful feature when user wants to join big tables. And also > it is useful in driver level partition pruning to improve query performance. > User can add buckets on any dimension column (except complex types) as follows > {code} > CREATE TABLE test(user_id BIGINT, firstname STRING, lastname STRING) > CLUSTERED BY(user_id) INTO 32 BUCKETS > STORED BY 'carbondata'; > {code} > In the above example column user_id is hash partitioned and creates 32 bucket > files in carbondata. So while doing the join with other table on bucketed > column it can select same buckets and do the join with out shuffling. > Carbon format changes > 1. Bucketing information needs to be stored inside schema thrift file > 2. Bucket id can be stored inside every carbondata index file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)