Hi All, Bucketing concept is based on the hash partition the bucketed column as per configured bucket numbers. Records with same bucketed column always goes to the same same bucket. Physically each bucket is a file/files in table directory. Advantages Bucketed table is useful feature to do the map side joins and avoids shuffling of data. Carbondata can do driver level pruning on bucketed column to improve query performance.
User can add bucketed table as follows CREATE TABLE test(user_id BIGINT, firstname STRING, lastname STRING) CLUSTERED BY(user_id) INTO 32 BUCKETS; In the above example column user_id is hash partitioned and creates 32 buckets/partitions files in carbondata. So while doing the join with other table on bucketed column it can select same buckets and do the join with out shuffling. Carbon creates following folder structure currently, since carbon is already supporting partitioning in its file format dbName -> tableName - > Fact -> Part0 ->Segment_id -> carbondatafiles Part1 ->Segment_id -> carbondatafiles we can also move the partitionid to file metadata.But if we move the partitionId to metadata then there would be complications in backward compatibility. -- Thanks & Regards, Ravindra