[jira] [Closed] (CARBONDATA-910) Implement Partition feature

David Cai (Jira) Wed, 06 May 2020 19:13:19 -0700


     [ 
https://issues.apache.org/jira/browse/CARBONDATA-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Cai closed CARBONDATA-910.
--------------------------------
    Resolution: Invalid

deprecated since 2.0

> Implement Partition feature
> ---------------------------
>
>                 Key: CARBONDATA-910
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-910
>             Project: CarbonData
>          Issue Type: New Feature
>          Components: core, data-load, data-query
>            Reporter: Cao, Lionel
>            Assignee: Cao, Lionel
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Why need partition table
> Partition table provide an option to divide table into some smaller pieces. 
> With partition table:
>       1. Data could be better managed, organized and stored. 
>       2. We can avoid full table scan in some scenario and improve query 
> performance. (partition column in filter, 
>       multiple partition tables join in the same partition column etc.)
> Partitioning design
> Range Partitioning           
>        range partitioning maps data to partitions according to the range of 
> partition column values, operator '<' defines non-inclusive upper bound of 
> current partition.
> List Partitioning
>        list partitioning allows you map data to partitions with specific 
> value list
> Hash Partitioning
>        hash partitioning maps data to partitions with hash algorithm and put 
> them to the given number of partitions
> Composite Partitioning(2 levels at most for now)
>        Range-Range, Range-List, Range-Hash, List-Range, List-List, List-Hash, 
> Hash-Range, Hash-List, Hash-Hash
> DDL-Create 
> Create table sales(
>      itemid long, 
>      logdate datetime, 
>      customerid int
>      ...
>      ...)
> [partition by range logdate(...)]
> [subpartition by list area(...)]
> Stored By 'carbondata'
> [tblproperties(...)];
> range partition: 
>      partition by range logdate(<  '2016-01-01', < '2017-01-01', < 
> '2017-02-01', < '2017-03-01', < '2099-01-01')
> list partition:
>      partition by list area('Asia', 'Europe', 'North America', 'Africa', 
> 'Oceania')
> hash partition:
>      partition by hash(itemid, 9) 
> composite partition:
>      partition by range logdate(<  '2016- -01', < '2017-01-01', < 
> '2017-02-01', < '2017-03-01', < '2099-01-01')
>      subpartition by list area('Asia', 'Europe', 'North America', 'Africa', 
> 'Oceania')
> DDL-Rebuild, Add
> Alter table sales rebuild partition by (range|list|hash)(...);
> Alter table salse add partition (< '2018-01-01');    #only support range 
> partitioning, list partitioning
> Alter table salse add partition ('South America');
> #Note: No delete operation for partition, please use rebuild. 
> If need delete data, use delete statement, but the definition of partition 
> will not be deleted.
> Partition Table Data Store
> [Option One]
> Use the current design, keep partition folder out of segments
> Fact
>    |___Part0
>    |          |___Segment_0
>    |                         |___ *******-[bucketId]-.carbondata
>    |                         |___ *******-[bucketId]-.carbondata
>    |          |___Segment_1
>    |          ...
>    |___Part1
>    |          |___Segment_0
>    |          |___Segment_1
>    |...
> [Option Two]
> remove partition folder, add partition id into file name and build btree in 
> driver side.
> Fact
>    |___Segment_0
>    |                  |___ *******-[bucketId]-[partitionId].carbondata
>    |                  |___ *******-[bucketId]-[partitionId].carbondata
>    |___Segment_1
>    |___Segment_2
>    ...
> Pros & Cons: 
> Option one would be faster to locate target files
> Option two need to store more metadata of folders
> Partition Table MetaData Store
> partitioni info should be stored in file footer/index file and load into 
> memory before user query.
> Relationship with Bucket
> Bucket should be lower level of partition.
> Partition Table Query
> Example:
> Select * from sales
> where logdate <= date '2016-12-01';
> User should remember to add a partition filter when write SQL on a partition 
> table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (CARBONDATA-910) Implement Partition feature

Reply via email to