Tag columns as partitioning columns
-----------------------------------

                 Key: HADOOP-4591
                 URL: https://issues.apache.org/jira/browse/HADOOP-4591
             Project: Hadoop Core
          Issue Type: Wish
          Components: contrib/hive
            Reporter: Venky Iyer



    CREATE TABLE tname (INT cname1, INT pcol PARTITIONING )
    COMMENT 'This is a table' 
    PARTITIONED BY(dt STRING) 
    STORED AS SEQUENCEFILE; 

The goal here is to annotate a column as being a "partitioning" column. 
Consider pcol in the above example. It is annotated with 'PARTITIONING', which 
implies that the create table
has 

PARTITIONED BY (dt, pcol)

and every write to this table has implicitly

INSERT OVERWRITE tname PARTITION (pcol='X')
WHERE output.pcol = 'X'

for every distinct value X that pcol takes.

This is ideally an addition on top of the explicit partitioning that is already 
in the syntax, so that if I said

INSERT OVERWRITE tname PARTITION (dt='D')

it would still go into the partition (dt='D", pcol='Y') when the value of pcol 
is Y.

It would be up to the user to make sure the cardinality of these columns is 
reasonable, and that enough data goes into each partition that there is some 
net benefit (just as it is in the explicit case).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to