Tag columns as partitioning columns
-----------------------------------
Key: HADOOP-4591
URL: https://issues.apache.org/jira/browse/HADOOP-4591
Project: Hadoop Core
Issue Type: Wish
Components: contrib/hive
Reporter: Venky Iyer
CREATE TABLE tname (INT cname1, INT pcol PARTITIONING )
COMMENT 'This is a table'
PARTITIONED BY(dt STRING)
STORED AS SEQUENCEFILE;
The goal here is to annotate a column as being a "partitioning" column.
Consider pcol in the above example. It is annotated with 'PARTITIONING', which
implies that the create table
has
PARTITIONED BY (dt, pcol)
and every write to this table has implicitly
INSERT OVERWRITE tname PARTITION (pcol='X')
WHERE output.pcol = 'X'
for every distinct value X that pcol takes.
This is ideally an addition on top of the explicit partitioning that is already
in the syntax, so that if I said
INSERT OVERWRITE tname PARTITION (dt='D')
it would still go into the partition (dt='D", pcol='Y') when the value of pcol
is Y.
It would be up to the user to make sure the cardinality of these columns is
reasonable, and that enough data goes into each partition that there is some
net benefit (just as it is in the explicit case).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.