[
https://issues.apache.org/jira/browse/HIVE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650281#action_12650281
]
Ashish Thusoo commented on HIVE-50:
-----------------------------------
This sounds good to me. I guess the PARTITION clause is just a syntax sugar at
this point and this could implicitly be achieved by
INSERT OVERWRITE tname(dt, pcol, cname)
SELECT '2008-11-04', another.a, another.b FROM another;
We could always infer from this that a constant value is being put in the
partition col dt and generate any optimizations that could be done for the
PARTITION clause.
> Tag columns as partitioning columns
> -----------------------------------
>
> Key: HIVE-50
> URL: https://issues.apache.org/jira/browse/HIVE-50
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: Venky Iyer
>
> CREATE TABLE tname (INT cname1, INT pcol PARTITIONING )
> COMMENT 'This is a table'
> PARTITIONED BY(dt STRING)
> STORED AS SEQUENCEFILE;
> The goal here is to annotate a column as being a "partitioning" column.
> Consider pcol in the above example. It is annotated with 'PARTITIONING',
> which implies that the create table
> has
> PARTITIONED BY (dt, pcol)
> and every write to this table has implicitly
> INSERT OVERWRITE tname PARTITION (pcol='X')
> WHERE output.pcol = 'X'
> for every distinct value X that pcol takes.
> This is ideally an addition on top of the explicit partitioning that is
> already in the syntax, so that if I said
> INSERT OVERWRITE tname PARTITION (dt='D')
> it would still go into the partition (dt='D", pcol='Y') when the value of
> pcol is Y.
> It would be up to the user to make sure the cardinality of these columns is
> reasonable, and that enough data goes into each partition that there is some
> net benefit (just as it is in the explicit case).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.