[jira] Commented: (HIVE-50) Tag columns as partitioning columns

Ashish Thusoo (JIRA) Mon, 24 Nov 2008 11:12:39 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650281#action_12650281
 ]


Ashish Thusoo commented on HIVE-50:
-----------------------------------

This sounds good to me. I guess the PARTITION clause is just a syntax sugar at 
this point and this could implicitly be achieved by

INSERT OVERWRITE tname(dt, pcol, cname)
SELECT '2008-11-04', another.a, another.b FROM another;

We could always infer from this that a constant value is being put in the 
partition col dt and generate any optimizations that could be done for the 
PARTITION clause.



> Tag columns as partitioning columns
> -----------------------------------
>
>                 Key: HIVE-50
>                 URL: https://issues.apache.org/jira/browse/HIVE-50
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Venky Iyer
>
>     CREATE TABLE tname (INT cname1, INT pcol PARTITIONING )
>     COMMENT 'This is a table' 
>     PARTITIONED BY(dt STRING) 
>     STORED AS SEQUENCEFILE; 
> The goal here is to annotate a column as being a "partitioning" column. 
> Consider pcol in the above example. It is annotated with 'PARTITIONING', 
> which implies that the create table
> has 
> PARTITIONED BY (dt, pcol)
> and every write to this table has implicitly
> INSERT OVERWRITE tname PARTITION (pcol='X')
> WHERE output.pcol = 'X'
> for every distinct value X that pcol takes.
> This is ideally an addition on top of the explicit partitioning that is 
> already in the syntax, so that if I said
> INSERT OVERWRITE tname PARTITION (dt='D')
> it would still go into the partition (dt='D", pcol='Y') when the value of 
> pcol is Y.
> It would be up to the user to make sure the cardinality of these columns is 
> reasonable, and that enough data goes into each partition that there is some 
> net benefit (just as it is in the explicit case).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-50) Tag columns as partitioning columns

Reply via email to