[jira] Commented: (HIVE-493) automatically infer existing partitions of table from HDFS files.

Prasad Chakka (JIRA) Sun, 11 Oct 2009 21:37:00 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764542#action_12764542
 ]


Prasad Chakka commented on HIVE-493:
------------------------------------

Cyrus, 

Thanks for providing this patch. Very useful.

It is possible that on an HDFS with permissions enabled, a partition/table 
directory is not accessible to the current user but metadata will be deleted 
here so I am little uncomfortable in removing partitions. I am not really sure 
that there is that much utility for removing partitions compared to the risk 
loosing partitions permanently. What do you think? 

Couple of comments on the code:
1) Can you add a test or two to the msck test package.
2) REPAIR should be an optional keyword to the MSCK ANTRL clause instead of 
being whole another clause. Look at how KW_EXTERNAL is used in createStatement 
clause.
3) Following like should be outside of the for loop since there is only one 
table here.
{code}
Table table = db.getTable(MetaStoreUtils.DEFAULT_DATABASE_NAME,
                msckDesc.getTableName());
{code}
4) Is this cast '(Map <String, String>)' really needed?



> automatically infer existing partitions of table from HDFS files.
> -----------------------------------------------------------------
>
>                 Key: HIVE-493
>                 URL: https://issues.apache.org/jira/browse/HIVE-493
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.3.0, 0.3.1, 0.4.0
>            Reporter: Prasad Chakka
>         Attachments: HIVE-493.patch
>
>
> Initially partition list for a table is inferred from HDFS directory 
> structure instead of looking into metastore (partitions are created using 
> 'alter table ... add partition'). but this automatic inferring was removed to 
> favor the later approach during checking-in metastore checker feature and 
> also to facilitate external partitions.
> Joydeep and Frederick mentioned that it would simple for users to create the 
> HDFS directory and let Hive infer rather than explicitly add a partition. But 
> doing that raises following...
> 1) External partitions -- so we have to mix both approaches and partition 
> list is merged list of inferred partitions and registered partitions. and 
> duplicates have to be resolved.
> 2) Partition level schemas can't supported. Which schema to chose for the 
> inferred partitions? the table schema when the inferred partition is created 
> or the latest tale schema? how do we know the table schema when the inferred 
> partitions is created?
> 3) If partitions have to be registered the partitions can be disabled without 
> actually deleting the data. this feature is not supported and may not be that 
> useful but nevertheless this can't be supported with inferred partitions
> 4) Indexes are being added. So if partitions are not registered then indexes 
> for such partitions can not be maintained automatically.
> I would like to know what is the general thinking about this among users of 
> Hive. If inferred partitions are preferred then can we live with restricted 
> functionality that this imposes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-493) automatically infer existing partitions of table from HDFS files.

Reply via email to