[ 
https://issues.apache.org/jira/browse/HIVE-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463058#comment-16463058
 ] 

Steve Hoffman commented on HIVE-6589:
-------------------------------------

I agree that `MSCK REPAIR TABLE your_table_name;` will fix up the table and 
even remove expired partitions, but it still takes time to scan and if you have 
a new partition every hour, you still run a job every hour to add said 
partition.

For example, in AWS using athena this repair table command takes 800 seconds to 
run which will only get longer as the partitions grow and eventually take more 
than an hour which means the hourly job will start a new one before the old one 
finishes – madness.

The point of this ticket isn't to come up with simpler ways to insert records 
into an internal metadata store, but to NOT insert individual records because 
the paths follow a pattern.

> Automatically add partitions for external tables
> ------------------------------------------------
>
>                 Key: HIVE-6589
>                 URL: https://issues.apache.org/jira/browse/HIVE-6589
>             Project: Hive
>          Issue Type: New Feature
>    Affects Versions: 0.14.0
>            Reporter: Ken Dallmeyer
>            Assignee: Dharmendra Pratap Singh
>            Priority: Major
>
> I have a data stream being loaded into Hadoop via Flume. It loads into a date 
> partition folder in HDFS.  The path looks like this:
> {code}/flume/my_data/YYYY/MM/DD/HH
> /flume/my_data/2014/03/02/01
> /flume/my_data/2014/03/02/02
> /flume/my_data/2014/03/02/03{code}
> On top of it I create an EXTERNAL hive table to do querying.  As of now, I 
> have to manually add partitions.  What I want is for EXTERNAL tables, Hive 
> should "discover" those partitions.  Additionally I would like to specify a 
> partition pattern so that when I query Hive will know to use the partition 
> pattern to find the HDFS folder.
> So something like this:
> {code}CREATE EXTERNAL TABLE my_data (
>   col1 STRING,
>   col2 INT
> )
> PARTITIONED BY (
>   dt STRING,
>   hour STRING
> )
> LOCATION 
>   '/flume/mydata'
> TBLPROPERTIES (
>   'hive.partition.spec' = 'dt=$Y-$M-$D, hour=$H',
>   'hive.partition.spec.location' = '$Y/$M/$D/$H',
> );
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to