[ 
https://issues.apache.org/jira/browse/FALCON-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887687#comment-13887687
 ] 

Satish Mittal commented on FALCON-284:
--------------------------------------

Suppose we have a HCatalog table table1 that is PARTITIONED BY (year STRING, 
month STRING, day STRING, hour STRING, minute STRING).

And we submit a falcon feed corresponding to table1 and with a retention of 2 
hours:

    <clusters>
        <cluster name="hcat-cluster">
            <validity start="2013-01-01T00:00Z" end="2030-01-01T00:00Z"/>
            <retention limit="hours(2)" action="delete"/>
        </cluster>
    </clusters>

    <table 
uri="catalog:default:table1#year=${YEAR};month=${MONTH};day=${DAY};hour=${HOUR};minute=${MINUTE}"
 />

The feed retention jobs for this feed succeed; however the partition filter 
used by retention only considers *year* in the partition filter. Here is a 
snippet of task log:

*2014-01-30 12:12:10,940 INFO  - List partitions for : table1, partition 
filter: year < '2014' (HiveCatalogService:134)*
2014-01-30 12:12:11,519 WARN  - DEPRECATED: Configuration property 
hive.metastore.local no longer has any effect. Make sure to provide a valid 
value for hive.metastore.uris if you are connecting to a remote metastore. 
(HiveConf:1231)
2014-01-30 12:12:11,844 INFO  - Trying to connect to metastore with URI 
thrift://localhost:5055 (metastore:249)
2014-01-30 12:12:11,881 INFO  - Waiting 1 seconds before next connection 
attempt. (metastore:327)
2014-01-30 12:12:12,881 INFO  - Connected to metastore. (metastore:337)
2014-01-30 12:12:12,930 INFO  - Caching HCatalog client object for 
thrift://localhost:5055 (HiveCatalogService:61)
2014-01-30 12:12:12,971 INFO  - No partitions to delete. (FeedEvictor:389)




> Hcatalog based feed retention doesn't work when partition filter spans across 
> multiple partition keys
> -----------------------------------------------------------------------------------------------------
>
>                 Key: FALCON-284
>                 URL: https://issues.apache.org/jira/browse/FALCON-284
>             Project: Falcon
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Satish Mittal
>
> When an HCatalog based feed is scheduled in falcon, retention only looks at 
> the first partition key that satisfies either of date pattern: yyyy | MM | dd 
> | HH | mm. As a result, it calculates a partition filter that contains only 
> one of these patterns. However if HCatalog table is defined in such a way 
> that date spans across multiple partition keys (year/month/day/hour/minute), 
> then feed retention doesn't delete any partitions that are granular than 
> first level (year).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to