[
https://issues.apache.org/jira/browse/FALCON-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887687#comment-13887687
]
Satish Mittal commented on FALCON-284:
--------------------------------------
Suppose we have a HCatalog table table1 that is PARTITIONED BY (year STRING,
month STRING, day STRING, hour STRING, minute STRING).
And we submit a falcon feed corresponding to table1 and with a retention of 2
hours:
<clusters>
<cluster name="hcat-cluster">
<validity start="2013-01-01T00:00Z" end="2030-01-01T00:00Z"/>
<retention limit="hours(2)" action="delete"/>
</cluster>
</clusters>
<table
uri="catalog:default:table1#year=${YEAR};month=${MONTH};day=${DAY};hour=${HOUR};minute=${MINUTE}"
/>
The feed retention jobs for this feed succeed; however the partition filter
used by retention only considers *year* in the partition filter. Here is a
snippet of task log:
*2014-01-30 12:12:10,940 INFO - List partitions for : table1, partition
filter: year < '2014' (HiveCatalogService:134)*
2014-01-30 12:12:11,519 WARN - DEPRECATED: Configuration property
hive.metastore.local no longer has any effect. Make sure to provide a valid
value for hive.metastore.uris if you are connecting to a remote metastore.
(HiveConf:1231)
2014-01-30 12:12:11,844 INFO - Trying to connect to metastore with URI
thrift://localhost:5055 (metastore:249)
2014-01-30 12:12:11,881 INFO - Waiting 1 seconds before next connection
attempt. (metastore:327)
2014-01-30 12:12:12,881 INFO - Connected to metastore. (metastore:337)
2014-01-30 12:12:12,930 INFO - Caching HCatalog client object for
thrift://localhost:5055 (HiveCatalogService:61)
2014-01-30 12:12:12,971 INFO - No partitions to delete. (FeedEvictor:389)
> Hcatalog based feed retention doesn't work when partition filter spans across
> multiple partition keys
> -----------------------------------------------------------------------------------------------------
>
> Key: FALCON-284
> URL: https://issues.apache.org/jira/browse/FALCON-284
> Project: Falcon
> Issue Type: Bug
> Affects Versions: 0.5
> Reporter: Satish Mittal
>
> When an HCatalog based feed is scheduled in falcon, retention only looks at
> the first partition key that satisfies either of date pattern: yyyy | MM | dd
> | HH | mm. As a result, it calculates a partition filter that contains only
> one of these patterns. However if HCatalog table is defined in such a way
> that date spans across multiple partition keys (year/month/day/hour/minute),
> then feed retention doesn't delete any partitions that are granular than
> first level (year).
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)