[ 
https://issues.apache.org/jira/browse/HBASE-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryan rawson updated HBASE-2999:
-------------------------------

    Summary: hbase TTL can be suboptimal and leave small regions after 
compaction  (was: hbase TTL behavior depends on key design)

> hbase TTL can be suboptimal and leave small regions after compaction
> --------------------------------------------------------------------
>
>                 Key: HBASE-2999
>                 URL: https://issues.apache.org/jira/browse/HBASE-2999
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.89.20100621
>         Environment: All
>            Reporter: Jimmy Hu
>
> Yes, Current TTL based on compaction is working as advertised if the key 
> randomly distribute the incoming data
> among all regions.  However, if the key is designed in chronological order, 
> the TTL doesn't really work, as  no compaction
> will happen for data already written. So we can't say  that current TTL 
> really work as advertised, as it is key structure dependent.
> This is a pity, because a major use case for hbase is for people to store 
> history or log data. normally people only
> want to retain the data for a fixed period. for example, US government 
> default data retention policy is 7 years. Those
> data are saved in chronological order. Current TTL implementation doesn't 
> work at all for those kind of use case.
> In order for that use case to really work, hbase needs to have an active 
> thread that periodically runs and check if there
> are data older than TTL, and delete the data older than TTL is necessary, 
> and compact small regions older than certain time period
> into larger ones to save system resource. It can optimize the deletion by 
> delete the whole region if it detects that the last time
> stamp for the region is older than TTL.  There should be 2 parameters  to 
> configure for hbase:
> 1. whether to disable/enable the TTL thread.
> 2. the interval that TTL will run. maybe we can use a special value like 0 
> to indicate that we don't run the TTL thread, thus saving one configuration 
> parameter.
> for the default TTL, probably it should be set to 1 day.
> 3. How small will the region be merged. it should be a percentage of the 
> store size. for example, if 2 consecutive region is only 10% of the store 
> szie ( default is 256M), we can initiate a region merge.  We probably need a 
> parameter to reduce the merge too. for example , we only merge for regions 
> who's largest timestamp
> is older than half of TTL.
> We are tracking min/max timestamps in storefiles currently, so it's possible 
> that we could expire some files of a region as well, even if the region was 
> not completely expired. So At minimum, we should be able to implement 
> dropping  the stores that is older than TTL. if all stores for a region is 
> dropped, we should drop the whole region,
> and update the key range of the adjacent region, so there is not a key hole 
> left.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to