[ https://issues.apache.org/jira/browse/HADOOP-2222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Billy Pearson updated HADOOP-2222: ---------------------------------- Fix Version/s: 0.17.0 > option to set TTL for columns in hbase > -------------------------------------- > > Key: HADOOP-2222 > URL: https://issues.apache.org/jira/browse/HADOOP-2222 > Project: Hadoop > Issue Type: New Feature > Components: contrib/hbase > Reporter: Billy Pearson > Priority: Minor > Fix For: 0.17.0 > > > I would like to see the option to have a TTL on the columns in hbase this > feature could be helpfully in removing stale data from large datasets with > out havening to do a full scan of the dataset and then issuing deletes. > Example > Say I am crawling pages and only refreshing pages based on a set score and > some pages doe not get updated over X days the old version of the page gets > removed from the data set. > Say I am striping out links form html and storing them say a link is removed > from a page then I would need to issue a delete statement to remove that > links form the data set with a ttl the link data would remove its self if not > updated in x secs. These are just examples based on crawling like nutch but I > can foresee many apps using this option. > This is a feature in bigtables thats is handled when bigtable does > garbage-collection. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.