[ 
https://issues.apache.org/jira/browse/ACCUMULO-452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225436#comment-13225436
 ] 

Todd Lipcon commented on ACCUMULO-452:
--------------------------------------

bq. If they want to scan the last 6 months of data for example and the largest 
file overlaps this time range but only 10% of the data in the file matches the 
range, then a lot of data needs to be filtered. Does HBase do anything special 
to deal with case.

We have a setting for "max file size" beyond which a file won't be included in 
compactions. Setting that to a few GB would be prudent in a case where most of 
your queries are time-bound. Of course, there's an associated cost against 
scanners which aren't time-bound, as they'll have to merge all files, but in 
some cases it's fine.

You can see more discussion about this in HBASE-4717
                
> Generalize locality groups
> --------------------------
>
>                 Key: ACCUMULO-452
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-452
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Keith Turner
>             Fix For: 1.5.0
>
>         Attachments: PartitionerDesign.txt
>
>
> Locality groups are a neat feature, but there is no reason to limit 
> partitioning to column families.  Data could be partitioned based on any 
> criteria.  For example if a user is interested in querying recent data and 
> ageing off old data partitioning locality groups based in timestamp would be 
> useful.  This could be accomplished by letting users specify a partitioner 
> plugin that is used at compaction and scan time.  Scans would need an ability 
> to pass options to the partitioner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to