[
https://issues.apache.org/jira/browse/ACCUMULO-452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225436#comment-13225436
]
Todd Lipcon commented on ACCUMULO-452:
--------------------------------------
bq. If they want to scan the last 6 months of data for example and the largest
file overlaps this time range but only 10% of the data in the file matches the
range, then a lot of data needs to be filtered. Does HBase do anything special
to deal with case.
We have a setting for "max file size" beyond which a file won't be included in
compactions. Setting that to a few GB would be prudent in a case where most of
your queries are time-bound. Of course, there's an associated cost against
scanners which aren't time-bound, as they'll have to merge all files, but in
some cases it's fine.
You can see more discussion about this in HBASE-4717
> Generalize locality groups
> --------------------------
>
> Key: ACCUMULO-452
> URL: https://issues.apache.org/jira/browse/ACCUMULO-452
> Project: Accumulo
> Issue Type: New Feature
> Reporter: Keith Turner
> Fix For: 1.5.0
>
> Attachments: PartitionerDesign.txt
>
>
> Locality groups are a neat feature, but there is no reason to limit
> partitioning to column families. Data could be partitioned based on any
> criteria. For example if a user is interested in querying recent data and
> ageing off old data partitioning locality groups based in timestamp would be
> useful. This could be accomplished by letting users specify a partitioner
> plugin that is used at compaction and scan time. Scans would need an ability
> to pass options to the partitioner.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira