[
https://issues.apache.org/jira/browse/ACCUMULO-452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225464#comment-13225464
]
Aaron Cordova commented on ACCUMULO-452:
----------------------------------------
Just another comment on the type of complexity I'd like to avoid.
Specifically, it's good to have orthogonality in your features.
Locality groups are for physical partitioning, timestamps are for data
versioning. If someone wants to partition their data into time-ranges they are
free to do so, using locality groups. They simply have to decide on what their
column families will be, building some information about time ranges into them,
and assign them to locality groups.
Another kind of partitioning happens with row IDs, allowing accesses to a small
range of rows to involve one or a small number of servers. This kind of
partitioning is nice because it's automatic, one doesn't have to worry about
whether the ranges are the right granularity, Accumulo splits based on size.
Now we're talking about adding a third way to physically split data,
timestamps, and basing it on something designed for some other purpose, which
is data versioning.
Timestamps do allow users to only get data for a particular time period, but
the intent is to limit the data after the row and columns have been selected,
or maybe for short scans. I'm guessing your users want to scan over a lot of
rows and columns, but that fall within a particular time period. For this they
should build time ranges into their rows or columns.
There are already two ways to let users do this, I think adding a third will
just add additional complexity and could interfere with the original versioning
functionality.
> Generalize locality groups
> --------------------------
>
> Key: ACCUMULO-452
> URL: https://issues.apache.org/jira/browse/ACCUMULO-452
> Project: Accumulo
> Issue Type: New Feature
> Reporter: Keith Turner
> Fix For: 1.5.0
>
> Attachments: PartitionerDesign.txt
>
>
> Locality groups are a neat feature, but there is no reason to limit
> partitioning to column families. Data could be partitioned based on any
> criteria. For example if a user is interested in querying recent data and
> ageing off old data partitioning locality groups based in timestamp would be
> useful. This could be accomplished by letting users specify a partitioner
> plugin that is used at compaction and scan time. Scans would need an ability
> to pass options to the partitioner.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira