[ 
https://issues.apache.org/jira/browse/ACCUMULO-452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225464#comment-13225464
 ] 

Aaron Cordova commented on ACCUMULO-452:
----------------------------------------

Just another comment on the type of complexity I'd like to avoid. 

Specifically, it's good to have orthogonality in your features. 

Locality groups are for physical partitioning, timestamps are for data 
versioning. If someone wants to partition their data into time-ranges they are 
free to do so, using locality groups. They simply have to decide on what their 
column families will be, building some information about time ranges into them, 
and assign them to locality groups. 

Another kind of partitioning happens with row IDs, allowing accesses to a small 
range of rows to involve one or a small number of servers. This kind of 
partitioning is nice because it's automatic, one doesn't have to worry about 
whether the ranges are the right granularity, Accumulo splits based on size.

Now we're talking about adding a third way to physically split data, 
timestamps, and basing it on something designed for some other purpose, which 
is data versioning.

Timestamps do allow users to only get data for a particular time period, but 
the intent is to limit the data after the row and columns have been selected, 
or maybe for short scans. I'm guessing your users want to scan over a lot of 
rows and columns, but that fall within a particular time period. For this they 
should build time ranges into their rows or columns.

There are already two ways to let users do this, I think adding a third will 
just add additional complexity and could interfere with the original versioning 
functionality.

                
> Generalize locality groups
> --------------------------
>
>                 Key: ACCUMULO-452
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-452
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Keith Turner
>             Fix For: 1.5.0
>
>         Attachments: PartitionerDesign.txt
>
>
> Locality groups are a neat feature, but there is no reason to limit 
> partitioning to column families.  Data could be partitioned based on any 
> criteria.  For example if a user is interested in querying recent data and 
> ageing off old data partitioning locality groups based in timestamp would be 
> useful.  This could be accomplished by letting users specify a partitioner 
> plugin that is used at compaction and scan time.  Scans would need an ability 
> to pass options to the partitioner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to