[ 
https://issues.apache.org/jira/browse/CASSANDRA-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099124#comment-13099124
 ] 

Mck SembWever edited comment on CASSANDRA-3137 at 9/27/11 9:19 AM:
-------------------------------------------------------------------

Indeed. I could be using this asap.

The use case is...
We're using a ByteOrderedPartition because we run incremental hadoop jobs over 
one of our column families where "events" initially come in. This cf has RF=1 
and time-based UUID keys that are manipulated so that their byte ordering are 
time ordered. (the timestamp put up front). Each column has ttl of 3 months.
After 3 months of data we saw all data on one node. Now i understand as the 
token range is the timestamp range which is from 1970 to 2270 so of course our 
3 month period fell on one node (with a 3 node cluster even 100 years would 
fall on one node).

To properly manage this cf we need to either continuously move nodes around, a 
cumbersome operation, or change the key so it's prefixed with {{timestamp % 
3months}}. This would allow 3 months of data to cycle over the whole cluster 
and wrap around again. Obviously we're leaning towards the latter solution as 
it simplifies operations. But it does require this patch.

(When CFIF supports IndexClause everything changes, we change our cluster to 
RandomPartitioner, use secondary indexes, and never look back...)
                
      was (Author: michaelsembwever):
    Indeed. I could be using this asap.

The use case is...
We're using a ByteOrderedPartition because we run incremental hadoop jobs over 
one of our column families where "events" initially come in. This cf has RF=1 
and time-based UUID keys that are manipulated so that their byte ordering are 
time ordered. (the byte-unsigned timestamp put up front). Each column has ttl 
of 3 months.
After 3 months of data we saw all data on one node. Now i understand as the 
token range is the timestamp range which is from 1970 to 2270 so of course our 
3 month period fell on one node (with a 3 node cluster even 100 years would 
fall on one node).

To properly manage this cf we need to either continuously move nodes around, a 
cumbersome operation, or change the key so it's prefixed with {{timestamp % 
3months}}. This would allow 3 months of data to cycle over the whole cluster 
and wrap around again. Obviously we're leaning towards the latter solution as 
it simplifies operations. But it does require this patch.

(When CFIF supports IndexClause everything changes, we change our cluster to 
RandomPartitioner, use secondary indexes, and never look back...)
                  
> Implement wrapping intersections for ConfigHelper's InputKeyRange
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-3137
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3137
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 0.8.5
>            Reporter: Mck SembWever
>            Assignee: Mck SembWever
>            Priority: Minor
>             Fix For: 0.8.7
>
>         Attachments: CASSANDRA-3137.patch, CASSANDRA-3137.patch
>
>
> Before there was no support for multiple intersections between the split's 
> range and the job's configured range.
> After CASSANDRA-3108 it is now possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to