about key sorting and token partitioning

2010-11-10 Thread zangds
Hi,
I am using cassandra to store a message steam, and want to use timestamps (like 
mmddhhMIss or something alike) as the keys.
So if I use RandomPartitioner, I will loose the order when using 
get_range_slices().
If I use OrderPreservingPartitioner, how should I configure cassandra to make 
load balance between the nodes?

Thanks!

2010-11-10 



zangds 


Re: about key sorting and token partitioning

2010-11-10 Thread Peter Schuller
 I am using cassandra to store a message steam, and want to use timestamps
 (like mmddhhMIss or something alike) as the keys.
 So if I use RandomPartitioner, I will loose the order when using
 get_range_slices().
 If I use OrderPreservingPartitioner, how should I configure cassandra to
 make load balance between the nodes?

AFAIK there's no silver bullet to making the order preserving
partitioner easy to use w.r.t. node balancing in the situation you're
describing.

One thing to consider is to use the random partitioner (for its
simplicity in managing the cluster) and use a granular subset of the
timestamp as the row key. For example, you could have the row key be
mmddhh to get an entire hour per row.

A reasonable granularity would depend on your use-case; but the idea
is to be able to take advantage of the simplicity of using the random
partitioner, while having reasonable efficiency on range slices by
making each row contain a pretty large range such that any additional
overhead in jumping across nodes is negligible in comparison to the
other work done.

-- 
/ Peter Schuller