date:20100602

Re: Number of client connections

2010-06-02 Thread Ran Tavory

as far as I know, only the os level limitations, e.g. typically ~60k

On Thu, Jun 3, 2010 at 9:34 AM, Lev Stesin  wrote:

> Hi,
>
> Is there a limit on the number of client connections to a node? Thanks.
>
> --
> Lev
>

Number of client connections

2010-06-02 Thread Lev Stesin

Hi,

Is there a limit on the number of client connections to a node? Thanks.

-- 
Lev

Re: nodetool cleanup isn't cleaning up?

2010-06-02 Thread Ran Tavory

getRangeToEndpointMap is very useful, thanks, I didn't know about it...
however, I've reconfigured my cluster since (moved some nodes and tokens) so
not the problem is gone. I guess I'll use getRangeToEndpointMap next time I
see something like this...

On Thu, Jun 3, 2010 at 9:15 AM, Jonathan Ellis  wrote:

> Then the next step is to check StorageService.getRangeToEndpointMap via jmx
>
> On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory  wrote:
> > I'm using RackAwareStrategy. But it still doesn't make sense I think...
> > let's see what did I miss...
> > According to http://wiki.apache.org/cassandra/Operations
> >
> > RackAwareStrategy: replica 2 is placed in the first node along the ring
> the
> > belongs in another data center than the first; the remaining N-2
> replicas,
> > if any, are placed on the first nodes along the ring in the same rack as
> the
> > first
> >
> > 192.168.252.124Up803.33 MB
> > 56713727820156410577229101238628035242 |<--|
> > 192.168.252.99Up 352.85 MB
> > 56713727820156410577229101238628035243 |   ^
> > 192.168.252.125Up134.24 MB
> > 85070591730234615865843651857942052863 v   |
> > 192.168.254.57Up 676.41 MB
> >  113427455640312821154458202477256070485|   ^
> > 192.168.254.58Up  99.74 MB
> >  141784319550391026443072753096570088106v   |
> > 192.168.254.59Up  99.94 MB
> >  170141183460469231731687303715884105727|-->|
> > Alright, so I made a mistake and didn't use the alternate-datacenter
> > suggestion on the page so the first node of every DC is overloaded with
> > replicas. However,  the current situation still doesn't make sense to me.
> > .252.124 will be overloaded b/c it has the first token in the 252 dc.
> > .254.57 will also be overloaded since it has the first token in the .254
> DC.
> > But for which node does 252.99 serve as a replicator? It's not the first
> in
> > the DC and it's just one single token more than it's predecessor (which
> is
> > in the same DC).
> > On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis 
> wrote:
> >>
> >> I'm saying that .99 is getting a copy of all the data for which .124
> >> is the primary.  (If you are using RackUnawarePartitioner.  If you are
> >> using RackAware it is some other node.)
> >>
> >> On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory  wrote:
> >> > ok, let me try and translate your answer ;)
> >> > Are you saying that the data that was left on the node is
> >> > non-primary-replicas of rows from the time before the move?
> >> > So this implies that when a node moves in the ring, it will affect
> >> > distribution of:
> >> > - new keys
> >> > - old keys primary node
> >> > -- but will not affect distribution of old keys non-primary replicas.
> >> > If so, still I don't understand something... I would expect even the
> >> > non-primary replicas of keys to be moved since if they don't, how
> would
> >> > they
> >> > be found? I mean upon reads the serving node should not care about
> >> > whether
> >> > the row is new or old, it should have a consistent and global mapping
> of
> >> > tokens. So I guess this ruins my theory...
> >> > What did you mean then? Is this deletions of non-primary replicated
> >> > data?
> >> > How does the replication factor affect the load on the moved host
> then?
> >> >
> >> > On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis 
> >> > wrote:
> >> >>
> >> >> well, there you are then.
> >> >>
> >> >> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory 
> wrote:
> >> >> > yes, replication factor = 2
> >> >> >
> >> >> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis <
> jbel...@gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> you have replication factor > 1 ?
> >> >> >>
> >> >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory 
> >> >> >> wrote:
> >> >> >> > I hope I understand nodetool cleanup correctly - it should clean
> >> >> >> > up
> >> >> >> > all
> >> >> >> > data
> >> >> >> > that does not (currently) belong to this node. If so, I think it
> >> >> >> > might
> >> >> >> > not
> >> >> >> > be working correctly.
> >> >> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below
> >> >> >> > 192.168.252.99Up 279.35 MB
> >> >> >> > 3544607988759775661076818827414252202
> >> >> >> >  |<--|
> >> >> >> > 192.168.252.124Up 167.23 MB
> >> >> >> > 56713727820156410577229101238628035242 |   ^
> >> >> >> > 192.168.252.125Up 82.91 MB
> >> >> >> >  85070591730234615865843651857942052863 v   |
> >> >> >> > 192.168.254.57Up 366.6 MB
> >> >> >> >  113427455640312821154458202477256070485|   ^
> >> >> >> > 192.168.254.58Up 88.44 MB
> >> >> >> >  141784319550391026443072753096570088106v   |
> >> >> >> > 192.168.254.59Up 88.45 MB
> >> >> >> >  170141183460469231731687303715884105727|-->|
> >> >> >> > I wanted 124 to take all the load from 99. So I issued a move
> >> >> >> > command.
> >> >> >> > $ nodetool -h cass99 -p 9004 move
> >> >> >> > 56713727820156410577229101238628035243
> >> >> >> >
> >>

Re: Effective cache size

2010-06-02 Thread Jonathan Ellis

On Wed, Jun 2, 2010 at 10:39 PM, David King  wrote:
> If I go to fetch some row given the rack-unaware placement strategy, the 
> default snitch and CL==ONE, the node that is asked is the first node in the 
> ring with the datum that is currently up, then a checksum is sent to the 
> replicas to trigger read repair as appropriate.

Yes

> So with the row cache, that first node (the primary replica) is the one that 
> has that row cached, yes?

No, it's the closest node as determined by snitch.sortByProximity.

any given node X will never know whether another node Y has a row
cached or not.  the overhead for communicating that level of detail
would be totally prohibitive.

all caching does is speed the read, once a request is received for
data local to a given node.  no more, no less.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: ColumnFamilyInputFormat with super columns

2010-06-02 Thread Jonathan Ellis

We don't support supercolumns in CFIF yet.

Peng Guo added this in his patchset at
http://files.cnblogs.com/gpcuster/CassandraInputFormat.rar but it's
mixed in with a ton of other changes.  Honestly it's probably easier
to start fresh, but it might be useful to look at his code for
inspiration.

On Wed, Jun 2, 2010 at 2:41 PM, Torsten Curdt  wrote:
> I have a super column along he lines of
>
>  => {  => { att: value }}
>
> Now I would like to process a set of rows [from_time..until_time] with Hadoop.
> I've setup the hadoop job like this
>
>                job.setInputFormatClass(ColumnFamilyInputFormat.class);
>                ConfigHelper.setColumnFamily(job.getConfiguration(), "family", 
> "events");
>
>                SlicePredicate predicate = new SlicePredicate();
>                predicate.setSlice_range(new SliceRange(new byte[0], new 
> byte[0],
> false, 1000));
>
>                ConfigHelper.setSlicePredicate(job.getConfiguration(), 
> predicate);
>
> but I don't see how I could say what rows the job should process.
> Any pointers?
>
> cheers
> --
> Torsten
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Nodes dropping out of cluster due to GC

2010-06-02 Thread Jonathan Ellis

remember: you get concurrent mode failures, when the old gen fills up
with garbage before it can finish the CMS.  so adding capacity =
reducing load per machine is the easiest way to make this a non-issue.

On Wed, Jun 2, 2010 at 12:45 PM, Eric Halpern  wrote:
>
>
> Ryan King wrote:
>>
>> Why run with so few nodes?
>>
>> -ryan
>>
>> On Tue, Jun 1, 2010 at 4:20 PM, Eric Halpern  wrote:
>>>
>>> Hello,
>>>
>>> We're running a 4 node cluster on beefy EC2 virtual instances (8 core, 32
>>> GB) using EBS storage with 8 GB of heap allocated to the JVM.
>>>
>>> Every couple of hours, each of the nodes does a concurrent mark/sweep
>>> that
>>> takes around 30 seconds to complete.  During that GC, the node
>>> temporarily
>>> drops out of the cluster, usually for about 15 seconds.
>>>
>>> The frequency of the concurrent mark sweeps seems reasonable, but the
>>> fact
>>> that the node drops out of the cluster temporarily is a major problem
>>> since
>>> this has significant impact on the performance and stability of our
>>> service.
>>>
>>> Has anyone experienced this sort of problem?  It would be great to hear
>>> from
>>> anyone who has had experience with this sort of issue and/or suggestions
>>> for
>>> how to deal with it.
>>>
>>> Thanks, Eric
>>> --
>>
>>
>
> We wanted to start with a small number of nodes to test things out before
> going big.  Is there some reason that a small cluster would cause more
> problems in this regard.  The actual request load is actually pretty light
> for the cluster.
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Nodes-dropping-out-of-cluster-due-to-GC-tp5128481p5132279.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Heterogeneous Cassandra Cluster

2010-06-02 Thread Jonathan Ellis

No.  And if we did it would be a bad idea: good ops practice is to
_minimize_ variability.

On Wed, Jun 2, 2010 at 3:18 AM, David Boxenhorn  wrote:
> Is it possible to make a heterogeneous Cassandra cluster, with both Linux
> and Windows nodes? I tried doing it and got
>
> Error in ThreadPoolExecutor
> java.lang.NullPointerException
>
> Not sure if this is due to the Linux/Windows mix or something else.
>
>
> Details below:
>
>
>
> [r...@iqdev01 cassandra]# bin/cassandra -f
>
>  INFO 20:32:26,431 Auto DiskAccessMode determined to be mmap
>
>  INFO 20:32:27,085 Sampling index for
> /var/lib/cassandra/data/system/LocationInfo-1-Data.db
>
>  INFO 20:32:27,095 Sampling index for
> /var/lib/cassandra/data/system/LocationInfo-2-Data.db
>
>  INFO 20:32:27,104 Replaying
> /var/lib/cassandra/commitlog/CommitLog-1275412410865.log
>
>  INFO 20:32:27,129 Creating new commitlog segment
> /var/lib/cassandra/commitlog/CommitLog-1275413547129.log
>
>  INFO 20:32:27,138 LocationInfo has reached its threshold; switching in a
> fresh Memtable at
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1275413547129.log',
> position=173)
>
>  INFO 20:32:27,138 Enqueuing flush of Memtable(LocationInfo)@1491010616
>
>  INFO 20:32:27,139 Writing Memtable(LocationInfo)@1491010616
>
>  INFO 20:32:27,187 Completed flushing
> /var/lib/cassandra/data/system/LocationInfo-3-Data.db
>
>  INFO 20:32:27,207 Log replay complete
>
>  INFO 20:32:27,239 Saved Token found: 25870423804996813139937576731363583348
>
>  INFO 20:32:27,239 Saved ClusterName found: Lookin2
>
>  INFO 20:32:27,247 Starting up server gossip
>
>  INFO 20:32:27,266 Joining: getting load information
>
>  INFO 20:32:27,267 Sleeping 9 ms to wait for load information...
>
>  INFO 20:32:27,327 Node /192.168.80.12 is now part of the cluster
>
>  INFO 20:32:27,332 Node /192.168.80.234 is now part of the cluster
>
>  INFO 20:32:27,864 InetAddress /192.168.80.12 is now UP
>
>  INFO 20:32:27,872 InetAddress /192.168.80.234 is now UP
>
>  INFO 20:33:57,269 Joining: getting bootstrap token
>
>  INFO 20:33:57,278 New token will be 25870423804996813139937576731363583348
> to assume load from /192.168.80.12
>
>  INFO 20:33:57,279 Joining: sleeping 3 for pending range setup
>
>  INFO 20:34:27,280 Bootstrapping
>
>  INFO 21:32:27,867 Compacting []
>
>  INFO 21:38:27,118 LocationInfo has reached its threshold; switching in a
> fresh Memtable at
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1275413547129.log',
> position=824)
>
>  INFO 21:38:27,118 Enqueuing flush of Memtable(LocationInfo)@993374707
>
>  INFO 21:38:27,118 Writing Memtable(LocationInfo)@993374707
>
>  INFO 21:38:27,158 Completed flushing
> /var/lib/cassandra/data/system/LocationInfo-4-Data.db
>
>  INFO 21:38:27,160 Compacting
> [org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-1-Data.db'),org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-2-Data.db'),org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-3-Data.db'),org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-4-Data.db')]
>
>  INFO 21:38:27,217 Compacted to
> /var/lib/cassandra/data/system/LocationInfo-5-Data.db.  1294/358 bytes for 1
> keys.  Time: 56ms.
>
> [r...@iqdev01 cassandra]# bin/cassandra -f
>
>  INFO 21:40:07,519 Auto DiskAccessMode determined to be mmap
>
>  INFO 21:40:07,972 Deleted
> /var/lib/cassandra/data/system/LocationInfo-1-Data.db
>
>  INFO 21:40:07,973 Deleted
> /var/lib/cassandra/data/system/LocationInfo-2-Data.db
>
>  INFO 21:40:07,974 Deleted
> /var/lib/cassandra/data/system/LocationInfo-3-Data.db
>
>  INFO 21:40:07,982 Sampling index for
> /var/lib/cassandra/data/system/LocationInfo-5-Data.db
>
>  INFO 21:40:07,991 Deleted
> /var/lib/cassandra/data/system/LocationInfo-4-Data.db
>
>  INFO 21:40:08,000 Replaying
> /var/lib/cassandra/commitlog/CommitLog-1275413547129.log
>
>  INFO 21:40:08,001 Log replay complete
>
>  INFO 21:40:08,038 Saved Token found: 25870423804996813139937576731363583348
>
>  INFO 21:40:08,040 Saved ClusterName found: Lookin2
>
>  INFO 21:40:08,042 Creating new commitlog segment
> /var/lib/cassandra/commitlog/CommitLog-1275417608042.log
>
>  INFO 21:40:08,059 Starting up server gossip
>
>  INFO 21:40:08,071 Joining: getting load information
>
>  INFO 21:40:08,071 Sleeping 9 ms to wait for load information...
>
>  INFO 21:40:10,372 Node /192.168.80.12 is now part of the cluster
>
>  INFO 21:40:10,374 Node /192.168.80.234 is now part of the cluster
>
>  INFO 21:40:11,091 InetAddress /192.168.80.234 is now UP
>
>  INFO 21:40:12,078 InetAddress /192.168.80.12 is now UP
>
>  INFO 21:41:38,072 Joining: getting bootstrap token
>
>  INFO 21:41:38,088 New token will be 25870423804996813139937576731363583348
> to assume load from /192.168.80.12
>
>  INFO 21:41:38,089 Joining: sleeping 3 for pending range setup
>
>  INFO 21:42:08,091 Bootstrapping
>
>

Re: Handling disk-full scenarios

2010-06-02 Thread Jonathan Ellis

this is why JBOD configuration is contraindicated for cassandra.
http://wiki.apache.org/cassandra/CassandraHardware

On Tue, Jun 1, 2010 at 1:08 PM, Ian Soboroff  wrote:
> My nodes have 5 disks and are using them separately as data disks.  The
> usage on the disks is not uniform, and one is nearly full.  Is there some
> way to manually balance the files across the disks?  Pretty much anything
> done via nodetool incurs an anticompaction with obviously fails.  system/ is
> not the problem, it's in my data's keyspace.
>
> Ian
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Start key must sort before (or equal to) finish key in your partitioner

2010-06-02 Thread Jonathan Ellis

that would be reasonable

On Wed, Jun 2, 2010 at 6:41 AM, David Boxenhorn  wrote:
> Would it be better to use an SQL-style timestamp ("-MM-DD HH:MM:SS.MMM")
> + unique id, then? They sort lexically the same as they sort
> chronologically.
>
> On Wed, Jun 2, 2010 at 4:37 PM, Leslie Viljoen 
> wrote:
>>
>> On Mon, May 31, 2010 at 8:52 PM, Jonathan Ellis  wrote:
>> > OPP uses lexical ordering on the keys, which isn't going to be the
>> > same as the natural order for a time-based uuid.
>>
>> *palmface*
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Is there any way to detect when a node is down so I can failover more effectively?

2010-06-02 Thread Jonathan Ellis

you're overcomplicating things.

just connect to *a* node, and if it happens to be down, try a different one.

nodes being down should be a rare event, not a normal condition.  no
need to optimize for it so much.

also see http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to

2010/6/1 Patricio Echagüe :
> Hi all, I'm using Hector framework to interact with Cassandra and at trying
> to handle failover more effectively I found it a bit complicated to fetch
> all cassandra nodes that are up and running.
>
> My goal is to keep an up-to-date list of active/up Cassandra servers to
> provide HEctor every time I need to execute against the db.
>
> I've seen this Thrift  method: get_string_property("token map") but it
> returns the nodes in the ring no matter is the node is down.
>
>
>
> Any advice?
>
> --
> Patricio.-
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: nodetool cleanup isn't cleaning up?

2010-06-02 Thread Jonathan Ellis

Then the next step is to check StorageService.getRangeToEndpointMap via jmx

On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory  wrote:
> I'm using RackAwareStrategy. But it still doesn't make sense I think...
> let's see what did I miss...
> According to http://wiki.apache.org/cassandra/Operations
>
> RackAwareStrategy: replica 2 is placed in the first node along the ring the
> belongs in another data center than the first; the remaining N-2 replicas,
> if any, are placed on the first nodes along the ring in the same rack as the
> first
>
> 192.168.252.124Up        803.33 MB
> 56713727820156410577229101238628035242     |<--|
> 192.168.252.99Up         352.85 MB
> 56713727820156410577229101238628035243     |   ^
> 192.168.252.125Up        134.24 MB
> 85070591730234615865843651857942052863     v   |
> 192.168.254.57Up         676.41 MB
>  113427455640312821154458202477256070485    |   ^
> 192.168.254.58Up          99.74 MB
>  141784319550391026443072753096570088106    v   |
> 192.168.254.59Up          99.94 MB
>  170141183460469231731687303715884105727    |-->|
> Alright, so I made a mistake and didn't use the alternate-datacenter
> suggestion on the page so the first node of every DC is overloaded with
> replicas. However,  the current situation still doesn't make sense to me.
> .252.124 will be overloaded b/c it has the first token in the 252 dc.
> .254.57 will also be overloaded since it has the first token in the .254 DC.
> But for which node does 252.99 serve as a replicator? It's not the first in
> the DC and it's just one single token more than it's predecessor (which is
> in the same DC).
> On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis  wrote:
>>
>> I'm saying that .99 is getting a copy of all the data for which .124
>> is the primary.  (If you are using RackUnawarePartitioner.  If you are
>> using RackAware it is some other node.)
>>
>> On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory  wrote:
>> > ok, let me try and translate your answer ;)
>> > Are you saying that the data that was left on the node is
>> > non-primary-replicas of rows from the time before the move?
>> > So this implies that when a node moves in the ring, it will affect
>> > distribution of:
>> > - new keys
>> > - old keys primary node
>> > -- but will not affect distribution of old keys non-primary replicas.
>> > If so, still I don't understand something... I would expect even the
>> > non-primary replicas of keys to be moved since if they don't, how would
>> > they
>> > be found? I mean upon reads the serving node should not care about
>> > whether
>> > the row is new or old, it should have a consistent and global mapping of
>> > tokens. So I guess this ruins my theory...
>> > What did you mean then? Is this deletions of non-primary replicated
>> > data?
>> > How does the replication factor affect the load on the moved host then?
>> >
>> > On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis 
>> > wrote:
>> >>
>> >> well, there you are then.
>> >>
>> >> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory  wrote:
>> >> > yes, replication factor = 2
>> >> >
>> >> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis 
>> >> > wrote:
>> >> >>
>> >> >> you have replication factor > 1 ?
>> >> >>
>> >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory 
>> >> >> wrote:
>> >> >> > I hope I understand nodetool cleanup correctly - it should clean
>> >> >> > up
>> >> >> > all
>> >> >> > data
>> >> >> > that does not (currently) belong to this node. If so, I think it
>> >> >> > might
>> >> >> > not
>> >> >> > be working correctly.
>> >> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below
>> >> >> > 192.168.252.99Up         279.35 MB
>> >> >> > 3544607988759775661076818827414252202
>> >> >> >      |<--|
>> >> >> > 192.168.252.124Up         167.23 MB
>> >> >> > 56713727820156410577229101238628035242     |   ^
>> >> >> > 192.168.252.125Up         82.91 MB
>> >> >> >  85070591730234615865843651857942052863     v   |
>> >> >> > 192.168.254.57Up         366.6 MB
>> >> >> >  113427455640312821154458202477256070485    |   ^
>> >> >> > 192.168.254.58Up         88.44 MB
>> >> >> >  141784319550391026443072753096570088106    v   |
>> >> >> > 192.168.254.59Up         88.45 MB
>> >> >> >  170141183460469231731687303715884105727    |-->|
>> >> >> > I wanted 124 to take all the load from 99. So I issued a move
>> >> >> > command.
>> >> >> > $ nodetool -h cass99 -p 9004 move
>> >> >> > 56713727820156410577229101238628035243
>> >> >> >
>> >> >> > This command tells 99 to take the space b/w
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > (56713727820156410577229101238628035242, 56713727820156410577229101238628035243]
>> >> >> > which is basically just one item in the token space, almost
>> >> >> > nothing... I
>> >> >> > wanted it to be very slim (just playing around).
>> >> >> > So, next I get this:
>> >> >> > 192.168.252.124Up         803.33 MB
>> >> >> > 56713727820156410577229101238628035242     |<--|
>> >> >> > 192.168.252.99Up         352.85 MB
>> >> >> > 56713727820156410577229101

Re: Monitoring compaction

2010-06-02 Thread Jonathan Ellis

Sure, patching CM stats into nodetool is fine.

On Tue, Jun 1, 2010 at 9:50 AM, Ian Soboroff  wrote:
> Regarding compaction thresholds... the BMT example says to set the threshold
> to 0 during an import.  Is this advisable during any bulk import (say using
> batch mutations or just lots and lots of thrift inserts)?
>
> Also, when I asked "are folks open to..." I meant that I'm happy to code a
> patch if anyone's interested.
> Ian
>
> On Tue, Jun 1, 2010 at 12:41 PM, Ian Soboroff  wrote:
>>
>> Thanks.  Are folks open to exposing this via nodetool?  I've been trying
>> to figure out a decent way to aggregate and expose all this information that
>> is easier than nodetool and less noisy than nagios... suggestions
>> appreciated.
>>
>> (My cluster only exposes a master node and everything else is private, so
>> running a pile of jconsoles is not even possible...)
>>
>> Ian
>>
>> On Tue, Jun 1, 2010 at 12:33 PM, Dylan Egan / WildfireApp.com
>>  wrote:
>>>
>>> Hi Ian,
>>>
>>> On Tue, Jun 1, 2010 at 9:27 AM, Ian Soboroff  wrote:
>>> > Are stats exposed over JMX for compaction?
>>>
>>> You can view them via the
>>> org.apache.cassandra.db:type=CompactionManager MBean. The PendingTasks
>>> attribute might suit you best.
>>>
>>> Cheers,
>>>
>>> Dylan.
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Effective cache size

2010-06-02 Thread David King

If I go to fetch some row given the rack-unaware placement strategy, the 
default snitch and CL==ONE, the node that is asked is the first node in the 
ring with the datum that is currently up, then a checksum is sent to the 
replicas to trigger read repair as appropriate. So with the row cache, that 
first node (the primary replica) is the one that has that row cached, yes?

So if i have six nodes, CL==ONE, RF==3, row cache of 3 million on each node. Do 
I have an effective 6 million row cache (3m*6/3)? Or 18m? And is that changed 
by doing CL==QUORUM reads?

Re: Continuously increasing RAM usage

2010-06-02 Thread Jake Luciani

I've started seeing this issue as well.  Running 0.6.2.

One interesting thing I happened upon, I explicitly called the GC via
jconsole and the heap dropped completely fixing the issue.  When you
explicitly call System.gc() it does a full sweep.  I'm wondering if this
issue is to do with the GC flags used.

-Jake

On Wed, Jun 2, 2010 at 3:09 PM, Torsten Curdt  wrote:

> We've also seen something like this. Will soon investigate and try
> again with 0.6.2
>
> On Wed, Jun 2, 2010 at 20:27, Paul Brown  wrote:
> >
> > FWIW, I'm seeing similar issues on a cluster.  Three nodes, Cassandra
> 0.6.1, SUN JDK 1.6.0_b20.  I will try to get some heap dumps to see what's
> building up.
> >
> > I've seen this sort of issue in systems that make heavy use of
> java.util.concurrent queues/executors, e.g.:
> >
> > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6236036
> >
> > That bug is long fixed, but it is an instance of how it can be harder to
> do nothing than something.
> >
> > -- Paul
> >
> >
> > On May 26, 2010, at 11:32 PM, James Golick wrote:
> >
> >> We're seeing RAM usage continually climb until eventually, cassandra
> becomes unresponsive.
> >>
> >> The JVM isn't OOM'ing. It has only committed 14/24GB of memory. So, I am
> assuming that the memory usage is related to mmap'd IO. Fair assumption?
> >>
> >> I tried setting the IO mode to standard, but it seemed to be a little
> slower and couldn't get the machine to come back online with adequate read
> performance, so I set it back. I'll have to write a solid cache warming
> script if I'm going to try that again.
> >>
> >> Any other ideas for what might be causing the issue? Is there something
> I should monitor or look at next time it happens?
> >>
> >> Thanks
> >
> >
>

Re: Read operation with CL.ALL, not yet supported?

2010-06-02 Thread Yuki Morishita

Gary,

Thanks for reply. I've opened an issue at

https://issues.apache.org/jira/browse/CASSANDRA-1152

Yuki

2010/6/3 Gary Dusbabek :
> Yuki,
>
> Can you file a jira ticket for this
> (https://issues.apache.org/jira/browse/CASSANDRA)?  The wiki indicates
> that this should be allowed:  http://wiki.apache.org/cassandra/API
>
> Regards,
>
> Gary.
>
>
> On Tue, Jun 1, 2010 at 21:50, Yuki Morishita  wrote:
>> Hi,
>>
>> I'm testing several read operations(get, get_slice, get_count, etc.) with
>> various ConsistencyLevel and noticed that ConsistencyLevel.ALL is
>> "not yet supported" in most of read ops (other than get_range_slice).
>>
>> I've looked up code in StorageProxy#readProtocol and it seems
>> to be able to handle CL.ALL, but in thrift.CassandraServer#readColumnFamily,
>> there is code that just throws exception when consistency_level == ALL.
>> Is there any reason that CL.ALL is "not yet supported"?
>>
>> 
>> Yuki Morishita
>>  t:yukim (http://twitter.com/yukim)
>>
>



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)

ColumnFamilyInputFormat with super columns

2010-06-02 Thread Torsten Curdt

I have a super column along he lines of

 => {  => { att: value }}

Now I would like to process a set of rows [from_time..until_time] with Hadoop.
I've setup the hadoop job like this

job.setInputFormatClass(ColumnFamilyInputFormat.class);
ConfigHelper.setColumnFamily(job.getConfiguration(), "family", 
"events");

SlicePredicate predicate = new SlicePredicate();
predicate.setSlice_range(new SliceRange(new byte[0], new 
byte[0],
false, 1000));

ConfigHelper.setSlicePredicate(job.getConfiguration(), 
predicate);  

but I don't see how I could say what rows the job should process.
Any pointers?

cheers
--
Torsten

Re: Changing replication factor from 2 to 3

2010-06-02 Thread Rob Coli


On 6/2/10 12:49 PM, Eric Halpern wrote:


We'd like to double our cluster size from 4 to 8 and increase our replication
factor from 2 to 3.

Is there any special procedure we need to follow to increase replication?
Is it sufficient to just start the new nodes with the replication factor of
3 and then reconfigure the existing nodes to the replication factor one at a
time?


http://wiki.apache.org/cassandra/Operations
"
Replication factor is not really intended to be changed in a live 
cluster either, but increasing it may be done if you (a) use 
ConsistencyLevel.QUORUM or ALL (depending on your existing replication 
factor) to make sure that a replica that actually has the data is 
consulted, (b) are willing to accept downtime while anti-entropy repair 
runs (see below), or (c) are willing to live with some clients 
potentially being told no data exists if they read from the new replica 
location(s) until repair is done.

"

Please feel free to update this wiki page if the above information is 
incomplete in any way. :)


=Rob

Changing replication factor from 2 to 3

2010-06-02 Thread Eric Halpern


We'd like to double our cluster size from 4 to 8 and increase our replication
factor from 2 to 3.

Is there any special procedure we need to follow to increase replication? 
Is it sufficient to just start the new nodes with the replication factor of
3 and then reconfigure the existing nodes to the replication factor one at a
time?


-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Changing-replication-factor-from-2-to-3-tp5132290p5132290.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Nodes dropping out of cluster due to GC

2010-06-02 Thread Eric Halpern



Ryan King wrote:
> 
> Why run with so few nodes?
> 
> -ryan
> 
> On Tue, Jun 1, 2010 at 4:20 PM, Eric Halpern  wrote:
>>
>> Hello,
>>
>> We're running a 4 node cluster on beefy EC2 virtual instances (8 core, 32
>> GB) using EBS storage with 8 GB of heap allocated to the JVM.
>>
>> Every couple of hours, each of the nodes does a concurrent mark/sweep
>> that
>> takes around 30 seconds to complete.  During that GC, the node
>> temporarily
>> drops out of the cluster, usually for about 15 seconds.
>>
>> The frequency of the concurrent mark sweeps seems reasonable, but the
>> fact
>> that the node drops out of the cluster temporarily is a major problem
>> since
>> this has significant impact on the performance and stability of our
>> service.
>>
>> Has anyone experienced this sort of problem?  It would be great to hear
>> from
>> anyone who has had experience with this sort of issue and/or suggestions
>> for
>> how to deal with it.
>>
>> Thanks, Eric
>> --
> 
> 

We wanted to start with a small number of nodes to test things out before
going big.  Is there some reason that a small cluster would cause more
problems in this regard.  The actual request load is actually pretty light
for the cluster.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Nodes-dropping-out-of-cluster-due-to-GC-tp5128481p5132279.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Nodes dropping out of cluster due to GC

2010-06-02 Thread Eric Halpern



Oleg Anastasjev wrote:
> 
>> 
>> Has anyone experienced this sort of problem?  It would be great to hear
>> from
>> anyone who has had experience with this sort of issue and/or suggestions
>> for
>> how to deal with it.
>> 
>> Thanks, Eric  
> 
> Yes, i did. Symptoms you described point to concurrent GC FAILURE. During
> this
> failure concurrent GC completely stops java program (i.e. cassandra) and
> does a
> GC cycle. Other cassandra nodes discover, that node is not responding and
> considering it dead.
> If concurrent GC is properly tuned, it should never do stop-the-world and
> GC ( 
> thats why it is called concurrent ;-) ).
> Reasons for concurrent GC failures can be several:
> 1. Not enought java heap - try to raise max java heap limit
> 2. Improperly sized java heap regions.
> 
> To help you to narrow the problem, pass -XX:+PrintGCDetails option to JVM
> launching cassandra node. This will log information about internal GC
> activities. Let it run till it will be thrown out of cluster again and
> search
> for "concurrent mode failure" or "promotion failed" strings. 
> 

We did indeed have a problem with our GC settings.  The survivor ratio was
too low.  After changing that things are better but we are still seeing GC
that takes 5-10 seconds, which is enough for the node to drop out of the
cluster briefly.
 
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Nodes-dropping-out-of-cluster-due-to-GC-tp5128481p5132267.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Continuously increasing RAM usage

2010-06-02 Thread Torsten Curdt

We've also seen something like this. Will soon investigate and try
again with 0.6.2

On Wed, Jun 2, 2010 at 20:27, Paul Brown  wrote:
>
> FWIW, I'm seeing similar issues on a cluster.  Three nodes, Cassandra 0.6.1, 
> SUN JDK 1.6.0_b20.  I will try to get some heap dumps to see what's building 
> up.
>
> I've seen this sort of issue in systems that make heavy use of 
> java.util.concurrent queues/executors, e.g.:
>
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6236036
>
> That bug is long fixed, but it is an instance of how it can be harder to do 
> nothing than something.
>
> -- Paul
>
>
> On May 26, 2010, at 11:32 PM, James Golick wrote:
>
>> We're seeing RAM usage continually climb until eventually, cassandra becomes 
>> unresponsive.
>>
>> The JVM isn't OOM'ing. It has only committed 14/24GB of memory. So, I am 
>> assuming that the memory usage is related to mmap'd IO. Fair assumption?
>>
>> I tried setting the IO mode to standard, but it seemed to be a little slower 
>> and couldn't get the machine to come back online with adequate read 
>> performance, so I set it back. I'll have to write a solid cache warming 
>> script if I'm going to try that again.
>>
>> Any other ideas for what might be causing the issue? Is there something I 
>> should monitor or look at next time it happens?
>>
>> Thanks
>
>

Re: Giant sets of ordered data

2010-06-02 Thread Jonathan Shook

Insert "if you want to use long values for keys and column names"
above paragraph 2. I forgot that part.

On Wed, Jun 2, 2010 at 1:29 PM, Jonathan Shook  wrote:
> If you want to do range queries on the keys, you can use OPP to do this:
> (example using UTF-8 lexicographic keys, with bursts split across rows
> according to row size limits)
>
> Events: {
>  "20100601.05.30.003": {
>    "20100601.05.30.003": 
>    "20100601.05.30.007": 
>    ...
>  }
> }
>
> With a future version of Cassandra, you may be able to use the same
> basic datatype for both key and column name, as keys will be binary
> like the rest, I believe.
>
> I'm not aware of specific performance improvements when using OPP
> range queries on keys vs iterating over known keys. I suspect (hope)
> that round-tripping to the server should be reduced, which may be
> significant. Does anybody have decent benchmarks that tell the
> difference?
>
>
> On Wed, Jun 2, 2010 at 11:53 AM, Ben Browning  wrote:
>> With a traffic pattern like that, you may be better off storing the
>> events of each burst (I'll call them group) in one or more keys and
>> then storing these keys in the day key.
>>
>> EventGroupsPerDay: {
>>  "20100601": {
>>    123456789: "group123", // column name is timestamp group was
>> received, column value is key
>>    123456790: "group124"
>>  }
>> }
>>
>> EventGroups: {
>>  "group123": {
>>    123456789: "value1",
>>    123456799: "value2"
>>   }
>> }
>>
>> If you think of Cassandra as a toolkit for building scalable indexes
>> it seems to make the modeling a bit easier. In this case, you're
>> building an index by day to lookup events that come in as groups. So,
>> first you'd fetch the slice of columns for the day you're interested
>> in to figure out which groups to look at then you'd fetch the events
>> in those groups.
>>
>> There are plenty of alternate ways to divide up the data among rows
>> also - you could use hour keys instead of days as an example.
>>
>> On Wed, Jun 2, 2010 at 11:57 AM, David Boxenhorn  wrote:
>>> Let's say you're logging events, and you have billions of events. What if
>>> the events come in bursts, so within a day there are millions of events, but
>>> they all come within microseconds of each other a few times a day? How do
>>> you find the events that happened on a particular day if you can't store
>>> them all in one row?
>>>
>>> On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook  wrote:

 Either OPP by key, or within a row by column name. I'd suggest the latter.
 If you have structured data to stick under a column (named by the
 timestamp), then you can serialize and unserialize it yourself, or you
 can use a supercolumn. It's effectively the same thing.  Cassandra
 only provides the super column support as a convenience layer as it is
 currently implemented. That may change in the future.

 You didn't make clear in your question why a standard column would be
 less suitable. I presumed you had layered structure within the
 timestamp, hence my response.
 How would you logically partition your dataset according to natural
 application boundaries? This will answer most of your question.
 If you have a dataset which can't be partitioned into a reasonable
 size row, then you may want to use OPP and key concatenation.

 What do you mean by giant?

 On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn 
 wrote:
 > How do I handle giant sets of ordered data, e.g. by timestamps, which I
 > want
 > to access by range?
 >
 > I can't put all the data into a supercolumn, because it's loaded into
 > memory
 > at once, and it's too much data.
 >
 > Am I forced to use an order-preserving partitioner? I don't want the
 > headache. Is there any other way?
 >
>>>
>>>
>>
>

Re: Giant sets of ordered data

2010-06-02 Thread Jonathan Shook

If you want to do range queries on the keys, you can use OPP to do this:
(example using UTF-8 lexicographic keys, with bursts split across rows
according to row size limits)

Events: {
 "20100601.05.30.003": {
"20100601.05.30.003": 
"20100601.05.30.007": 
...
 }
}

With a future version of Cassandra, you may be able to use the same
basic datatype for both key and column name, as keys will be binary
like the rest, I believe.

I'm not aware of specific performance improvements when using OPP
range queries on keys vs iterating over known keys. I suspect (hope)
that round-tripping to the server should be reduced, which may be
significant. Does anybody have decent benchmarks that tell the
difference?


On Wed, Jun 2, 2010 at 11:53 AM, Ben Browning  wrote:
> With a traffic pattern like that, you may be better off storing the
> events of each burst (I'll call them group) in one or more keys and
> then storing these keys in the day key.
>
> EventGroupsPerDay: {
>  "20100601": {
>    123456789: "group123", // column name is timestamp group was
> received, column value is key
>    123456790: "group124"
>  }
> }
>
> EventGroups: {
>  "group123": {
>    123456789: "value1",
>    123456799: "value2"
>   }
> }
>
> If you think of Cassandra as a toolkit for building scalable indexes
> it seems to make the modeling a bit easier. In this case, you're
> building an index by day to lookup events that come in as groups. So,
> first you'd fetch the slice of columns for the day you're interested
> in to figure out which groups to look at then you'd fetch the events
> in those groups.
>
> There are plenty of alternate ways to divide up the data among rows
> also - you could use hour keys instead of days as an example.
>
> On Wed, Jun 2, 2010 at 11:57 AM, David Boxenhorn  wrote:
>> Let's say you're logging events, and you have billions of events. What if
>> the events come in bursts, so within a day there are millions of events, but
>> they all come within microseconds of each other a few times a day? How do
>> you find the events that happened on a particular day if you can't store
>> them all in one row?
>>
>> On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook  wrote:
>>>
>>> Either OPP by key, or within a row by column name. I'd suggest the latter.
>>> If you have structured data to stick under a column (named by the
>>> timestamp), then you can serialize and unserialize it yourself, or you
>>> can use a supercolumn. It's effectively the same thing.  Cassandra
>>> only provides the super column support as a convenience layer as it is
>>> currently implemented. That may change in the future.
>>>
>>> You didn't make clear in your question why a standard column would be
>>> less suitable. I presumed you had layered structure within the
>>> timestamp, hence my response.
>>> How would you logically partition your dataset according to natural
>>> application boundaries? This will answer most of your question.
>>> If you have a dataset which can't be partitioned into a reasonable
>>> size row, then you may want to use OPP and key concatenation.
>>>
>>> What do you mean by giant?
>>>
>>> On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn 
>>> wrote:
>>> > How do I handle giant sets of ordered data, e.g. by timestamps, which I
>>> > want
>>> > to access by range?
>>> >
>>> > I can't put all the data into a supercolumn, because it's loaded into
>>> > memory
>>> > at once, and it's too much data.
>>> >
>>> > Am I forced to use an order-preserving partitioner? I don't want the
>>> > headache. Is there any other way?
>>> >
>>
>>
>

Re: Continuously increasing RAM usage

2010-06-02 Thread Paul Brown

FWIW, I'm seeing similar issues on a cluster.  Three nodes, Cassandra 0.6.1, 
SUN JDK 1.6.0_b20.  I will try to get some heap dumps to see what's building up.

I've seen this sort of issue in systems that make heavy use of 
java.util.concurrent queues/executors, e.g.:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6236036

That bug is long fixed, but it is an instance of how it can be harder to do 
nothing than something.

-- Paul

On May 26, 2010, at 11:32 PM, James Golick wrote:

> We're seeing RAM usage continually climb until eventually, cassandra becomes 
> unresponsive.
> 
> The JVM isn't OOM'ing. It has only committed 14/24GB of memory. So, I am 
> assuming that the memory usage is related to mmap'd IO. Fair assumption?
> 
> I tried setting the IO mode to standard, but it seemed to be a little slower 
> and couldn't get the machine to come back online with adequate read 
> performance, so I set it back. I'll have to write a solid cache warming 
> script if I'm going to try that again.
> 
> Any other ideas for what might be causing the issue? Is there something I 
> should monitor or look at next time it happens?
> 
> Thanks

Capacity planning and Re: Handling disk-full scenarios

2010-06-02 Thread Ian Soboroff

Reading some more (someone break in when I lose my clue ;-)

Reading the streams page in the wiki about anticompaction, I think the best
approach to take when a node gets its disks overfull, is to set the
compaction thresholds to 0 on all nodes, decommission the overfull node,
wait for stuff to get redistributed, and then clean off the decommissioned
node and bootstrap it.  Since the disks are too full for an anticompaction,
you can't move the token on that node.

Given this, I wonder about the right approach to capacity planning.  If I
want to store, say, 500M rows, and I know based on current cfstats that the
mean compacted size row is 27k, how much overhead is there on top of the
13.5 TB of raw data?

Trying to compute from what I have, in cfstats I have a total "Spaced used
(total)" of around 1.6TB (this is only a subset of the data loaded so far),
but when I could data directories using du(1) I get around 23TB already
used.

On Wed, Jun 2, 2010 at 11:29 AM, Ian Soboroff  wrote:

> Ok, answered part of this myself.  You can stop a node, move files around
> on the data disks, as long as they stay in the right keyspace directories,
> and all is fine.
>
> Now, I have a single Data.db file which is 900GB and is compacted.  The
> drive its on is only 1.5TB, so it can't anticompact at all.  Is there
> anything I can do?  The replication factor is 3, so one idea is to take down
> the node, blow away the huge file, adjust the token, and restart the node.
> At that point I'm not sure what to tell the new node or other nodes to do...
> do I need to run a repair, or a cleanup, or a loadbalance, or ... what?
>
> It would be great to be able to fix a storage quota on a per-data-directory
> basis, to ensure that enough capacity is retained for anticompaction.
> Default 45% quota, adjustable for the brave.
>
> Ian
>
>
> On Tue, Jun 1, 2010 at 4:08 PM, Ian Soboroff  wrote:
>
>> My nodes have 5 disks and are using them separately as data disks.  The
>> usage on the disks is not uniform, and one is nearly full.  Is there some
>> way to manually balance the files across the disks?  Pretty much anything
>> done via nodetool incurs an anticompaction with obviously fails.  system/ is
>> not the problem, it's in my data's keyspace.
>>
>> Ian
>>
>>
>

Re: Read operation with CL.ALL, not yet supported?

2010-06-02 Thread Gary Dusbabek

Yuki,

Can you file a jira ticket for this
(https://issues.apache.org/jira/browse/CASSANDRA)?  The wiki indicates
that this should be allowed:  http://wiki.apache.org/cassandra/API

Regards,

Gary.


On Tue, Jun 1, 2010 at 21:50, Yuki Morishita  wrote:
> Hi,
>
> I'm testing several read operations(get, get_slice, get_count, etc.) with
> various ConsistencyLevel and noticed that ConsistencyLevel.ALL is
> "not yet supported" in most of read ops (other than get_range_slice).
>
> I've looked up code in StorageProxy#readProtocol and it seems
> to be able to handle CL.ALL, but in thrift.CassandraServer#readColumnFamily,
> there is code that just throws exception when consistency_level == ALL.
> Is there any reason that CL.ALL is "not yet supported"?
>
> 
> Yuki Morishita
>  t:yukim (http://twitter.com/yukim)
>

Re: Error during startup

2010-06-02 Thread Gary Dusbabek

I was able to reproduce the error by staring up a node using
RandomPartioner, kill it, switch to OrderPreservingPartitioner,
restart, kill, switch back to RandomPartitioner, BANG!

So it looks like you tinkered with the partitioner at some point.
This has the unfortunate effect of corrupting your system table.  I'm
trying to figure out a way to detect this and abort before data is
overwritten.

Gary.


On Sun, May 30, 2010 at 06:49, David Boxenhorn  wrote:
> I deleted the system/LocationInfo files, and now everything works.
>
> Yay! (...what happened?)
>
> On Sun, May 30, 2010 at 4:18 PM, David Boxenhorn  wrote:
>>
>> I'm getting an "Expected both token and generation columns; found
>> ColumnFamily" error during startup can anyone tell me what it is? Details
>> below.
>>
>> Starting Cassandra Server
>> Listening for transport dt_socket at address: 
>>  INFO 16:14:33,459 Auto DiskAccessMode determined to be standard
>>  INFO 16:14:33,615 Sampling index for
>> C:\var\lib\cassandra\data\system\LocationInfo-1-Data.db
>>  INFO 16:14:33,631 Removing orphan
>> C:\var\lib\cassandra\data\Lookin2\Users-tmp-27-Index.db
>>  INFO 16:14:33,631 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\Users-19-Data.db
>>  INFO 16:14:33,662 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\Users-18-Data.db
>>  INFO 16:14:33,818 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\Users-20-Data.db
>>  INFO 16:14:33,850 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\Users-21-Data.db
>>  INFO 16:14:33,865 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\Users-22-Data.db
>>  INFO 16:14:33,881 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-580-Data.db
>>  INFO 16:14:33,896 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-672-Data.db
>>  INFO 16:14:33,912 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-681-Data.db
>>  INFO 16:14:33,912 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-691-Data.db
>>  INFO 16:14:33,928 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-696-Data.db
>>  INFO 16:14:33,943 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\Attractions-17-Data.db
>>  INFO 16:14:34,006 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestTrendsetterIdx-5-Data.db
>>  INFO 16:14:34,006 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestTrendsetterIdx-6-Data.db
>>  INFO 16:14:34,021 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-29-Data.db
>>  INFO 16:14:34,350 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-51-Data.db
>>  INFO 16:14:34,693 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-72-Data.db
>>  INFO 16:14:35,021 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-77-Data.db
>>  INFO 16:14:35,225 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-78-Data.db
>>  INFO 16:14:35,350 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-79-Data.db
>>  INFO 16:14:35,459 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-80-Data.db
>>  INFO 16:14:35,459 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\Taxonomy-1-Data.db
>>  INFO 16:14:35,475 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\Taxonomy-2-Data.db
>>  INFO 16:14:35,475 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\Content-30-Data.db
>>  INFO 16:14:35,631 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\Content-35-Data.db
>>  INFO 16:14:35,771 Sampling index for
>> C:\var\lib\cassandra\data\Lookin2\Content-40-Data.db
>>  INFO 16:14:35,959 Compacting
>> [org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-19-Data.db'),org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-20-Data.db'),org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-21-Data.db'),org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-22-Data.db')]
>> ERROR 16:14:35,975 Exception encountered during startup.
>> java.lang.RuntimeException: Expected both token and generation columns;
>> found ColumnFamily(LocationInfo [Generation:false:4...@4,])
>>     at
>> org.apache.cassandra.db.SystemTable.initMetadata(SystemTable.java:159)
>>     at
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:305)
>>     at
>> org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:99)
>>     at
>> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:177)
>> Exception encountered during startup.
>>
>
>

Re: Heterogeneous Cassandra Cluster

2010-06-02 Thread David Boxenhorn

Our replication factor was 1, so that wasn't the problem. (We tried other
replication factors too, just in case, but it didn't help.)

On Wed, Jun 2, 2010 at 7:51 PM, Nahor

> wrote:

>  On 2010-06-02 3:18, David Boxenhorn wrote:
>
>> Is it possible to make a heterogeneous Cassandra cluster, with both Linux
>> and Windows nodes? I tried doing it and got
>>
>> Error in ThreadPoolExecutor
>> java.lang.NullPointerException
>>
>> Not sure if this is due to the Linux/Windows mix or something else.
>>
>>
>> Details below:
>>
> [...]
>
>
>>  INFO 21:42:08,091 Bootstrapping
>>
>> ERROR 21:49:03,526 Error in ThreadPoolExecutor
>>
>> java.lang.NullPointerException
>>
>>at
>> org.apache.cassandra.streaming.StreamInitiateVerbHandler.getNewNames(StreamInitiateVerbHandler.java:154)
>>
>>at
>> org.apache.cassandra.streaming.StreamInitiateVerbHandler.doVerb(StreamInitiateVerbHandler.java:76)
>>
>>at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40)
>>
>>at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>
>>at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>
>>at java.lang.Thread.run(Thread.java:619)
>>
>> ERROR 21:49:03,527 Fatal exception in thread
>> Thread[MESSAGE-DESERIALIZER-POOL:1,5,main]
>>
>> java.lang.NullPointerException
>>
>>at
>> org.apache.cassandra.streaming.StreamInitiateVerbHandler.getNewNames(StreamInitiateVerbHandler.java:154)
>>
>>at
>> org.apache.cassandra.streaming.StreamInitiateVerbHandler.doVerb(StreamInitiateVerbHandler.java:76)
>>
>>at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40)
>>
>>at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>
>>at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>
>>at java.lang.Thread.run(Thread.java:619)
>>
>>
>>
> Looks like https://issues.apache.org/jira/browse/CASSANDRA-1136
>
> Make sure you have enough nodes in your cluster to satisfy your replication
> factor before you add any data. This is what seems to be source of the
> problem in my case.
>
> That said, I was also using an heterogeneous system (Linux + Windows) but I
> think I tested it with only Linux nodes too.
>
>
>

Re: Giant sets of ordered data

2010-06-02 Thread Ben Browning

With a traffic pattern like that, you may be better off storing the
events of each burst (I'll call them group) in one or more keys and
then storing these keys in the day key.

EventGroupsPerDay: {
  "20100601": {
123456789: "group123", // column name is timestamp group was
received, column value is key
123456790: "group124"
  }
}

EventGroups: {
  "group123": {
123456789: "value1",
123456799: "value2"
   }
}

If you think of Cassandra as a toolkit for building scalable indexes
it seems to make the modeling a bit easier. In this case, you're
building an index by day to lookup events that come in as groups. So,
first you'd fetch the slice of columns for the day you're interested
in to figure out which groups to look at then you'd fetch the events
in those groups.

There are plenty of alternate ways to divide up the data among rows
also - you could use hour keys instead of days as an example.

On Wed, Jun 2, 2010 at 11:57 AM, David Boxenhorn  wrote:
> Let's say you're logging events, and you have billions of events. What if
> the events come in bursts, so within a day there are millions of events, but
> they all come within microseconds of each other a few times a day? How do
> you find the events that happened on a particular day if you can't store
> them all in one row?
>
> On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook  wrote:
>>
>> Either OPP by key, or within a row by column name. I'd suggest the latter.
>> If you have structured data to stick under a column (named by the
>> timestamp), then you can serialize and unserialize it yourself, or you
>> can use a supercolumn. It's effectively the same thing.  Cassandra
>> only provides the super column support as a convenience layer as it is
>> currently implemented. That may change in the future.
>>
>> You didn't make clear in your question why a standard column would be
>> less suitable. I presumed you had layered structure within the
>> timestamp, hence my response.
>> How would you logically partition your dataset according to natural
>> application boundaries? This will answer most of your question.
>> If you have a dataset which can't be partitioned into a reasonable
>> size row, then you may want to use OPP and key concatenation.
>>
>> What do you mean by giant?
>>
>> On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn 
>> wrote:
>> > How do I handle giant sets of ordered data, e.g. by timestamps, which I
>> > want
>> > to access by range?
>> >
>> > I can't put all the data into a supercolumn, because it's loaded into
>> > memory
>> > at once, and it's too much data.
>> >
>> > Am I forced to use an order-preserving partitioner? I don't want the
>> > headache. Is there any other way?
>> >
>
>

Re: Heterogeneous Cassandra Cluster

2010-06-02 Thread Nahor


 On 2010-06-02 3:18, David Boxenhorn wrote:
Is it possible to make a heterogeneous Cassandra cluster, with both 
Linux and Windows nodes? I tried doing it and got


Error in ThreadPoolExecutor
java.lang.NullPointerException

Not sure if this is due to the Linux/Windows mix or something else.


Details below:

[...]


 INFO 21:42:08,091 Bootstrapping

ERROR 21:49:03,526 Error in ThreadPoolExecutor

java.lang.NullPointerException

at 
org.apache.cassandra.streaming.StreamInitiateVerbHandler.getNewNames(StreamInitiateVerbHandler.java:154)


at 
org.apache.cassandra.streaming.StreamInitiateVerbHandler.doVerb(StreamInitiateVerbHandler.java:76)


at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40)


at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)


at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)


at java.lang.Thread.run(Thread.java:619)

ERROR 21:49:03,527 Fatal exception in thread 
Thread[MESSAGE-DESERIALIZER-POOL:1,5,main]


java.lang.NullPointerException

at 
org.apache.cassandra.streaming.StreamInitiateVerbHandler.getNewNames(StreamInitiateVerbHandler.java:154)


at 
org.apache.cassandra.streaming.StreamInitiateVerbHandler.doVerb(StreamInitiateVerbHandler.java:76)


at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40)


at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)


at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)


at java.lang.Thread.run(Thread.java:619)




Looks like https://issues.apache.org/jira/browse/CASSANDRA-1136

Make sure you have enough nodes in your cluster to satisfy your 
replication factor before you add any data. This is what seems to be 
source of the problem in my case.


That said, I was also using an heterogeneous system (Linux + Windows) 
but I think I tested it with only Linux nodes too.

Re: Nodes dropping out of cluster due to GC

2010-06-02 Thread Ryan King

Why run with so few nodes?

-ryan

On Tue, Jun 1, 2010 at 4:20 PM, Eric Halpern  wrote:
>
> Hello,
>
> We're running a 4 node cluster on beefy EC2 virtual instances (8 core, 32
> GB) using EBS storage with 8 GB of heap allocated to the JVM.
>
> Every couple of hours, each of the nodes does a concurrent mark/sweep that
> takes around 30 seconds to complete.  During that GC, the node temporarily
> drops out of the cluster, usually for about 15 seconds.
>
> The frequency of the concurrent mark sweeps seems reasonable, but the fact
> that the node drops out of the cluster temporarily is a major problem since
> this has significant impact on the performance and stability of our service.
>
> Has anyone experienced this sort of problem?  It would be great to hear from
> anyone who has had experience with this sort of issue and/or suggestions for
> how to deal with it.
>
> Thanks, Eric
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Nodes-dropping-out-of-cluster-due-to-GC-tp5128481p5128481.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.
>

Re: Giant sets of ordered data

2010-06-02 Thread David Boxenhorn

Let's say you're logging events, and you have billions of events. What if
the events come in bursts, so within a day there are millions of events, but
they all come within microseconds of each other a few times a day? How do
you find the events that happened on a particular day if you can't store
them all in one row?

On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook  wrote:

> Either OPP by key, or within a row by column name. I'd suggest the latter.
> If you have structured data to stick under a column (named by the
> timestamp), then you can serialize and unserialize it yourself, or you
> can use a supercolumn. It's effectively the same thing.  Cassandra
> only provides the super column support as a convenience layer as it is
> currently implemented. That may change in the future.
>
> You didn't make clear in your question why a standard column would be
> less suitable. I presumed you had layered structure within the
> timestamp, hence my response.
> How would you logically partition your dataset according to natural
> application boundaries? This will answer most of your question.
> If you have a dataset which can't be partitioned into a reasonable
> size row, then you may want to use OPP and key concatenation.
>
> What do you mean by giant?
>
> On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn 
> wrote:
> > How do I handle giant sets of ordered data, e.g. by timestamps, which I
> want
> > to access by range?
> >
> > I can't put all the data into a supercolumn, because it's loaded into
> memory
> > at once, and it's too much data.
> >
> > Am I forced to use an order-preserving partitioner? I don't want the
> > headache. Is there any other way?
> >
>

Re: Giant sets of ordered data

2010-06-02 Thread Jonathan Shook

Either OPP by key, or within a row by column name. I'd suggest the latter.
If you have structured data to stick under a column (named by the
timestamp), then you can serialize and unserialize it yourself, or you
can use a supercolumn. It's effectively the same thing.  Cassandra
only provides the super column support as a convenience layer as it is
currently implemented. That may change in the future.

You didn't make clear in your question why a standard column would be
less suitable. I presumed you had layered structure within the
timestamp, hence my response.
How would you logically partition your dataset according to natural
application boundaries? This will answer most of your question.
If you have a dataset which can't be partitioned into a reasonable
size row, then you may want to use OPP and key concatenation.

What do you mean by giant?

On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn  wrote:
> How do I handle giant sets of ordered data, e.g. by timestamps, which I want
> to access by range?
>
> I can't put all the data into a supercolumn, because it's loaded into memory
> at once, and it's too much data.
>
> Am I forced to use an order-preserving partitioner? I don't want the
> headache. Is there any other way?
>

Re: Giant sets of ordered data

2010-06-02 Thread Ben Browning

I like to model this kind of data as columns, where the timestamps are
the column name (either longs, TimeUUIDs, or string depending on your
usage). If you have too much data for a single row, you'd need to have
multiple rows of these. For time-series data, it makes sense to use
one row per minute/hour/day/year depending on the volume of your data.

Something like the following:

SomeTimeData: { // columnfamily
  "20100601": { // key, mmdd
123456789: "value1", // column name is milliseconds since epoch
123456799: "value2"
  },
  "20100602": {
12345889: "value3"
  }
}

Now you can use column slices to retrieve all values between two time
periods on a given day. If you need to support larger ranges you'll
either have to slice columns from multiple keys or change the keys
from mmdd to mm, , etc. There's a tradeoff here between
row width and read speed. Reading 1000 columns as a continuous slice
from a single row will be very fast but reading 1000 columns as slices
from 10 keys won't be as fast.

Ben

On Wed, Jun 2, 2010 at 11:32 AM, David Boxenhorn  wrote:
> How do I handle giant sets of ordered data, e.g. by timestamps, which I want
> to access by range?
>
> I can't put all the data into a supercolumn, because it's loaded into memory
> at once, and it's too much data.
>
> Am I forced to use an order-preserving partitioner? I don't want the
> headache. Is there any other way?
>

Giant sets of ordered data

2010-06-02 Thread David Boxenhorn

How do I handle giant sets of ordered data, e.g. by timestamps, which I want
to access by range?

I can't put all the data into a supercolumn, because it's loaded into memory
at once, and it's too much data.

Am I forced to use an order-preserving partitioner? I don't want the
headache. Is there any other way?

Re: Handling disk-full scenarios

2010-06-02 Thread Ian Soboroff

Ok, answered part of this myself.  You can stop a node, move files around on
the data disks, as long as they stay in the right keyspace directories, and
all is fine.

Now, I have a single Data.db file which is 900GB and is compacted.  The
drive its on is only 1.5TB, so it can't anticompact at all.  Is there
anything I can do?  The replication factor is 3, so one idea is to take down
the node, blow away the huge file, adjust the token, and restart the node.
At that point I'm not sure what to tell the new node or other nodes to do...
do I need to run a repair, or a cleanup, or a loadbalance, or ... what?

It would be great to be able to fix a storage quota on a per-data-directory
basis, to ensure that enough capacity is retained for anticompaction.
Default 45% quota, adjustable for the brave.

Ian

On Tue, Jun 1, 2010 at 4:08 PM, Ian Soboroff  wrote:

> My nodes have 5 disks and are using them separately as data disks.  The
> usage on the disks is not uniform, and one is nearly full.  Is there some
> way to manually balance the files across the disks?  Pretty much anything
> done via nodetool incurs an anticompaction with obviously fails.  system/ is
> not the problem, it's in my data's keyspace.
>
> Ian
>
>

Re: Range search on keys not working?

2010-06-02 Thread Jonathan Shook

Can you clarify what you mean by 'random between nodes' ?

On Wed, Jun 2, 2010 at 8:15 AM, David Boxenhorn  wrote:
> I see. But we could make this work if the random partitioner was random only
> between nodes, but was still ordered within each node. (Or if there were
> another partitioner that did this.) That way we could get everything we need
> from each node separately. The results would not be ordered, but they would
> be correct.
>
> On Wed, Jun 2, 2010 at 4:09 PM, Sylvain Lebresne  wrote:
>>
>> > So why do the "start" and "finish" range parameters exist?
>>
>> Because especially if you want to iterate over all your key (which as
>> stated by Ben above
>> is the only meaningful way to use get_range_slices() with the random
>> partitionner), you'll
>> want to paginate that. And that's where the 'start' and 'finish' are
>> useful (to be fair,
>> the 'finish' part is not so useful in practice with the random
>> partitioner).
>>
>> --
>> Sylvain
>>
>> >
>> > On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning  wrote:
>> >>
>> >> Martin,
>> >>
>> >> On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller
>> >>  wrote:
>> >> > I think you can specify an end key, but it should be a key which does
>> >> > exist
>> >> > in your column family.
>> >>
>> >>
>> >> Logically, it doesn't make sense to ever specify an end key with
>> >> random partitioner. If you specified a start key of "aaa" and and end
>> >> key of "aac" you might get back as results "aaa", "zfc", "hik", etc.
>> >> And, even if you have a key of "aab" it might not show up. Key ranges
>> >> only make sense with order-preserving partitioner. The only time to
>> >> ever use a key range with random partitioner is when you want to
>> >> iterate over all keys in the CF.
>> >>
>> >> Ben
>> >>
>> >>
>> >> > But maybe I'm off the track here and someone else here knows more
>> >> > about
>> >> > this
>> >> > key range stuff.
>> >> >
>> >> > Martin
>> >> >
>> >> > 
>> >> > From: David Boxenhorn [mailto:da...@lookin2.com]
>> >> > Sent: Wednesday, June 02, 2010 2:30 PM
>> >> > To: user@cassandra.apache.org
>> >> > Subject: Re: Range search on keys not working?
>> >> >
>> >> > In other words, I should check the values as I iterate, and stop
>> >> > iterating
>> >> > when I get out of range?
>> >> >
>> >> > I'll try that!
>> >> >
>> >> > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller
>> >> >  wrote:
>> >> >>
>> >> >> When not using OOP, you should not use something like 'CATEGORY/' as
>> >> >> the
>> >> >> end key.
>> >> >> Use the empty string as the end key and limit the number of returned
>> >> >> keys,
>> >> >> as you did with
>> >> >> the 'max' value.
>> >> >>
>> >> >> If I understand correctly, the end key is used to generate an end
>> >> >> token
>> >> >> by
>> >> >> hashing it, and
>> >> >> there is not the same correspondence between 'CATEGORY' and
>> >> >> 'CATEGORY/'
>> >> >> as
>> >> >> for
>> >> >> hash('CATEGORY') and hash('CATEGORY/').
>> >> >>
>> >> >> At least, this was the explanation I gave myself when I had the same
>> >> >> problem.
>> >> >>
>> >> >> The solution is to iterate through the keys by always using the last
>> >> >> key
>> >> >> returned as the
>> >> >> start key for the next call to get_range_slices, and the to drop the
>> >> >> first
>> >> >> element from
>> >> >> the result.
>> >> >>
>> >> >> HTH,
>> >> >>   Martin
>> >> >>
>> >> >> 
>> >> >> From: David Boxenhorn [mailto:da...@lookin2.com]
>> >> >> Sent: Wednesday, June 02, 2010 2:01 PM
>> >> >> To: user@cassandra.apache.org
>> >> >> Subject: Re: Range search on keys not working?
>> >> >>
>> >> >> The previous thread where we discussed this is called, "key is
>> >> >> sorted?"
>> >> >>
>> >> >>
>> >> >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn 
>> >> >> wrote:
>> >> >>>
>> >> >>> I'm not using OPP. But I was assured on earlier threads (I asked
>> >> >>> several
>> >> >>> times to be sure) that it would work as stated below: the results
>> >> >>> would not
>> >> >>> be ordered, but they would be correct.
>> >> >>>
>> >> >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt 
>> >> >>> wrote:
>> >> 
>> >>  Sounds like you are not using an order preserving partitioner?
>> >> 
>> >>  On Wed, Jun 2, 2010 at 13:48, David Boxenhorn 
>> >>  wrote:
>> >>  > Range search on keys is not working for me. I was assured in
>> >>  > earlier
>> >>  > threads
>> >>  > that range search would work, but the results would not be
>> >>  > ordered.
>> >>  >
>> >>  > I'm trying to get all the rows that start with "CATEGORY."
>> >>  >
>> >>  > I'm doing:
>> >>  >
>> >>  > String start = "CATEGORY.";
>> >>  > .
>> >>  > .
>> >>  > .
>> >>  > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
>> >>  > "CATEGORY/", max)
>> >>  > .
>> >>  > .
>> >>  > .
>> >>  >
>> >>  > in a loop, setting start to the last key each time - bu

Re: Start key must sort before (or equal to) finish key in your partitioner

2010-06-02 Thread David Boxenhorn

Would it be better to use an SQL-style timestamp ("-MM-DD HH:MM:SS.MMM")
+ unique id, then? They sort lexically the same as they sort
chronologically.

On Wed, Jun 2, 2010 at 4:37 PM, Leslie Viljoen wrote:

> On Mon, May 31, 2010 at 8:52 PM, Jonathan Ellis  wrote:
> > OPP uses lexical ordering on the keys, which isn't going to be the
> > same as the natural order for a time-based uuid.
>
> *palmface*
>

Re: Start key must sort before (or equal to) finish key in your partitioner

2010-06-02 Thread Leslie Viljoen

On Mon, May 31, 2010 at 8:52 PM, Jonathan Ellis  wrote:
> OPP uses lexical ordering on the keys, which isn't going to be the
> same as the natural order for a time-based uuid.

*palmface*

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn

I see. But we could make this work if the random partitioner was random only
between nodes, but was still ordered within each node. (Or if there were
another partitioner that did this.) That way we could get everything we need
from each node separately. The results would not be ordered, but they would
be correct.

On Wed, Jun 2, 2010 at 4:09 PM, Sylvain Lebresne  wrote:

> > So why do the "start" and "finish" range parameters exist?
>
> Because especially if you want to iterate over all your key (which as
> stated by Ben above
> is the only meaningful way to use get_range_slices() with the random
> partitionner), you'll
> want to paginate that. And that's where the 'start' and 'finish' are
> useful (to be fair,
> the 'finish' part is not so useful in practice with the random
> partitioner).
>
> --
> Sylvain
>
> >
> > On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning  wrote:
> >>
> >> Martin,
> >>
> >> On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller
> >>  wrote:
> >> > I think you can specify an end key, but it should be a key which does
> >> > exist
> >> > in your column family.
> >>
> >>
> >> Logically, it doesn't make sense to ever specify an end key with
> >> random partitioner. If you specified a start key of "aaa" and and end
> >> key of "aac" you might get back as results "aaa", "zfc", "hik", etc.
> >> And, even if you have a key of "aab" it might not show up. Key ranges
> >> only make sense with order-preserving partitioner. The only time to
> >> ever use a key range with random partitioner is when you want to
> >> iterate over all keys in the CF.
> >>
> >> Ben
> >>
> >>
> >> > But maybe I'm off the track here and someone else here knows more
> about
> >> > this
> >> > key range stuff.
> >> >
> >> > Martin
> >> >
> >> > 
> >> > From: David Boxenhorn [mailto:da...@lookin2.com]
> >> > Sent: Wednesday, June 02, 2010 2:30 PM
> >> > To: user@cassandra.apache.org
> >> > Subject: Re: Range search on keys not working?
> >> >
> >> > In other words, I should check the values as I iterate, and stop
> >> > iterating
> >> > when I get out of range?
> >> >
> >> > I'll try that!
> >> >
> >> > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller
> >> >  wrote:
> >> >>
> >> >> When not using OOP, you should not use something like 'CATEGORY/' as
> >> >> the
> >> >> end key.
> >> >> Use the empty string as the end key and limit the number of returned
> >> >> keys,
> >> >> as you did with
> >> >> the 'max' value.
> >> >>
> >> >> If I understand correctly, the end key is used to generate an end
> token
> >> >> by
> >> >> hashing it, and
> >> >> there is not the same correspondence between 'CATEGORY' and
> 'CATEGORY/'
> >> >> as
> >> >> for
> >> >> hash('CATEGORY') and hash('CATEGORY/').
> >> >>
> >> >> At least, this was the explanation I gave myself when I had the same
> >> >> problem.
> >> >>
> >> >> The solution is to iterate through the keys by always using the last
> >> >> key
> >> >> returned as the
> >> >> start key for the next call to get_range_slices, and the to drop the
> >> >> first
> >> >> element from
> >> >> the result.
> >> >>
> >> >> HTH,
> >> >>   Martin
> >> >>
> >> >> 
> >> >> From: David Boxenhorn [mailto:da...@lookin2.com]
> >> >> Sent: Wednesday, June 02, 2010 2:01 PM
> >> >> To: user@cassandra.apache.org
> >> >> Subject: Re: Range search on keys not working?
> >> >>
> >> >> The previous thread where we discussed this is called, "key is
> sorted?"
> >> >>
> >> >>
> >> >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn 
> >> >> wrote:
> >> >>>
> >> >>> I'm not using OPP. But I was assured on earlier threads (I asked
> >> >>> several
> >> >>> times to be sure) that it would work as stated below: the results
> >> >>> would not
> >> >>> be ordered, but they would be correct.
> >> >>>
> >> >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt 
> >> >>> wrote:
> >> 
> >>  Sounds like you are not using an order preserving partitioner?
> >> 
> >>  On Wed, Jun 2, 2010 at 13:48, David Boxenhorn 
> >>  wrote:
> >>  > Range search on keys is not working for me. I was assured in
> >>  > earlier
> >>  > threads
> >>  > that range search would work, but the results would not be
> ordered.
> >>  >
> >>  > I'm trying to get all the rows that start with "CATEGORY."
> >>  >
> >>  > I'm doing:
> >>  >
> >>  > String start = "CATEGORY.";
> >>  > .
> >>  > .
> >>  > .
> >>  > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
> >>  > "CATEGORY/", max)
> >>  > .
> >>  > .
> >>  > .
> >>  >
> >>  > in a loop, setting start to the last key each time - but I'm
> >>  > getting
> >>  > rows
> >>  > that don't start with "CATEGORY."!!
> >>  >
> >>  > How do I get all rows that start with "CATEGORY."?
> >> >>>
> >> >>
> >> >
> >> >
> >
> >
>

Re: Range search on keys not working?

2010-06-02 Thread Sylvain Lebresne

> So why do the "start" and "finish" range parameters exist?

Because especially if you want to iterate over all your key (which as
stated by Ben above
is the only meaningful way to use get_range_slices() with the random
partitionner), you'll
want to paginate that. And that's where the 'start' and 'finish' are
useful (to be fair,
the 'finish' part is not so useful in practice with the random partitioner).

--
Sylvain

>
> On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning  wrote:
>>
>> Martin,
>>
>> On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller
>>  wrote:
>> > I think you can specify an end key, but it should be a key which does
>> > exist
>> > in your column family.
>>
>>
>> Logically, it doesn't make sense to ever specify an end key with
>> random partitioner. If you specified a start key of "aaa" and and end
>> key of "aac" you might get back as results "aaa", "zfc", "hik", etc.
>> And, even if you have a key of "aab" it might not show up. Key ranges
>> only make sense with order-preserving partitioner. The only time to
>> ever use a key range with random partitioner is when you want to
>> iterate over all keys in the CF.
>>
>> Ben
>>
>>
>> > But maybe I'm off the track here and someone else here knows more about
>> > this
>> > key range stuff.
>> >
>> > Martin
>> >
>> > 
>> > From: David Boxenhorn [mailto:da...@lookin2.com]
>> > Sent: Wednesday, June 02, 2010 2:30 PM
>> > To: user@cassandra.apache.org
>> > Subject: Re: Range search on keys not working?
>> >
>> > In other words, I should check the values as I iterate, and stop
>> > iterating
>> > when I get out of range?
>> >
>> > I'll try that!
>> >
>> > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller
>> >  wrote:
>> >>
>> >> When not using OOP, you should not use something like 'CATEGORY/' as
>> >> the
>> >> end key.
>> >> Use the empty string as the end key and limit the number of returned
>> >> keys,
>> >> as you did with
>> >> the 'max' value.
>> >>
>> >> If I understand correctly, the end key is used to generate an end token
>> >> by
>> >> hashing it, and
>> >> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/'
>> >> as
>> >> for
>> >> hash('CATEGORY') and hash('CATEGORY/').
>> >>
>> >> At least, this was the explanation I gave myself when I had the same
>> >> problem.
>> >>
>> >> The solution is to iterate through the keys by always using the last
>> >> key
>> >> returned as the
>> >> start key for the next call to get_range_slices, and the to drop the
>> >> first
>> >> element from
>> >> the result.
>> >>
>> >> HTH,
>> >>   Martin
>> >>
>> >> 
>> >> From: David Boxenhorn [mailto:da...@lookin2.com]
>> >> Sent: Wednesday, June 02, 2010 2:01 PM
>> >> To: user@cassandra.apache.org
>> >> Subject: Re: Range search on keys not working?
>> >>
>> >> The previous thread where we discussed this is called, "key is sorted?"
>> >>
>> >>
>> >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn 
>> >> wrote:
>> >>>
>> >>> I'm not using OPP. But I was assured on earlier threads (I asked
>> >>> several
>> >>> times to be sure) that it would work as stated below: the results
>> >>> would not
>> >>> be ordered, but they would be correct.
>> >>>
>> >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt 
>> >>> wrote:
>> 
>>  Sounds like you are not using an order preserving partitioner?
>> 
>>  On Wed, Jun 2, 2010 at 13:48, David Boxenhorn 
>>  wrote:
>>  > Range search on keys is not working for me. I was assured in
>>  > earlier
>>  > threads
>>  > that range search would work, but the results would not be ordered.
>>  >
>>  > I'm trying to get all the rows that start with "CATEGORY."
>>  >
>>  > I'm doing:
>>  >
>>  > String start = "CATEGORY.";
>>  > .
>>  > .
>>  > .
>>  > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
>>  > "CATEGORY/", max)
>>  > .
>>  > .
>>  > .
>>  >
>>  > in a loop, setting start to the last key each time - but I'm
>>  > getting
>>  > rows
>>  > that don't start with "CATEGORY."!!
>>  >
>>  > How do I get all rows that start with "CATEGORY."?
>> >>>
>> >>
>> >
>> >
>
>

Re: Range search on keys not working?

2010-06-02 Thread Ben Browning

They exist because when using OPP they are useful and make sense.

On Wed, Jun 2, 2010 at 8:59 AM, David Boxenhorn  wrote:
> So why do the "start" and "finish" range parameters exist?
>
> On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning  wrote:
>>
>> Martin,
>>
>> On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller
>>  wrote:
>> > I think you can specify an end key, but it should be a key which does
>> > exist
>> > in your column family.
>>
>>
>> Logically, it doesn't make sense to ever specify an end key with
>> random partitioner. If you specified a start key of "aaa" and and end
>> key of "aac" you might get back as results "aaa", "zfc", "hik", etc.
>> And, even if you have a key of "aab" it might not show up. Key ranges
>> only make sense with order-preserving partitioner. The only time to
>> ever use a key range with random partitioner is when you want to
>> iterate over all keys in the CF.
>>
>> Ben
>>
>>
>> > But maybe I'm off the track here and someone else here knows more about
>> > this
>> > key range stuff.
>> >
>> > Martin
>> >
>> > 
>> > From: David Boxenhorn [mailto:da...@lookin2.com]
>> > Sent: Wednesday, June 02, 2010 2:30 PM
>> > To: user@cassandra.apache.org
>> > Subject: Re: Range search on keys not working?
>> >
>> > In other words, I should check the values as I iterate, and stop
>> > iterating
>> > when I get out of range?
>> >
>> > I'll try that!
>> >
>> > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller
>> >  wrote:
>> >>
>> >> When not using OOP, you should not use something like 'CATEGORY/' as
>> >> the
>> >> end key.
>> >> Use the empty string as the end key and limit the number of returned
>> >> keys,
>> >> as you did with
>> >> the 'max' value.
>> >>
>> >> If I understand correctly, the end key is used to generate an end token
>> >> by
>> >> hashing it, and
>> >> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/'
>> >> as
>> >> for
>> >> hash('CATEGORY') and hash('CATEGORY/').
>> >>
>> >> At least, this was the explanation I gave myself when I had the same
>> >> problem.
>> >>
>> >> The solution is to iterate through the keys by always using the last
>> >> key
>> >> returned as the
>> >> start key for the next call to get_range_slices, and the to drop the
>> >> first
>> >> element from
>> >> the result.
>> >>
>> >> HTH,
>> >>   Martin
>> >>
>> >> 
>> >> From: David Boxenhorn [mailto:da...@lookin2.com]
>> >> Sent: Wednesday, June 02, 2010 2:01 PM
>> >> To: user@cassandra.apache.org
>> >> Subject: Re: Range search on keys not working?
>> >>
>> >> The previous thread where we discussed this is called, "key is sorted?"
>> >>
>> >>
>> >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn 
>> >> wrote:
>> >>>
>> >>> I'm not using OPP. But I was assured on earlier threads (I asked
>> >>> several
>> >>> times to be sure) that it would work as stated below: the results
>> >>> would not
>> >>> be ordered, but they would be correct.
>> >>>
>> >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt 
>> >>> wrote:
>> 
>>  Sounds like you are not using an order preserving partitioner?
>> 
>>  On Wed, Jun 2, 2010 at 13:48, David Boxenhorn 
>>  wrote:
>>  > Range search on keys is not working for me. I was assured in
>>  > earlier
>>  > threads
>>  > that range search would work, but the results would not be ordered.
>>  >
>>  > I'm trying to get all the rows that start with "CATEGORY."
>>  >
>>  > I'm doing:
>>  >
>>  > String start = "CATEGORY.";
>>  > .
>>  > .
>>  > .
>>  > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
>>  > "CATEGORY/", max)
>>  > .
>>  > .
>>  > .
>>  >
>>  > in a loop, setting start to the last key each time - but I'm
>>  > getting
>>  > rows
>>  > that don't start with "CATEGORY."!!
>>  >
>>  > How do I get all rows that start with "CATEGORY."?
>> >>>
>> >>
>> >
>> >
>
>

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn

So why do the "start" and "finish" range parameters exist?

On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning  wrote:

> Martin,
>
> On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller
>  wrote:
> > I think you can specify an end key, but it should be a key which does
> exist
> > in your column family.
>
>
> Logically, it doesn't make sense to ever specify an end key with
> random partitioner. If you specified a start key of "aaa" and and end
> key of "aac" you might get back as results "aaa", "zfc", "hik", etc.
> And, even if you have a key of "aab" it might not show up. Key ranges
> only make sense with order-preserving partitioner. The only time to
> ever use a key range with random partitioner is when you want to
> iterate over all keys in the CF.
>
> Ben
>
>
> > But maybe I'm off the track here and someone else here knows more about
> this
> > key range stuff.
> >
> > Martin
> >
> > 
> > From: David Boxenhorn [mailto:da...@lookin2.com]
> > Sent: Wednesday, June 02, 2010 2:30 PM
> > To: user@cassandra.apache.org
> > Subject: Re: Range search on keys not working?
> >
> > In other words, I should check the values as I iterate, and stop
> iterating
> > when I get out of range?
> >
> > I'll try that!
> >
> > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller
> >  wrote:
> >>
> >> When not using OOP, you should not use something like 'CATEGORY/' as the
> >> end key.
> >> Use the empty string as the end key and limit the number of returned
> keys,
> >> as you did with
> >> the 'max' value.
> >>
> >> If I understand correctly, the end key is used to generate an end token
> by
> >> hashing it, and
> >> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/'
> as
> >> for
> >> hash('CATEGORY') and hash('CATEGORY/').
> >>
> >> At least, this was the explanation I gave myself when I had the same
> >> problem.
> >>
> >> The solution is to iterate through the keys by always using the last key
> >> returned as the
> >> start key for the next call to get_range_slices, and the to drop the
> first
> >> element from
> >> the result.
> >>
> >> HTH,
> >>   Martin
> >>
> >> 
> >> From: David Boxenhorn [mailto:da...@lookin2.com]
> >> Sent: Wednesday, June 02, 2010 2:01 PM
> >> To: user@cassandra.apache.org
> >> Subject: Re: Range search on keys not working?
> >>
> >> The previous thread where we discussed this is called, "key is sorted?"
> >>
> >>
> >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn 
> wrote:
> >>>
> >>> I'm not using OPP. But I was assured on earlier threads (I asked
> several
> >>> times to be sure) that it would work as stated below: the results would
> not
> >>> be ordered, but they would be correct.
> >>>
> >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt 
> wrote:
> 
>  Sounds like you are not using an order preserving partitioner?
> 
>  On Wed, Jun 2, 2010 at 13:48, David Boxenhorn 
> wrote:
>  > Range search on keys is not working for me. I was assured in earlier
>  > threads
>  > that range search would work, but the results would not be ordered.
>  >
>  > I'm trying to get all the rows that start with "CATEGORY."
>  >
>  > I'm doing:
>  >
>  > String start = "CATEGORY.";
>  > .
>  > .
>  > .
>  > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
>  > "CATEGORY/", max)
>  > .
>  > .
>  > .
>  >
>  > in a loop, setting start to the last key each time - but I'm getting
>  > rows
>  > that don't start with "CATEGORY."!!
>  >
>  > How do I get all rows that start with "CATEGORY."?
> >>>
> >>
> >
> >
>

Re: Range search on keys not working?

2010-06-02 Thread Ben Browning

Martin,

On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller
 wrote:
> I think you can specify an end key, but it should be a key which does exist
> in your column family.


Logically, it doesn't make sense to ever specify an end key with
random partitioner. If you specified a start key of "aaa" and and end
key of "aac" you might get back as results "aaa", "zfc", "hik", etc.
And, even if you have a key of "aab" it might not show up. Key ranges
only make sense with order-preserving partitioner. The only time to
ever use a key range with random partitioner is when you want to
iterate over all keys in the CF.

Ben


> But maybe I'm off the track here and someone else here knows more about this
> key range stuff.
>
> Martin
>
> 
> From: David Boxenhorn [mailto:da...@lookin2.com]
> Sent: Wednesday, June 02, 2010 2:30 PM
> To: user@cassandra.apache.org
> Subject: Re: Range search on keys not working?
>
> In other words, I should check the values as I iterate, and stop iterating
> when I get out of range?
>
> I'll try that!
>
> On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller
>  wrote:
>>
>> When not using OOP, you should not use something like 'CATEGORY/' as the
>> end key.
>> Use the empty string as the end key and limit the number of returned keys,
>> as you did with
>> the 'max' value.
>>
>> If I understand correctly, the end key is used to generate an end token by
>> hashing it, and
>> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/' as
>> for
>> hash('CATEGORY') and hash('CATEGORY/').
>>
>> At least, this was the explanation I gave myself when I had the same
>> problem.
>>
>> The solution is to iterate through the keys by always using the last key
>> returned as the
>> start key for the next call to get_range_slices, and the to drop the first
>> element from
>> the result.
>>
>> HTH,
>>   Martin
>>
>> 
>> From: David Boxenhorn [mailto:da...@lookin2.com]
>> Sent: Wednesday, June 02, 2010 2:01 PM
>> To: user@cassandra.apache.org
>> Subject: Re: Range search on keys not working?
>>
>> The previous thread where we discussed this is called, "key is sorted?"
>>
>>
>> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn  wrote:
>>>
>>> I'm not using OPP. But I was assured on earlier threads (I asked several
>>> times to be sure) that it would work as stated below: the results would not
>>> be ordered, but they would be correct.
>>>
>>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt  wrote:

 Sounds like you are not using an order preserving partitioner?

 On Wed, Jun 2, 2010 at 13:48, David Boxenhorn  wrote:
 > Range search on keys is not working for me. I was assured in earlier
 > threads
 > that range search would work, but the results would not be ordered.
 >
 > I'm trying to get all the rows that start with "CATEGORY."
 >
 > I'm doing:
 >
 > String start = "CATEGORY.";
 > .
 > .
 > .
 > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
 > "CATEGORY/", max)
 > .
 > .
 > .
 >
 > in a loop, setting start to the last key each time - but I'm getting
 > rows
 > that don't start with "CATEGORY."!!
 >
 > How do I get all rows that start with "CATEGORY."?
>>>
>>
>
>

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn

Here is the relevant part of the previous thread:

Thank you. That is very good news. I can sort the results myself - what is
important is that I get them!

On Thu, May 13, 2010 at 2:42 AM, Vijay  wrote:
If you use Random partitioner, You will *NOT* get RowKey's sorted. (Columns
are sorted always).

Answer: If used Random partitioner
True True

Regards,


On Wed, May 12, 2010 at 1:25 AM, David Boxenhorn  wrote:
You do any kind of range slice, e.g. keys beginning with "abc"? But the
results will not be ordered?

Please answer one of the following:

True True
True False
False False

Explain?

Thanks!

On Sun, May 9, 2010 at 8:27 PM, Vijay  wrote:
True, The Range slice support was enabled in Random Partitioner for the
hadoop support.
Random partitioner actually hash the Key and those keys are sorted so we
cannot have the actual key in order (Hope this doesnt confuse you)...

Regards,



On Wed, Jun 2, 2010 at 3:40 PM, Ben Browning  wrote:

> The keys will not be in any specific order when not using OPP, so, you
> will never "get out of range" - you have to iterate over every single
> key to find all keys that start with "CATEGORY". If you don't iterate
> over every single key you run a chance of missing some. Obviously,
> this kind of key range scan is nothing something that will scale well
> as the number of keys go up. If your app needs this kind of behavior
> you'd be much better off with OPP.
>
> Ben
>
> On Wed, Jun 2, 2010 at 8:29 AM, David Boxenhorn  wrote:
> > In other words, I should check the values as I iterate, and stop
> iterating
> > when I get out of range?
> >
> > I'll try that!
> >
> > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller
> >  wrote:
> >>
> >> When not using OOP, you should not use something like 'CATEGORY/' as the
> >> end key.
> >> Use the empty string as the end key and limit the number of returned
> keys,
> >> as you did with
> >> the 'max' value.
> >>
> >> If I understand correctly, the end key is used to generate an end token
> by
> >> hashing it, and
> >> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/'
> as
> >> for
> >> hash('CATEGORY') and hash('CATEGORY/').
> >>
> >> At least, this was the explanation I gave myself when I had the same
> >> problem.
> >>
> >> The solution is to iterate through the keys by always using the last key
> >> returned as the
> >> start key for the next call to get_range_slices, and the to drop the
> first
> >> element from
> >> the result.
> >>
> >> HTH,
> >>   Martin
> >>
> >> 
> >> From: David Boxenhorn [mailto:da...@lookin2.com]
> >> Sent: Wednesday, June 02, 2010 2:01 PM
> >> To: user@cassandra.apache.org
> >> Subject: Re: Range search on keys not working?
> >>
> >> The previous thread where we discussed this is called, "key is sorted?"
> >>
> >>
> >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn 
> wrote:
> >>>
> >>> I'm not using OPP. But I was assured on earlier threads (I asked
> several
> >>> times to be sure) that it would work as stated below: the results would
> not
> >>> be ordered, but they would be correct.
> >>>
> >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt 
> wrote:
> 
>  Sounds like you are not using an order preserving partitioner?
> 
>  On Wed, Jun 2, 2010 at 13:48, David Boxenhorn 
> wrote:
>  > Range search on keys is not working for me. I was assured in earlier
>  > threads
>  > that range search would work, but the results would not be ordered.
>  >
>  > I'm trying to get all the rows that start with "CATEGORY."
>  >
>  > I'm doing:
>  >
>  > String start = "CATEGORY.";
>  > .
>  > .
>  > .
>  > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
>  > "CATEGORY/", max)
>  > .
>  > .
>  > .
>  >
>  > in a loop, setting start to the last key each time - but I'm getting
>  > rows
>  > that don't start with "CATEGORY."!!
>  >
>  > How do I get all rows that start with "CATEGORY."?
> >>>
> >>
> >
> >
>

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn

That's crazy! I could artificially insert a key with just the prefix, as a
placeholder, but why can't Cassandra do that virtually?

On Wed, Jun 2, 2010 at 3:34 PM, Dr. Martin Grabmüller <
martin.grabmuel...@eleven.de> wrote:

>  I think you can specify an end key, but it should be a key which does
> exist in your column family.
>
> But maybe I'm off the track here and someone else here knows more about
> this key range stuff.
>
> Martin
>
>  --
> *From:* David Boxenhorn [mailto:da...@lookin2.com]
> *Sent:* Wednesday, June 02, 2010 2:30 PM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: Range search on keys not working?
>
>  In other words, I should check the values as I iterate, and stop
> iterating when I get out of range?
>
> I'll try that!
>
> On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller <
> martin.grabmuel...@eleven.de> wrote:
>
>>  When not using OOP, you should not use something like 'CATEGORY/' as the
>> end key.
>> Use the empty string as the end key and limit the number of returned keys,
>> as you did with
>> the 'max' value.
>>
>> If I understand correctly, the end key is used to generate an end token by
>> hashing it, and
>> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/' as
>> for
>> hash('CATEGORY') and hash('CATEGORY/').
>>
>> At least, this was the explanation I gave myself when I had the same
>> problem.
>>
>> The solution is to iterate through the keys by always using the last key
>> returned as the
>> start key for the next call to get_range_slices, and the to drop the first
>> element from
>> the result.
>>
>> HTH,
>>   Martin
>>
>>  --
>> *From:* David Boxenhorn [mailto:da...@lookin2.com]
>> *Sent:* Wednesday, June 02, 2010 2:01 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Range search on keys not working?
>>
>>   The previous thread where we discussed this is called, "key is sorted?"
>>
>>
>>
>> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn wrote:
>>
>>> I'm not using OPP. But I was assured on earlier threads (I asked several
>>> times to be sure) that it would work as stated below: the results would not
>>> be ordered, but they would be correct.
>>>
>>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt  wrote:
>>>
 Sounds like you are not using an order preserving partitioner?

 On Wed, Jun 2, 2010 at 13:48, David Boxenhorn 
 wrote:
 > Range search on keys is not working for me. I was assured in earlier
 threads
 > that range search would work, but the results would not be ordered.
 >
 > I'm trying to get all the rows that start with "CATEGORY."
 >
 > I'm doing:
 >
 > String start = "CATEGORY.";
 > .
 > .
 > .
 > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
 > "CATEGORY/", max)
 > .
 > .
 > .
 >
 > in a loop, setting start to the last key each time - but I'm getting
 rows
 > that don't start with "CATEGORY."!!
 >
 > How do I get all rows that start with "CATEGORY."?

>>>
>>>
>>
>

RE: Range search on keys not working?

2010-06-02 Thread Dr . Martin Grabmüller

I think you can specify an end key, but it should be a key which does exist in 
your column family.

But maybe I'm off the track here and someone else here knows more about this 
key range stuff.

Martin

From: David Boxenhorn [mailto:da...@lookin2.com] 
Sent: Wednesday, June 02, 2010 2:30 PM
To: user@cassandra.apache.org
Subject: Re: Range search on keys not working?

In other words, I should check the values as I iterate, and stop 
iterating when I get out of range? 

I'll try that!

On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller 
 wrote:

When not using OOP, you should not use something like 
'CATEGORY/' as the end key.
Use the empty string as the end key and limit the number of 
returned keys, as you did with
the 'max' value.

If I understand correctly, the end key is used to generate an 
end token by hashing it, and
there is not the same correspondence between 'CATEGORY' and 
'CATEGORY/' as for
hash('CATEGORY') and hash('CATEGORY/').

At least, this was the explanation I gave myself when I had the 
same problem.

The solution is to iterate through the keys by always using the 
last key returned as the
start key for the next call to get_range_slices, and the to 
drop the first element from
the result.

HTH,
  Martin

From: David Boxenhorn [mailto:da...@lookin2.com] 
Sent: Wednesday, June 02, 2010 2:01 PM
To: user@cassandra.apache.org
Subject: Re: Range search on keys not working?

The previous thread where we discussed this is called, 
"key is sorted?" 

On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn 
 wrote:

I'm not using OPP. But I was assured on earlier 
threads (I asked several times to be sure) that it would work as stated below: 
the results would not be ordered, but they would be correct. 

On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt 
 wrote:

Sounds like you are not using an order 
preserving partitioner?

On Wed, Jun 2, 2010 at 13:48, David 
Boxenhorn  wrote:
> Range search on keys is not working 
for me. I was assured in earlier threads
> that range search would work, but the 
results would not be ordered.
>
> I'm trying to get all the rows that 
start with "CATEGORY."
>
> I'm doing:
>
> String start = "CATEGORY.";
> .
> .
> .
> 
keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
> "CATEGORY/", max)
> .
> .
> .
>
> in a loop, setting start to the last 
key each time - but I'm getting rows
> that don't start with "CATEGORY."!!
>
> How do I get all rows that start with 
"CATEGORY."?

Re: Range search on keys not working?

2010-06-02 Thread Ben Browning

The keys will not be in any specific order when not using OPP, so, you
will never "get out of range" - you have to iterate over every single
key to find all keys that start with "CATEGORY". If you don't iterate
over every single key you run a chance of missing some. Obviously,
this kind of key range scan is nothing something that will scale well
as the number of keys go up. If your app needs this kind of behavior
you'd be much better off with OPP.

Ben

On Wed, Jun 2, 2010 at 8:29 AM, David Boxenhorn  wrote:
> In other words, I should check the values as I iterate, and stop iterating
> when I get out of range?
>
> I'll try that!
>
> On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller
>  wrote:
>>
>> When not using OOP, you should not use something like 'CATEGORY/' as the
>> end key.
>> Use the empty string as the end key and limit the number of returned keys,
>> as you did with
>> the 'max' value.
>>
>> If I understand correctly, the end key is used to generate an end token by
>> hashing it, and
>> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/' as
>> for
>> hash('CATEGORY') and hash('CATEGORY/').
>>
>> At least, this was the explanation I gave myself when I had the same
>> problem.
>>
>> The solution is to iterate through the keys by always using the last key
>> returned as the
>> start key for the next call to get_range_slices, and the to drop the first
>> element from
>> the result.
>>
>> HTH,
>>   Martin
>>
>> 
>> From: David Boxenhorn [mailto:da...@lookin2.com]
>> Sent: Wednesday, June 02, 2010 2:01 PM
>> To: user@cassandra.apache.org
>> Subject: Re: Range search on keys not working?
>>
>> The previous thread where we discussed this is called, "key is sorted?"
>>
>>
>> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn  wrote:
>>>
>>> I'm not using OPP. But I was assured on earlier threads (I asked several
>>> times to be sure) that it would work as stated below: the results would not
>>> be ordered, but they would be correct.
>>>
>>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt  wrote:

 Sounds like you are not using an order preserving partitioner?

 On Wed, Jun 2, 2010 at 13:48, David Boxenhorn  wrote:
 > Range search on keys is not working for me. I was assured in earlier
 > threads
 > that range search would work, but the results would not be ordered.
 >
 > I'm trying to get all the rows that start with "CATEGORY."
 >
 > I'm doing:
 >
 > String start = "CATEGORY.";
 > .
 > .
 > .
 > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
 > "CATEGORY/", max)
 > .
 > .
 > .
 >
 > in a loop, setting start to the last key each time - but I'm getting
 > rows
 > that don't start with "CATEGORY."!!
 >
 > How do I get all rows that start with "CATEGORY."?
>>>
>>
>
>

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn

In other words, I should check the values as I iterate, and stop iterating
when I get out of range?

I'll try that!

On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller <
martin.grabmuel...@eleven.de> wrote:

>  When not using OOP, you should not use something like 'CATEGORY/' as the
> end key.
> Use the empty string as the end key and limit the number of returned keys,
> as you did with
> the 'max' value.
>
> If I understand correctly, the end key is used to generate an end token by
> hashing it, and
> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/' as
> for
> hash('CATEGORY') and hash('CATEGORY/').
>
> At least, this was the explanation I gave myself when I had the same
> problem.
>
> The solution is to iterate through the keys by always using the last key
> returned as the
> start key for the next call to get_range_slices, and the to drop the first
> element from
> the result.
>
> HTH,
>   Martin
>
>  --
> *From:* David Boxenhorn [mailto:da...@lookin2.com]
> *Sent:* Wednesday, June 02, 2010 2:01 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Range search on keys not working?
>
>  The previous thread where we discussed this is called, "key is sorted?"
>
>
> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn  wrote:
>
>> I'm not using OPP. But I was assured on earlier threads (I asked several
>> times to be sure) that it would work as stated below: the results would not
>> be ordered, but they would be correct.
>>
>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt  wrote:
>>
>>> Sounds like you are not using an order preserving partitioner?
>>>
>>> On Wed, Jun 2, 2010 at 13:48, David Boxenhorn  wrote:
>>> > Range search on keys is not working for me. I was assured in earlier
>>> threads
>>> > that range search would work, but the results would not be ordered.
>>> >
>>> > I'm trying to get all the rows that start with "CATEGORY."
>>> >
>>> > I'm doing:
>>> >
>>> > String start = "CATEGORY.";
>>> > .
>>> > .
>>> > .
>>> > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
>>> > "CATEGORY/", max)
>>> > .
>>> > .
>>> > .
>>> >
>>> > in a loop, setting start to the last key each time - but I'm getting
>>> rows
>>> > that don't start with "CATEGORY."!!
>>> >
>>> > How do I get all rows that start with "CATEGORY."?
>>>
>>
>>
>

RE: Range search on keys not working?

2010-06-02 Thread Dr . Martin Grabmüller

When not using OOP, you should not use something like 'CATEGORY/' as the end 
key.
Use the empty string as the end key and limit the number of returned keys, as 
you did with
the 'max' value.

If I understand correctly, the end key is used to generate an end token by 
hashing it, and
there is not the same correspondence between 'CATEGORY' and 'CATEGORY/' as for
hash('CATEGORY') and hash('CATEGORY/').

At least, this was the explanation I gave myself when I had the same problem.

The solution is to iterate through the keys by always using the last key 
returned as the
start key for the next call to get_range_slices, and the to drop the first 
element from
the result.

HTH,
  Martin

From: David Boxenhorn [mailto:da...@lookin2.com] 
Sent: Wednesday, June 02, 2010 2:01 PM
To: user@cassandra.apache.org
Subject: Re: Range search on keys not working?

The previous thread where we discussed this is called, "key is sorted?" 

On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn  
wrote:

I'm not using OPP. But I was assured on earlier threads (I 
asked several times to be sure) that it would work as stated below: the results 
would not be ordered, but they would be correct. 

On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt 
 wrote:

Sounds like you are not using an order preserving 
partitioner?

On Wed, Jun 2, 2010 at 13:48, David Boxenhorn 
 wrote:
> Range search on keys is not working for me. I was 
assured in earlier threads
> that range search would work, but the results would 
not be ordered.
>
> I'm trying to get all the rows that start with 
"CATEGORY."
>
> I'm doing:
>
> String start = "CATEGORY.";
> .
> .
> .
> keyspace.getSuperRangeSlice(columnParent, 
slicePredicate, start,
> "CATEGORY/", max)
> .
> .
> .
>
> in a loop, setting start to the last key each time - 
but I'm getting rows
> that don't start with "CATEGORY."!!
>
> How do I get all rows that start with "CATEGORY."?

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn

The previous thread where we discussed this is called, "key is sorted?"


On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn  wrote:

> I'm not using OPP. But I was assured on earlier threads (I asked several
> times to be sure) that it would work as stated below: the results would not
> be ordered, but they would be correct.
>
> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt  wrote:
>
>> Sounds like you are not using an order preserving partitioner?
>>
>> On Wed, Jun 2, 2010 at 13:48, David Boxenhorn  wrote:
>> > Range search on keys is not working for me. I was assured in earlier
>> threads
>> > that range search would work, but the results would not be ordered.
>> >
>> > I'm trying to get all the rows that start with "CATEGORY."
>> >
>> > I'm doing:
>> >
>> > String start = "CATEGORY.";
>> > .
>> > .
>> > .
>> > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
>> > "CATEGORY/", max)
>> > .
>> > .
>> > .
>> >
>> > in a loop, setting start to the last key each time - but I'm getting
>> rows
>> > that don't start with "CATEGORY."!!
>> >
>> > How do I get all rows that start with "CATEGORY."?
>>
>
>

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn

I'm not using OPP. But I was assured on earlier threads (I asked several
times to be sure) that it would work as stated below: the results would not
be ordered, but they would be correct.

On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt  wrote:

> Sounds like you are not using an order preserving partitioner?
>
> On Wed, Jun 2, 2010 at 13:48, David Boxenhorn  wrote:
> > Range search on keys is not working for me. I was assured in earlier
> threads
> > that range search would work, but the results would not be ordered.
> >
> > I'm trying to get all the rows that start with "CATEGORY."
> >
> > I'm doing:
> >
> > String start = "CATEGORY.";
> > .
> > .
> > .
> > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
> > "CATEGORY/", max)
> > .
> > .
> > .
> >
> > in a loop, setting start to the last key each time - but I'm getting rows
> > that don't start with "CATEGORY."!!
> >
> > How do I get all rows that start with "CATEGORY."?
>

Re: Range search on keys not working?

2010-06-02 Thread Torsten Curdt

Sounds like you are not using an order preserving partitioner?

On Wed, Jun 2, 2010 at 13:48, David Boxenhorn  wrote:
> Range search on keys is not working for me. I was assured in earlier threads
> that range search would work, but the results would not be ordered.
>
> I'm trying to get all the rows that start with "CATEGORY."
>
> I'm doing:
>
> String start = "CATEGORY.";
> .
> .
> .
> keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
> "CATEGORY/", max)
> .
> .
> .
>
> in a loop, setting start to the last key each time - but I'm getting rows
> that don't start with "CATEGORY."!!
>
> How do I get all rows that start with "CATEGORY."?

Range search on keys not working?

2010-06-02 Thread David Boxenhorn

Range search on keys is not working for me. I was assured in earlier threads
that range search would work, but the results would not be ordered.

I'm trying to get all the rows that start with "CATEGORY."

I'm doing:

String start = "CATEGORY.";
.
.
.
keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
"CATEGORY/", max)
.
.
.

in a loop, setting start to the last key each time - but I'm getting rows
that don't start with "CATEGORY."!!

How do I get all rows that start with "CATEGORY."?

Heterogeneous Cassandra Cluster

2010-06-02 Thread David Boxenhorn

Is it possible to make a heterogeneous Cassandra cluster, with both Linux
and Windows nodes? I tried doing it and got

Error in ThreadPoolExecutor
java.lang.NullPointerException

Not sure if this is due to the Linux/Windows mix or something else.


Details below:



 [r...@iqdev01 cassandra]# bin/cassandra -f

 INFO 20:32:26,431 Auto DiskAccessMode determined to be mmap

 INFO 20:32:27,085 Sampling index for
/var/lib/cassandra/data/system/LocationInfo-1-Data.db

 INFO 20:32:27,095 Sampling index for
/var/lib/cassandra/data/system/LocationInfo-2-Data.db

 INFO 20:32:27,104 Replaying
/var/lib/cassandra/commitlog/CommitLog-1275412410865.log

 INFO 20:32:27,129 Creating new commitlog segment
/var/lib/cassandra/commitlog/CommitLog-1275413547129.log

 INFO 20:32:27,138 LocationInfo has reached its threshold; switching in a
fresh Memtable at
CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1275413547129.log',
position=173)

 INFO 20:32:27,138 Enqueuing flush of Memtable(LocationInfo)@1491010616

 INFO 20:32:27,139 Writing Memtable(LocationInfo)@1491010616

 INFO 20:32:27,187 Completed flushing
/var/lib/cassandra/data/system/LocationInfo-3-Data.db

 INFO 20:32:27,207 Log replay complete

 INFO 20:32:27,239 Saved Token found: 25870423804996813139937576731363583348

 INFO 20:32:27,239 Saved ClusterName found: Lookin2

 INFO 20:32:27,247 Starting up server gossip

 INFO 20:32:27,266 Joining: getting load information

 INFO 20:32:27,267 Sleeping 9 ms to wait for load information...

 INFO 20:32:27,327 Node /192.168.80.12 is now part of the cluster

 INFO 20:32:27,332 Node /192.168.80.234 is now part of the cluster

 INFO 20:32:27,864 InetAddress /192.168.80.12 is now UP

 INFO 20:32:27,872 InetAddress /192.168.80.234 is now UP

 INFO 20:33:57,269 Joining: getting bootstrap token

 INFO 20:33:57,278 New token will be 25870423804996813139937576731363583348
to assume load from /192.168.80.12

 INFO 20:33:57,279 Joining: sleeping 3 for pending range setup

 INFO 20:34:27,280 Bootstrapping

 INFO 21:32:27,867 Compacting []

 INFO 21:38:27,118 LocationInfo has reached its threshold; switching in a
fresh Memtable at
CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1275413547129.log',
position=824)

 INFO 21:38:27,118 Enqueuing flush of Memtable(LocationInfo)@993374707

 INFO 21:38:27,118 Writing Memtable(LocationInfo)@993374707

 INFO 21:38:27,158 Completed flushing
/var/lib/cassandra/data/system/LocationInfo-4-Data.db

 INFO 21:38:27,160 Compacting
[org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-1-Data.db'),org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-2-Data.db'),org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-3-Data.db'),org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-4-Data.db')]

 INFO 21:38:27,217 Compacted to
/var/lib/cassandra/data/system/LocationInfo-5-Data.db.  1294/358 bytes for 1
keys.  Time: 56ms.

[r...@iqdev01 cassandra]# bin/cassandra -f

 INFO 21:40:07,519 Auto DiskAccessMode determined to be mmap

 INFO 21:40:07,972 Deleted
/var/lib/cassandra/data/system/LocationInfo-1-Data.db

 INFO 21:40:07,973 Deleted
/var/lib/cassandra/data/system/LocationInfo-2-Data.db

 INFO 21:40:07,974 Deleted
/var/lib/cassandra/data/system/LocationInfo-3-Data.db

 INFO 21:40:07,982 Sampling index for
/var/lib/cassandra/data/system/LocationInfo-5-Data.db

 INFO 21:40:07,991 Deleted
/var/lib/cassandra/data/system/LocationInfo-4-Data.db

 INFO 21:40:08,000 Replaying
/var/lib/cassandra/commitlog/CommitLog-1275413547129.log

 INFO 21:40:08,001 Log replay complete

 INFO 21:40:08,038 Saved Token found: 25870423804996813139937576731363583348

 INFO 21:40:08,040 Saved ClusterName found: Lookin2

 INFO 21:40:08,042 Creating new commitlog segment
/var/lib/cassandra/commitlog/CommitLog-1275417608042.log

 INFO 21:40:08,059 Starting up server gossip

 INFO 21:40:08,071 Joining: getting load information

 INFO 21:40:08,071 Sleeping 9 ms to wait for load information...

 INFO 21:40:10,372 Node /192.168.80.12 is now part of the cluster

 INFO 21:40:10,374 Node /192.168.80.234 is now part of the cluster

 INFO 21:40:11,091 InetAddress /192.168.80.234 is now UP

 INFO 21:40:12,078 InetAddress /192.168.80.12 is now UP

 INFO 21:41:38,072 Joining: getting bootstrap token

 INFO 21:41:38,088 New token will be 25870423804996813139937576731363583348
to assume load from /192.168.80.12

 INFO 21:41:38,089 Joining: sleeping 3 for pending range setup

 INFO 21:42:08,091 Bootstrapping

ERROR 21:49:03,526 Error in ThreadPoolExecutor

java.lang.NullPointerException

at
org.apache.cassandra.streaming.StreamInitiateVerbHandler.getNewNames(StreamInitiateVerbHandler.java:154)

at
org.apache.cassandra.streaming.StreamInitiateVerbHandler.doVerb(StreamInitiateVerbHandler.java:76)

at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDel

Re: [SPAM ] Re: writing speed test

2010-06-02 Thread Shuai Yuan

Thanks Peter!

In my test application, for each record,

rowkey -> rand() * 4, about 64B

column * 20 -> rand() * 20, about 320B

I use batch_insert(rowkey, col*20) in thrift.

Kevin Yuan

  
??: Peter Sch??ller 
??: user@cassandra.apache.org
: [***SPAM*** ] Re: writing speed test
: Wed, 2 Jun 2010 10:44:52 +0200

Since this thread has now gone on for a while...

As far as I can tell you never specify the characteristics of your
writes. Evaluating expected write throughput in terms of "MB/s to
disk" is pretty impossible if one does not know anything about the
nature of the writes. If you're expecting 50 MB, is that reasonable? I
don't know; if you're writing a gazillion one-byte values with
shortish keys, 50 MB/seconds translates to a *huge* amounts of writes
per second and you're likely to be CPU bound even in the most
efficient implementation reasonably possible.

If on the other hand you're writing large values (say slabs of 128k)
you might more reasonably be expecting higher disk throughput.

I don't have enough hands-on experience with cassandra to have a feel
for the CPU vs. disk in terms of bottlenecking, and when we expect to
bottleneck on what, but I can say that it's definitely going to matter
quite a lot what *kind* of writes you're doing. This tends to be the
case regardless of the database system.

Re: writing speed test

2010-06-02 Thread Peter Schüller

Since this thread has now gone on for a while...

As far as I can tell you never specify the characteristics of your
writes. Evaluating expected write throughput in terms of "MB/s to
disk" is pretty impossible if one does not know anything about the
nature of the writes. If you're expecting 50 MB, is that reasonable? I
don't know; if you're writing a gazillion one-byte values with
shortish keys, 50 MB/seconds translates to a *huge* amounts of writes
per second and you're likely to be CPU bound even in the most
efficient implementation reasonably possible.

If on the other hand you're writing large values (say slabs of 128k)
you might more reasonably be expecting higher disk throughput.

I don't have enough hands-on experience with cassandra to have a feel
for the CPU vs. disk in terms of bottlenecking, and when we expect to
bottleneck on what, but I can say that it's definitely going to matter
quite a lot what *kind* of writes you're doing. This tends to be the
case regardless of the database system.

-- 
/ Peter Schuller aka scode

Re: [SPAM ] Re: [SPAM ] Re: [SPAM ] Re: [SPAM ] Re: writing speed test

2010-06-02 Thread Shuai Yuan

Still seems MEM.

However it's hard to convince that constantly writing(even great amount
of data) needs so much MEM(16GB). The process is quite simple,

input_data -> memtable -> flush to disk

right? What does cassandra need so much MEM for?

Thanks!


?? 2010-06-02 16:24 +0800??lwl??
> No.
> But I did some capacity tests about another distributed system.
> Your former test cost too much MEM, it was the bottleneck.
> caches and JVM cost MEM, so I suggested to decrease them.
> 
> 
> What is the bottleneck of your current test now?
> 
> 
> ?? 2010??6??2?? 4:13??Shuai Yuan ??
> 
> Hi,
> 
> I tried,
> 
> 1-consistency level ZERO
> 
> 2-JVM heap 4GB
> 
> 3-normal Memtable cache
> 
> and now I have about 30% improvment.
> 
> However I want to know if you have also done w/r benchmark and
> what's
> the result?
> 
> ?? 2010-06-02 11:35 +0800??lwl??
> 
> > and, why did you set "JVM has 8G heap"?
> > 8g, seems too big.
> >
> > ?? 2010??6??2?? 11:20??lwl ??
> > 3.32 concurrent read & 128 write in
> storage-conf.xml, other
> > cache
> > enlarged as well.
> > 
> >
> >
> > maybe you can try to decrease the size of caches.
> >
> > ?? 2010??6??2?? 11:14??Shuai Yuan
> > ??
> >
> >
> > ?? 2010-06-02 10:37 +0800??lwl??
> > > is all the 4 servers' MEM  almost 100%?
> >
> >
> > Yes
> >
> >
> > > ?? 2010??6??2?? 10:12??Shuai Yuan
> > ??
> > > 
> > > Thanks lwl.
> > >
> > > Then is there anyway of tuning
> this, faster
> > flush to disk or
> > > else?
> > >
> > > Cheers,
> > >
> > > Kevin
> > >
> > > ?? 2010-06-02 09:57 +0800??lwl
> ??
> > >
> > > > MEM: almost 100% (16GB)
> > > > -
> > > > maybe this is the bottleneck.
> > > > writing concerns Memtable and
> SSTable in
> > memory.
> > > >
> > > > ?? 2010??6??2?? 9:48??Shuai
> Yuan
> > > ??
> > > > 
> > > > ?? 2010-06-01 15:00
> -0500??
> > Jonathan Shook??
> > > > > Also, what are you
> meaning
> > specifically by 'slow'?
> > > Which
> > > > measurements
> > > > > are you looking at.
> What are
> > your baseline
> > > constraints for
> > > > your test
> > > > > system?
> > > > >
> > > >
> > > > Actually, the problem is
> the
> > utilizaton of
> > > resources(for a
> > > > single
> > > > machine):
> > > > CPU: 700% / 1600% (16
> cores)
> > > > MEM: almost 100% (16GB)
> > > > Swap: almost 0%
> > > > Disk IO(write):
> 20~30MB / 200MB
> > (7.2k raid5,
> > > benchmarked
> > > > previously)
> > > > NET: up to 100Mbps /
> 950Mbps
> > (1Gbps, tuned and
> > > benchmarked
> > > > previously)
> > > >
> > > > So the speed of
> generating load,
> > about 15M/s as
> >

Re: Nodes dropping out of cluster due to GC

2010-06-02 Thread Oleg Anastasjev

> 
> Has anyone experienced this sort of problem?  It would be great to hear from
> anyone who has had experience with this sort of issue and/or suggestions for
> how to deal with it.
> 
> Thanks, Eric  

Yes, i did. Symptoms you described point to concurrent GC FAILURE. During this
failure concurrent GC completely stops java program (i.e. cassandra) and does a
GC cycle. Other cassandra nodes discover, that node is not responding and
considering it dead.
If concurrent GC is properly tuned, it should never do stop-the-world and GC ( 
thats why it is called concurrent ;-) ).
Reasons for concurrent GC failures can be several:
1. Not enought java heap - try to raise max java heap limit
2. Improperly sized java heap regions.

To help you to narrow the problem, pass -XX:+PrintGCDetails option to JVM
launching cassandra node. This will log information about internal GC
activities. Let it run till it will be thrown out of cluster again and search
for "concurrent mode failure" or "promotion failed" strings.

Re: [SPAM ] Re: [SPAM ] Re: [SPAM ] Re: writing speed test

2010-06-02 Thread Shuai Yuan

Hi,

I tried,

1-consistency level ZERO

2-JVM heap 4GB

3-normal Memtable cache

and now I have about 30% improvment.

However I want to know if you have also done w/r benchmark and what's
the result?

?? 2010-06-02 11:35 +0800??lwl??
> and, why did you set "JVM has 8G heap"?
> 8g, seems too big.
> 
> ?? 2010??6??2?? 11:20??lwl ??
> 3.32 concurrent read & 128 write in storage-conf.xml, other
> cache
> enlarged as well.
> 
> 
> 
> maybe you can try to decrease the size of caches.
> 
> ?? 2010??6??2?? 11:14??Shuai Yuan
> ??
> 
> 
> ?? 2010-06-02 10:37 +0800??lwl??
> > is all the 4 servers' MEM  almost 100%?
> 
> 
> Yes
> 
> 
> > ?? 2010??6??2?? 10:12??Shuai Yuan
> ??
> > 
> > Thanks lwl.
> >
> > Then is there anyway of tuning this, faster
> flush to disk or
> > else?
> >
> > Cheers,
> >
> > Kevin
> >
> > ?? 2010-06-02 09:57 +0800??lwl??
> >
> > > MEM: almost 100% (16GB)
> > > -
> > > maybe this is the bottleneck.
> > > writing concerns Memtable and SSTable in
> memory.
> > >
> > > ?? 2010??6??2?? 9:48??Shuai Yuan
> > ??
> > > 
> > > ?? 2010-06-01 15:00 -0500??
> Jonathan Shook??
> > > > Also, what are you meaning
> specifically by 'slow'?
> > Which
> > > measurements
> > > > are you looking at. What are
> your baseline
> > constraints for
> > > your test
> > > > system?
> > > >
> > >
> > > Actually, the problem is the
> utilizaton of
> > resources(for a
> > > single
> > > machine):
> > > CPU: 700% / 1600% (16 cores)
> > > MEM: almost 100% (16GB)
> > > Swap: almost 0%
> > > Disk IO(write): 20~30MB / 200MB
> (7.2k raid5,
> > benchmarked
> > > previously)
> > > NET: up to 100Mbps / 950Mbps
> (1Gbps, tuned and
> > benchmarked
> > > previously)
> > >
> > > So the speed of generating load,
> about 15M/s as
> > reported
> > > before seems
> > > quite slow to me. I assume the
> system should get at
> > least
> > > about 50MB/s
> > > of Disk IO speed.
> > >
> > > MEM? I don't think it plays a
> major role in this
> > writing game.
> > > What's
> > > the bottleneck of the system?
> > >
> > > P.S
> > > about Consistency Level, I've
> tried ONE/DCQUORUM and
> > found ONE
> > > is about
> > > 10-15% faster. However that's
> neither a promising
> > result.
> > >
> > > Thanks!
> > >
> > > Kevin
> > >
> > > >
> > > > 2010/6/1 ??
> :
> > > > > Hi, It would be better if we
> know which
> > Consistency Level
> > > did you choose,
> > > > > and what is the schema of test
>

60 matches

Mail list logo