Re: Number of client connections
as far as I know, only the os level limitations, e.g. typically ~60k On Thu, Jun 3, 2010 at 9:34 AM, Lev Stesin wrote: > Hi, > > Is there a limit on the number of client connections to a node? Thanks. > > -- > Lev >
Number of client connections
Hi, Is there a limit on the number of client connections to a node? Thanks. -- Lev
Re: nodetool cleanup isn't cleaning up?
getRangeToEndpointMap is very useful, thanks, I didn't know about it... however, I've reconfigured my cluster since (moved some nodes and tokens) so not the problem is gone. I guess I'll use getRangeToEndpointMap next time I see something like this... On Thu, Jun 3, 2010 at 9:15 AM, Jonathan Ellis wrote: > Then the next step is to check StorageService.getRangeToEndpointMap via jmx > > On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory wrote: > > I'm using RackAwareStrategy. But it still doesn't make sense I think... > > let's see what did I miss... > > According to http://wiki.apache.org/cassandra/Operations > > > > RackAwareStrategy: replica 2 is placed in the first node along the ring > the > > belongs in another data center than the first; the remaining N-2 > replicas, > > if any, are placed on the first nodes along the ring in the same rack as > the > > first > > > > 192.168.252.124Up803.33 MB > > 56713727820156410577229101238628035242 |<--| > > 192.168.252.99Up 352.85 MB > > 56713727820156410577229101238628035243 | ^ > > 192.168.252.125Up134.24 MB > > 85070591730234615865843651857942052863 v | > > 192.168.254.57Up 676.41 MB > > 113427455640312821154458202477256070485| ^ > > 192.168.254.58Up 99.74 MB > > 141784319550391026443072753096570088106v | > > 192.168.254.59Up 99.94 MB > > 170141183460469231731687303715884105727|-->| > > Alright, so I made a mistake and didn't use the alternate-datacenter > > suggestion on the page so the first node of every DC is overloaded with > > replicas. However, the current situation still doesn't make sense to me. > > .252.124 will be overloaded b/c it has the first token in the 252 dc. > > .254.57 will also be overloaded since it has the first token in the .254 > DC. > > But for which node does 252.99 serve as a replicator? It's not the first > in > > the DC and it's just one single token more than it's predecessor (which > is > > in the same DC). > > On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis > wrote: > >> > >> I'm saying that .99 is getting a copy of all the data for which .124 > >> is the primary. (If you are using RackUnawarePartitioner. If you are > >> using RackAware it is some other node.) > >> > >> On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory wrote: > >> > ok, let me try and translate your answer ;) > >> > Are you saying that the data that was left on the node is > >> > non-primary-replicas of rows from the time before the move? > >> > So this implies that when a node moves in the ring, it will affect > >> > distribution of: > >> > - new keys > >> > - old keys primary node > >> > -- but will not affect distribution of old keys non-primary replicas. > >> > If so, still I don't understand something... I would expect even the > >> > non-primary replicas of keys to be moved since if they don't, how > would > >> > they > >> > be found? I mean upon reads the serving node should not care about > >> > whether > >> > the row is new or old, it should have a consistent and global mapping > of > >> > tokens. So I guess this ruins my theory... > >> > What did you mean then? Is this deletions of non-primary replicated > >> > data? > >> > How does the replication factor affect the load on the moved host > then? > >> > > >> > On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis > >> > wrote: > >> >> > >> >> well, there you are then. > >> >> > >> >> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory > wrote: > >> >> > yes, replication factor = 2 > >> >> > > >> >> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis < > jbel...@gmail.com> > >> >> > wrote: > >> >> >> > >> >> >> you have replication factor > 1 ? > >> >> >> > >> >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory > >> >> >> wrote: > >> >> >> > I hope I understand nodetool cleanup correctly - it should clean > >> >> >> > up > >> >> >> > all > >> >> >> > data > >> >> >> > that does not (currently) belong to this node. If so, I think it > >> >> >> > might > >> >> >> > not > >> >> >> > be working correctly. > >> >> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below > >> >> >> > 192.168.252.99Up 279.35 MB > >> >> >> > 3544607988759775661076818827414252202 > >> >> >> > |<--| > >> >> >> > 192.168.252.124Up 167.23 MB > >> >> >> > 56713727820156410577229101238628035242 | ^ > >> >> >> > 192.168.252.125Up 82.91 MB > >> >> >> > 85070591730234615865843651857942052863 v | > >> >> >> > 192.168.254.57Up 366.6 MB > >> >> >> > 113427455640312821154458202477256070485| ^ > >> >> >> > 192.168.254.58Up 88.44 MB > >> >> >> > 141784319550391026443072753096570088106v | > >> >> >> > 192.168.254.59Up 88.45 MB > >> >> >> > 170141183460469231731687303715884105727|-->| > >> >> >> > I wanted 124 to take all the load from 99. So I issued a move > >> >> >> > command. > >> >> >> > $ nodetool -h cass99 -p 9004 move > >> >> >> > 56713727820156410577229101238628035243 > >> >> >> > > >>
Re: Effective cache size
On Wed, Jun 2, 2010 at 10:39 PM, David King wrote: > If I go to fetch some row given the rack-unaware placement strategy, the > default snitch and CL==ONE, the node that is asked is the first node in the > ring with the datum that is currently up, then a checksum is sent to the > replicas to trigger read repair as appropriate. Yes > So with the row cache, that first node (the primary replica) is the one that > has that row cached, yes? No, it's the closest node as determined by snitch.sortByProximity. any given node X will never know whether another node Y has a row cached or not. the overhead for communicating that level of detail would be totally prohibitive. all caching does is speed the read, once a request is received for data local to a given node. no more, no less. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: ColumnFamilyInputFormat with super columns
We don't support supercolumns in CFIF yet. Peng Guo added this in his patchset at http://files.cnblogs.com/gpcuster/CassandraInputFormat.rar but it's mixed in with a ton of other changes. Honestly it's probably easier to start fresh, but it might be useful to look at his code for inspiration. On Wed, Jun 2, 2010 at 2:41 PM, Torsten Curdt wrote: > I have a super column along he lines of > > => { => { att: value }} > > Now I would like to process a set of rows [from_time..until_time] with Hadoop. > I've setup the hadoop job like this > > job.setInputFormatClass(ColumnFamilyInputFormat.class); > ConfigHelper.setColumnFamily(job.getConfiguration(), "family", > "events"); > > SlicePredicate predicate = new SlicePredicate(); > predicate.setSlice_range(new SliceRange(new byte[0], new > byte[0], > false, 1000)); > > ConfigHelper.setSlicePredicate(job.getConfiguration(), > predicate); > > but I don't see how I could say what rows the job should process. > Any pointers? > > cheers > -- > Torsten > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Nodes dropping out of cluster due to GC
remember: you get concurrent mode failures, when the old gen fills up with garbage before it can finish the CMS. so adding capacity = reducing load per machine is the easiest way to make this a non-issue. On Wed, Jun 2, 2010 at 12:45 PM, Eric Halpern wrote: > > > Ryan King wrote: >> >> Why run with so few nodes? >> >> -ryan >> >> On Tue, Jun 1, 2010 at 4:20 PM, Eric Halpern wrote: >>> >>> Hello, >>> >>> We're running a 4 node cluster on beefy EC2 virtual instances (8 core, 32 >>> GB) using EBS storage with 8 GB of heap allocated to the JVM. >>> >>> Every couple of hours, each of the nodes does a concurrent mark/sweep >>> that >>> takes around 30 seconds to complete. During that GC, the node >>> temporarily >>> drops out of the cluster, usually for about 15 seconds. >>> >>> The frequency of the concurrent mark sweeps seems reasonable, but the >>> fact >>> that the node drops out of the cluster temporarily is a major problem >>> since >>> this has significant impact on the performance and stability of our >>> service. >>> >>> Has anyone experienced this sort of problem? It would be great to hear >>> from >>> anyone who has had experience with this sort of issue and/or suggestions >>> for >>> how to deal with it. >>> >>> Thanks, Eric >>> -- >> >> > > We wanted to start with a small number of nodes to test things out before > going big. Is there some reason that a small cluster would cause more > problems in this regard. The actual request load is actually pretty light > for the cluster. > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Nodes-dropping-out-of-cluster-due-to-GC-tp5128481p5132279.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com. > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Heterogeneous Cassandra Cluster
No. And if we did it would be a bad idea: good ops practice is to _minimize_ variability. On Wed, Jun 2, 2010 at 3:18 AM, David Boxenhorn wrote: > Is it possible to make a heterogeneous Cassandra cluster, with both Linux > and Windows nodes? I tried doing it and got > > Error in ThreadPoolExecutor > java.lang.NullPointerException > > Not sure if this is due to the Linux/Windows mix or something else. > > > Details below: > > > > [r...@iqdev01 cassandra]# bin/cassandra -f > > INFO 20:32:26,431 Auto DiskAccessMode determined to be mmap > > INFO 20:32:27,085 Sampling index for > /var/lib/cassandra/data/system/LocationInfo-1-Data.db > > INFO 20:32:27,095 Sampling index for > /var/lib/cassandra/data/system/LocationInfo-2-Data.db > > INFO 20:32:27,104 Replaying > /var/lib/cassandra/commitlog/CommitLog-1275412410865.log > > INFO 20:32:27,129 Creating new commitlog segment > /var/lib/cassandra/commitlog/CommitLog-1275413547129.log > > INFO 20:32:27,138 LocationInfo has reached its threshold; switching in a > fresh Memtable at > CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1275413547129.log', > position=173) > > INFO 20:32:27,138 Enqueuing flush of Memtable(LocationInfo)@1491010616 > > INFO 20:32:27,139 Writing Memtable(LocationInfo)@1491010616 > > INFO 20:32:27,187 Completed flushing > /var/lib/cassandra/data/system/LocationInfo-3-Data.db > > INFO 20:32:27,207 Log replay complete > > INFO 20:32:27,239 Saved Token found: 25870423804996813139937576731363583348 > > INFO 20:32:27,239 Saved ClusterName found: Lookin2 > > INFO 20:32:27,247 Starting up server gossip > > INFO 20:32:27,266 Joining: getting load information > > INFO 20:32:27,267 Sleeping 9 ms to wait for load information... > > INFO 20:32:27,327 Node /192.168.80.12 is now part of the cluster > > INFO 20:32:27,332 Node /192.168.80.234 is now part of the cluster > > INFO 20:32:27,864 InetAddress /192.168.80.12 is now UP > > INFO 20:32:27,872 InetAddress /192.168.80.234 is now UP > > INFO 20:33:57,269 Joining: getting bootstrap token > > INFO 20:33:57,278 New token will be 25870423804996813139937576731363583348 > to assume load from /192.168.80.12 > > INFO 20:33:57,279 Joining: sleeping 3 for pending range setup > > INFO 20:34:27,280 Bootstrapping > > INFO 21:32:27,867 Compacting [] > > INFO 21:38:27,118 LocationInfo has reached its threshold; switching in a > fresh Memtable at > CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1275413547129.log', > position=824) > > INFO 21:38:27,118 Enqueuing flush of Memtable(LocationInfo)@993374707 > > INFO 21:38:27,118 Writing Memtable(LocationInfo)@993374707 > > INFO 21:38:27,158 Completed flushing > /var/lib/cassandra/data/system/LocationInfo-4-Data.db > > INFO 21:38:27,160 Compacting > [org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-1-Data.db'),org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-2-Data.db'),org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-3-Data.db'),org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-4-Data.db')] > > INFO 21:38:27,217 Compacted to > /var/lib/cassandra/data/system/LocationInfo-5-Data.db. 1294/358 bytes for 1 > keys. Time: 56ms. > > [r...@iqdev01 cassandra]# bin/cassandra -f > > INFO 21:40:07,519 Auto DiskAccessMode determined to be mmap > > INFO 21:40:07,972 Deleted > /var/lib/cassandra/data/system/LocationInfo-1-Data.db > > INFO 21:40:07,973 Deleted > /var/lib/cassandra/data/system/LocationInfo-2-Data.db > > INFO 21:40:07,974 Deleted > /var/lib/cassandra/data/system/LocationInfo-3-Data.db > > INFO 21:40:07,982 Sampling index for > /var/lib/cassandra/data/system/LocationInfo-5-Data.db > > INFO 21:40:07,991 Deleted > /var/lib/cassandra/data/system/LocationInfo-4-Data.db > > INFO 21:40:08,000 Replaying > /var/lib/cassandra/commitlog/CommitLog-1275413547129.log > > INFO 21:40:08,001 Log replay complete > > INFO 21:40:08,038 Saved Token found: 25870423804996813139937576731363583348 > > INFO 21:40:08,040 Saved ClusterName found: Lookin2 > > INFO 21:40:08,042 Creating new commitlog segment > /var/lib/cassandra/commitlog/CommitLog-1275417608042.log > > INFO 21:40:08,059 Starting up server gossip > > INFO 21:40:08,071 Joining: getting load information > > INFO 21:40:08,071 Sleeping 9 ms to wait for load information... > > INFO 21:40:10,372 Node /192.168.80.12 is now part of the cluster > > INFO 21:40:10,374 Node /192.168.80.234 is now part of the cluster > > INFO 21:40:11,091 InetAddress /192.168.80.234 is now UP > > INFO 21:40:12,078 InetAddress /192.168.80.12 is now UP > > INFO 21:41:38,072 Joining: getting bootstrap token > > INFO 21:41:38,088 New token will be 25870423804996813139937576731363583348 > to assume load from /192.168.80.12 > > INFO 21:41:38,089 Joining: sleeping 3 for pending range setup > > INFO 21:42:08,091 Bootstrapping > >
Re: Handling disk-full scenarios
this is why JBOD configuration is contraindicated for cassandra. http://wiki.apache.org/cassandra/CassandraHardware On Tue, Jun 1, 2010 at 1:08 PM, Ian Soboroff wrote: > My nodes have 5 disks and are using them separately as data disks. The > usage on the disks is not uniform, and one is nearly full. Is there some > way to manually balance the files across the disks? Pretty much anything > done via nodetool incurs an anticompaction with obviously fails. system/ is > not the problem, it's in my data's keyspace. > > Ian > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Start key must sort before (or equal to) finish key in your partitioner
that would be reasonable On Wed, Jun 2, 2010 at 6:41 AM, David Boxenhorn wrote: > Would it be better to use an SQL-style timestamp ("-MM-DD HH:MM:SS.MMM") > + unique id, then? They sort lexically the same as they sort > chronologically. > > On Wed, Jun 2, 2010 at 4:37 PM, Leslie Viljoen > wrote: >> >> On Mon, May 31, 2010 at 8:52 PM, Jonathan Ellis wrote: >> > OPP uses lexical ordering on the keys, which isn't going to be the >> > same as the natural order for a time-based uuid. >> >> *palmface* > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Is there any way to detect when a node is down so I can failover more effectively?
you're overcomplicating things. just connect to *a* node, and if it happens to be down, try a different one. nodes being down should be a rare event, not a normal condition. no need to optimize for it so much. also see http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to 2010/6/1 Patricio Echagüe : > Hi all, I'm using Hector framework to interact with Cassandra and at trying > to handle failover more effectively I found it a bit complicated to fetch > all cassandra nodes that are up and running. > > My goal is to keep an up-to-date list of active/up Cassandra servers to > provide HEctor every time I need to execute against the db. > > I've seen this Thrift method: get_string_property("token map") but it > returns the nodes in the ring no matter is the node is down. > > > > Any advice? > > -- > Patricio.- > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: nodetool cleanup isn't cleaning up?
Then the next step is to check StorageService.getRangeToEndpointMap via jmx On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory wrote: > I'm using RackAwareStrategy. But it still doesn't make sense I think... > let's see what did I miss... > According to http://wiki.apache.org/cassandra/Operations > > RackAwareStrategy: replica 2 is placed in the first node along the ring the > belongs in another data center than the first; the remaining N-2 replicas, > if any, are placed on the first nodes along the ring in the same rack as the > first > > 192.168.252.124Up 803.33 MB > 56713727820156410577229101238628035242 |<--| > 192.168.252.99Up 352.85 MB > 56713727820156410577229101238628035243 | ^ > 192.168.252.125Up 134.24 MB > 85070591730234615865843651857942052863 v | > 192.168.254.57Up 676.41 MB > 113427455640312821154458202477256070485 | ^ > 192.168.254.58Up 99.74 MB > 141784319550391026443072753096570088106 v | > 192.168.254.59Up 99.94 MB > 170141183460469231731687303715884105727 |-->| > Alright, so I made a mistake and didn't use the alternate-datacenter > suggestion on the page so the first node of every DC is overloaded with > replicas. However, the current situation still doesn't make sense to me. > .252.124 will be overloaded b/c it has the first token in the 252 dc. > .254.57 will also be overloaded since it has the first token in the .254 DC. > But for which node does 252.99 serve as a replicator? It's not the first in > the DC and it's just one single token more than it's predecessor (which is > in the same DC). > On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis wrote: >> >> I'm saying that .99 is getting a copy of all the data for which .124 >> is the primary. (If you are using RackUnawarePartitioner. If you are >> using RackAware it is some other node.) >> >> On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory wrote: >> > ok, let me try and translate your answer ;) >> > Are you saying that the data that was left on the node is >> > non-primary-replicas of rows from the time before the move? >> > So this implies that when a node moves in the ring, it will affect >> > distribution of: >> > - new keys >> > - old keys primary node >> > -- but will not affect distribution of old keys non-primary replicas. >> > If so, still I don't understand something... I would expect even the >> > non-primary replicas of keys to be moved since if they don't, how would >> > they >> > be found? I mean upon reads the serving node should not care about >> > whether >> > the row is new or old, it should have a consistent and global mapping of >> > tokens. So I guess this ruins my theory... >> > What did you mean then? Is this deletions of non-primary replicated >> > data? >> > How does the replication factor affect the load on the moved host then? >> > >> > On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis >> > wrote: >> >> >> >> well, there you are then. >> >> >> >> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory wrote: >> >> > yes, replication factor = 2 >> >> > >> >> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis >> >> > wrote: >> >> >> >> >> >> you have replication factor > 1 ? >> >> >> >> >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory >> >> >> wrote: >> >> >> > I hope I understand nodetool cleanup correctly - it should clean >> >> >> > up >> >> >> > all >> >> >> > data >> >> >> > that does not (currently) belong to this node. If so, I think it >> >> >> > might >> >> >> > not >> >> >> > be working correctly. >> >> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below >> >> >> > 192.168.252.99Up 279.35 MB >> >> >> > 3544607988759775661076818827414252202 >> >> >> > |<--| >> >> >> > 192.168.252.124Up 167.23 MB >> >> >> > 56713727820156410577229101238628035242 | ^ >> >> >> > 192.168.252.125Up 82.91 MB >> >> >> > 85070591730234615865843651857942052863 v | >> >> >> > 192.168.254.57Up 366.6 MB >> >> >> > 113427455640312821154458202477256070485 | ^ >> >> >> > 192.168.254.58Up 88.44 MB >> >> >> > 141784319550391026443072753096570088106 v | >> >> >> > 192.168.254.59Up 88.45 MB >> >> >> > 170141183460469231731687303715884105727 |-->| >> >> >> > I wanted 124 to take all the load from 99. So I issued a move >> >> >> > command. >> >> >> > $ nodetool -h cass99 -p 9004 move >> >> >> > 56713727820156410577229101238628035243 >> >> >> > >> >> >> > This command tells 99 to take the space b/w >> >> >> > >> >> >> > >> >> >> > >> >> >> > (56713727820156410577229101238628035242, 56713727820156410577229101238628035243] >> >> >> > which is basically just one item in the token space, almost >> >> >> > nothing... I >> >> >> > wanted it to be very slim (just playing around). >> >> >> > So, next I get this: >> >> >> > 192.168.252.124Up 803.33 MB >> >> >> > 56713727820156410577229101238628035242 |<--| >> >> >> > 192.168.252.99Up 352.85 MB >> >> >> > 56713727820156410577229101
Re: Monitoring compaction
Sure, patching CM stats into nodetool is fine. On Tue, Jun 1, 2010 at 9:50 AM, Ian Soboroff wrote: > Regarding compaction thresholds... the BMT example says to set the threshold > to 0 during an import. Is this advisable during any bulk import (say using > batch mutations or just lots and lots of thrift inserts)? > > Also, when I asked "are folks open to..." I meant that I'm happy to code a > patch if anyone's interested. > Ian > > On Tue, Jun 1, 2010 at 12:41 PM, Ian Soboroff wrote: >> >> Thanks. Are folks open to exposing this via nodetool? I've been trying >> to figure out a decent way to aggregate and expose all this information that >> is easier than nodetool and less noisy than nagios... suggestions >> appreciated. >> >> (My cluster only exposes a master node and everything else is private, so >> running a pile of jconsoles is not even possible...) >> >> Ian >> >> On Tue, Jun 1, 2010 at 12:33 PM, Dylan Egan / WildfireApp.com >> wrote: >>> >>> Hi Ian, >>> >>> On Tue, Jun 1, 2010 at 9:27 AM, Ian Soboroff wrote: >>> > Are stats exposed over JMX for compaction? >>> >>> You can view them via the >>> org.apache.cassandra.db:type=CompactionManager MBean. The PendingTasks >>> attribute might suit you best. >>> >>> Cheers, >>> >>> Dylan. >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Effective cache size
If I go to fetch some row given the rack-unaware placement strategy, the default snitch and CL==ONE, the node that is asked is the first node in the ring with the datum that is currently up, then a checksum is sent to the replicas to trigger read repair as appropriate. So with the row cache, that first node (the primary replica) is the one that has that row cached, yes? So if i have six nodes, CL==ONE, RF==3, row cache of 3 million on each node. Do I have an effective 6 million row cache (3m*6/3)? Or 18m? And is that changed by doing CL==QUORUM reads?
Re: Continuously increasing RAM usage
I've started seeing this issue as well. Running 0.6.2. One interesting thing I happened upon, I explicitly called the GC via jconsole and the heap dropped completely fixing the issue. When you explicitly call System.gc() it does a full sweep. I'm wondering if this issue is to do with the GC flags used. -Jake On Wed, Jun 2, 2010 at 3:09 PM, Torsten Curdt wrote: > We've also seen something like this. Will soon investigate and try > again with 0.6.2 > > On Wed, Jun 2, 2010 at 20:27, Paul Brown wrote: > > > > FWIW, I'm seeing similar issues on a cluster. Three nodes, Cassandra > 0.6.1, SUN JDK 1.6.0_b20. I will try to get some heap dumps to see what's > building up. > > > > I've seen this sort of issue in systems that make heavy use of > java.util.concurrent queues/executors, e.g.: > > > > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6236036 > > > > That bug is long fixed, but it is an instance of how it can be harder to > do nothing than something. > > > > -- Paul > > > > > > On May 26, 2010, at 11:32 PM, James Golick wrote: > > > >> We're seeing RAM usage continually climb until eventually, cassandra > becomes unresponsive. > >> > >> The JVM isn't OOM'ing. It has only committed 14/24GB of memory. So, I am > assuming that the memory usage is related to mmap'd IO. Fair assumption? > >> > >> I tried setting the IO mode to standard, but it seemed to be a little > slower and couldn't get the machine to come back online with adequate read > performance, so I set it back. I'll have to write a solid cache warming > script if I'm going to try that again. > >> > >> Any other ideas for what might be causing the issue? Is there something > I should monitor or look at next time it happens? > >> > >> Thanks > > > > >
Re: Read operation with CL.ALL, not yet supported?
Gary, Thanks for reply. I've opened an issue at https://issues.apache.org/jira/browse/CASSANDRA-1152 Yuki 2010/6/3 Gary Dusbabek : > Yuki, > > Can you file a jira ticket for this > (https://issues.apache.org/jira/browse/CASSANDRA)? The wiki indicates > that this should be allowed: http://wiki.apache.org/cassandra/API > > Regards, > > Gary. > > > On Tue, Jun 1, 2010 at 21:50, Yuki Morishita wrote: >> Hi, >> >> I'm testing several read operations(get, get_slice, get_count, etc.) with >> various ConsistencyLevel and noticed that ConsistencyLevel.ALL is >> "not yet supported" in most of read ops (other than get_range_slice). >> >> I've looked up code in StorageProxy#readProtocol and it seems >> to be able to handle CL.ALL, but in thrift.CassandraServer#readColumnFamily, >> there is code that just throws exception when consistency_level == ALL. >> Is there any reason that CL.ALL is "not yet supported"? >> >> >> Yuki Morishita >> t:yukim (http://twitter.com/yukim) >> > -- Yuki Morishita t:yukim (http://twitter.com/yukim)
ColumnFamilyInputFormat with super columns
I have a super column along he lines of => { => { att: value }} Now I would like to process a set of rows [from_time..until_time] with Hadoop. I've setup the hadoop job like this job.setInputFormatClass(ColumnFamilyInputFormat.class); ConfigHelper.setColumnFamily(job.getConfiguration(), "family", "events"); SlicePredicate predicate = new SlicePredicate(); predicate.setSlice_range(new SliceRange(new byte[0], new byte[0], false, 1000)); ConfigHelper.setSlicePredicate(job.getConfiguration(), predicate); but I don't see how I could say what rows the job should process. Any pointers? cheers -- Torsten
Re: Changing replication factor from 2 to 3
On 6/2/10 12:49 PM, Eric Halpern wrote: We'd like to double our cluster size from 4 to 8 and increase our replication factor from 2 to 3. Is there any special procedure we need to follow to increase replication? Is it sufficient to just start the new nodes with the replication factor of 3 and then reconfigure the existing nodes to the replication factor one at a time? http://wiki.apache.org/cassandra/Operations " Replication factor is not really intended to be changed in a live cluster either, but increasing it may be done if you (a) use ConsistencyLevel.QUORUM or ALL (depending on your existing replication factor) to make sure that a replica that actually has the data is consulted, (b) are willing to accept downtime while anti-entropy repair runs (see below), or (c) are willing to live with some clients potentially being told no data exists if they read from the new replica location(s) until repair is done. " Please feel free to update this wiki page if the above information is incomplete in any way. :) =Rob
Changing replication factor from 2 to 3
We'd like to double our cluster size from 4 to 8 and increase our replication factor from 2 to 3. Is there any special procedure we need to follow to increase replication? Is it sufficient to just start the new nodes with the replication factor of 3 and then reconfigure the existing nodes to the replication factor one at a time? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Changing-replication-factor-from-2-to-3-tp5132290p5132290.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Nodes dropping out of cluster due to GC
Ryan King wrote: > > Why run with so few nodes? > > -ryan > > On Tue, Jun 1, 2010 at 4:20 PM, Eric Halpern wrote: >> >> Hello, >> >> We're running a 4 node cluster on beefy EC2 virtual instances (8 core, 32 >> GB) using EBS storage with 8 GB of heap allocated to the JVM. >> >> Every couple of hours, each of the nodes does a concurrent mark/sweep >> that >> takes around 30 seconds to complete. During that GC, the node >> temporarily >> drops out of the cluster, usually for about 15 seconds. >> >> The frequency of the concurrent mark sweeps seems reasonable, but the >> fact >> that the node drops out of the cluster temporarily is a major problem >> since >> this has significant impact on the performance and stability of our >> service. >> >> Has anyone experienced this sort of problem? It would be great to hear >> from >> anyone who has had experience with this sort of issue and/or suggestions >> for >> how to deal with it. >> >> Thanks, Eric >> -- > > We wanted to start with a small number of nodes to test things out before going big. Is there some reason that a small cluster would cause more problems in this regard. The actual request load is actually pretty light for the cluster. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Nodes-dropping-out-of-cluster-due-to-GC-tp5128481p5132279.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Nodes dropping out of cluster due to GC
Oleg Anastasjev wrote: > >> >> Has anyone experienced this sort of problem? It would be great to hear >> from >> anyone who has had experience with this sort of issue and/or suggestions >> for >> how to deal with it. >> >> Thanks, Eric > > Yes, i did. Symptoms you described point to concurrent GC FAILURE. During > this > failure concurrent GC completely stops java program (i.e. cassandra) and > does a > GC cycle. Other cassandra nodes discover, that node is not responding and > considering it dead. > If concurrent GC is properly tuned, it should never do stop-the-world and > GC ( > thats why it is called concurrent ;-) ). > Reasons for concurrent GC failures can be several: > 1. Not enought java heap - try to raise max java heap limit > 2. Improperly sized java heap regions. > > To help you to narrow the problem, pass -XX:+PrintGCDetails option to JVM > launching cassandra node. This will log information about internal GC > activities. Let it run till it will be thrown out of cluster again and > search > for "concurrent mode failure" or "promotion failed" strings. > We did indeed have a problem with our GC settings. The survivor ratio was too low. After changing that things are better but we are still seeing GC that takes 5-10 seconds, which is enough for the node to drop out of the cluster briefly. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Nodes-dropping-out-of-cluster-due-to-GC-tp5128481p5132267.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Continuously increasing RAM usage
We've also seen something like this. Will soon investigate and try again with 0.6.2 On Wed, Jun 2, 2010 at 20:27, Paul Brown wrote: > > FWIW, I'm seeing similar issues on a cluster. Three nodes, Cassandra 0.6.1, > SUN JDK 1.6.0_b20. I will try to get some heap dumps to see what's building > up. > > I've seen this sort of issue in systems that make heavy use of > java.util.concurrent queues/executors, e.g.: > > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6236036 > > That bug is long fixed, but it is an instance of how it can be harder to do > nothing than something. > > -- Paul > > > On May 26, 2010, at 11:32 PM, James Golick wrote: > >> We're seeing RAM usage continually climb until eventually, cassandra becomes >> unresponsive. >> >> The JVM isn't OOM'ing. It has only committed 14/24GB of memory. So, I am >> assuming that the memory usage is related to mmap'd IO. Fair assumption? >> >> I tried setting the IO mode to standard, but it seemed to be a little slower >> and couldn't get the machine to come back online with adequate read >> performance, so I set it back. I'll have to write a solid cache warming >> script if I'm going to try that again. >> >> Any other ideas for what might be causing the issue? Is there something I >> should monitor or look at next time it happens? >> >> Thanks > >
Re: Giant sets of ordered data
Insert "if you want to use long values for keys and column names" above paragraph 2. I forgot that part. On Wed, Jun 2, 2010 at 1:29 PM, Jonathan Shook wrote: > If you want to do range queries on the keys, you can use OPP to do this: > (example using UTF-8 lexicographic keys, with bursts split across rows > according to row size limits) > > Events: { > "20100601.05.30.003": { > "20100601.05.30.003": > "20100601.05.30.007": > ... > } > } > > With a future version of Cassandra, you may be able to use the same > basic datatype for both key and column name, as keys will be binary > like the rest, I believe. > > I'm not aware of specific performance improvements when using OPP > range queries on keys vs iterating over known keys. I suspect (hope) > that round-tripping to the server should be reduced, which may be > significant. Does anybody have decent benchmarks that tell the > difference? > > > On Wed, Jun 2, 2010 at 11:53 AM, Ben Browning wrote: >> With a traffic pattern like that, you may be better off storing the >> events of each burst (I'll call them group) in one or more keys and >> then storing these keys in the day key. >> >> EventGroupsPerDay: { >> "20100601": { >> 123456789: "group123", // column name is timestamp group was >> received, column value is key >> 123456790: "group124" >> } >> } >> >> EventGroups: { >> "group123": { >> 123456789: "value1", >> 123456799: "value2" >> } >> } >> >> If you think of Cassandra as a toolkit for building scalable indexes >> it seems to make the modeling a bit easier. In this case, you're >> building an index by day to lookup events that come in as groups. So, >> first you'd fetch the slice of columns for the day you're interested >> in to figure out which groups to look at then you'd fetch the events >> in those groups. >> >> There are plenty of alternate ways to divide up the data among rows >> also - you could use hour keys instead of days as an example. >> >> On Wed, Jun 2, 2010 at 11:57 AM, David Boxenhorn wrote: >>> Let's say you're logging events, and you have billions of events. What if >>> the events come in bursts, so within a day there are millions of events, but >>> they all come within microseconds of each other a few times a day? How do >>> you find the events that happened on a particular day if you can't store >>> them all in one row? >>> >>> On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook wrote: Either OPP by key, or within a row by column name. I'd suggest the latter. If you have structured data to stick under a column (named by the timestamp), then you can serialize and unserialize it yourself, or you can use a supercolumn. It's effectively the same thing. Cassandra only provides the super column support as a convenience layer as it is currently implemented. That may change in the future. You didn't make clear in your question why a standard column would be less suitable. I presumed you had layered structure within the timestamp, hence my response. How would you logically partition your dataset according to natural application boundaries? This will answer most of your question. If you have a dataset which can't be partitioned into a reasonable size row, then you may want to use OPP and key concatenation. What do you mean by giant? On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn wrote: > How do I handle giant sets of ordered data, e.g. by timestamps, which I > want > to access by range? > > I can't put all the data into a supercolumn, because it's loaded into > memory > at once, and it's too much data. > > Am I forced to use an order-preserving partitioner? I don't want the > headache. Is there any other way? > >>> >>> >> >
Re: Giant sets of ordered data
If you want to do range queries on the keys, you can use OPP to do this: (example using UTF-8 lexicographic keys, with bursts split across rows according to row size limits) Events: { "20100601.05.30.003": { "20100601.05.30.003": "20100601.05.30.007": ... } } With a future version of Cassandra, you may be able to use the same basic datatype for both key and column name, as keys will be binary like the rest, I believe. I'm not aware of specific performance improvements when using OPP range queries on keys vs iterating over known keys. I suspect (hope) that round-tripping to the server should be reduced, which may be significant. Does anybody have decent benchmarks that tell the difference? On Wed, Jun 2, 2010 at 11:53 AM, Ben Browning wrote: > With a traffic pattern like that, you may be better off storing the > events of each burst (I'll call them group) in one or more keys and > then storing these keys in the day key. > > EventGroupsPerDay: { > "20100601": { > 123456789: "group123", // column name is timestamp group was > received, column value is key > 123456790: "group124" > } > } > > EventGroups: { > "group123": { > 123456789: "value1", > 123456799: "value2" > } > } > > If you think of Cassandra as a toolkit for building scalable indexes > it seems to make the modeling a bit easier. In this case, you're > building an index by day to lookup events that come in as groups. So, > first you'd fetch the slice of columns for the day you're interested > in to figure out which groups to look at then you'd fetch the events > in those groups. > > There are plenty of alternate ways to divide up the data among rows > also - you could use hour keys instead of days as an example. > > On Wed, Jun 2, 2010 at 11:57 AM, David Boxenhorn wrote: >> Let's say you're logging events, and you have billions of events. What if >> the events come in bursts, so within a day there are millions of events, but >> they all come within microseconds of each other a few times a day? How do >> you find the events that happened on a particular day if you can't store >> them all in one row? >> >> On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook wrote: >>> >>> Either OPP by key, or within a row by column name. I'd suggest the latter. >>> If you have structured data to stick under a column (named by the >>> timestamp), then you can serialize and unserialize it yourself, or you >>> can use a supercolumn. It's effectively the same thing. Cassandra >>> only provides the super column support as a convenience layer as it is >>> currently implemented. That may change in the future. >>> >>> You didn't make clear in your question why a standard column would be >>> less suitable. I presumed you had layered structure within the >>> timestamp, hence my response. >>> How would you logically partition your dataset according to natural >>> application boundaries? This will answer most of your question. >>> If you have a dataset which can't be partitioned into a reasonable >>> size row, then you may want to use OPP and key concatenation. >>> >>> What do you mean by giant? >>> >>> On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn >>> wrote: >>> > How do I handle giant sets of ordered data, e.g. by timestamps, which I >>> > want >>> > to access by range? >>> > >>> > I can't put all the data into a supercolumn, because it's loaded into >>> > memory >>> > at once, and it's too much data. >>> > >>> > Am I forced to use an order-preserving partitioner? I don't want the >>> > headache. Is there any other way? >>> > >> >> >
Re: Continuously increasing RAM usage
FWIW, I'm seeing similar issues on a cluster. Three nodes, Cassandra 0.6.1, SUN JDK 1.6.0_b20. I will try to get some heap dumps to see what's building up. I've seen this sort of issue in systems that make heavy use of java.util.concurrent queues/executors, e.g.: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6236036 That bug is long fixed, but it is an instance of how it can be harder to do nothing than something. -- Paul On May 26, 2010, at 11:32 PM, James Golick wrote: > We're seeing RAM usage continually climb until eventually, cassandra becomes > unresponsive. > > The JVM isn't OOM'ing. It has only committed 14/24GB of memory. So, I am > assuming that the memory usage is related to mmap'd IO. Fair assumption? > > I tried setting the IO mode to standard, but it seemed to be a little slower > and couldn't get the machine to come back online with adequate read > performance, so I set it back. I'll have to write a solid cache warming > script if I'm going to try that again. > > Any other ideas for what might be causing the issue? Is there something I > should monitor or look at next time it happens? > > Thanks
Capacity planning and Re: Handling disk-full scenarios
Reading some more (someone break in when I lose my clue ;-) Reading the streams page in the wiki about anticompaction, I think the best approach to take when a node gets its disks overfull, is to set the compaction thresholds to 0 on all nodes, decommission the overfull node, wait for stuff to get redistributed, and then clean off the decommissioned node and bootstrap it. Since the disks are too full for an anticompaction, you can't move the token on that node. Given this, I wonder about the right approach to capacity planning. If I want to store, say, 500M rows, and I know based on current cfstats that the mean compacted size row is 27k, how much overhead is there on top of the 13.5 TB of raw data? Trying to compute from what I have, in cfstats I have a total "Spaced used (total)" of around 1.6TB (this is only a subset of the data loaded so far), but when I could data directories using du(1) I get around 23TB already used. On Wed, Jun 2, 2010 at 11:29 AM, Ian Soboroff wrote: > Ok, answered part of this myself. You can stop a node, move files around > on the data disks, as long as they stay in the right keyspace directories, > and all is fine. > > Now, I have a single Data.db file which is 900GB and is compacted. The > drive its on is only 1.5TB, so it can't anticompact at all. Is there > anything I can do? The replication factor is 3, so one idea is to take down > the node, blow away the huge file, adjust the token, and restart the node. > At that point I'm not sure what to tell the new node or other nodes to do... > do I need to run a repair, or a cleanup, or a loadbalance, or ... what? > > It would be great to be able to fix a storage quota on a per-data-directory > basis, to ensure that enough capacity is retained for anticompaction. > Default 45% quota, adjustable for the brave. > > Ian > > > On Tue, Jun 1, 2010 at 4:08 PM, Ian Soboroff wrote: > >> My nodes have 5 disks and are using them separately as data disks. The >> usage on the disks is not uniform, and one is nearly full. Is there some >> way to manually balance the files across the disks? Pretty much anything >> done via nodetool incurs an anticompaction with obviously fails. system/ is >> not the problem, it's in my data's keyspace. >> >> Ian >> >> >
Re: Read operation with CL.ALL, not yet supported?
Yuki, Can you file a jira ticket for this (https://issues.apache.org/jira/browse/CASSANDRA)? The wiki indicates that this should be allowed: http://wiki.apache.org/cassandra/API Regards, Gary. On Tue, Jun 1, 2010 at 21:50, Yuki Morishita wrote: > Hi, > > I'm testing several read operations(get, get_slice, get_count, etc.) with > various ConsistencyLevel and noticed that ConsistencyLevel.ALL is > "not yet supported" in most of read ops (other than get_range_slice). > > I've looked up code in StorageProxy#readProtocol and it seems > to be able to handle CL.ALL, but in thrift.CassandraServer#readColumnFamily, > there is code that just throws exception when consistency_level == ALL. > Is there any reason that CL.ALL is "not yet supported"? > > > Yuki Morishita > t:yukim (http://twitter.com/yukim) >
Re: Error during startup
I was able to reproduce the error by staring up a node using RandomPartioner, kill it, switch to OrderPreservingPartitioner, restart, kill, switch back to RandomPartitioner, BANG! So it looks like you tinkered with the partitioner at some point. This has the unfortunate effect of corrupting your system table. I'm trying to figure out a way to detect this and abort before data is overwritten. Gary. On Sun, May 30, 2010 at 06:49, David Boxenhorn wrote: > I deleted the system/LocationInfo files, and now everything works. > > Yay! (...what happened?) > > On Sun, May 30, 2010 at 4:18 PM, David Boxenhorn wrote: >> >> I'm getting an "Expected both token and generation columns; found >> ColumnFamily" error during startup can anyone tell me what it is? Details >> below. >> >> Starting Cassandra Server >> Listening for transport dt_socket at address: >> INFO 16:14:33,459 Auto DiskAccessMode determined to be standard >> INFO 16:14:33,615 Sampling index for >> C:\var\lib\cassandra\data\system\LocationInfo-1-Data.db >> INFO 16:14:33,631 Removing orphan >> C:\var\lib\cassandra\data\Lookin2\Users-tmp-27-Index.db >> INFO 16:14:33,631 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\Users-19-Data.db >> INFO 16:14:33,662 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\Users-18-Data.db >> INFO 16:14:33,818 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\Users-20-Data.db >> INFO 16:14:33,850 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\Users-21-Data.db >> INFO 16:14:33,865 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\Users-22-Data.db >> INFO 16:14:33,881 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-580-Data.db >> INFO 16:14:33,896 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-672-Data.db >> INFO 16:14:33,912 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-681-Data.db >> INFO 16:14:33,912 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-691-Data.db >> INFO 16:14:33,928 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-696-Data.db >> INFO 16:14:33,943 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\Attractions-17-Data.db >> INFO 16:14:34,006 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestTrendsetterIdx-5-Data.db >> INFO 16:14:34,006 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestTrendsetterIdx-6-Data.db >> INFO 16:14:34,021 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-29-Data.db >> INFO 16:14:34,350 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-51-Data.db >> INFO 16:14:34,693 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-72-Data.db >> INFO 16:14:35,021 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-77-Data.db >> INFO 16:14:35,225 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-78-Data.db >> INFO 16:14:35,350 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-79-Data.db >> INFO 16:14:35,459 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-80-Data.db >> INFO 16:14:35,459 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\Taxonomy-1-Data.db >> INFO 16:14:35,475 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\Taxonomy-2-Data.db >> INFO 16:14:35,475 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\Content-30-Data.db >> INFO 16:14:35,631 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\Content-35-Data.db >> INFO 16:14:35,771 Sampling index for >> C:\var\lib\cassandra\data\Lookin2\Content-40-Data.db >> INFO 16:14:35,959 Compacting >> [org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-19-Data.db'),org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-20-Data.db'),org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-21-Data.db'),org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-22-Data.db')] >> ERROR 16:14:35,975 Exception encountered during startup. >> java.lang.RuntimeException: Expected both token and generation columns; >> found ColumnFamily(LocationInfo [Generation:false:4...@4,]) >> at >> org.apache.cassandra.db.SystemTable.initMetadata(SystemTable.java:159) >> at >> org.apache.cassandra.service.StorageService.initServer(StorageService.java:305) >> at >> org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:99) >> at >> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:177) >> Exception encountered during startup. >> > >
Re: Heterogeneous Cassandra Cluster
Our replication factor was 1, so that wasn't the problem. (We tried other replication factors too, just in case, but it didn't help.) On Wed, Jun 2, 2010 at 7:51 PM, Nahor > wrote: > On 2010-06-02 3:18, David Boxenhorn wrote: > >> Is it possible to make a heterogeneous Cassandra cluster, with both Linux >> and Windows nodes? I tried doing it and got >> >> Error in ThreadPoolExecutor >> java.lang.NullPointerException >> >> Not sure if this is due to the Linux/Windows mix or something else. >> >> >> Details below: >> > [...] > > >> INFO 21:42:08,091 Bootstrapping >> >> ERROR 21:49:03,526 Error in ThreadPoolExecutor >> >> java.lang.NullPointerException >> >>at >> org.apache.cassandra.streaming.StreamInitiateVerbHandler.getNewNames(StreamInitiateVerbHandler.java:154) >> >>at >> org.apache.cassandra.streaming.StreamInitiateVerbHandler.doVerb(StreamInitiateVerbHandler.java:76) >> >>at >> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40) >> >>at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> >>at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> >>at java.lang.Thread.run(Thread.java:619) >> >> ERROR 21:49:03,527 Fatal exception in thread >> Thread[MESSAGE-DESERIALIZER-POOL:1,5,main] >> >> java.lang.NullPointerException >> >>at >> org.apache.cassandra.streaming.StreamInitiateVerbHandler.getNewNames(StreamInitiateVerbHandler.java:154) >> >>at >> org.apache.cassandra.streaming.StreamInitiateVerbHandler.doVerb(StreamInitiateVerbHandler.java:76) >> >>at >> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40) >> >>at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> >>at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> >>at java.lang.Thread.run(Thread.java:619) >> >> >> > Looks like https://issues.apache.org/jira/browse/CASSANDRA-1136 > > Make sure you have enough nodes in your cluster to satisfy your replication > factor before you add any data. This is what seems to be source of the > problem in my case. > > That said, I was also using an heterogeneous system (Linux + Windows) but I > think I tested it with only Linux nodes too. > > >
Re: Giant sets of ordered data
With a traffic pattern like that, you may be better off storing the events of each burst (I'll call them group) in one or more keys and then storing these keys in the day key. EventGroupsPerDay: { "20100601": { 123456789: "group123", // column name is timestamp group was received, column value is key 123456790: "group124" } } EventGroups: { "group123": { 123456789: "value1", 123456799: "value2" } } If you think of Cassandra as a toolkit for building scalable indexes it seems to make the modeling a bit easier. In this case, you're building an index by day to lookup events that come in as groups. So, first you'd fetch the slice of columns for the day you're interested in to figure out which groups to look at then you'd fetch the events in those groups. There are plenty of alternate ways to divide up the data among rows also - you could use hour keys instead of days as an example. On Wed, Jun 2, 2010 at 11:57 AM, David Boxenhorn wrote: > Let's say you're logging events, and you have billions of events. What if > the events come in bursts, so within a day there are millions of events, but > they all come within microseconds of each other a few times a day? How do > you find the events that happened on a particular day if you can't store > them all in one row? > > On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook wrote: >> >> Either OPP by key, or within a row by column name. I'd suggest the latter. >> If you have structured data to stick under a column (named by the >> timestamp), then you can serialize and unserialize it yourself, or you >> can use a supercolumn. It's effectively the same thing. Cassandra >> only provides the super column support as a convenience layer as it is >> currently implemented. That may change in the future. >> >> You didn't make clear in your question why a standard column would be >> less suitable. I presumed you had layered structure within the >> timestamp, hence my response. >> How would you logically partition your dataset according to natural >> application boundaries? This will answer most of your question. >> If you have a dataset which can't be partitioned into a reasonable >> size row, then you may want to use OPP and key concatenation. >> >> What do you mean by giant? >> >> On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn >> wrote: >> > How do I handle giant sets of ordered data, e.g. by timestamps, which I >> > want >> > to access by range? >> > >> > I can't put all the data into a supercolumn, because it's loaded into >> > memory >> > at once, and it's too much data. >> > >> > Am I forced to use an order-preserving partitioner? I don't want the >> > headache. Is there any other way? >> > > >
Re: Heterogeneous Cassandra Cluster
On 2010-06-02 3:18, David Boxenhorn wrote: Is it possible to make a heterogeneous Cassandra cluster, with both Linux and Windows nodes? I tried doing it and got Error in ThreadPoolExecutor java.lang.NullPointerException Not sure if this is due to the Linux/Windows mix or something else. Details below: [...] INFO 21:42:08,091 Bootstrapping ERROR 21:49:03,526 Error in ThreadPoolExecutor java.lang.NullPointerException at org.apache.cassandra.streaming.StreamInitiateVerbHandler.getNewNames(StreamInitiateVerbHandler.java:154) at org.apache.cassandra.streaming.StreamInitiateVerbHandler.doVerb(StreamInitiateVerbHandler.java:76) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) ERROR 21:49:03,527 Fatal exception in thread Thread[MESSAGE-DESERIALIZER-POOL:1,5,main] java.lang.NullPointerException at org.apache.cassandra.streaming.StreamInitiateVerbHandler.getNewNames(StreamInitiateVerbHandler.java:154) at org.apache.cassandra.streaming.StreamInitiateVerbHandler.doVerb(StreamInitiateVerbHandler.java:76) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Looks like https://issues.apache.org/jira/browse/CASSANDRA-1136 Make sure you have enough nodes in your cluster to satisfy your replication factor before you add any data. This is what seems to be source of the problem in my case. That said, I was also using an heterogeneous system (Linux + Windows) but I think I tested it with only Linux nodes too.
Re: Nodes dropping out of cluster due to GC
Why run with so few nodes? -ryan On Tue, Jun 1, 2010 at 4:20 PM, Eric Halpern wrote: > > Hello, > > We're running a 4 node cluster on beefy EC2 virtual instances (8 core, 32 > GB) using EBS storage with 8 GB of heap allocated to the JVM. > > Every couple of hours, each of the nodes does a concurrent mark/sweep that > takes around 30 seconds to complete. During that GC, the node temporarily > drops out of the cluster, usually for about 15 seconds. > > The frequency of the concurrent mark sweeps seems reasonable, but the fact > that the node drops out of the cluster temporarily is a major problem since > this has significant impact on the performance and stability of our service. > > Has anyone experienced this sort of problem? It would be great to hear from > anyone who has had experience with this sort of issue and/or suggestions for > how to deal with it. > > Thanks, Eric > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Nodes-dropping-out-of-cluster-due-to-GC-tp5128481p5128481.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com. >
Re: Giant sets of ordered data
Let's say you're logging events, and you have billions of events. What if the events come in bursts, so within a day there are millions of events, but they all come within microseconds of each other a few times a day? How do you find the events that happened on a particular day if you can't store them all in one row? On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook wrote: > Either OPP by key, or within a row by column name. I'd suggest the latter. > If you have structured data to stick under a column (named by the > timestamp), then you can serialize and unserialize it yourself, or you > can use a supercolumn. It's effectively the same thing. Cassandra > only provides the super column support as a convenience layer as it is > currently implemented. That may change in the future. > > You didn't make clear in your question why a standard column would be > less suitable. I presumed you had layered structure within the > timestamp, hence my response. > How would you logically partition your dataset according to natural > application boundaries? This will answer most of your question. > If you have a dataset which can't be partitioned into a reasonable > size row, then you may want to use OPP and key concatenation. > > What do you mean by giant? > > On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn > wrote: > > How do I handle giant sets of ordered data, e.g. by timestamps, which I > want > > to access by range? > > > > I can't put all the data into a supercolumn, because it's loaded into > memory > > at once, and it's too much data. > > > > Am I forced to use an order-preserving partitioner? I don't want the > > headache. Is there any other way? > > >
Re: Giant sets of ordered data
Either OPP by key, or within a row by column name. I'd suggest the latter. If you have structured data to stick under a column (named by the timestamp), then you can serialize and unserialize it yourself, or you can use a supercolumn. It's effectively the same thing. Cassandra only provides the super column support as a convenience layer as it is currently implemented. That may change in the future. You didn't make clear in your question why a standard column would be less suitable. I presumed you had layered structure within the timestamp, hence my response. How would you logically partition your dataset according to natural application boundaries? This will answer most of your question. If you have a dataset which can't be partitioned into a reasonable size row, then you may want to use OPP and key concatenation. What do you mean by giant? On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn wrote: > How do I handle giant sets of ordered data, e.g. by timestamps, which I want > to access by range? > > I can't put all the data into a supercolumn, because it's loaded into memory > at once, and it's too much data. > > Am I forced to use an order-preserving partitioner? I don't want the > headache. Is there any other way? >
Re: Giant sets of ordered data
I like to model this kind of data as columns, where the timestamps are the column name (either longs, TimeUUIDs, or string depending on your usage). If you have too much data for a single row, you'd need to have multiple rows of these. For time-series data, it makes sense to use one row per minute/hour/day/year depending on the volume of your data. Something like the following: SomeTimeData: { // columnfamily "20100601": { // key, mmdd 123456789: "value1", // column name is milliseconds since epoch 123456799: "value2" }, "20100602": { 12345889: "value3" } } Now you can use column slices to retrieve all values between two time periods on a given day. If you need to support larger ranges you'll either have to slice columns from multiple keys or change the keys from mmdd to mm, , etc. There's a tradeoff here between row width and read speed. Reading 1000 columns as a continuous slice from a single row will be very fast but reading 1000 columns as slices from 10 keys won't be as fast. Ben On Wed, Jun 2, 2010 at 11:32 AM, David Boxenhorn wrote: > How do I handle giant sets of ordered data, e.g. by timestamps, which I want > to access by range? > > I can't put all the data into a supercolumn, because it's loaded into memory > at once, and it's too much data. > > Am I forced to use an order-preserving partitioner? I don't want the > headache. Is there any other way? >
Giant sets of ordered data
How do I handle giant sets of ordered data, e.g. by timestamps, which I want to access by range? I can't put all the data into a supercolumn, because it's loaded into memory at once, and it's too much data. Am I forced to use an order-preserving partitioner? I don't want the headache. Is there any other way?
Re: Handling disk-full scenarios
Ok, answered part of this myself. You can stop a node, move files around on the data disks, as long as they stay in the right keyspace directories, and all is fine. Now, I have a single Data.db file which is 900GB and is compacted. The drive its on is only 1.5TB, so it can't anticompact at all. Is there anything I can do? The replication factor is 3, so one idea is to take down the node, blow away the huge file, adjust the token, and restart the node. At that point I'm not sure what to tell the new node or other nodes to do... do I need to run a repair, or a cleanup, or a loadbalance, or ... what? It would be great to be able to fix a storage quota on a per-data-directory basis, to ensure that enough capacity is retained for anticompaction. Default 45% quota, adjustable for the brave. Ian On Tue, Jun 1, 2010 at 4:08 PM, Ian Soboroff wrote: > My nodes have 5 disks and are using them separately as data disks. The > usage on the disks is not uniform, and one is nearly full. Is there some > way to manually balance the files across the disks? Pretty much anything > done via nodetool incurs an anticompaction with obviously fails. system/ is > not the problem, it's in my data's keyspace. > > Ian > >
Re: Range search on keys not working?
Can you clarify what you mean by 'random between nodes' ? On Wed, Jun 2, 2010 at 8:15 AM, David Boxenhorn wrote: > I see. But we could make this work if the random partitioner was random only > between nodes, but was still ordered within each node. (Or if there were > another partitioner that did this.) That way we could get everything we need > from each node separately. The results would not be ordered, but they would > be correct. > > On Wed, Jun 2, 2010 at 4:09 PM, Sylvain Lebresne wrote: >> >> > So why do the "start" and "finish" range parameters exist? >> >> Because especially if you want to iterate over all your key (which as >> stated by Ben above >> is the only meaningful way to use get_range_slices() with the random >> partitionner), you'll >> want to paginate that. And that's where the 'start' and 'finish' are >> useful (to be fair, >> the 'finish' part is not so useful in practice with the random >> partitioner). >> >> -- >> Sylvain >> >> > >> > On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning wrote: >> >> >> >> Martin, >> >> >> >> On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller >> >> wrote: >> >> > I think you can specify an end key, but it should be a key which does >> >> > exist >> >> > in your column family. >> >> >> >> >> >> Logically, it doesn't make sense to ever specify an end key with >> >> random partitioner. If you specified a start key of "aaa" and and end >> >> key of "aac" you might get back as results "aaa", "zfc", "hik", etc. >> >> And, even if you have a key of "aab" it might not show up. Key ranges >> >> only make sense with order-preserving partitioner. The only time to >> >> ever use a key range with random partitioner is when you want to >> >> iterate over all keys in the CF. >> >> >> >> Ben >> >> >> >> >> >> > But maybe I'm off the track here and someone else here knows more >> >> > about >> >> > this >> >> > key range stuff. >> >> > >> >> > Martin >> >> > >> >> > >> >> > From: David Boxenhorn [mailto:da...@lookin2.com] >> >> > Sent: Wednesday, June 02, 2010 2:30 PM >> >> > To: user@cassandra.apache.org >> >> > Subject: Re: Range search on keys not working? >> >> > >> >> > In other words, I should check the values as I iterate, and stop >> >> > iterating >> >> > when I get out of range? >> >> > >> >> > I'll try that! >> >> > >> >> > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller >> >> > wrote: >> >> >> >> >> >> When not using OOP, you should not use something like 'CATEGORY/' as >> >> >> the >> >> >> end key. >> >> >> Use the empty string as the end key and limit the number of returned >> >> >> keys, >> >> >> as you did with >> >> >> the 'max' value. >> >> >> >> >> >> If I understand correctly, the end key is used to generate an end >> >> >> token >> >> >> by >> >> >> hashing it, and >> >> >> there is not the same correspondence between 'CATEGORY' and >> >> >> 'CATEGORY/' >> >> >> as >> >> >> for >> >> >> hash('CATEGORY') and hash('CATEGORY/'). >> >> >> >> >> >> At least, this was the explanation I gave myself when I had the same >> >> >> problem. >> >> >> >> >> >> The solution is to iterate through the keys by always using the last >> >> >> key >> >> >> returned as the >> >> >> start key for the next call to get_range_slices, and the to drop the >> >> >> first >> >> >> element from >> >> >> the result. >> >> >> >> >> >> HTH, >> >> >> Martin >> >> >> >> >> >> >> >> >> From: David Boxenhorn [mailto:da...@lookin2.com] >> >> >> Sent: Wednesday, June 02, 2010 2:01 PM >> >> >> To: user@cassandra.apache.org >> >> >> Subject: Re: Range search on keys not working? >> >> >> >> >> >> The previous thread where we discussed this is called, "key is >> >> >> sorted?" >> >> >> >> >> >> >> >> >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn >> >> >> wrote: >> >> >>> >> >> >>> I'm not using OPP. But I was assured on earlier threads (I asked >> >> >>> several >> >> >>> times to be sure) that it would work as stated below: the results >> >> >>> would not >> >> >>> be ordered, but they would be correct. >> >> >>> >> >> >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt >> >> >>> wrote: >> >> >> >> Sounds like you are not using an order preserving partitioner? >> >> >> >> On Wed, Jun 2, 2010 at 13:48, David Boxenhorn >> >> wrote: >> >> > Range search on keys is not working for me. I was assured in >> >> > earlier >> >> > threads >> >> > that range search would work, but the results would not be >> >> > ordered. >> >> > >> >> > I'm trying to get all the rows that start with "CATEGORY." >> >> > >> >> > I'm doing: >> >> > >> >> > String start = "CATEGORY."; >> >> > . >> >> > . >> >> > . >> >> > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, >> >> > "CATEGORY/", max) >> >> > . >> >> > . >> >> > . >> >> > >> >> > in a loop, setting start to the last key each time - bu
Re: Start key must sort before (or equal to) finish key in your partitioner
Would it be better to use an SQL-style timestamp ("-MM-DD HH:MM:SS.MMM") + unique id, then? They sort lexically the same as they sort chronologically. On Wed, Jun 2, 2010 at 4:37 PM, Leslie Viljoen wrote: > On Mon, May 31, 2010 at 8:52 PM, Jonathan Ellis wrote: > > OPP uses lexical ordering on the keys, which isn't going to be the > > same as the natural order for a time-based uuid. > > *palmface* >
Re: Start key must sort before (or equal to) finish key in your partitioner
On Mon, May 31, 2010 at 8:52 PM, Jonathan Ellis wrote: > OPP uses lexical ordering on the keys, which isn't going to be the > same as the natural order for a time-based uuid. *palmface*
Re: Range search on keys not working?
I see. But we could make this work if the random partitioner was random only between nodes, but was still ordered within each node. (Or if there were another partitioner that did this.) That way we could get everything we need from each node separately. The results would not be ordered, but they would be correct. On Wed, Jun 2, 2010 at 4:09 PM, Sylvain Lebresne wrote: > > So why do the "start" and "finish" range parameters exist? > > Because especially if you want to iterate over all your key (which as > stated by Ben above > is the only meaningful way to use get_range_slices() with the random > partitionner), you'll > want to paginate that. And that's where the 'start' and 'finish' are > useful (to be fair, > the 'finish' part is not so useful in practice with the random > partitioner). > > -- > Sylvain > > > > > On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning wrote: > >> > >> Martin, > >> > >> On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller > >> wrote: > >> > I think you can specify an end key, but it should be a key which does > >> > exist > >> > in your column family. > >> > >> > >> Logically, it doesn't make sense to ever specify an end key with > >> random partitioner. If you specified a start key of "aaa" and and end > >> key of "aac" you might get back as results "aaa", "zfc", "hik", etc. > >> And, even if you have a key of "aab" it might not show up. Key ranges > >> only make sense with order-preserving partitioner. The only time to > >> ever use a key range with random partitioner is when you want to > >> iterate over all keys in the CF. > >> > >> Ben > >> > >> > >> > But maybe I'm off the track here and someone else here knows more > about > >> > this > >> > key range stuff. > >> > > >> > Martin > >> > > >> > > >> > From: David Boxenhorn [mailto:da...@lookin2.com] > >> > Sent: Wednesday, June 02, 2010 2:30 PM > >> > To: user@cassandra.apache.org > >> > Subject: Re: Range search on keys not working? > >> > > >> > In other words, I should check the values as I iterate, and stop > >> > iterating > >> > when I get out of range? > >> > > >> > I'll try that! > >> > > >> > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller > >> > wrote: > >> >> > >> >> When not using OOP, you should not use something like 'CATEGORY/' as > >> >> the > >> >> end key. > >> >> Use the empty string as the end key and limit the number of returned > >> >> keys, > >> >> as you did with > >> >> the 'max' value. > >> >> > >> >> If I understand correctly, the end key is used to generate an end > token > >> >> by > >> >> hashing it, and > >> >> there is not the same correspondence between 'CATEGORY' and > 'CATEGORY/' > >> >> as > >> >> for > >> >> hash('CATEGORY') and hash('CATEGORY/'). > >> >> > >> >> At least, this was the explanation I gave myself when I had the same > >> >> problem. > >> >> > >> >> The solution is to iterate through the keys by always using the last > >> >> key > >> >> returned as the > >> >> start key for the next call to get_range_slices, and the to drop the > >> >> first > >> >> element from > >> >> the result. > >> >> > >> >> HTH, > >> >> Martin > >> >> > >> >> > >> >> From: David Boxenhorn [mailto:da...@lookin2.com] > >> >> Sent: Wednesday, June 02, 2010 2:01 PM > >> >> To: user@cassandra.apache.org > >> >> Subject: Re: Range search on keys not working? > >> >> > >> >> The previous thread where we discussed this is called, "key is > sorted?" > >> >> > >> >> > >> >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn > >> >> wrote: > >> >>> > >> >>> I'm not using OPP. But I was assured on earlier threads (I asked > >> >>> several > >> >>> times to be sure) that it would work as stated below: the results > >> >>> would not > >> >>> be ordered, but they would be correct. > >> >>> > >> >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt > >> >>> wrote: > >> > >> Sounds like you are not using an order preserving partitioner? > >> > >> On Wed, Jun 2, 2010 at 13:48, David Boxenhorn > >> wrote: > >> > Range search on keys is not working for me. I was assured in > >> > earlier > >> > threads > >> > that range search would work, but the results would not be > ordered. > >> > > >> > I'm trying to get all the rows that start with "CATEGORY." > >> > > >> > I'm doing: > >> > > >> > String start = "CATEGORY."; > >> > . > >> > . > >> > . > >> > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, > >> > "CATEGORY/", max) > >> > . > >> > . > >> > . > >> > > >> > in a loop, setting start to the last key each time - but I'm > >> > getting > >> > rows > >> > that don't start with "CATEGORY."!! > >> > > >> > How do I get all rows that start with "CATEGORY."? > >> >>> > >> >> > >> > > >> > > > > > >
Re: Range search on keys not working?
> So why do the "start" and "finish" range parameters exist? Because especially if you want to iterate over all your key (which as stated by Ben above is the only meaningful way to use get_range_slices() with the random partitionner), you'll want to paginate that. And that's where the 'start' and 'finish' are useful (to be fair, the 'finish' part is not so useful in practice with the random partitioner). -- Sylvain > > On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning wrote: >> >> Martin, >> >> On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller >> wrote: >> > I think you can specify an end key, but it should be a key which does >> > exist >> > in your column family. >> >> >> Logically, it doesn't make sense to ever specify an end key with >> random partitioner. If you specified a start key of "aaa" and and end >> key of "aac" you might get back as results "aaa", "zfc", "hik", etc. >> And, even if you have a key of "aab" it might not show up. Key ranges >> only make sense with order-preserving partitioner. The only time to >> ever use a key range with random partitioner is when you want to >> iterate over all keys in the CF. >> >> Ben >> >> >> > But maybe I'm off the track here and someone else here knows more about >> > this >> > key range stuff. >> > >> > Martin >> > >> > >> > From: David Boxenhorn [mailto:da...@lookin2.com] >> > Sent: Wednesday, June 02, 2010 2:30 PM >> > To: user@cassandra.apache.org >> > Subject: Re: Range search on keys not working? >> > >> > In other words, I should check the values as I iterate, and stop >> > iterating >> > when I get out of range? >> > >> > I'll try that! >> > >> > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller >> > wrote: >> >> >> >> When not using OOP, you should not use something like 'CATEGORY/' as >> >> the >> >> end key. >> >> Use the empty string as the end key and limit the number of returned >> >> keys, >> >> as you did with >> >> the 'max' value. >> >> >> >> If I understand correctly, the end key is used to generate an end token >> >> by >> >> hashing it, and >> >> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/' >> >> as >> >> for >> >> hash('CATEGORY') and hash('CATEGORY/'). >> >> >> >> At least, this was the explanation I gave myself when I had the same >> >> problem. >> >> >> >> The solution is to iterate through the keys by always using the last >> >> key >> >> returned as the >> >> start key for the next call to get_range_slices, and the to drop the >> >> first >> >> element from >> >> the result. >> >> >> >> HTH, >> >> Martin >> >> >> >> >> >> From: David Boxenhorn [mailto:da...@lookin2.com] >> >> Sent: Wednesday, June 02, 2010 2:01 PM >> >> To: user@cassandra.apache.org >> >> Subject: Re: Range search on keys not working? >> >> >> >> The previous thread where we discussed this is called, "key is sorted?" >> >> >> >> >> >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn >> >> wrote: >> >>> >> >>> I'm not using OPP. But I was assured on earlier threads (I asked >> >>> several >> >>> times to be sure) that it would work as stated below: the results >> >>> would not >> >>> be ordered, but they would be correct. >> >>> >> >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt >> >>> wrote: >> >> Sounds like you are not using an order preserving partitioner? >> >> On Wed, Jun 2, 2010 at 13:48, David Boxenhorn >> wrote: >> > Range search on keys is not working for me. I was assured in >> > earlier >> > threads >> > that range search would work, but the results would not be ordered. >> > >> > I'm trying to get all the rows that start with "CATEGORY." >> > >> > I'm doing: >> > >> > String start = "CATEGORY."; >> > . >> > . >> > . >> > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, >> > "CATEGORY/", max) >> > . >> > . >> > . >> > >> > in a loop, setting start to the last key each time - but I'm >> > getting >> > rows >> > that don't start with "CATEGORY."!! >> > >> > How do I get all rows that start with "CATEGORY."? >> >>> >> >> >> > >> > > >
Re: Range search on keys not working?
They exist because when using OPP they are useful and make sense. On Wed, Jun 2, 2010 at 8:59 AM, David Boxenhorn wrote: > So why do the "start" and "finish" range parameters exist? > > On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning wrote: >> >> Martin, >> >> On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller >> wrote: >> > I think you can specify an end key, but it should be a key which does >> > exist >> > in your column family. >> >> >> Logically, it doesn't make sense to ever specify an end key with >> random partitioner. If you specified a start key of "aaa" and and end >> key of "aac" you might get back as results "aaa", "zfc", "hik", etc. >> And, even if you have a key of "aab" it might not show up. Key ranges >> only make sense with order-preserving partitioner. The only time to >> ever use a key range with random partitioner is when you want to >> iterate over all keys in the CF. >> >> Ben >> >> >> > But maybe I'm off the track here and someone else here knows more about >> > this >> > key range stuff. >> > >> > Martin >> > >> > >> > From: David Boxenhorn [mailto:da...@lookin2.com] >> > Sent: Wednesday, June 02, 2010 2:30 PM >> > To: user@cassandra.apache.org >> > Subject: Re: Range search on keys not working? >> > >> > In other words, I should check the values as I iterate, and stop >> > iterating >> > when I get out of range? >> > >> > I'll try that! >> > >> > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller >> > wrote: >> >> >> >> When not using OOP, you should not use something like 'CATEGORY/' as >> >> the >> >> end key. >> >> Use the empty string as the end key and limit the number of returned >> >> keys, >> >> as you did with >> >> the 'max' value. >> >> >> >> If I understand correctly, the end key is used to generate an end token >> >> by >> >> hashing it, and >> >> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/' >> >> as >> >> for >> >> hash('CATEGORY') and hash('CATEGORY/'). >> >> >> >> At least, this was the explanation I gave myself when I had the same >> >> problem. >> >> >> >> The solution is to iterate through the keys by always using the last >> >> key >> >> returned as the >> >> start key for the next call to get_range_slices, and the to drop the >> >> first >> >> element from >> >> the result. >> >> >> >> HTH, >> >> Martin >> >> >> >> >> >> From: David Boxenhorn [mailto:da...@lookin2.com] >> >> Sent: Wednesday, June 02, 2010 2:01 PM >> >> To: user@cassandra.apache.org >> >> Subject: Re: Range search on keys not working? >> >> >> >> The previous thread where we discussed this is called, "key is sorted?" >> >> >> >> >> >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn >> >> wrote: >> >>> >> >>> I'm not using OPP. But I was assured on earlier threads (I asked >> >>> several >> >>> times to be sure) that it would work as stated below: the results >> >>> would not >> >>> be ordered, but they would be correct. >> >>> >> >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt >> >>> wrote: >> >> Sounds like you are not using an order preserving partitioner? >> >> On Wed, Jun 2, 2010 at 13:48, David Boxenhorn >> wrote: >> > Range search on keys is not working for me. I was assured in >> > earlier >> > threads >> > that range search would work, but the results would not be ordered. >> > >> > I'm trying to get all the rows that start with "CATEGORY." >> > >> > I'm doing: >> > >> > String start = "CATEGORY."; >> > . >> > . >> > . >> > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, >> > "CATEGORY/", max) >> > . >> > . >> > . >> > >> > in a loop, setting start to the last key each time - but I'm >> > getting >> > rows >> > that don't start with "CATEGORY."!! >> > >> > How do I get all rows that start with "CATEGORY."? >> >>> >> >> >> > >> > > >
Re: Range search on keys not working?
So why do the "start" and "finish" range parameters exist? On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning wrote: > Martin, > > On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller > wrote: > > I think you can specify an end key, but it should be a key which does > exist > > in your column family. > > > Logically, it doesn't make sense to ever specify an end key with > random partitioner. If you specified a start key of "aaa" and and end > key of "aac" you might get back as results "aaa", "zfc", "hik", etc. > And, even if you have a key of "aab" it might not show up. Key ranges > only make sense with order-preserving partitioner. The only time to > ever use a key range with random partitioner is when you want to > iterate over all keys in the CF. > > Ben > > > > But maybe I'm off the track here and someone else here knows more about > this > > key range stuff. > > > > Martin > > > > > > From: David Boxenhorn [mailto:da...@lookin2.com] > > Sent: Wednesday, June 02, 2010 2:30 PM > > To: user@cassandra.apache.org > > Subject: Re: Range search on keys not working? > > > > In other words, I should check the values as I iterate, and stop > iterating > > when I get out of range? > > > > I'll try that! > > > > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller > > wrote: > >> > >> When not using OOP, you should not use something like 'CATEGORY/' as the > >> end key. > >> Use the empty string as the end key and limit the number of returned > keys, > >> as you did with > >> the 'max' value. > >> > >> If I understand correctly, the end key is used to generate an end token > by > >> hashing it, and > >> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/' > as > >> for > >> hash('CATEGORY') and hash('CATEGORY/'). > >> > >> At least, this was the explanation I gave myself when I had the same > >> problem. > >> > >> The solution is to iterate through the keys by always using the last key > >> returned as the > >> start key for the next call to get_range_slices, and the to drop the > first > >> element from > >> the result. > >> > >> HTH, > >> Martin > >> > >> > >> From: David Boxenhorn [mailto:da...@lookin2.com] > >> Sent: Wednesday, June 02, 2010 2:01 PM > >> To: user@cassandra.apache.org > >> Subject: Re: Range search on keys not working? > >> > >> The previous thread where we discussed this is called, "key is sorted?" > >> > >> > >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn > wrote: > >>> > >>> I'm not using OPP. But I was assured on earlier threads (I asked > several > >>> times to be sure) that it would work as stated below: the results would > not > >>> be ordered, but they would be correct. > >>> > >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt > wrote: > > Sounds like you are not using an order preserving partitioner? > > On Wed, Jun 2, 2010 at 13:48, David Boxenhorn > wrote: > > Range search on keys is not working for me. I was assured in earlier > > threads > > that range search would work, but the results would not be ordered. > > > > I'm trying to get all the rows that start with "CATEGORY." > > > > I'm doing: > > > > String start = "CATEGORY."; > > . > > . > > . > > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, > > "CATEGORY/", max) > > . > > . > > . > > > > in a loop, setting start to the last key each time - but I'm getting > > rows > > that don't start with "CATEGORY."!! > > > > How do I get all rows that start with "CATEGORY."? > >>> > >> > > > > >
Re: Range search on keys not working?
Martin, On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller wrote: > I think you can specify an end key, but it should be a key which does exist > in your column family. Logically, it doesn't make sense to ever specify an end key with random partitioner. If you specified a start key of "aaa" and and end key of "aac" you might get back as results "aaa", "zfc", "hik", etc. And, even if you have a key of "aab" it might not show up. Key ranges only make sense with order-preserving partitioner. The only time to ever use a key range with random partitioner is when you want to iterate over all keys in the CF. Ben > But maybe I'm off the track here and someone else here knows more about this > key range stuff. > > Martin > > > From: David Boxenhorn [mailto:da...@lookin2.com] > Sent: Wednesday, June 02, 2010 2:30 PM > To: user@cassandra.apache.org > Subject: Re: Range search on keys not working? > > In other words, I should check the values as I iterate, and stop iterating > when I get out of range? > > I'll try that! > > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller > wrote: >> >> When not using OOP, you should not use something like 'CATEGORY/' as the >> end key. >> Use the empty string as the end key and limit the number of returned keys, >> as you did with >> the 'max' value. >> >> If I understand correctly, the end key is used to generate an end token by >> hashing it, and >> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/' as >> for >> hash('CATEGORY') and hash('CATEGORY/'). >> >> At least, this was the explanation I gave myself when I had the same >> problem. >> >> The solution is to iterate through the keys by always using the last key >> returned as the >> start key for the next call to get_range_slices, and the to drop the first >> element from >> the result. >> >> HTH, >> Martin >> >> >> From: David Boxenhorn [mailto:da...@lookin2.com] >> Sent: Wednesday, June 02, 2010 2:01 PM >> To: user@cassandra.apache.org >> Subject: Re: Range search on keys not working? >> >> The previous thread where we discussed this is called, "key is sorted?" >> >> >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn wrote: >>> >>> I'm not using OPP. But I was assured on earlier threads (I asked several >>> times to be sure) that it would work as stated below: the results would not >>> be ordered, but they would be correct. >>> >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt wrote: Sounds like you are not using an order preserving partitioner? On Wed, Jun 2, 2010 at 13:48, David Boxenhorn wrote: > Range search on keys is not working for me. I was assured in earlier > threads > that range search would work, but the results would not be ordered. > > I'm trying to get all the rows that start with "CATEGORY." > > I'm doing: > > String start = "CATEGORY."; > . > . > . > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, > "CATEGORY/", max) > . > . > . > > in a loop, setting start to the last key each time - but I'm getting > rows > that don't start with "CATEGORY."!! > > How do I get all rows that start with "CATEGORY."? >>> >> > >
Re: Range search on keys not working?
Here is the relevant part of the previous thread: Thank you. That is very good news. I can sort the results myself - what is important is that I get them! On Thu, May 13, 2010 at 2:42 AM, Vijay wrote: If you use Random partitioner, You will *NOT* get RowKey's sorted. (Columns are sorted always). Answer: If used Random partitioner True True Regards, On Wed, May 12, 2010 at 1:25 AM, David Boxenhorn wrote: You do any kind of range slice, e.g. keys beginning with "abc"? But the results will not be ordered? Please answer one of the following: True True True False False False Explain? Thanks! On Sun, May 9, 2010 at 8:27 PM, Vijay wrote: True, The Range slice support was enabled in Random Partitioner for the hadoop support. Random partitioner actually hash the Key and those keys are sorted so we cannot have the actual key in order (Hope this doesnt confuse you)... Regards, On Wed, Jun 2, 2010 at 3:40 PM, Ben Browning wrote: > The keys will not be in any specific order when not using OPP, so, you > will never "get out of range" - you have to iterate over every single > key to find all keys that start with "CATEGORY". If you don't iterate > over every single key you run a chance of missing some. Obviously, > this kind of key range scan is nothing something that will scale well > as the number of keys go up. If your app needs this kind of behavior > you'd be much better off with OPP. > > Ben > > On Wed, Jun 2, 2010 at 8:29 AM, David Boxenhorn wrote: > > In other words, I should check the values as I iterate, and stop > iterating > > when I get out of range? > > > > I'll try that! > > > > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller > > wrote: > >> > >> When not using OOP, you should not use something like 'CATEGORY/' as the > >> end key. > >> Use the empty string as the end key and limit the number of returned > keys, > >> as you did with > >> the 'max' value. > >> > >> If I understand correctly, the end key is used to generate an end token > by > >> hashing it, and > >> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/' > as > >> for > >> hash('CATEGORY') and hash('CATEGORY/'). > >> > >> At least, this was the explanation I gave myself when I had the same > >> problem. > >> > >> The solution is to iterate through the keys by always using the last key > >> returned as the > >> start key for the next call to get_range_slices, and the to drop the > first > >> element from > >> the result. > >> > >> HTH, > >> Martin > >> > >> > >> From: David Boxenhorn [mailto:da...@lookin2.com] > >> Sent: Wednesday, June 02, 2010 2:01 PM > >> To: user@cassandra.apache.org > >> Subject: Re: Range search on keys not working? > >> > >> The previous thread where we discussed this is called, "key is sorted?" > >> > >> > >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn > wrote: > >>> > >>> I'm not using OPP. But I was assured on earlier threads (I asked > several > >>> times to be sure) that it would work as stated below: the results would > not > >>> be ordered, but they would be correct. > >>> > >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt > wrote: > > Sounds like you are not using an order preserving partitioner? > > On Wed, Jun 2, 2010 at 13:48, David Boxenhorn > wrote: > > Range search on keys is not working for me. I was assured in earlier > > threads > > that range search would work, but the results would not be ordered. > > > > I'm trying to get all the rows that start with "CATEGORY." > > > > I'm doing: > > > > String start = "CATEGORY."; > > . > > . > > . > > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, > > "CATEGORY/", max) > > . > > . > > . > > > > in a loop, setting start to the last key each time - but I'm getting > > rows > > that don't start with "CATEGORY."!! > > > > How do I get all rows that start with "CATEGORY."? > >>> > >> > > > > >
Re: Range search on keys not working?
That's crazy! I could artificially insert a key with just the prefix, as a placeholder, but why can't Cassandra do that virtually? On Wed, Jun 2, 2010 at 3:34 PM, Dr. Martin Grabmüller < martin.grabmuel...@eleven.de> wrote: > I think you can specify an end key, but it should be a key which does > exist in your column family. > > But maybe I'm off the track here and someone else here knows more about > this key range stuff. > > Martin > > -- > *From:* David Boxenhorn [mailto:da...@lookin2.com] > *Sent:* Wednesday, June 02, 2010 2:30 PM > > *To:* user@cassandra.apache.org > *Subject:* Re: Range search on keys not working? > > In other words, I should check the values as I iterate, and stop > iterating when I get out of range? > > I'll try that! > > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller < > martin.grabmuel...@eleven.de> wrote: > >> When not using OOP, you should not use something like 'CATEGORY/' as the >> end key. >> Use the empty string as the end key and limit the number of returned keys, >> as you did with >> the 'max' value. >> >> If I understand correctly, the end key is used to generate an end token by >> hashing it, and >> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/' as >> for >> hash('CATEGORY') and hash('CATEGORY/'). >> >> At least, this was the explanation I gave myself when I had the same >> problem. >> >> The solution is to iterate through the keys by always using the last key >> returned as the >> start key for the next call to get_range_slices, and the to drop the first >> element from >> the result. >> >> HTH, >> Martin >> >> -- >> *From:* David Boxenhorn [mailto:da...@lookin2.com] >> *Sent:* Wednesday, June 02, 2010 2:01 PM >> *To:* user@cassandra.apache.org >> *Subject:* Re: Range search on keys not working? >> >> The previous thread where we discussed this is called, "key is sorted?" >> >> >> >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn wrote: >> >>> I'm not using OPP. But I was assured on earlier threads (I asked several >>> times to be sure) that it would work as stated below: the results would not >>> be ordered, but they would be correct. >>> >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt wrote: >>> Sounds like you are not using an order preserving partitioner? On Wed, Jun 2, 2010 at 13:48, David Boxenhorn wrote: > Range search on keys is not working for me. I was assured in earlier threads > that range search would work, but the results would not be ordered. > > I'm trying to get all the rows that start with "CATEGORY." > > I'm doing: > > String start = "CATEGORY."; > . > . > . > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, > "CATEGORY/", max) > . > . > . > > in a loop, setting start to the last key each time - but I'm getting rows > that don't start with "CATEGORY."!! > > How do I get all rows that start with "CATEGORY."? >>> >>> >> >
RE: Range search on keys not working?
I think you can specify an end key, but it should be a key which does exist in your column family. But maybe I'm off the track here and someone else here knows more about this key range stuff. Martin From: David Boxenhorn [mailto:da...@lookin2.com] Sent: Wednesday, June 02, 2010 2:30 PM To: user@cassandra.apache.org Subject: Re: Range search on keys not working? In other words, I should check the values as I iterate, and stop iterating when I get out of range? I'll try that! On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller wrote: When not using OOP, you should not use something like 'CATEGORY/' as the end key. Use the empty string as the end key and limit the number of returned keys, as you did with the 'max' value. If I understand correctly, the end key is used to generate an end token by hashing it, and there is not the same correspondence between 'CATEGORY' and 'CATEGORY/' as for hash('CATEGORY') and hash('CATEGORY/'). At least, this was the explanation I gave myself when I had the same problem. The solution is to iterate through the keys by always using the last key returned as the start key for the next call to get_range_slices, and the to drop the first element from the result. HTH, Martin From: David Boxenhorn [mailto:da...@lookin2.com] Sent: Wednesday, June 02, 2010 2:01 PM To: user@cassandra.apache.org Subject: Re: Range search on keys not working? The previous thread where we discussed this is called, "key is sorted?" On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn wrote: I'm not using OPP. But I was assured on earlier threads (I asked several times to be sure) that it would work as stated below: the results would not be ordered, but they would be correct. On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt wrote: Sounds like you are not using an order preserving partitioner? On Wed, Jun 2, 2010 at 13:48, David Boxenhorn wrote: > Range search on keys is not working for me. I was assured in earlier threads > that range search would work, but the results would not be ordered. > > I'm trying to get all the rows that start with "CATEGORY." > > I'm doing: > > String start = "CATEGORY."; > . > . > . > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, > "CATEGORY/", max) > . > . > . > > in a loop, setting start to the last key each time - but I'm getting rows > that don't start with "CATEGORY."!! > > How do I get all rows that start with "CATEGORY."?
Re: Range search on keys not working?
The keys will not be in any specific order when not using OPP, so, you will never "get out of range" - you have to iterate over every single key to find all keys that start with "CATEGORY". If you don't iterate over every single key you run a chance of missing some. Obviously, this kind of key range scan is nothing something that will scale well as the number of keys go up. If your app needs this kind of behavior you'd be much better off with OPP. Ben On Wed, Jun 2, 2010 at 8:29 AM, David Boxenhorn wrote: > In other words, I should check the values as I iterate, and stop iterating > when I get out of range? > > I'll try that! > > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller > wrote: >> >> When not using OOP, you should not use something like 'CATEGORY/' as the >> end key. >> Use the empty string as the end key and limit the number of returned keys, >> as you did with >> the 'max' value. >> >> If I understand correctly, the end key is used to generate an end token by >> hashing it, and >> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/' as >> for >> hash('CATEGORY') and hash('CATEGORY/'). >> >> At least, this was the explanation I gave myself when I had the same >> problem. >> >> The solution is to iterate through the keys by always using the last key >> returned as the >> start key for the next call to get_range_slices, and the to drop the first >> element from >> the result. >> >> HTH, >> Martin >> >> >> From: David Boxenhorn [mailto:da...@lookin2.com] >> Sent: Wednesday, June 02, 2010 2:01 PM >> To: user@cassandra.apache.org >> Subject: Re: Range search on keys not working? >> >> The previous thread where we discussed this is called, "key is sorted?" >> >> >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn wrote: >>> >>> I'm not using OPP. But I was assured on earlier threads (I asked several >>> times to be sure) that it would work as stated below: the results would not >>> be ordered, but they would be correct. >>> >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt wrote: Sounds like you are not using an order preserving partitioner? On Wed, Jun 2, 2010 at 13:48, David Boxenhorn wrote: > Range search on keys is not working for me. I was assured in earlier > threads > that range search would work, but the results would not be ordered. > > I'm trying to get all the rows that start with "CATEGORY." > > I'm doing: > > String start = "CATEGORY."; > . > . > . > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, > "CATEGORY/", max) > . > . > . > > in a loop, setting start to the last key each time - but I'm getting > rows > that don't start with "CATEGORY."!! > > How do I get all rows that start with "CATEGORY."? >>> >> > >
Re: Range search on keys not working?
In other words, I should check the values as I iterate, and stop iterating when I get out of range? I'll try that! On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller < martin.grabmuel...@eleven.de> wrote: > When not using OOP, you should not use something like 'CATEGORY/' as the > end key. > Use the empty string as the end key and limit the number of returned keys, > as you did with > the 'max' value. > > If I understand correctly, the end key is used to generate an end token by > hashing it, and > there is not the same correspondence between 'CATEGORY' and 'CATEGORY/' as > for > hash('CATEGORY') and hash('CATEGORY/'). > > At least, this was the explanation I gave myself when I had the same > problem. > > The solution is to iterate through the keys by always using the last key > returned as the > start key for the next call to get_range_slices, and the to drop the first > element from > the result. > > HTH, > Martin > > -- > *From:* David Boxenhorn [mailto:da...@lookin2.com] > *Sent:* Wednesday, June 02, 2010 2:01 PM > *To:* user@cassandra.apache.org > *Subject:* Re: Range search on keys not working? > > The previous thread where we discussed this is called, "key is sorted?" > > > On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn wrote: > >> I'm not using OPP. But I was assured on earlier threads (I asked several >> times to be sure) that it would work as stated below: the results would not >> be ordered, but they would be correct. >> >> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt wrote: >> >>> Sounds like you are not using an order preserving partitioner? >>> >>> On Wed, Jun 2, 2010 at 13:48, David Boxenhorn wrote: >>> > Range search on keys is not working for me. I was assured in earlier >>> threads >>> > that range search would work, but the results would not be ordered. >>> > >>> > I'm trying to get all the rows that start with "CATEGORY." >>> > >>> > I'm doing: >>> > >>> > String start = "CATEGORY."; >>> > . >>> > . >>> > . >>> > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, >>> > "CATEGORY/", max) >>> > . >>> > . >>> > . >>> > >>> > in a loop, setting start to the last key each time - but I'm getting >>> rows >>> > that don't start with "CATEGORY."!! >>> > >>> > How do I get all rows that start with "CATEGORY."? >>> >> >> >
RE: Range search on keys not working?
When not using OOP, you should not use something like 'CATEGORY/' as the end key. Use the empty string as the end key and limit the number of returned keys, as you did with the 'max' value. If I understand correctly, the end key is used to generate an end token by hashing it, and there is not the same correspondence between 'CATEGORY' and 'CATEGORY/' as for hash('CATEGORY') and hash('CATEGORY/'). At least, this was the explanation I gave myself when I had the same problem. The solution is to iterate through the keys by always using the last key returned as the start key for the next call to get_range_slices, and the to drop the first element from the result. HTH, Martin From: David Boxenhorn [mailto:da...@lookin2.com] Sent: Wednesday, June 02, 2010 2:01 PM To: user@cassandra.apache.org Subject: Re: Range search on keys not working? The previous thread where we discussed this is called, "key is sorted?" On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn wrote: I'm not using OPP. But I was assured on earlier threads (I asked several times to be sure) that it would work as stated below: the results would not be ordered, but they would be correct. On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt wrote: Sounds like you are not using an order preserving partitioner? On Wed, Jun 2, 2010 at 13:48, David Boxenhorn wrote: > Range search on keys is not working for me. I was assured in earlier threads > that range search would work, but the results would not be ordered. > > I'm trying to get all the rows that start with "CATEGORY." > > I'm doing: > > String start = "CATEGORY."; > . > . > . > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, > "CATEGORY/", max) > . > . > . > > in a loop, setting start to the last key each time - but I'm getting rows > that don't start with "CATEGORY."!! > > How do I get all rows that start with "CATEGORY."?
Re: Range search on keys not working?
The previous thread where we discussed this is called, "key is sorted?" On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn wrote: > I'm not using OPP. But I was assured on earlier threads (I asked several > times to be sure) that it would work as stated below: the results would not > be ordered, but they would be correct. > > On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt wrote: > >> Sounds like you are not using an order preserving partitioner? >> >> On Wed, Jun 2, 2010 at 13:48, David Boxenhorn wrote: >> > Range search on keys is not working for me. I was assured in earlier >> threads >> > that range search would work, but the results would not be ordered. >> > >> > I'm trying to get all the rows that start with "CATEGORY." >> > >> > I'm doing: >> > >> > String start = "CATEGORY."; >> > . >> > . >> > . >> > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, >> > "CATEGORY/", max) >> > . >> > . >> > . >> > >> > in a loop, setting start to the last key each time - but I'm getting >> rows >> > that don't start with "CATEGORY."!! >> > >> > How do I get all rows that start with "CATEGORY."? >> > >
Re: Range search on keys not working?
I'm not using OPP. But I was assured on earlier threads (I asked several times to be sure) that it would work as stated below: the results would not be ordered, but they would be correct. On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt wrote: > Sounds like you are not using an order preserving partitioner? > > On Wed, Jun 2, 2010 at 13:48, David Boxenhorn wrote: > > Range search on keys is not working for me. I was assured in earlier > threads > > that range search would work, but the results would not be ordered. > > > > I'm trying to get all the rows that start with "CATEGORY." > > > > I'm doing: > > > > String start = "CATEGORY."; > > . > > . > > . > > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, > > "CATEGORY/", max) > > . > > . > > . > > > > in a loop, setting start to the last key each time - but I'm getting rows > > that don't start with "CATEGORY."!! > > > > How do I get all rows that start with "CATEGORY."? >
Re: Range search on keys not working?
Sounds like you are not using an order preserving partitioner? On Wed, Jun 2, 2010 at 13:48, David Boxenhorn wrote: > Range search on keys is not working for me. I was assured in earlier threads > that range search would work, but the results would not be ordered. > > I'm trying to get all the rows that start with "CATEGORY." > > I'm doing: > > String start = "CATEGORY."; > . > . > . > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, > "CATEGORY/", max) > . > . > . > > in a loop, setting start to the last key each time - but I'm getting rows > that don't start with "CATEGORY."!! > > How do I get all rows that start with "CATEGORY."?
Range search on keys not working?
Range search on keys is not working for me. I was assured in earlier threads that range search would work, but the results would not be ordered. I'm trying to get all the rows that start with "CATEGORY." I'm doing: String start = "CATEGORY."; . . . keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, "CATEGORY/", max) . . . in a loop, setting start to the last key each time - but I'm getting rows that don't start with "CATEGORY."!! How do I get all rows that start with "CATEGORY."?
Heterogeneous Cassandra Cluster
Is it possible to make a heterogeneous Cassandra cluster, with both Linux and Windows nodes? I tried doing it and got Error in ThreadPoolExecutor java.lang.NullPointerException Not sure if this is due to the Linux/Windows mix or something else. Details below: [r...@iqdev01 cassandra]# bin/cassandra -f INFO 20:32:26,431 Auto DiskAccessMode determined to be mmap INFO 20:32:27,085 Sampling index for /var/lib/cassandra/data/system/LocationInfo-1-Data.db INFO 20:32:27,095 Sampling index for /var/lib/cassandra/data/system/LocationInfo-2-Data.db INFO 20:32:27,104 Replaying /var/lib/cassandra/commitlog/CommitLog-1275412410865.log INFO 20:32:27,129 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1275413547129.log INFO 20:32:27,138 LocationInfo has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1275413547129.log', position=173) INFO 20:32:27,138 Enqueuing flush of Memtable(LocationInfo)@1491010616 INFO 20:32:27,139 Writing Memtable(LocationInfo)@1491010616 INFO 20:32:27,187 Completed flushing /var/lib/cassandra/data/system/LocationInfo-3-Data.db INFO 20:32:27,207 Log replay complete INFO 20:32:27,239 Saved Token found: 25870423804996813139937576731363583348 INFO 20:32:27,239 Saved ClusterName found: Lookin2 INFO 20:32:27,247 Starting up server gossip INFO 20:32:27,266 Joining: getting load information INFO 20:32:27,267 Sleeping 9 ms to wait for load information... INFO 20:32:27,327 Node /192.168.80.12 is now part of the cluster INFO 20:32:27,332 Node /192.168.80.234 is now part of the cluster INFO 20:32:27,864 InetAddress /192.168.80.12 is now UP INFO 20:32:27,872 InetAddress /192.168.80.234 is now UP INFO 20:33:57,269 Joining: getting bootstrap token INFO 20:33:57,278 New token will be 25870423804996813139937576731363583348 to assume load from /192.168.80.12 INFO 20:33:57,279 Joining: sleeping 3 for pending range setup INFO 20:34:27,280 Bootstrapping INFO 21:32:27,867 Compacting [] INFO 21:38:27,118 LocationInfo has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1275413547129.log', position=824) INFO 21:38:27,118 Enqueuing flush of Memtable(LocationInfo)@993374707 INFO 21:38:27,118 Writing Memtable(LocationInfo)@993374707 INFO 21:38:27,158 Completed flushing /var/lib/cassandra/data/system/LocationInfo-4-Data.db INFO 21:38:27,160 Compacting [org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-1-Data.db'),org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-2-Data.db'),org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-3-Data.db'),org.apache.cassandra.io.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-4-Data.db')] INFO 21:38:27,217 Compacted to /var/lib/cassandra/data/system/LocationInfo-5-Data.db. 1294/358 bytes for 1 keys. Time: 56ms. [r...@iqdev01 cassandra]# bin/cassandra -f INFO 21:40:07,519 Auto DiskAccessMode determined to be mmap INFO 21:40:07,972 Deleted /var/lib/cassandra/data/system/LocationInfo-1-Data.db INFO 21:40:07,973 Deleted /var/lib/cassandra/data/system/LocationInfo-2-Data.db INFO 21:40:07,974 Deleted /var/lib/cassandra/data/system/LocationInfo-3-Data.db INFO 21:40:07,982 Sampling index for /var/lib/cassandra/data/system/LocationInfo-5-Data.db INFO 21:40:07,991 Deleted /var/lib/cassandra/data/system/LocationInfo-4-Data.db INFO 21:40:08,000 Replaying /var/lib/cassandra/commitlog/CommitLog-1275413547129.log INFO 21:40:08,001 Log replay complete INFO 21:40:08,038 Saved Token found: 25870423804996813139937576731363583348 INFO 21:40:08,040 Saved ClusterName found: Lookin2 INFO 21:40:08,042 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1275417608042.log INFO 21:40:08,059 Starting up server gossip INFO 21:40:08,071 Joining: getting load information INFO 21:40:08,071 Sleeping 9 ms to wait for load information... INFO 21:40:10,372 Node /192.168.80.12 is now part of the cluster INFO 21:40:10,374 Node /192.168.80.234 is now part of the cluster INFO 21:40:11,091 InetAddress /192.168.80.234 is now UP INFO 21:40:12,078 InetAddress /192.168.80.12 is now UP INFO 21:41:38,072 Joining: getting bootstrap token INFO 21:41:38,088 New token will be 25870423804996813139937576731363583348 to assume load from /192.168.80.12 INFO 21:41:38,089 Joining: sleeping 3 for pending range setup INFO 21:42:08,091 Bootstrapping ERROR 21:49:03,526 Error in ThreadPoolExecutor java.lang.NullPointerException at org.apache.cassandra.streaming.StreamInitiateVerbHandler.getNewNames(StreamInitiateVerbHandler.java:154) at org.apache.cassandra.streaming.StreamInitiateVerbHandler.doVerb(StreamInitiateVerbHandler.java:76) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDel
Re: [***SPAM*** ] Re: writing speed test
Thanks Peter! In my test application, for each record, rowkey -> rand() * 4, about 64B column * 20 -> rand() * 20, about 320B I use batch_insert(rowkey, col*20) in thrift. Kevin Yuan ??: Peter Sch??ller ??: user@cassandra.apache.org : [***SPAM*** ] Re: writing speed test : Wed, 2 Jun 2010 10:44:52 +0200 Since this thread has now gone on for a while... As far as I can tell you never specify the characteristics of your writes. Evaluating expected write throughput in terms of "MB/s to disk" is pretty impossible if one does not know anything about the nature of the writes. If you're expecting 50 MB, is that reasonable? I don't know; if you're writing a gazillion one-byte values with shortish keys, 50 MB/seconds translates to a *huge* amounts of writes per second and you're likely to be CPU bound even in the most efficient implementation reasonably possible. If on the other hand you're writing large values (say slabs of 128k) you might more reasonably be expecting higher disk throughput. I don't have enough hands-on experience with cassandra to have a feel for the CPU vs. disk in terms of bottlenecking, and when we expect to bottleneck on what, but I can say that it's definitely going to matter quite a lot what *kind* of writes you're doing. This tends to be the case regardless of the database system.
Re: writing speed test
Since this thread has now gone on for a while... As far as I can tell you never specify the characteristics of your writes. Evaluating expected write throughput in terms of "MB/s to disk" is pretty impossible if one does not know anything about the nature of the writes. If you're expecting 50 MB, is that reasonable? I don't know; if you're writing a gazillion one-byte values with shortish keys, 50 MB/seconds translates to a *huge* amounts of writes per second and you're likely to be CPU bound even in the most efficient implementation reasonably possible. If on the other hand you're writing large values (say slabs of 128k) you might more reasonably be expecting higher disk throughput. I don't have enough hands-on experience with cassandra to have a feel for the CPU vs. disk in terms of bottlenecking, and when we expect to bottleneck on what, but I can say that it's definitely going to matter quite a lot what *kind* of writes you're doing. This tends to be the case regardless of the database system. -- / Peter Schuller aka scode
Re: [***SPAM*** ] Re: [***SPAM*** ] Re: [***SPAM*** ] Re: [***SPAM*** ] Re: writing speed test
Still seems MEM. However it's hard to convince that constantly writing(even great amount of data) needs so much MEM(16GB). The process is quite simple, input_data -> memtable -> flush to disk right? What does cassandra need so much MEM for? Thanks! ?? 2010-06-02 16:24 +0800??lwl?? > No. > But I did some capacity tests about another distributed system. > Your former test cost too much MEM, it was the bottleneck. > caches and JVM cost MEM, so I suggested to decrease them. > > > What is the bottleneck of your current test now? > > > ?? 2010??6??2?? 4:13??Shuai Yuan ?? > > Hi, > > I tried, > > 1-consistency level ZERO > > 2-JVM heap 4GB > > 3-normal Memtable cache > > and now I have about 30% improvment. > > However I want to know if you have also done w/r benchmark and > what's > the result? > > ?? 2010-06-02 11:35 +0800??lwl?? > > > and, why did you set "JVM has 8G heap"? > > 8g, seems too big. > > > > ?? 2010??6??2?? 11:20??lwl ?? > > 3.32 concurrent read & 128 write in > storage-conf.xml, other > > cache > > enlarged as well. > > > > > > > > maybe you can try to decrease the size of caches. > > > > ?? 2010??6??2?? 11:14??Shuai Yuan > > ?? > > > > > > ?? 2010-06-02 10:37 +0800??lwl?? > > > is all the 4 servers' MEM almost 100%? > > > > > > Yes > > > > > > > ?? 2010??6??2?? 10:12??Shuai Yuan > > ?? > > > > > > Thanks lwl. > > > > > > Then is there anyway of tuning > this, faster > > flush to disk or > > > else? > > > > > > Cheers, > > > > > > Kevin > > > > > > ?? 2010-06-02 09:57 +0800??lwl > ?? > > > > > > > MEM: almost 100% (16GB) > > > > - > > > > maybe this is the bottleneck. > > > > writing concerns Memtable and > SSTable in > > memory. > > > > > > > > ?? 2010??6??2?? 9:48??Shuai > Yuan > > > ?? > > > > > > > > ?? 2010-06-01 15:00 > -0500?? > > Jonathan Shook?? > > > > > Also, what are you > meaning > > specifically by 'slow'? > > > Which > > > > measurements > > > > > are you looking at. > What are > > your baseline > > > constraints for > > > > your test > > > > > system? > > > > > > > > > > > > > Actually, the problem is > the > > utilizaton of > > > resources(for a > > > > single > > > > machine): > > > > CPU: 700% / 1600% (16 > cores) > > > > MEM: almost 100% (16GB) > > > > Swap: almost 0% > > > > Disk IO(write): > 20~30MB / 200MB > > (7.2k raid5, > > > benchmarked > > > > previously) > > > > NET: up to 100Mbps / > 950Mbps > > (1Gbps, tuned and > > > benchmarked > > > > previously) > > > > > > > > So the speed of > generating load, > > about 15M/s as > >
Re: Nodes dropping out of cluster due to GC
> > Has anyone experienced this sort of problem? It would be great to hear from > anyone who has had experience with this sort of issue and/or suggestions for > how to deal with it. > > Thanks, Eric Yes, i did. Symptoms you described point to concurrent GC FAILURE. During this failure concurrent GC completely stops java program (i.e. cassandra) and does a GC cycle. Other cassandra nodes discover, that node is not responding and considering it dead. If concurrent GC is properly tuned, it should never do stop-the-world and GC ( thats why it is called concurrent ;-) ). Reasons for concurrent GC failures can be several: 1. Not enought java heap - try to raise max java heap limit 2. Improperly sized java heap regions. To help you to narrow the problem, pass -XX:+PrintGCDetails option to JVM launching cassandra node. This will log information about internal GC activities. Let it run till it will be thrown out of cluster again and search for "concurrent mode failure" or "promotion failed" strings.
Re: [***SPAM*** ] Re: [***SPAM*** ] Re: [***SPAM*** ] Re: writing speed test
Hi, I tried, 1-consistency level ZERO 2-JVM heap 4GB 3-normal Memtable cache and now I have about 30% improvment. However I want to know if you have also done w/r benchmark and what's the result? ?? 2010-06-02 11:35 +0800??lwl?? > and, why did you set "JVM has 8G heap"? > 8g, seems too big. > > ?? 2010??6??2?? 11:20??lwl ?? > 3.32 concurrent read & 128 write in storage-conf.xml, other > cache > enlarged as well. > > > > maybe you can try to decrease the size of caches. > > ?? 2010??6??2?? 11:14??Shuai Yuan > ?? > > > ?? 2010-06-02 10:37 +0800??lwl?? > > is all the 4 servers' MEM almost 100%? > > > Yes > > > > ?? 2010??6??2?? 10:12??Shuai Yuan > ?? > > > > Thanks lwl. > > > > Then is there anyway of tuning this, faster > flush to disk or > > else? > > > > Cheers, > > > > Kevin > > > > ?? 2010-06-02 09:57 +0800??lwl?? > > > > > MEM: almost 100% (16GB) > > > - > > > maybe this is the bottleneck. > > > writing concerns Memtable and SSTable in > memory. > > > > > > ?? 2010??6??2?? 9:48??Shuai Yuan > > ?? > > > > > > ?? 2010-06-01 15:00 -0500?? > Jonathan Shook?? > > > > Also, what are you meaning > specifically by 'slow'? > > Which > > > measurements > > > > are you looking at. What are > your baseline > > constraints for > > > your test > > > > system? > > > > > > > > > > Actually, the problem is the > utilizaton of > > resources(for a > > > single > > > machine): > > > CPU: 700% / 1600% (16 cores) > > > MEM: almost 100% (16GB) > > > Swap: almost 0% > > > Disk IO(write): 20~30MB / 200MB > (7.2k raid5, > > benchmarked > > > previously) > > > NET: up to 100Mbps / 950Mbps > (1Gbps, tuned and > > benchmarked > > > previously) > > > > > > So the speed of generating load, > about 15M/s as > > reported > > > before seems > > > quite slow to me. I assume the > system should get at > > least > > > about 50MB/s > > > of Disk IO speed. > > > > > > MEM? I don't think it plays a > major role in this > > writing game. > > > What's > > > the bottleneck of the system? > > > > > > P.S > > > about Consistency Level, I've > tried ONE/DCQUORUM and > > found ONE > > > is about > > > 10-15% faster. However that's > neither a promising > > result. > > > > > > Thanks! > > > > > > Kevin > > > > > > > > > > > 2010/6/1 ?? > : > > > > > Hi, It would be better if we > know which > > Consistency Level > > > did you choose, > > > > > and what is the schema of test >