Stable/unstable packages?
Are there separate repos/packages for stable/unstable releases of Cassandra? I was a bit surprised to find the official debian repo pushing out 0.8b2 as a normal update to the cassandra package. Would it not be better to have a cassandra-unstable package for bleeding edge and plain cassandra for stable? Maybe even cassandra-0.7 and cassandra-0.8 with a cassandra virtual package pointing at the current stable release? Marcus
Re: Stable/unstable packages?
On 27 May 2011, at 10:10, Marcus Bointon wrote: Are there separate repos/packages for stable/unstable releases of Cassandra? I was a bit surprised to find the official debian repo pushing out 0.8b2 as a normal update to the cassandra package. Would it not be better to have a cassandra-unstable package for bleeding edge and plain cassandra for stable? Maybe even cassandra-0.7 and cassandra-0.8 with a cassandra virtual package pointing at the current stable release? Ahem. I just found http://wiki.apache.org/cassandra/DebianPackaging so don't answer that one! Marcus
Re: Corrupted Counter Columns
On Thu, May 26, 2011 at 2:21 PM, Utku Can Topçu u...@topcu.gen.tr wrote: Hello, I'm using the the 0.8.0-rc1, with RF=2 and 4 nodes. Strangely counters are corrupted. Say, the actual value should be : 51664 and the value that cassandra sometimes outputs is: either 51664 or 18651001. What does sometimes means in that context ? Is it like some query returns the former and some other the latter ? Does it alternate in the value returned despite no write coming in or does this at least stabilize to one of those value. Could you give more details on how this manifests itself. Does it depends on which node you connect to for the request for instance, does querying at QUORUM solves it ? And I have no idea on how to diagnose the problem or reproduce it. Can you help me in fixing this issue? Regards, Utku
Fwd: Mixing different OS in a cassandra cluster
Hi, I tried to mix windows and linux in a cassandra cluster version 0.7.4 and got an exception on a linux node bootstrapping from a windows node. java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(Unknown Source) at org.apache.cassandra.io.sstable.Descriptor.fromFilename(Descriptor.java:117) at org.apache.cassandra.streaming.PendingFile$PendingFileSerializer.deserialize(PendingFile.java:126) at org.apache.cassandra.streaming.StreamHeader$StreamHeaderSerializer.deserialize(StreamHeader.java:90) at org.apache.cassandra.streaming.StreamHeader$StreamHeaderSerializer.deserialize(StreamHeader.java:72) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:90) the problem is that the file name separator differs on windows and linux. Are there any plans to fix support for clusters with mixed nodes? A fix for the file name issue would be quite simple. Thanks and regards Mikael Wikblom
Re: Fwd: Mixing different OS in a cassandra cluster
Right. This is not supported. On May 27, 2011 7:25 AM, Mikael Wikblom mikael.wikb...@sitevision.se wrote: Hi, I tried to mix windows and linux in a cassandra cluster version 0.7.4 and got an exception on a linux node bootstrapping from a windows node. java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(Unknown Source) at org.apache.cassandra.io.sstable.Descriptor.fromFilename(Descriptor.java:117) at org.apache.cassandra.streaming.PendingFile$PendingFileSerializer.deserialize(PendingFile.java:126) at org.apache.cassandra.streaming.StreamHeader$StreamHeaderSerializer.deserialize(StreamHeader.java:90) at org.apache.cassandra.streaming.StreamHeader$StreamHeaderSerializer.deserialize(StreamHeader.java:72) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:90) the problem is that the file name separator differs on windows and linux. Are there any plans to fix support for clusters with mixed nodes? A fix for the file name issue would be quite simple. Thanks and regards Mikael Wikblom
average repair/bootstrap durations
Hi - Operations like repair and bootstrap on nodes in our cluster (average load 150GB each) take a very long time. By long I mean 1-2 days. With nodetool netstats I can see the progress % very slowly progressing. I guess there are some throttling mechanisms built into cassandra. And yes there is also production load on these nodes so it is somewhat understandable. Also some of out compacted data files are as 50-60 GB each. I was just wondering if these times are similar to what other people are experiencing or if there is a serious configuration problem with our setup. So what have you guys seen with operations like loadbalance,repair, cleanup, bootstrap on nodes with large amounts of data?? I'm not seeing too many full garbage collections. Other minor GCs are well under a second. Setup info: 0.7.4 5 GB heap 8 GB ram 64 bit linux os AMD quad core HP blades CMS Garbage collector with default cassandra settings 1 TB raid 0 sata disks across 2 datacenters, but operations within the same dc take very long too. This is a netstat output of a bootstrap that has been going on for 3+ hours: Mode: Normal Streaming to: /10.47.108.103 /var/lib/cassandra/data/DFS/main-f-1541-Data.db/(0,32842490722),(32842490722,139556639427),(139556639427,161075890783) progress=94624588642/161075890783 - 58% /var/lib/cassandra/data/DFS/main-f-1455-Data.db/(0,660743002) progress=0/660743002 - 0% /var/lib/cassandra/data/DFS/main-f-1444-Data.db/(0,32816130132),(32816130132,71465138397),(71465138397,90968640033) progress=0/90968640033 - 0% /var/lib/cassandra/data/DFS/main-f-1540-Data.db/(0,931632934),(931632934,2621052149),(2621052149,3236107041) progress=0/3236107041 - 0% /var/lib/cassandra/data/DFS/main-f-1488-Data.db/(0,33428780851),(33428780851,110546591227),(110546591227,110851587206) progress=0/110851587206 - 0% /var/lib/cassandra/data/DFS/main-f-1542-Data.db/(0,24091168),(24091168,97485080),(97485080,108233211) progress=0/108233211 - 0% /var/lib/cassandra/data/DFS/main-f-1544-Data.db/(0,3646406),(3646406,18065308),(18065308,25776551) progress=0/25776551 - 0% /var/lib/cassandra/data/DFS/main-f-1452-Data.db/(0,676616940) progress=0/676616940 - 0% /var/lib/cassandra/data/DFS/main-f-1548-Data.db/(0,6957269),(6957269,48966550),(48966550,51499779) progress=0/51499779 - 0% /var/lib/cassandra/data/DFS/main-f-1552-Data.db/(0,237153399),(237153399,750466875),(750466875,898056853) progress=0/898056853 - 0% /var/lib/cassandra/data/DFS/main-f-1554-Data.db/(0,45155582),(45155582,195640768),(195640768,247592141) progress=0/247592141 - 0% /var/lib/cassandra/data/DFS/main-f-1449-Data.db/(0,2812483216) progress=0/2812483216 - 0% /var/lib/cassandra/data/DFS/main-f-1545-Data.db/(0,107648943),(107648943,434575065),(434575065,436667186) progress=0/436667186 - 0% Not receiving any streams. Pool NameActive Pending Completed Commandsn/a 0 134283 Responses n/a 0 192438
Re: python cql driver select count(*) failed
(and if it did, it would be the SQL row count, which is different than the column count from pycassa.) On Fri, May 27, 2011 at 10:13 AM, Jonathan Ellis jbel...@gmail.com wrote: CQL does not support count(). On Fri, May 27, 2011 at 4:18 AM, Donal Zang zan...@ihep.ac.cn wrote: Hi, I'm using the jar from the trunk source code . I tried the following select cql, but it get the wrong result.(I can get the right result using pycassa's get_count()) cqlsh select count(1) from t_container where KEY = '2011041210' ; (0,) cqlsh select count(*) from t_container where KEY = '2011041210' ; (0,) Any ideas? Should the KEY be converted to bytes? Thanks! Donal -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: average repair/bootstrap durations
On Fri, May 27, 2011 at 9:08 AM, Jonathan Colby jonathan.co...@gmail.comwrote: Hi - Operations like repair and bootstrap on nodes in our cluster (average load 150GB each) take a very long time. By long I mean 1-2 days. With nodetool netstats I can see the progress % very slowly progressing. I guess there are some throttling mechanisms built into cassandra. And yes there is also production load on these nodes so it is somewhat understandable. Also some of out compacted data files are as 50-60 GB each. I was just wondering if these times are similar to what other people are experiencing or if there is a serious configuration problem with our setup. So what have you guys seen with operations like loadbalance,repair, cleanup, bootstrap on nodes with large amounts of data?? I'm not seeing too many full garbage collections. Other minor GCs are well under a second. Setup info: 0.7.4 5 GB heap 8 GB ram 64 bit linux os AMD quad core HP blades CMS Garbage collector with default cassandra settings 1 TB raid 0 sata disks across 2 datacenters, but operations within the same dc take very long too. This is a netstat output of a bootstrap that has been going on for 3+ hours: Mode: Normal Streaming to: /10.47.108.103 /var/lib/cassandra/data/DFS/main-f-1541-Data.db/(0,32842490722),(32842490722,139556639427),(139556639427,161075890783) progress=94624588642/161075890783 - 58% /var/lib/cassandra/data/DFS/main-f-1455-Data.db/(0,660743002) progress=0/660743002 - 0% /var/lib/cassandra/data/DFS/main-f-1444-Data.db/(0,32816130132),(32816130132,71465138397),(71465138397,90968640033) progress=0/90968640033 - 0% /var/lib/cassandra/data/DFS/main-f-1540-Data.db/(0,931632934),(931632934,2621052149),(2621052149,3236107041) progress=0/3236107041 - 0% /var/lib/cassandra/data/DFS/main-f-1488-Data.db/(0,33428780851),(33428780851,110546591227),(110546591227,110851587206) progress=0/110851587206 - 0% /var/lib/cassandra/data/DFS/main-f-1542-Data.db/(0,24091168),(24091168,97485080),(97485080,108233211) progress=0/108233211 - 0% /var/lib/cassandra/data/DFS/main-f-1544-Data.db/(0,3646406),(3646406,18065308),(18065308,25776551) progress=0/25776551 - 0% /var/lib/cassandra/data/DFS/main-f-1452-Data.db/(0,676616940) progress=0/676616940 - 0% /var/lib/cassandra/data/DFS/main-f-1548-Data.db/(0,6957269),(6957269,48966550),(48966550,51499779) progress=0/51499779 - 0% /var/lib/cassandra/data/DFS/main-f-1552-Data.db/(0,237153399),(237153399,750466875),(750466875,898056853) progress=0/898056853 - 0% /var/lib/cassandra/data/DFS/main-f-1554-Data.db/(0,45155582),(45155582,195640768),(195640768,247592141) progress=0/247592141 - 0% /var/lib/cassandra/data/DFS/main-f-1449-Data.db/(0,2812483216) progress=0/2812483216 - 0% /var/lib/cassandra/data/DFS/main-f-1545-Data.db/(0,107648943),(107648943,434575065),(434575065,436667186) progress=0/436667186 - 0% Not receiving any streams. Pool NameActive Pending Completed Commandsn/a 0 134283 Responses n/a 0 192438 That is a little long but every case is diffent par. With low requiest load and some heavy server iron RAID,RAM you can see a compaction move really fast 300 GB in 4-6 hours. With enough load one of these operations compact,cleanup,join can get really bogged down to the point where it almost does not move. Sometimes that is just the way it is based on how fragmented your rows are and how fast your gear is. Not pushing your Cassandra caches up to your JVM limit can help. If your heap is often near full you can have jvm memory fragmentation which slows things down. 0.8 has some more tuning options for compaction, multi-threaded, knobs for effective rate. I notice you are using: 5 GB heap 8 GB ram So your RAM/DATA ratio is on the lower site. I think unless you have a good use case for row cache less XMx is more, but that is a minor tweak.
Re: python cql driver select count(*) failed
CQL does not support count(). On Fri, May 27, 2011 at 4:18 AM, Donal Zang zan...@ihep.ac.cn wrote: Hi, I'm using the jar from the trunk source code . I tried the following select cql, but it get the wrong result.(I can get the right result using pycassa's get_count()) cqlsh select count(1) from t_container where KEY = '2011041210' ; (0,) cqlsh select count(*) from t_container where KEY = '2011041210' ; (0,) Any ideas? Should the KEY be converted to bytes? Thanks! Donal -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Cluster not recovering when a single node dies
We have a 4 node cluster with a replication factor of 2. When one node dies, the other nodes throw UnavailableExceptions for quorum reads (as expected initially). They never get out of that state. Is there something we can do in nodetool to make the remaining nodes function? Thanks. -- - Paul Loy p...@keteracel.com http://uk.linkedin.com/in/paulloy
Re: Cluster not recovering when a single node dies
Quorum of 2 is 2. You need at least RF=3 for quorum to tolerate losing a node indefinitely. On Fri, May 27, 2011 at 10:37 AM, Paul Loy ketera...@gmail.com wrote: We have a 4 node cluster with a replication factor of 2. When one node dies, the other nodes throw UnavailableExceptions for quorum reads (as expected initially). They never get out of that state. Is there something we can do in nodetool to make the remaining nodes function? Thanks. -- - Paul Loy p...@keteracel.com http://uk.linkedin.com/in/paulloy -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Cluster not recovering when a single node dies
ahh, thanks. On Fri, May 27, 2011 at 4:43 PM, Jonathan Ellis jbel...@gmail.com wrote: Quorum of 2 is 2. You need at least RF=3 for quorum to tolerate losing a node indefinitely. On Fri, May 27, 2011 at 10:37 AM, Paul Loy ketera...@gmail.com wrote: We have a 4 node cluster with a replication factor of 2. When one node dies, the other nodes throw UnavailableExceptions for quorum reads (as expected initially). They never get out of that state. Is there something we can do in nodetool to make the remaining nodes function? Thanks. -- - Paul Loy p...@keteracel.com http://uk.linkedin.com/in/paulloy -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- - Paul Loy p...@keteracel.com http://uk.linkedin.com/in/paulloy
Re: Cluster not recovering when a single node dies
I guess my next question is: the data should be complete somewhere in the ring with RF = 2. Does cassandra not redistribute the replication ring without a nodetool decommission call? On Fri, May 27, 2011 at 4:45 PM, Paul Loy ketera...@gmail.com wrote: ahh, thanks. On Fri, May 27, 2011 at 4:43 PM, Jonathan Ellis jbel...@gmail.com wrote: Quorum of 2 is 2. You need at least RF=3 for quorum to tolerate losing a node indefinitely. On Fri, May 27, 2011 at 10:37 AM, Paul Loy ketera...@gmail.com wrote: We have a 4 node cluster with a replication factor of 2. When one node dies, the other nodes throw UnavailableExceptions for quorum reads (as expected initially). They never get out of that state. Is there something we can do in nodetool to make the remaining nodes function? Thanks. -- - Paul Loy p...@keteracel.com http://uk.linkedin.com/in/paulloy -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- - Paul Loy p...@keteracel.com http://uk.linkedin.com/in/paulloy -- - Paul Loy p...@keteracel.com http://uk.linkedin.com/in/paulloy
Re: average repair/bootstrap durations
Thanks Ed! I was thinking about surrendering more memory to mmap operations. I'm going to try bringing the Xmx down to 4G On Fri, May 27, 2011 at 5:19 PM, Edward Capriolo edlinuxg...@gmail.com wrote: On Fri, May 27, 2011 at 9:08 AM, Jonathan Colby jonathan.co...@gmail.com wrote: Hi - Operations like repair and bootstrap on nodes in our cluster (average load 150GB each) take a very long time. By long I mean 1-2 days. With nodetool netstats I can see the progress % very slowly progressing. I guess there are some throttling mechanisms built into cassandra. And yes there is also production load on these nodes so it is somewhat understandable. Also some of out compacted data files are as 50-60 GB each. I was just wondering if these times are similar to what other people are experiencing or if there is a serious configuration problem with our setup. So what have you guys seen with operations like loadbalance,repair, cleanup, bootstrap on nodes with large amounts of data?? I'm not seeing too many full garbage collections. Other minor GCs are well under a second. Setup info: 0.7.4 5 GB heap 8 GB ram 64 bit linux os AMD quad core HP blades CMS Garbage collector with default cassandra settings 1 TB raid 0 sata disks across 2 datacenters, but operations within the same dc take very long too. This is a netstat output of a bootstrap that has been going on for 3+ hours: Mode: Normal Streaming to: /10.47.108.103 /var/lib/cassandra/data/DFS/main-f-1541-Data.db/(0,32842490722),(32842490722,139556639427),(139556639427,161075890783) progress=94624588642/161075890783 - 58% /var/lib/cassandra/data/DFS/main-f-1455-Data.db/(0,660743002) progress=0/660743002 - 0% /var/lib/cassandra/data/DFS/main-f-1444-Data.db/(0,32816130132),(32816130132,71465138397),(71465138397,90968640033) progress=0/90968640033 - 0% /var/lib/cassandra/data/DFS/main-f-1540-Data.db/(0,931632934),(931632934,2621052149),(2621052149,3236107041) progress=0/3236107041 - 0% /var/lib/cassandra/data/DFS/main-f-1488-Data.db/(0,33428780851),(33428780851,110546591227),(110546591227,110851587206) progress=0/110851587206 - 0% /var/lib/cassandra/data/DFS/main-f-1542-Data.db/(0,24091168),(24091168,97485080),(97485080,108233211) progress=0/108233211 - 0% /var/lib/cassandra/data/DFS/main-f-1544-Data.db/(0,3646406),(3646406,18065308),(18065308,25776551) progress=0/25776551 - 0% /var/lib/cassandra/data/DFS/main-f-1452-Data.db/(0,676616940) progress=0/676616940 - 0% /var/lib/cassandra/data/DFS/main-f-1548-Data.db/(0,6957269),(6957269,48966550),(48966550,51499779) progress=0/51499779 - 0% /var/lib/cassandra/data/DFS/main-f-1552-Data.db/(0,237153399),(237153399,750466875),(750466875,898056853) progress=0/898056853 - 0% /var/lib/cassandra/data/DFS/main-f-1554-Data.db/(0,45155582),(45155582,195640768),(195640768,247592141) progress=0/247592141 - 0% /var/lib/cassandra/data/DFS/main-f-1449-Data.db/(0,2812483216) progress=0/2812483216 - 0% /var/lib/cassandra/data/DFS/main-f-1545-Data.db/(0,107648943),(107648943,434575065),(434575065,436667186) progress=0/436667186 - 0% Not receiving any streams. Pool Name Active Pending Completed Commands n/a 0 134283 Responses n/a 0 192438 That is a little long but every case is diffent par. With low requiest load and some heavy server iron RAID,RAM you can see a compaction move really fast 300 GB in 4-6 hours. With enough load one of these operations compact,cleanup,join can get really bogged down to the point where it almost does not move. Sometimes that is just the way it is based on how fragmented your rows are and how fast your gear is. Not pushing your Cassandra caches up to your JVM limit can help. If your heap is often near full you can have jvm memory fragmentation which slows things down. 0.8 has some more tuning options for compaction, multi-threaded, knobs for effective rate. I notice you are using: 5 GB heap 8 GB ram So your RAM/DATA ratio is on the lower site. I think unless you have a good use case for row cache less XMx is more, but that is a minor tweak.
pb deletion
i use cassandra database replicated in two servers,when want to delete a record using this line : client.remove(keyspace, sKey, new ColumnPath(columnFamily), timestamp, ConsistencyLevel.ONE); but when i check,i see that the record still exist! any idea BR
Re: pb deletion
What is the ConsitencyLevel of your reads? A ConsistencyLevel.ONE remove returns when it has deleted the record from at least 1 replica (and any other ones will be deleted when they can). It could be the case that you are deleting the record off of one node and then reading it off of the other one (that has not had the delete propagated to it). Try removing with a ConsistencyLevel.QUORUM or ConsistencyLevel.ALL (same thing in your case). - Original Message - From: karim abbouh karim_...@yahoo.fr To: user@cassandra.apache.org Sent: Friday, May 27, 2011 5:09:08 PM Subject: pb deletion i use cassandra database replicated in two servers,when want to delete a record using this line : client.remove(keyspace, sKey, new ColumnPath(columnFamily), timestamp, ConsistencyLevel.ONE); but when i check,i see that the record still exist! any idea BR
Re: Cluster not recovering when a single node dies
It does not. (Most failures are transient, so Cassandra doesn't inflict the non-negligible performance impact of re-replicating a full node's worth of data until you tell it that guys' not coming back this time.) On Fri, May 27, 2011 at 10:47 AM, Paul Loy ketera...@gmail.com wrote: I guess my next question is: the data should be complete somewhere in the ring with RF = 2. Does cassandra not redistribute the replication ring without a nodetool decommission call? On Fri, May 27, 2011 at 4:45 PM, Paul Loy ketera...@gmail.com wrote: ahh, thanks. On Fri, May 27, 2011 at 4:43 PM, Jonathan Ellis jbel...@gmail.com wrote: Quorum of 2 is 2. You need at least RF=3 for quorum to tolerate losing a node indefinitely. On Fri, May 27, 2011 at 10:37 AM, Paul Loy ketera...@gmail.com wrote: We have a 4 node cluster with a replication factor of 2. When one node dies, the other nodes throw UnavailableExceptions for quorum reads (as expected initially). They never get out of that state. Is there something we can do in nodetool to make the remaining nodes function? Thanks. -- - Paul Loy p...@keteracel.com http://uk.linkedin.com/in/paulloy -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- - Paul Loy p...@keteracel.com http://uk.linkedin.com/in/paulloy -- - Paul Loy p...@keteracel.com http://uk.linkedin.com/in/paulloy -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Cluster not recovering when a single node dies
Sounds reasonable. Thanks. On Fri, May 27, 2011 at 7:12 PM, Jonathan Ellis jbel...@gmail.com wrote: It does not. (Most failures are transient, so Cassandra doesn't inflict the non-negligible performance impact of re-replicating a full node's worth of data until you tell it that guys' not coming back this time.) On Fri, May 27, 2011 at 10:47 AM, Paul Loy ketera...@gmail.com wrote: I guess my next question is: the data should be complete somewhere in the ring with RF = 2. Does cassandra not redistribute the replication ring without a nodetool decommission call? On Fri, May 27, 2011 at 4:45 PM, Paul Loy ketera...@gmail.com wrote: ahh, thanks. On Fri, May 27, 2011 at 4:43 PM, Jonathan Ellis jbel...@gmail.com wrote: Quorum of 2 is 2. You need at least RF=3 for quorum to tolerate losing a node indefinitely. On Fri, May 27, 2011 at 10:37 AM, Paul Loy ketera...@gmail.com wrote: We have a 4 node cluster with a replication factor of 2. When one node dies, the other nodes throw UnavailableExceptions for quorum reads (as expected initially). They never get out of that state. Is there something we can do in nodetool to make the remaining nodes function? Thanks. -- - Paul Loy p...@keteracel.com http://uk.linkedin.com/in/paulloy -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- - Paul Loy p...@keteracel.com http://uk.linkedin.com/in/paulloy -- - Paul Loy p...@keteracel.com http://uk.linkedin.com/in/paulloy -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- - Paul Loy p...@keteracel.com http://uk.linkedin.com/in/paulloy
Re: Re: nodetool move trying to stream data to node no longer in cluster
Glad to report I fixed this problem. 1. I added the load_ring_state=false flag 2. I was able to arrange a time where I could take down the whole cluster and bring it back up. After that the phantom node disappeared. On Fri, May 27, 2011 at 12:48 AM, jonathan.co...@gmail.com wrote: Hi Aaron - Thanks alot for the great feedback. I'll try your suggestion on removing it as an endpoint with jmx. On , aaron morton aa...@thelastpickle.com wrote: Off the top of my head the simple way to stop invalid end point state been passed around is a full cluster stop. Obviously thats not an option. The problem is if one node has the IP is will share it around with the others. Out of interest take a look at the o.a.c.db.FailureDetector MBean getAllEndpointStates() function. That returns the end point state held by the Gossiper. I think you should see the Phantom IP listed in there. If it's only on some nodes *perhaps* restarting the node with the JVM option -Dcassandra.load_ring_state=false *may* help. That will stop the node from loading it's save ring state and force it to get it via gossip. Again, if there are other nodes with the phantom IP it may just get it again. I'll do some digging and try to get back to you. This pops up from time to time and thinking out loud I wonder if it would be possible to add a new application state that purges an IP from the ring. e.g. VersionedValue.STATUS_PURGED that works with a ttl so it goes through X number of gossip rounds and then disappears. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 May 2011, at 19:58, Jonathan Colby wrote: @Aaron - Unfortunately I'm still seeing message like: is down, removing from gossip, although with not the same frequency. And repair/move jobs don't seem to try to stream data to the removed node anymore. Anyone know how to totally purge any stored gossip/endpoint data on nodes that were removed from the cluster. Or what might be happening here otherwise? On May 26, 2011, at 9:10 AM, aaron morton wrote: cool. I was going to suggest that but as you already had the move running I thought it may be a little drastic. Did it show any progress ? If the IP address is not responding there should have been some sort of error. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 May 2011, at 15:28, jonathan.co...@gmail.com wrote: Seems like it had something to do with stale endpoint information. I did a rolling restart of the whole cluster and that seemed to trigger the nodes to remove the node that was decommissioned. On , aaron morton aa...@thelastpickle.com wrote: Is it showing progress ? It may just be a problem with the information printed out. Can you check from the other nodes in the cluster to see if they are receiving the stream ? cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 May 2011, at 00:42, Jonathan Colby wrote: I recently removed a node (with decommission) from our cluster. I added a couple new nodes and am now trying to rebalance the cluster using nodetool move. However, netstats shows that the node being moved is trying to stream data to the node that I already decommissioned yesterday. The removed node was powered-off, taken out of dns, its IP is not even pingable. It was never a seed neither. This is cassandra 0.7.5 on 64bit linux. How do I tell the cluster that this node is gone? Gossip should have detected this. The ring commands shows the correct cluster IPs. Here is a portion of netstats. 10.46.108.102 is the node which was removed. Mode: Leaving: streaming data to other nodes Streaming to: /10.46.108.102 /var/lib/cassandra/data/DFS/main-f-1064-Data.db/(4681027,5195491),(5195491,15308570),(15308570,15891710),(16336750,20558705),(20558705,29112203),(29112203,36279329),(36465942,36623223),(36740457,37227058),(37227058,42206994),(42206994,47380294),(47635053,47709813),(47709813,48353944),(48621287,49406499),(53330048,53571312),(53571312,54153922),(54153922,59857615),(59857615,61029910),(61029910,61871509),(62190800,62498605),(62824281,62964830),(63511604,64353114),(64353114,64760400),(65174702,65919771),(65919771,66435630),(81440029,81725949),(81725949,83313847),(83313847,83908709),(88983863,89237303),(89237303,89934199),(89934199,97 ... 5693491,14795861666),(14795861666,14796105318),(14796105318,14796366886),(14796699825,14803874941),(14803874941,14808898331),(14808898331,14811670699),(14811670699,14815125177),(14815125177,14819765003),(14820229433,14820858266)
expiring + counter column?
is this combination feature available , or on track ? thanks Yang
Re: expiring + counter column?
No. See comments to https://issues.apache.org/jira/browse/CASSANDRA-2103 On Fri, May 27, 2011 at 7:29 PM, Yang tedd...@gmail.com wrote: is this combination feature available , or on track ? thanks Yang -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com