Re: exception when adding a node replication factor (3) exceeds number of endpoints (1) - SOLVED
OK, is seems a phantom node (one that was removed from the cluster) kept being passed around in gossip as a down endpoint and was messing up the gossip algorithm. I had the luxury of being able to stop the entire cluster and bring the nodes up one by one. That purged the bad node from gossip. Not sure if there was a more elegant way to do that. On Fri, May 27, 2011 at 9:28 AM, jonathan.co...@gmail.com wrote: Anyone have any idea what this could mean? This is a cluster of 7 nodes, I'm trying to add the 8th node. INFO [FlushWriter:1] 2011-05-27 09:22:40,495 Memtable.java (line 164) Completed flushing /var/lib/cassandra/data/system/Migrations-f-1-Data.db (6358 bytes) INFO [FlushWriter:1] 2011-05-27 09:22:40,496 Memtable.java (line 157) Writing Memtable-Schema@60230368(2363 bytes, 3 operations) INFO [FlushWriter:1] 2011-05-27 09:22:40,562 Memtable.java (line 164) Completed flushing /var/lib/cassandra/data/system/Schema-f-1-Data.db (2513 bytes) INFO [GossipStage:1] 2011-05-27 09:22:40,829 Gossiper.java (line 610) Node /10.46.108.104 is now part of the cluster ERROR [GossipStage:1] 2011-05-27 09:22:40,845 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.IllegalStateException: replication factor (3) exceeds number of endpoints (1) at org.apache.cassandra.locator.OldNetworkTopologyStrategy.calculateNaturalEndpoints(OldNetworkTopologyStrategy.java:100) at org.apache.cassandra.locator.AbstractReplicationStrategy.getAddressRanges(AbstractReplicationStrategy.java:196) at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:945) at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896) at org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:707) at org.apache.cassandra.service.StorageService.onChange(StorageService.java:648) at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1124) at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:643) at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:611) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:690) at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:60) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) ERROR [GossipStage:1] 2011-05-27 09:22:40,847 AbstractCassandraDaemon.java (line 112) Fatal exception in thread Thread[GossipStage:1,5,main] java.lang.IllegalStateException: replication factor (3) exceeds number of endpoints (1) at org.apache.cassandra.locator.OldNetworkTopologyStrategy.calculateNaturalEndpoints(OldNetworkTopologyStrategy.java:100) at org.apache.cassandra.locator.AbstractReplicationStrategy.getAddressRanges(AbstractReplicationStrategy.java:196) at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:945) at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896) at org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:707) at org.apache.cassandra.service.StorageService.onChange(StorageService.java:648) at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1124) at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:643) at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:611) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:690) at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:60) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
new thing going on with repair in 0.7.6??
It might just not have occurred to me in the previous 0.7.4 version, but when I do a repair on a node in v0.7.6, it seems like data is also synced with neighboring nodes. My understanding of repair is that the data is reconciled one the node being repaired. i.e., data is removed or added to that node based on reading the data on other nodes. I read another thread about a bug which results in the entire data being streamed over when you don't specify a CF. But in my case, we only have one CF - we're using cassandra as a simple key/value store so I don't think it applies to my setup. This is a netstats on the node being repaired. Note how everything is streaming out to other nodes. Is this a bug or an improvement? Mode: Normal Streaming to: /10.47.108.103 /var/lib/cassandra/data/DFS/main-f-1833-Data.db sections=2542 progress=6243767484/48128279825 - 12% /var/lib/cassandra/data/DFS/main-f-1886-Data.db sections=2146 progress=0/748205318 - 0% /var/lib/cassandra/data/DFS/main-f-1854-Data.db sections=2542 progress=0/47640938847 - 0% /var/lib/cassandra/data/DFS/main-f-1851-Data.db sections=2502 progress=0/1587416504 - 0% /var/lib/cassandra/data/DFS/main-f-1892-Data.db sections=1409 progress=0/175226826 - 0% /var/lib/cassandra/data/DFS/main-f-1850-Data.db sections=1108 progress=0/107442430 - 0% /var/lib/cassandra/data/DFS/main-f-1859-Data.db sections=2542 progress=0/81697265819 - 0% Streaming to: /10.46.108.103 /var/lib/cassandra/data/DFS/main-f-1854-Data.db sections=72 progress=0/303912581 - 0% /var/lib/cassandra/data/DFS/main-f-1851-Data.db sections=71 progress=0/24604460 - 0% /var/lib/cassandra/data/DFS/main-f-1892-Data.db sections=26 progress=0/30900263 - 0% /var/lib/cassandra/data/DFS/main-f-1850-Data.db sections=19 progress=0/150012 - 0% /var/lib/cassandra/data/DFS/main-f-1859-Data.db sections=72 progress=0/436200262 - 0% Streaming to: /10.46.108.101 /var/lib/cassandra/data/DFS/main-f-1892-Data.db sections=193 progress=0/54332711 - 0% /var/lib/cassandra/data/DFS/main-f-1851-Data.db sections=693 progress=0/52937963 - 0% /var/lib/cassandra/data/DFS/main-f-1850-Data.db sections=135 progress=0/1323107 - 0% /var/lib/cassandra/data/DFS/main-f-1859-Data.db sections=702 progress=0/4220897850 - 0% Nothing streaming from /10.47.108.103
Re: Corrupted Counter Columns
Hello, Actually I did not have the patience to discover more on what's going on. I had to drop the CF and start from scratch. Even though there were no writes to those particular columns, while reading at CL.ONE there was a 50% chance that - The query returned the correct value (51664) - The query returned a non-sense value (18651001) (I say this is non-sense because the there were not more than 52K increment requests and all increments are actually +1 increments) Aftet starting from scratch; I'm writing with CL.ONE and reading with CL.QUORUM. Things seems to work fine. On Fri, May 27, 2011 at 1:59 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Thu, May 26, 2011 at 2:21 PM, Utku Can Topçu u...@topcu.gen.tr wrote: Hello, I'm using the the 0.8.0-rc1, with RF=2 and 4 nodes. Strangely counters are corrupted. Say, the actual value should be : 51664 and the value that cassandra sometimes outputs is: either 51664 or 18651001. What does sometimes means in that context ? Is it like some query returns the former and some other the latter ? Does it alternate in the value returned despite no write coming in or does this at least stabilize to one of those value. Could you give more details on how this manifests itself. Does it depends on which node you connect to for the request for instance, does querying at QUORUM solves it ? And I have no idea on how to diagnose the problem or reproduce it. Can you help me in fixing this issue? Regards, Utku
Re: expiring + counter column?
How about implementing a freezing mechanism on counter columns. If there are no more increments within freeze seconds after the last increments (it would be orders or day or so); the column would lock itself on increments and won't accept increment. And after this freeze perioid, the ttl should work fine. The column will be gone forever after freeze + ttl seconds. On Sat, May 28, 2011 at 2:57 AM, Jonathan Ellis jbel...@gmail.com wrote: No. See comments to https://issues.apache.org/jira/browse/CASSANDRA-2103 On Fri, May 27, 2011 at 7:29 PM, Yang tedd...@gmail.com wrote: is this combination feature available , or on track ? thanks Yang -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
automating cleanup tasks
Hi Everyone, Other than cron, is anyone using anything fancy to automate and manage the execution of some funtastic tasks, like 'nodetool repair' on all the nodes in their ring? -- Sasha Dolgy sasha.do...@gmail.com
Re: PHP CQL Driver
On Thu, 2011-05-26 at 20:51 +0200, Kwasi Gyasi - Agyei wrote: CREATE COLUMNFAMILY magic (KEY text PRIMARY KEY, monkey ) WITH comparator = text AND default_validation = text That's not a valid query. If monkey is a column definition, then it needs a type. If you're trying to name the key, don't do that (at least not yet). Try instead: CREATE COLUMNFAMILY magic (KEY text PRIMARY KEY, monkey text) WITH comparator = text AND default_validation = text -- Eric Evans eev...@rackspace.com