Re: exception when adding a node replication factor (3) exceeds number of endpoints (1) - SOLVED

2011-05-28 Thread Jonathan Colby
OK, is seems a phantom node (one that was removed from the cluster)
kept being passed around in gossip as a down endpoint and was messing
up the gossip algorithm.  I had the luxury of being able to stop the
entire cluster and bring the nodes up one by one.  That purged the bad
node from gossip.  Not sure if there was a more elegant way to do
that.

On Fri, May 27, 2011 at 9:28 AM,  jonathan.co...@gmail.com wrote:
 Anyone have any idea what this could mean?
 This is a cluster of 7 nodes, I'm trying to add the 8th node.

 INFO [FlushWriter:1] 2011-05-27 09:22:40,495 Memtable.java (line 164)
 Completed flushing /var/lib/cassandra/data/system/Migrations-f-1-Data.db
 (6358 bytes)
 INFO [FlushWriter:1] 2011-05-27 09:22:40,496 Memtable.java (line 157)
 Writing Memtable-Schema@60230368(2363 bytes, 3 operations)
 INFO [FlushWriter:1] 2011-05-27 09:22:40,562 Memtable.java (line 164)
 Completed flushing /var/lib/cassandra/data/system/Schema-f-1-Data.db (2513
 bytes)
 INFO [GossipStage:1] 2011-05-27 09:22:40,829 Gossiper.java (line 610) Node
 /10.46.108.104 is now part of the cluster
 ERROR [GossipStage:1] 2011-05-27 09:22:40,845
 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
 java.lang.IllegalStateException: replication factor (3) exceeds number of
 endpoints (1)
 at
 org.apache.cassandra.locator.OldNetworkTopologyStrategy.calculateNaturalEndpoints(OldNetworkTopologyStrategy.java:100)
 at
 org.apache.cassandra.locator.AbstractReplicationStrategy.getAddressRanges(AbstractReplicationStrategy.java:196)
 at
 org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:945)
 at
 org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
 at
 org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:707)
 at
 org.apache.cassandra.service.StorageService.onChange(StorageService.java:648)
 at
 org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1124)
 at
 org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:643)
 at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:611)
 at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:690)
 at
 org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:60)
 at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 ERROR [GossipStage:1] 2011-05-27 09:22:40,847 AbstractCassandraDaemon.java
 (line 112) Fatal exception in thread Thread[GossipStage:1,5,main]
 java.lang.IllegalStateException: replication factor (3) exceeds number of
 endpoints (1)
 at
 org.apache.cassandra.locator.OldNetworkTopologyStrategy.calculateNaturalEndpoints(OldNetworkTopologyStrategy.java:100)
 at
 org.apache.cassandra.locator.AbstractReplicationStrategy.getAddressRanges(AbstractReplicationStrategy.java:196)
 at
 org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:945)
 at
 org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
 at
 org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:707)
 at
 org.apache.cassandra.service.StorageService.onChange(StorageService.java:648)
 at
 org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1124)
 at
 org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:643)
 at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:611)
 at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:690)
 at
 org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:60)
 at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)


new thing going on with repair in 0.7.6??

2011-05-28 Thread Jonathan Colby
It might just not have occurred to me in the previous 0.7.4 version,
but when I do a repair on a node in v0.7.6, it seems like data is also
synced with neighboring nodes.

My understanding of repair is that the data is reconciled one the node
being repaired. i.e., data is removed or added to that node based on
reading the data on other nodes.

I read another thread about a bug which results in the entire data
being streamed over when you don't specify a CF.  But in my case, we
only have one CF - we're using cassandra as a simple key/value store
so I don't think it applies to my setup.

This is a netstats on the node being repaired. Note how everything is
streaming out to other nodes.  Is this a bug or an improvement?

Mode: Normal
Streaming to: /10.47.108.103
   /var/lib/cassandra/data/DFS/main-f-1833-Data.db sections=2542
progress=6243767484/48128279825 - 12%
   /var/lib/cassandra/data/DFS/main-f-1886-Data.db sections=2146
progress=0/748205318 - 0%
   /var/lib/cassandra/data/DFS/main-f-1854-Data.db sections=2542
progress=0/47640938847 - 0%
   /var/lib/cassandra/data/DFS/main-f-1851-Data.db sections=2502
progress=0/1587416504 - 0%
   /var/lib/cassandra/data/DFS/main-f-1892-Data.db sections=1409
progress=0/175226826 - 0%
   /var/lib/cassandra/data/DFS/main-f-1850-Data.db sections=1108
progress=0/107442430 - 0%
   /var/lib/cassandra/data/DFS/main-f-1859-Data.db sections=2542
progress=0/81697265819 - 0%
Streaming to: /10.46.108.103
   /var/lib/cassandra/data/DFS/main-f-1854-Data.db sections=72
progress=0/303912581 - 0%
   /var/lib/cassandra/data/DFS/main-f-1851-Data.db sections=71
progress=0/24604460 - 0%
   /var/lib/cassandra/data/DFS/main-f-1892-Data.db sections=26
progress=0/30900263 - 0%
   /var/lib/cassandra/data/DFS/main-f-1850-Data.db sections=19
progress=0/150012 - 0%
   /var/lib/cassandra/data/DFS/main-f-1859-Data.db sections=72
progress=0/436200262 - 0%
Streaming to: /10.46.108.101
   /var/lib/cassandra/data/DFS/main-f-1892-Data.db sections=193
progress=0/54332711 - 0%
   /var/lib/cassandra/data/DFS/main-f-1851-Data.db sections=693
progress=0/52937963 - 0%
   /var/lib/cassandra/data/DFS/main-f-1850-Data.db sections=135
progress=0/1323107 - 0%
   /var/lib/cassandra/data/DFS/main-f-1859-Data.db sections=702
progress=0/4220897850 - 0%
 Nothing streaming from /10.47.108.103


Re: Corrupted Counter Columns

2011-05-28 Thread Utku Can Topçu
Hello,

Actually I did not have the patience to discover more on what's going on. I
had to drop the CF and start from scratch.

Even though there were no writes to those particular columns, while reading
at CL.ONE
there was a 50% chance that
- The query returned the correct value (51664)
- The query returned a non-sense value (18651001) (I say this is non-sense
because the there were not more than 52K increment requests and all
increments are actually +1 increments)

Aftet starting from scratch; I'm writing with CL.ONE and reading with
CL.QUORUM. Things seems to work fine.


On Fri, May 27, 2011 at 1:59 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Thu, May 26, 2011 at 2:21 PM, Utku Can Topçu u...@topcu.gen.tr wrote:
  Hello,
 
  I'm using the the 0.8.0-rc1, with RF=2 and 4 nodes.
 
  Strangely counters are corrupted. Say, the actual value should be : 51664
  and the value that cassandra sometimes outputs is: either 51664 or
 18651001.

 What does sometimes means in that context ? Is it like some query
 returns the former and some other the latter ? Does it alternate in
 the value returned despite no write coming in or does this at least
 stabilize to one of those value. Could you give more details on how
 this manifests itself. Does it depends on which node you connect to
 for the request for instance, does querying at QUORUM solves it ?

 
  And I have no idea on how to diagnose the problem or reproduce it.
 
  Can you help me in fixing this issue?
 
  Regards,
  Utku
 



Re: expiring + counter column?

2011-05-28 Thread Utku Can Topçu
How about implementing a freezing mechanism on counter columns.

If there are no more increments within freeze seconds after the last
increments (it would be orders or day or so); the column would lock itself
on increments and won't accept increment.

And after this freeze perioid, the ttl should work fine. The column will be
gone forever after freeze + ttl seconds.

On Sat, May 28, 2011 at 2:57 AM, Jonathan Ellis jbel...@gmail.com wrote:

 No. See comments to https://issues.apache.org/jira/browse/CASSANDRA-2103

 On Fri, May 27, 2011 at 7:29 PM, Yang tedd...@gmail.com wrote:
  is this combination feature available , or on track ?
 
  thanks
  Yang
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



automating cleanup tasks

2011-05-28 Thread Sasha Dolgy
Hi Everyone,

Other than cron, is anyone using anything fancy to automate and manage
the execution of some funtastic tasks, like 'nodetool repair' on all
the nodes in their ring?

-- 
Sasha Dolgy
sasha.do...@gmail.com


Re: PHP CQL Driver

2011-05-28 Thread Eric Evans
On Thu, 2011-05-26 at 20:51 +0200, Kwasi Gyasi - Agyei wrote:
 CREATE COLUMNFAMILY magic (KEY text PRIMARY KEY, monkey ) WITH
 comparator = text AND default_validation = text

That's not a valid query.  If monkey is a column definition, then it
needs a type.  If you're trying to name the key, don't do that (at least
not yet).  Try instead:

CREATE COLUMNFAMILY magic (KEY text PRIMARY KEY, monkey text) WITH
comparator = text AND default_validation = text


-- 
Eric Evans
eev...@rackspace.com