date:20110527

Stable/unstable packages?

2011-05-27 Thread Marcus Bointon

Are there separate repos/packages for stable/unstable releases of Cassandra? I 
was a bit surprised to find the official debian repo pushing out 0.8b2 as a 
normal update to the cassandra package. Would it not be better to have a 
cassandra-unstable package for bleeding edge and plain cassandra for stable? 
Maybe even cassandra-0.7 and cassandra-0.8 with a cassandra virtual package 
pointing at the current stable release?

Marcus

Re: Stable/unstable packages?

2011-05-27 Thread Marcus Bointon

On 27 May 2011, at 10:10, Marcus Bointon wrote:

 Are there separate repos/packages for stable/unstable releases of Cassandra? 
 I was a bit surprised to find the official debian repo pushing out 0.8b2 as a 
 normal update to the cassandra package. Would it not be better to have a 
 cassandra-unstable package for bleeding edge and plain cassandra for stable? 
 Maybe even cassandra-0.7 and cassandra-0.8 with a cassandra virtual package 
 pointing at the current stable release?

Ahem. I just found http://wiki.apache.org/cassandra/DebianPackaging so don't 
answer that one!

Marcus

Re: Corrupted Counter Columns

2011-05-27 Thread Sylvain Lebresne

On Thu, May 26, 2011 at 2:21 PM, Utku Can Topçu u...@topcu.gen.tr wrote:
 Hello,

 I'm using the the 0.8.0-rc1, with RF=2 and 4 nodes.

 Strangely counters are corrupted. Say, the actual value should be : 51664
 and the value that cassandra sometimes outputs is: either 51664 or 18651001.

What does sometimes means in that context ? Is it like some query
returns the former and some other the latter ? Does it alternate in
the value returned despite no write coming in or does this at least
stabilize to one of those value. Could you give more details on how
this manifests itself. Does it depends on which node you connect to
for the request for instance, does querying at QUORUM solves it ?


 And I have no idea on how to diagnose the problem or reproduce it.

 Can you help me in fixing this issue?

 Regards,
 Utku

Fwd: Mixing different OS in a cassandra cluster

2011-05-27 Thread Mikael Wikblom


Hi,

I tried to mix windows and linux in a cassandra cluster version 0.7.4 
and got an exception on a linux node bootstrapping from a windows node.


java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(Unknown Source)
at 
org.apache.cassandra.io.sstable.Descriptor.fromFilename(Descriptor.java:117)
at 
org.apache.cassandra.streaming.PendingFile$PendingFileSerializer.deserialize(PendingFile.java:126)
at 
org.apache.cassandra.streaming.StreamHeader$StreamHeaderSerializer.deserialize(StreamHeader.java:90)
at 
org.apache.cassandra.streaming.StreamHeader$StreamHeaderSerializer.deserialize(StreamHeader.java:72)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:90)


the problem is that the file name separator differs on windows and 
linux. Are there any plans to fix support for clusters with mixed nodes? 
A fix for the file name issue would be quite simple.


Thanks and regards
Mikael Wikblom

Re: Fwd: Mixing different OS in a cassandra cluster

2011-05-27 Thread Jonathan Ellis

Right. This is not supported.
On May 27, 2011 7:25 AM, Mikael Wikblom mikael.wikb...@sitevision.se
wrote:
 Hi,

 I tried to mix windows and linux in a cassandra cluster version 0.7.4
 and got an exception on a linux node bootstrapping from a windows node.

 java.lang.StringIndexOutOfBoundsException: String index out of range: -1
 at java.lang.String.substring(Unknown Source)
 at

org.apache.cassandra.io.sstable.Descriptor.fromFilename(Descriptor.java:117)
 at

org.apache.cassandra.streaming.PendingFile$PendingFileSerializer.deserialize(PendingFile.java:126)
 at

org.apache.cassandra.streaming.StreamHeader$StreamHeaderSerializer.deserialize(StreamHeader.java:90)
 at

org.apache.cassandra.streaming.StreamHeader$StreamHeaderSerializer.deserialize(StreamHeader.java:72)
 at

org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:90)

 the problem is that the file name separator differs on windows and
 linux. Are there any plans to fix support for clusters with mixed nodes?
 A fix for the file name issue would be quite simple.

 Thanks and regards
 Mikael Wikblom

average repair/bootstrap durations

2011-05-27 Thread Jonathan Colby

Hi -

Operations  like repair and bootstrap on nodes in our cluster (average
load 150GB each) take a very long time.

By long I mean 1-2 days.   With nodetool netstats I can see the
progress % very slowly progressing.

I guess there are some throttling mechanisms built into cassandra.
And yes there is also production load on these nodes so it is somewhat
understandable. Also some of out compacted data files are as 50-60 GB
each.

I was just wondering if these times are similar to what other people
are experiencing or if there is a serious configuration problem with
our setup.

So what have you guys seen with operations like loadbalance,repair,
cleanup, bootstrap on nodes with large amounts of data??

I'm not seeing too many full garbage collections.  Other minor GCs are
well under a second.

Setup info:
0.7.4
5 GB heap
8 GB  ram
64 bit linux os
AMD quad core HP blades
CMS Garbage collector with default cassandra settings
1 TB raid 0 sata disks
across 2 datacenters, but operations within the same dc take very long too.


This is a netstat output of a bootstrap that has been going on for 3+ hours:

Mode: Normal
Streaming to: /10.47.108.103
   
/var/lib/cassandra/data/DFS/main-f-1541-Data.db/(0,32842490722),(32842490722,139556639427),(139556639427,161075890783)
 progress=94624588642/161075890783 - 58%
   /var/lib/cassandra/data/DFS/main-f-1455-Data.db/(0,660743002)
 progress=0/660743002 - 0%
   
/var/lib/cassandra/data/DFS/main-f-1444-Data.db/(0,32816130132),(32816130132,71465138397),(71465138397,90968640033)
 progress=0/90968640033 - 0%
   
/var/lib/cassandra/data/DFS/main-f-1540-Data.db/(0,931632934),(931632934,2621052149),(2621052149,3236107041)
 progress=0/3236107041 - 0%
   
/var/lib/cassandra/data/DFS/main-f-1488-Data.db/(0,33428780851),(33428780851,110546591227),(110546591227,110851587206)
 progress=0/110851587206 - 0%
   
/var/lib/cassandra/data/DFS/main-f-1542-Data.db/(0,24091168),(24091168,97485080),(97485080,108233211)
 progress=0/108233211 - 0%
   
/var/lib/cassandra/data/DFS/main-f-1544-Data.db/(0,3646406),(3646406,18065308),(18065308,25776551)
 progress=0/25776551 - 0%
   /var/lib/cassandra/data/DFS/main-f-1452-Data.db/(0,676616940)
 progress=0/676616940 - 0%
   
/var/lib/cassandra/data/DFS/main-f-1548-Data.db/(0,6957269),(6957269,48966550),(48966550,51499779)
 progress=0/51499779 - 0%
   
/var/lib/cassandra/data/DFS/main-f-1552-Data.db/(0,237153399),(237153399,750466875),(750466875,898056853)
 progress=0/898056853 - 0%
   
/var/lib/cassandra/data/DFS/main-f-1554-Data.db/(0,45155582),(45155582,195640768),(195640768,247592141)
 progress=0/247592141 - 0%
   /var/lib/cassandra/data/DFS/main-f-1449-Data.db/(0,2812483216)
 progress=0/2812483216 - 0%
   
/var/lib/cassandra/data/DFS/main-f-1545-Data.db/(0,107648943),(107648943,434575065),(434575065,436667186)
 progress=0/436667186 - 0%
Not receiving any streams.
Pool NameActive   Pending  Completed
Commandsn/a 0 134283
Responses   n/a 0 192438

Re: python cql driver select count(*) failed

2011-05-27 Thread Jonathan Ellis

(and if it did, it would be the SQL row count, which is different than
the column count from pycassa.)

On Fri, May 27, 2011 at 10:13 AM, Jonathan Ellis jbel...@gmail.com wrote:
 CQL does not support count().

 On Fri, May 27, 2011 at 4:18 AM, Donal Zang zan...@ihep.ac.cn wrote:
 Hi,
  I'm using the jar from the trunk source code .
 I tried the following select cql, but it get the wrong result.(I can get the
 right result using pycassa's get_count())
 cqlsh select count(1) from t_container where KEY = '2011041210' ;
 (0,)
 cqlsh select count(*) from t_container where KEY = '2011041210' ;
 (0,)
 Any ideas? Should the KEY be converted to bytes?

 Thanks!
 Donal




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: average repair/bootstrap durations

2011-05-27 Thread Edward Capriolo

On Fri, May 27, 2011 at 9:08 AM, Jonathan Colby jonathan.co...@gmail.comwrote:

 Hi -

 Operations  like repair and bootstrap on nodes in our cluster (average
 load 150GB each) take a very long time.

 By long I mean 1-2 days.   With nodetool netstats I can see the
 progress % very slowly progressing.

 I guess there are some throttling mechanisms built into cassandra.
 And yes there is also production load on these nodes so it is somewhat
 understandable. Also some of out compacted data files are as 50-60 GB
 each.

 I was just wondering if these times are similar to what other people
 are experiencing or if there is a serious configuration problem with
 our setup.

 So what have you guys seen with operations like loadbalance,repair,
 cleanup, bootstrap on nodes with large amounts of data??

 I'm not seeing too many full garbage collections.  Other minor GCs are
 well under a second.

 Setup info:
 0.7.4
 5 GB heap
 8 GB  ram
 64 bit linux os
 AMD quad core HP blades
 CMS Garbage collector with default cassandra settings
 1 TB raid 0 sata disks
 across 2 datacenters, but operations within the same dc take very long too.


 This is a netstat output of a bootstrap that has been going on for 3+
 hours:

 Mode: Normal
 Streaming to: /10.47.108.103

 /var/lib/cassandra/data/DFS/main-f-1541-Data.db/(0,32842490722),(32842490722,139556639427),(139556639427,161075890783)
 progress=94624588642/161075890783 - 58%
   /var/lib/cassandra/data/DFS/main-f-1455-Data.db/(0,660743002)
 progress=0/660743002 - 0%

 /var/lib/cassandra/data/DFS/main-f-1444-Data.db/(0,32816130132),(32816130132,71465138397),(71465138397,90968640033)
 progress=0/90968640033 - 0%

 /var/lib/cassandra/data/DFS/main-f-1540-Data.db/(0,931632934),(931632934,2621052149),(2621052149,3236107041)
 progress=0/3236107041 - 0%

 /var/lib/cassandra/data/DFS/main-f-1488-Data.db/(0,33428780851),(33428780851,110546591227),(110546591227,110851587206)
 progress=0/110851587206 - 0%

 /var/lib/cassandra/data/DFS/main-f-1542-Data.db/(0,24091168),(24091168,97485080),(97485080,108233211)
 progress=0/108233211 - 0%

 /var/lib/cassandra/data/DFS/main-f-1544-Data.db/(0,3646406),(3646406,18065308),(18065308,25776551)
 progress=0/25776551 - 0%
   /var/lib/cassandra/data/DFS/main-f-1452-Data.db/(0,676616940)
 progress=0/676616940 - 0%

 /var/lib/cassandra/data/DFS/main-f-1548-Data.db/(0,6957269),(6957269,48966550),(48966550,51499779)
 progress=0/51499779 - 0%

 /var/lib/cassandra/data/DFS/main-f-1552-Data.db/(0,237153399),(237153399,750466875),(750466875,898056853)
 progress=0/898056853 - 0%

 /var/lib/cassandra/data/DFS/main-f-1554-Data.db/(0,45155582),(45155582,195640768),(195640768,247592141)
 progress=0/247592141 - 0%
   /var/lib/cassandra/data/DFS/main-f-1449-Data.db/(0,2812483216)
 progress=0/2812483216 - 0%

 /var/lib/cassandra/data/DFS/main-f-1545-Data.db/(0,107648943),(107648943,434575065),(434575065,436667186)
 progress=0/436667186 - 0%
 Not receiving any streams.
 Pool NameActive   Pending  Completed
 Commandsn/a 0 134283
 Responses   n/a 0 192438


That is a little long but every case is diffent par. With low requiest load
and some heavy server iron RAID,RAM you can see a compaction move really
fast 300 GB in 4-6 hours. With enough load one of these operations
compact,cleanup,join can get really bogged down to the point where it almost
does not move. Sometimes that is just the way it is based on how fragmented
your rows are and how fast your gear is. Not pushing your Cassandra caches
up to your JVM limit can help. If your heap is often near full you can have
jvm memory fragmentation which slows things down.

0.8 has some more tuning options for compaction, multi-threaded, knobs for
effective rate.

I notice you are using:
5 GB heap
8 GB  ram

So your RAM/DATA ratio is on the lower site. I think unless you have a good
use case for row cache less XMx is more, but that is a minor tweak.

Re: python cql driver select count(*) failed

2011-05-27 Thread Jonathan Ellis

CQL does not support count().

On Fri, May 27, 2011 at 4:18 AM, Donal Zang zan...@ihep.ac.cn wrote:
 Hi,
  I'm using the jar from the trunk source code .
 I tried the following select cql, but it get the wrong result.(I can get the
 right result using pycassa's get_count())
 cqlsh select count(1) from t_container where KEY = '2011041210' ;
 (0,)
 cqlsh select count(*) from t_container where KEY = '2011041210' ;
 (0,)
 Any ideas? Should the KEY be converted to bytes?

 Thanks!
 Donal




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Cluster not recovering when a single node dies

2011-05-27 Thread Paul Loy

We have a 4 node cluster with a replication factor of 2. When one node dies,
the other nodes throw UnavailableExceptions for quorum reads (as expected
initially). They never get out of that state.

Is there something we can do in nodetool to make the remaining nodes
function?

Thanks.

-- 
-
Paul Loy
p...@keteracel.com
http://uk.linkedin.com/in/paulloy

Re: Cluster not recovering when a single node dies

2011-05-27 Thread Jonathan Ellis

Quorum of 2 is 2. You need at least RF=3 for quorum to tolerate losing
a node indefinitely.

On Fri, May 27, 2011 at 10:37 AM, Paul Loy ketera...@gmail.com wrote:
 We have a 4 node cluster with a replication factor of 2. When one node dies,
 the other nodes throw UnavailableExceptions for quorum reads (as expected
 initially). They never get out of that state.

 Is there something we can do in nodetool to make the remaining nodes
 function?

 Thanks.

 --
 -
 Paul Loy
 p...@keteracel.com
 http://uk.linkedin.com/in/paulloy




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Cluster not recovering when a single node dies

2011-05-27 Thread Paul Loy

ahh, thanks.

On Fri, May 27, 2011 at 4:43 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Quorum of 2 is 2. You need at least RF=3 for quorum to tolerate losing
 a node indefinitely.

 On Fri, May 27, 2011 at 10:37 AM, Paul Loy ketera...@gmail.com wrote:
  We have a 4 node cluster with a replication factor of 2. When one node
 dies,
  the other nodes throw UnavailableExceptions for quorum reads (as expected
  initially). They never get out of that state.
 
  Is there something we can do in nodetool to make the remaining nodes
  function?
 
  Thanks.
 
  --
  -
  Paul Loy
  p...@keteracel.com
  http://uk.linkedin.com/in/paulloy
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




-- 
-
Paul Loy
p...@keteracel.com
http://uk.linkedin.com/in/paulloy

Re: Cluster not recovering when a single node dies

2011-05-27 Thread Paul Loy

I guess my next question is: the data should be complete somewhere in the
ring with RF = 2. Does cassandra not redistribute the replication ring
without a nodetool decommission call?

On Fri, May 27, 2011 at 4:45 PM, Paul Loy ketera...@gmail.com wrote:

 ahh, thanks.

 On Fri, May 27, 2011 at 4:43 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Quorum of 2 is 2. You need at least RF=3 for quorum to tolerate losing
 a node indefinitely.

 On Fri, May 27, 2011 at 10:37 AM, Paul Loy ketera...@gmail.com wrote:
  We have a 4 node cluster with a replication factor of 2. When one node
 dies,
  the other nodes throw UnavailableExceptions for quorum reads (as
 expected
  initially). They never get out of that state.
 
  Is there something we can do in nodetool to make the remaining nodes
  function?
 
  Thanks.
 
  --
  -
  Paul Loy
  p...@keteracel.com
  http://uk.linkedin.com/in/paulloy
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




 --
 -
 Paul Loy
 p...@keteracel.com
 http://uk.linkedin.com/in/paulloy




-- 
-
Paul Loy
p...@keteracel.com
http://uk.linkedin.com/in/paulloy

Re: average repair/bootstrap durations

2011-05-27 Thread Jonathan Colby

Thanks Ed!   I was thinking about surrendering more memory to mmap
operations.  I'm going to try bringing the Xmx down to 4G

On Fri, May 27, 2011 at 5:19 PM, Edward Capriolo edlinuxg...@gmail.com wrote:


 On Fri, May 27, 2011 at 9:08 AM, Jonathan Colby jonathan.co...@gmail.com
 wrote:

 Hi -

 Operations  like repair and bootstrap on nodes in our cluster (average
 load 150GB each) take a very long time.

 By long I mean 1-2 days.   With nodetool netstats I can see the
 progress % very slowly progressing.

 I guess there are some throttling mechanisms built into cassandra.
 And yes there is also production load on these nodes so it is somewhat
 understandable. Also some of out compacted data files are as 50-60 GB
 each.

 I was just wondering if these times are similar to what other people
 are experiencing or if there is a serious configuration problem with
 our setup.

 So what have you guys seen with operations like loadbalance,repair,
 cleanup, bootstrap on nodes with large amounts of data??

 I'm not seeing too many full garbage collections.  Other minor GCs are
 well under a second.

 Setup info:
 0.7.4
 5 GB heap
 8 GB  ram
 64 bit linux os
 AMD quad core HP blades
 CMS Garbage collector with default cassandra settings
 1 TB raid 0 sata disks
 across 2 datacenters, but operations within the same dc take very long
 too.


 This is a netstat output of a bootstrap that has been going on for 3+
 hours:

 Mode: Normal
 Streaming to: /10.47.108.103

 /var/lib/cassandra/data/DFS/main-f-1541-Data.db/(0,32842490722),(32842490722,139556639427),(139556639427,161075890783)
         progress=94624588642/161075890783 - 58%
   /var/lib/cassandra/data/DFS/main-f-1455-Data.db/(0,660743002)
         progress=0/660743002 - 0%

 /var/lib/cassandra/data/DFS/main-f-1444-Data.db/(0,32816130132),(32816130132,71465138397),(71465138397,90968640033)
         progress=0/90968640033 - 0%

 /var/lib/cassandra/data/DFS/main-f-1540-Data.db/(0,931632934),(931632934,2621052149),(2621052149,3236107041)
         progress=0/3236107041 - 0%

 /var/lib/cassandra/data/DFS/main-f-1488-Data.db/(0,33428780851),(33428780851,110546591227),(110546591227,110851587206)
         progress=0/110851587206 - 0%

 /var/lib/cassandra/data/DFS/main-f-1542-Data.db/(0,24091168),(24091168,97485080),(97485080,108233211)
         progress=0/108233211 - 0%

 /var/lib/cassandra/data/DFS/main-f-1544-Data.db/(0,3646406),(3646406,18065308),(18065308,25776551)
         progress=0/25776551 - 0%
   /var/lib/cassandra/data/DFS/main-f-1452-Data.db/(0,676616940)
         progress=0/676616940 - 0%

 /var/lib/cassandra/data/DFS/main-f-1548-Data.db/(0,6957269),(6957269,48966550),(48966550,51499779)
         progress=0/51499779 - 0%

 /var/lib/cassandra/data/DFS/main-f-1552-Data.db/(0,237153399),(237153399,750466875),(750466875,898056853)
         progress=0/898056853 - 0%

 /var/lib/cassandra/data/DFS/main-f-1554-Data.db/(0,45155582),(45155582,195640768),(195640768,247592141)
         progress=0/247592141 - 0%
   /var/lib/cassandra/data/DFS/main-f-1449-Data.db/(0,2812483216)
         progress=0/2812483216 - 0%

 /var/lib/cassandra/data/DFS/main-f-1545-Data.db/(0,107648943),(107648943,434575065),(434575065,436667186)
         progress=0/436667186 - 0%
 Not receiving any streams.
 Pool Name                    Active   Pending      Completed
 Commands                        n/a         0         134283
 Responses                       n/a         0         192438

 That is a little long but every case is diffent par. With low requiest load
 and some heavy server iron RAID,RAM you can see a compaction move really
 fast 300 GB in 4-6 hours. With enough load one of these operations
 compact,cleanup,join can get really bogged down to the point where it almost
 does not move. Sometimes that is just the way it is based on how fragmented
 your rows are and how fast your gear is. Not pushing your Cassandra caches
 up to your JVM limit can help. If your heap is often near full you can have
 jvm memory fragmentation which slows things down.

 0.8 has some more tuning options for compaction, multi-threaded, knobs for
 effective rate.

 I notice you are using:
 5 GB heap
 8 GB  ram

 So your RAM/DATA ratio is on the lower site. I think unless you have a good
 use case for row cache less XMx is more, but that is a minor tweak.

pb deletion

2011-05-27 Thread karim abbouh

i use cassandra database replicated in two servers,when want to delete a record 
using this line :
client.remove(keyspace, sKey, new ColumnPath(columnFamily), timestamp, 
ConsistencyLevel.ONE);

but when i check,i see that the record still exist!
any idea

BR

Re: pb deletion

2011-05-27 Thread Konstantin Naryshkin

What is the ConsitencyLevel of your reads? A ConsistencyLevel.ONE remove 
returns when it has deleted the record from at least 1 replica (and any other 
ones will be deleted when they can). It could be the case that you are deleting 
the record off of one node and then reading it off of the other one (that has 
not had the delete propagated to it). 

Try removing with a ConsistencyLevel.QUORUM or ConsistencyLevel.ALL (same thing 
in your case). 

- Original Message -
From: karim abbouh karim_...@yahoo.fr 
To: user@cassandra.apache.org 
Sent: Friday, May 27, 2011 5:09:08 PM 
Subject: pb deletion 

i use cassandra database replicated in two servers,when want to delete a record 
using this line : 
client.remove(keyspace, sKey, new ColumnPath(columnFamily), timestamp, 
ConsistencyLevel.ONE); 

but when i check,i see that the record still exist! 
any idea 

BR

Re: Cluster not recovering when a single node dies

2011-05-27 Thread Jonathan Ellis

It does not. (Most failures are transient, so Cassandra doesn't
inflict the non-negligible performance impact of re-replicating a full
node's worth of data until you tell it that guys' not coming back
this time.)

On Fri, May 27, 2011 at 10:47 AM, Paul Loy ketera...@gmail.com wrote:
 I guess my next question is: the data should be complete somewhere in the
 ring with RF = 2. Does cassandra not redistribute the replication ring
 without a nodetool decommission call?

 On Fri, May 27, 2011 at 4:45 PM, Paul Loy ketera...@gmail.com wrote:

 ahh, thanks.

 On Fri, May 27, 2011 at 4:43 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Quorum of 2 is 2. You need at least RF=3 for quorum to tolerate losing
 a node indefinitely.

 On Fri, May 27, 2011 at 10:37 AM, Paul Loy ketera...@gmail.com wrote:
  We have a 4 node cluster with a replication factor of 2. When one node
  dies,
  the other nodes throw UnavailableExceptions for quorum reads (as
  expected
  initially). They never get out of that state.
 
  Is there something we can do in nodetool to make the remaining nodes
  function?
 
  Thanks.
 
  --
  -
  Paul Loy
  p...@keteracel.com
  http://uk.linkedin.com/in/paulloy
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



 --
 -
 Paul Loy
 p...@keteracel.com
 http://uk.linkedin.com/in/paulloy



 --
 -
 Paul Loy
 p...@keteracel.com
 http://uk.linkedin.com/in/paulloy




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Cluster not recovering when a single node dies

2011-05-27 Thread Paul Loy

Sounds reasonable.

Thanks.

On Fri, May 27, 2011 at 7:12 PM, Jonathan Ellis jbel...@gmail.com wrote:

 It does not. (Most failures are transient, so Cassandra doesn't
 inflict the non-negligible performance impact of re-replicating a full
 node's worth of data until you tell it that guys' not coming back
 this time.)

 On Fri, May 27, 2011 at 10:47 AM, Paul Loy ketera...@gmail.com wrote:
  I guess my next question is: the data should be complete somewhere in the
  ring with RF = 2. Does cassandra not redistribute the replication ring
  without a nodetool decommission call?
 
  On Fri, May 27, 2011 at 4:45 PM, Paul Loy ketera...@gmail.com wrote:
 
  ahh, thanks.
 
  On Fri, May 27, 2011 at 4:43 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  Quorum of 2 is 2. You need at least RF=3 for quorum to tolerate losing
  a node indefinitely.
 
  On Fri, May 27, 2011 at 10:37 AM, Paul Loy ketera...@gmail.com
 wrote:
   We have a 4 node cluster with a replication factor of 2. When one
 node
   dies,
   the other nodes throw UnavailableExceptions for quorum reads (as
   expected
   initially). They never get out of that state.
  
   Is there something we can do in nodetool to make the remaining nodes
   function?
  
   Thanks.
  
   --
   -
   Paul Loy
   p...@keteracel.com
   http://uk.linkedin.com/in/paulloy
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 
 
  --
  -
  Paul Loy
  p...@keteracel.com
  http://uk.linkedin.com/in/paulloy
 
 
 
  --
  -
  Paul Loy
  p...@keteracel.com
  http://uk.linkedin.com/in/paulloy
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




-- 
-
Paul Loy
p...@keteracel.com
http://uk.linkedin.com/in/paulloy

Re: Re: nodetool move trying to stream data to node no longer in cluster

2011-05-27 Thread Jonathan Colby

Glad to report I fixed this problem.
1. I added the load_ring_state=false flag
2. I was able to arrange a time where I could take down the whole
cluster and bring it back up.

After that the phantom node disappeared.

On Fri, May 27, 2011 at 12:48 AM,  jonathan.co...@gmail.com wrote:
 Hi Aaron - Thanks alot for the great feedback. I'll try your suggestion on
 removing it as an endpoint with jmx.

 On , aaron morton aa...@thelastpickle.com wrote:
 Off the top of my head the simple way to stop invalid end point state been
 passed around is a full cluster stop. Obviously thats not an option. The
 problem is if one node has the IP is will share it around with the others.



 Out of interest take a look at the o.a.c.db.FailureDetector MBean
 getAllEndpointStates() function. That returns the end point state held by
 the Gossiper. I think you should see the Phantom IP listed in there.



 If it's only on some nodes *perhaps* restarting the node with the JVM
 option -Dcassandra.load_ring_state=false *may* help. That will stop the node
 from loading it's save ring state and force it to get it via gossip. Again,
 if there are other nodes with the phantom IP it may just get it again.



 I'll do some digging and try to get back to you. This pops up from time to
 time and thinking out loud I wonder if it would be possible to add a new
 application state that purges an IP from the ring. e.g.
 VersionedValue.STATUS_PURGED that works with a ttl so it goes through X
 number of gossip rounds and then disappears.



 Hope that helps.





 -

 Aaron Morton

 Freelance Cassandra Developer

 @aaronmorton

 http://www.thelastpickle.com



 On 26 May 2011, at 19:58, Jonathan Colby wrote:



  @Aaron -

 

  Unfortunately I'm still seeing message like:   is down, removing from
  gossip, although with not the same frequency.

 

  And repair/move jobs don't seem to try to stream data to the removed
  node anymore.

 

  Anyone know how to totally purge any stored gossip/endpoint data on
  nodes that were removed from the cluster.  Or what might be happening here
  otherwise?

 

 

  On May 26, 2011, at 9:10 AM, aaron morton wrote:

 

  cool. I was going to suggest that but as you already had the move
  running I thought it may be a little drastic.

 

  Did it show any progress ? If the IP address is not responding there
  should have been some sort of error.

 

  Cheers

 

  -

  Aaron Morton

  Freelance Cassandra Developer

  @aaronmorton

  http://www.thelastpickle.com

 

  On 26 May 2011, at 15:28, jonathan.co...@gmail.com wrote:

 

  Seems like it had something to do with stale endpoint information. I
  did a rolling restart of the whole cluster and that seemed to trigger the
  nodes to remove the node that was decommissioned.

 

  On , aaron morton aa...@thelastpickle.com wrote:

  Is it showing progress ? It may just be a problem with the
  information printed out.

 

 

 

  Can you check from the other nodes in the cluster to see if they are
  receiving the stream ?

 

 

 

  cheers

 

 

 

  -

 

  Aaron Morton

 

  Freelance Cassandra Developer

 

  @aaronmorton

 

  http://www.thelastpickle.com

 

 

 

  On 26 May 2011, at 00:42, Jonathan Colby wrote:

 

 

 

  I recently removed a node (with decommission) from our cluster.

 

 

 

  I added a couple new nodes and am now trying to rebalance the
  cluster using nodetool move.

 

 

 

  However,  netstats shows that the node being moved is trying to
  stream data to the node that I already decommissioned yesterday.

 

 

 

  The removed node was powered-off, taken out of dns, its IP is not
  even pingable.   It was never a seed neither.

 

 

 

  This is cassandra 0.7.5 on 64bit linux.   How do I tell the cluster
  that this node is gone?  Gossip should have detected this.  The ring
  commands shows the correct cluster IPs.

 

 

 

  Here is a portion of netstats. 10.46.108.102 is the node which was
  removed.

 

 

 

  Mode: Leaving: streaming data to other nodes

 

  Streaming to: /10.46.108.102

 

 
  /var/lib/cassandra/data/DFS/main-f-1064-Data.db/(4681027,5195491),(5195491,15308570),(15308570,15891710),(16336750,20558705),(20558705,29112203),(29112203,36279329),(36465942,36623223),(36740457,37227058),(37227058,42206994),(42206994,47380294),(47635053,47709813),(47709813,48353944),(48621287,49406499),(53330048,53571312),(53571312,54153922),(54153922,59857615),(59857615,61029910),(61029910,61871509),(62190800,62498605),(62824281,62964830),(63511604,64353114),(64353114,64760400),(65174702,65919771),(65919771,66435630),(81440029,81725949),(81725949,83313847),(83313847,83908709),(88983863,89237303),(89237303,89934199),(89934199,97

 

  ...

 

 
  5693491,14795861666),(14795861666,14796105318),(14796105318,14796366886),(14796699825,14803874941),(14803874941,14808898331),(14808898331,14811670699),(14811670699,14815125177),(14815125177,14819765003),(14820229433,14820858266)

expiring + counter column?

2011-05-27 Thread Yang

is this combination feature available , or on track ?

thanks
Yang

Re: expiring + counter column?

2011-05-27 Thread Jonathan Ellis

No. See comments to https://issues.apache.org/jira/browse/CASSANDRA-2103

On Fri, May 27, 2011 at 7:29 PM, Yang tedd...@gmail.com wrote:
 is this combination feature available , or on track ?

 thanks
 Yang




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Stable/unstable packages?

Re: Stable/unstable packages?

Re: Corrupted Counter Columns

Fwd: Mixing different OS in a cassandra cluster

Re: Fwd: Mixing different OS in a cassandra cluster

average repair/bootstrap durations

Re: python cql driver select count(*) failed

Re: average repair/bootstrap durations

Re: python cql driver select count(*) failed

Cluster not recovering when a single node dies

Re: Cluster not recovering when a single node dies

Re: Cluster not recovering when a single node dies

Re: Cluster not recovering when a single node dies

Re: average repair/bootstrap durations

pb deletion

Re: pb deletion

Re: Cluster not recovering when a single node dies

Re: Cluster not recovering when a single node dies

Re: Re: nodetool move trying to stream data to node no longer in cluster

expiring + counter column?

Re: expiring + counter column?

21 matches

Site Navigation

Mail list logo

Footer information