Upgrade 1.2.11 to 2.0.6: some errors

2014-03-19 Thread Nicolas Lalevée
Hi,

On our test cluster, we tried a upgrade of Cassandra from 1.22.1 to 2.0.6. It 
was not straight forward so I would like to know if it is expected, so I can do 
it safely on prod.

The first time we tried, the first upgrading node refused to start with this 
error:

ERROR [main] 2014-03-19 10:50:31,363 CassandraDaemon.java (line 488) Exception 
encountered during startup
java.lang.RuntimeException: Incompatible SSTable found.  Current version jb is 
unable to read file: /var/lib/cassandra/d
ata/system/NodeIdInfo/system-NodeIdInfo-hf-4.  Please run upgradesstables.
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:415)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:392)
at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:309)
at org.apache.cassandra.db.Keyspace.(Keyspace.java:266)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
at 
org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:514)
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:237)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:471)
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:560)

I've read again the NEWS.txt [1], and as far as I understand, upgradesstables 
is only required for < 1.2.9. But maybe I don't understand correctly the 
paragraph:
- Upgrading is ONLY supported from Cassandra 1.2.9 or later. This
  goes for sstable compatibility as well as network.  When
  upgrading from an earlier release, upgrade to 1.2.9 first and
  run upgradesstables before proceeding to 2.0.

So we did the required upgradesstables. The node started successfully.

I have checked on our prod cluster, there is also some hf files, on all nodes, 
all being /var/lib/cassandra/data/system/Versions/system-Versions-hf-*
And I have tried many upgradesstables command, there are still lying there.
# nodetool upgradesstables system Versions
Exception in thread "main" java.lang.IllegalArgumentException: Unknown table/cf 
pair (system.Versions)
# nodetool upgradesstables system
# nodetool upgradesstables
# nodetool upgradesstables -a system
# ls /var/lib/cassandra/data/system/Versions/*-hf-* | wc -l
15

I did not try "nodetool upgradesstables -a" since we have a lot of data.

I guess this will cause me trouble if I try to upgrade in prod ? Is there a bug 
I should report ?

Continuing on our test cluster, we upgraded the second node. And during the 
time we were running with 2 different versions of cassandra, there was errors 
in the logs:

ERROR [WRITE-/10.10.0.41] 2014-03-19 11:23:27,523 OutboundTcpConnection.java 
(line 234) error writing to /10.10.0.41
java.lang.RuntimeException: Cannot convert filter to old super column format. 
Update all nodes to Cassandra 2.0 first.
at 
org.apache.cassandra.db.SuperColumns.sliceFilterToSC(SuperColumns.java:357)
at 
org.apache.cassandra.db.SuperColumns.filterToSC(SuperColumns.java:258)
at 
org.apache.cassandra.db.ReadCommandSerializer.serializedSize(ReadCommand.java:192)
at 
org.apache.cassandra.db.ReadCommandSerializer.serializedSize(ReadCommand.java:134)
at org.apache.cassandra.net.MessageOut.serialize(MessageOut.java:116)
at 
org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:251)
at 
org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:203)
at 
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:151)

I confirm we do have old style super columns which were designed when cassandra 
was 1.0.x. Since in our test cluster the replication factor is 1, I can see 
errors on the client side, since 1 node among 2 was down. So I don't know for 
sure if this error in cassandra affected the client, the time frame is too 
short to be sure from the logs. In prod we have a replication factor of 3. If 
we'll do a such upgrade in prod, node by node to avoid any downtime, will the 
client still see write errors during the time there will be mixed versions of 
cassandra ?

Nicolas

[1] 
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-2.0.6



Re: Cassandra timeout whereas it is not much busy

2013-01-29 Thread Nicolas Lalevée

Le 29 janv. 2013 à 08:08, aaron morton  a écrit :

>> From what I could read there seems to be a contention issue around the 
>> flushing (the "switchlock" ?). Cassandra would then be slow, but not using 
>> the entire cpu. I would be in the strange situation I was where I reported 
>> my issue in this thread.
>> Does my theory makes sense ?
> If you are seeing contention around the switch lock you will see a pattern in 
> the logs where a "Writing…" message is immediately followed by an "Enqueing…" 
> message. This happens when the flush_queue is full and the thread flushing 
> (either because of memory, commit log or snapshot etc) is waiting. 
> 
> See the comments for memtable_flush_queue_size in the yaml file. 
> 
> If you increase the value you will flush more frequently as C* leaves for 
> memory to handle the case where the queue is full. 
> 
> If you have spare IO you could consider increasing memtable_flush_writers

ok. I see.

I think that the RAM upgrade will fix most of my issues. But if I come to see 
that situation again, I'll definitively look into tuning memtable_flush_writers.

Thanks for your help.

Nicolas

> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 29/01/2013, at 4:19 AM, Nicolas Lalevée  wrote:
> 
>> I did some testing, I have a theory.
>> 
>> First, we have it seems "a lot" of CF. And two are particularly every hungry 
>> in RAM, consuming a quite big amount of RAM for the bloom filters. Cassandra 
>> do not force the flush of the memtables if it has more than 6G of Xmx 
>> (luckily for us, this is the maximum reasonable we can give).
>> Since our machines have 8G, this gives quite a little room for the disk 
>> cache. Thanks to this systemtap script [1], I have seen that the hit ratio 
>> is about 10%.
>> 
>> Then I have tested with an Xmx at 4G. So %wa drops down. The disk cache 
>> ratio raises to 80%. On the other hand, flushing is happening very often. I 
>> cannot say how much, since I have too many CF to graph them all. But the 
>> ones I graph, none of their memtable goes above 10M, whereas they usually go 
>> up to 200M.
>> 
>> I have not tested further. Since it is quite obvious that the machines needs 
>> more RAM. And they're about to receive more.
>> 
>> But I guess that if I had to put more write and read pressure, with still an 
>> xmx at 4G, the %wa would still be quite low, but the flushing would be even 
>> more intensive. And I guess that it would go wrong. From what I could read 
>> there seems to be a contention issue around the flushing (the "switchlock" 
>> ?). Cassandra would then be slow, but not using the entire cpu. I would be 
>> in the strange situation I was where I reported my issue in this thread.
>> Does my theory makes sense ?
>> 
>> Nicolas
>> 
>> [1] http://sourceware.org/systemtap/wiki/WSCacheHitRate
>> 
>> Le 23 janv. 2013 à 18:35, Nicolas Lalevée  a 
>> écrit :
>> 
>>> Le 22 janv. 2013 à 21:50, Rob Coli  a écrit :
>>> 
>>>> On Wed, Jan 16, 2013 at 1:30 PM, Nicolas Lalevée
>>>>  wrote:
>>>>> Here is the long story.
>>>>> After some long useless staring at the monitoring graphs, I gave a try to
>>>>> using the openjdk 6b24 rather than openjdk 7u9
>>>> 
>>>> OpenJDK 6 and 7 are both counter-recommended with regards to
>>>> Cassandra. I've heard reports of mysterious behavior like the behavior
>>>> you describe, when using OpenJDK 7.
>>>> 
>>>> Try using the Sun/Oracle JVM? Is your JNA working?
>>> 
>>> JNA is working.
>>> I tried both oracle-jdk6 and oracle-jdk7, no difference with openjdk6. And 
>>> since ubuntu is only maintaining openjdk, we'll stick with it until 
>>> oracle's one proven better.
>>> oracle vs openjdk, I tested for now under "normal" pressure though.
>>> 
>>> What amaze me is whatever how much I google it and ask around, I still 
>>> don't know for sure the difference between the openjdk and oracle's jdk…
>>> 
>>> Nicolas
>>> 
>> 
> 



Re: Cassandra timeout whereas it is not much busy

2013-01-28 Thread Nicolas Lalevée
I did some testing, I have a theory.

First, we have it seems "a lot" of CF. And two are particularly every hungry in 
RAM, consuming a quite big amount of RAM for the bloom filters. Cassandra do 
not force the flush of the memtables if it has more than 6G of Xmx (luckily for 
us, this is the maximum reasonable we can give).
Since our machines have 8G, this gives quite a little room for the disk cache. 
Thanks to this systemtap script [1], I have seen that the hit ratio is about 
10%.

Then I have tested with an Xmx at 4G. So %wa drops down. The disk cache ratio 
raises to 80%. On the other hand, flushing is happening very often. I cannot 
say how much, since I have too many CF to graph them all. But the ones I graph, 
none of their memtable goes above 10M, whereas they usually go up to 200M.

I have not tested further. Since it is quite obvious that the machines needs 
more RAM. And they're about to receive more.

But I guess that if I had to put more write and read pressure, with still an 
xmx at 4G, the %wa would still be quite low, but the flushing would be even 
more intensive. And I guess that it would go wrong. From what I could read 
there seems to be a contention issue around the flushing (the "switchlock" ?). 
Cassandra would then be slow, but not using the entire cpu. I would be in the 
strange situation I was where I reported my issue in this thread.
Does my theory makes sense ?

Nicolas

[1] http://sourceware.org/systemtap/wiki/WSCacheHitRate

Le 23 janv. 2013 à 18:35, Nicolas Lalevée  a écrit :

> Le 22 janv. 2013 à 21:50, Rob Coli  a écrit :
> 
>> On Wed, Jan 16, 2013 at 1:30 PM, Nicolas Lalevée
>>  wrote:
>>> Here is the long story.
>>> After some long useless staring at the monitoring graphs, I gave a try to
>>> using the openjdk 6b24 rather than openjdk 7u9
>> 
>> OpenJDK 6 and 7 are both counter-recommended with regards to
>> Cassandra. I've heard reports of mysterious behavior like the behavior
>> you describe, when using OpenJDK 7.
>> 
>> Try using the Sun/Oracle JVM? Is your JNA working?
> 
> JNA is working.
> I tried both oracle-jdk6 and oracle-jdk7, no difference with openjdk6. And 
> since ubuntu is only maintaining openjdk, we'll stick with it until oracle's 
> one proven better.
> oracle vs openjdk, I tested for now under "normal" pressure though.
> 
> What amaze me is whatever how much I google it and ask around, I still don't 
> know for sure the difference between the openjdk and oracle's jdk…
> 
> Nicolas
> 



Re: JMX CF Beans

2013-01-26 Thread Nicolas Lalevée
thanks. both of you.

Nicolas

Le 25 janv. 2013 à 19:05, Tyler Hobbs  a écrit :

> 
> On Fri, Jan 25, 2013 at 8:07 AM, Nicolas Lalevée  
> wrote:
> Just a quick question about the attributes exposed via JMX. I have some doc 
> [1] but it doesn't help about CF beans.
> 
> The "BloomFilterFalseRatio", is that the ratio of found vs missed, or the 
> ratio of false positive vs the number of tests, or something else ?
> 
> False positives.
> 
> You should be aware of this bug, though: 
> https://issues.apache.org/jira/browse/CASSANDRA-4043
>  
> 
> The "ReadCount" and "WriteCount", how do they count regarding the replication 
> factor ? As far as I understand, the read and write on the StorageProxy is 
> the actual number of requests coming from clients. So judging that the sum on 
> all cf of the read and write is near equal to the replication factor multiply 
> by the number of read and write on the StorageProxy, I am guessing that the 
> read and write per cf are the replicas one. Am I right ?
> 
> 
> StorageProxy read/write counts should equal the number of client requests.
> ColumnFamily read/write counts correspond to actual, local data reads, so the 
> sum of this number across all nodes will be approximately RF * the 
> StorageProxy counts.
> 
> 
> -- 
> Tyler Hobbs
> DataStax



JMX CF Beans

2013-01-25 Thread Nicolas Lalevée
Just a quick question about the attributes exposed via JMX. I have some doc [1] 
but it doesn't help about CF beans.

The "BloomFilterFalseRatio", is that the ratio of found vs missed, or the ratio 
of false positive vs the number of tests, or something else ?

The "ReadCount" and "WriteCount", how do they count regarding the replication 
factor ? As far as I understand, the read and write on the StorageProxy is the 
actual number of requests coming from clients. So judging that the sum on all 
cf of the read and write is near equal to the replication factor multiply by 
the number of read and write on the StorageProxy, I am guessing that the read 
and write per cf are the replicas one. Am I right ?

Nicolas

[1] http://wiki.apache.org/cassandra/JmxInterface



Re: Cassandra timeout whereas it is not much busy

2013-01-23 Thread Nicolas Lalevée
Le 22 janv. 2013 à 21:50, Rob Coli  a écrit :

> On Wed, Jan 16, 2013 at 1:30 PM, Nicolas Lalevée
>  wrote:
>> Here is the long story.
>> After some long useless staring at the monitoring graphs, I gave a try to
>> using the openjdk 6b24 rather than openjdk 7u9
> 
> OpenJDK 6 and 7 are both counter-recommended with regards to
> Cassandra. I've heard reports of mysterious behavior like the behavior
> you describe, when using OpenJDK 7.
> 
> Try using the Sun/Oracle JVM? Is your JNA working?

JNA is working.
I tried both oracle-jdk6 and oracle-jdk7, no difference with openjdk6. And 
since ubuntu is only maintaining openjdk, we'll stick with it until oracle's 
one proven better.
oracle vs openjdk, I tested for now under "normal" pressure though.

What amaze me is whatever how much I google it and ask around, I still don't 
know for sure the difference between the openjdk and oracle's jdk…

Nicolas



Re: Cassandra timeout whereas it is not much busy

2013-01-21 Thread Nicolas Lalevée
Le 17 janv. 2013 à 05:00, aaron morton  a écrit :

> Check the disk utilisation using iostat -x 5
> If you are on a VM / in the cloud check for CPU steal. 
> Check the logs for messages from the GCInspector, the ParNew events are times 
> the JVM is paused. 

I have seen logs about that. I didn't worry much, since the GC of the jvm was 
not under pressure. As far as I understand, unless a CF is "continuously" 
flushed, it should not be a major issue, isn't it ?
I don't know for sure if there was a lot of flush though, since my nodes were 
not properly monitored.

> Look at the times dropped messages are logged and try to correlate them with 
> other server events.

I tried that with not much success. I have graphs on cacti though, so this is 
quite hard to visualize when things happen simultaneously on several graphs.

> If you have a lot secondary indexes, or a lot of memtables flushing at the 
> some time you may be blocking behind the global Switch Lock. If you use 
> secondary indexes make sure the memtable_flush_queue_size is set correctly, 
> see the comments in the yaml file.

I have no secondary indexes.

> If you have a lot of CF's flushing at the same time, and there are not 
> messages from the "MeteredFlusher", it may be the log segment is too big for 
> the number of CF's you have. When the segment needs to be recycled all dirty 
> CF's are flushed, if you have a lot of cf's this can result in blocking 
> around the switch lock. Trying reducing the commitlog_segment_size_in_mb so 
> that less CF's are flushed.

What is "a lot" ? We have 26 CF. 9 are barely used. 15 contains time series 
data (cassandra rocks with them) in which only 3 of them have from 1 to 10 read 
or writes per sec. 1 quite hot (200read/s) which is mainly used for its bloom 
filter (which "disksize" is about 1G). And 1 also hot used only for writes 
(which has the same big bloom filter, which I am about to remove since it is 
useless).

BTW, thanks for the pointers. I have not tried yet to put our nodes under 
pressure. But when I'll do, I'll look at those pointers closely.

Nicolas

> 
> Hope that helps
>  
> -----
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 17/01/2013, at 10:30 AM, Nicolas Lalevée  
> wrote:
> 
>> Hi,
>> 
>> I have a strange behavior I am not able to understand.
>> 
>> I have 6 nodes with cassandra-1.0.12. Each nodes have 8G of RAM. I have a 
>> replication factor of 3.
>> 
>> ---
>> my story is maybe too long, trying shorter here, while saving what I wrote 
>> in case someone has patience to read my bad english ;)
>> 
>> I got under a situation where my cluster was generating a lot of timeouts on 
>> our frontend, whereas I could not see any major trouble on the internal 
>> stats. Actually cpu, read & write counts on the column families were quite 
>> low. A mess until I switched from java7 to java6 and forced the used of 
>> jamm. After the switch, cpu, read & write counts, were going up again, 
>> timeouts gone. I have seen this behavior while reducing the xmx too.
>> 
>> What could be blocking cassandra from utilizing the while resources of the 
>> machine ? Is there is metrics I didn't saw which could explain this ?
>> 
>> ---
>> Here is the long story.
>> 
>> When I first set my cluster up, I gave blindly 6G of heap to the cassandra 
>> nodes, thinking that more a java process has, the smoother it runs, while 
>> keeping some RAM to the disk cache. We got some new feature deployed, and 
>> things were going into hell, some machine up to 60% of wa. I give credit to 
>> cassandra because there was not that much timeout received on the web 
>> frontend, it was kind of slow but is was kind of working. With some 
>> optimizations, we reduced the pressure of the new feature, but it was still 
>> at 40%wa.
>> 
>> At that time I didn't have much monitoring, just heap and cpu. I read some 
>> article how to tune, and I learned that the disk cache is quite important 
>> because cassandra relies on it to be the read cache. So I have tried many 
>> xmx, and 3G seems of kind the lowest possible. So on 2 among 6 nodes, I have 
>> set 3,3G to xmx. Amazingly, I saw the wa down to 10%. Quite happy with that, 
>> I changed the xmx 3,3G on each node. But then things really went to hell, a 
>> lot of timeouts on the frontend. It was not working at all. So I rolled back.
>> 
>> After some time, probably because of the growing data of the new feature to 
>> a n

Cassandra timeout whereas it is not much busy

2013-01-16 Thread Nicolas Lalevée
Hi,

I have a strange behavior I am not able to understand.

I have 6 nodes with cassandra-1.0.12. Each nodes have 8G of RAM. I have a 
replication factor of 3.

---
my story is maybe too long, trying shorter here, while saving what I wrote in 
case someone has patience to read my bad english ;)

I got under a situation where my cluster was generating a lot of timeouts on 
our frontend, whereas I could not see any major trouble on the internal stats. 
Actually cpu, read & write counts on the column families were quite low. A mess 
until I switched from java7 to java6 and forced the used of jamm. After the 
switch, cpu, read & write counts, were going up again, timeouts gone. I have 
seen this behavior while reducing the xmx too.

What could be blocking cassandra from utilizing the while resources of the 
machine ? Is there is metrics I didn't saw which could explain this ?

---
Here is the long story.

When I first set my cluster up, I gave blindly 6G of heap to the cassandra 
nodes, thinking that more a java process has, the smoother it runs, while 
keeping some RAM to the disk cache. We got some new feature deployed, and 
things were going into hell, some machine up to 60% of wa. I give credit to 
cassandra because there was not that much timeout received on the web frontend, 
it was kind of slow but is was kind of working. With some optimizations, we 
reduced the pressure of the new feature, but it was still at 40%wa.

At that time I didn't have much monitoring, just heap and cpu. I read some 
article how to tune, and I learned that the disk cache is quite important 
because cassandra relies on it to be the read cache. So I have tried many xmx, 
and 3G seems of kind the lowest possible. So on 2 among 6 nodes, I have set 
3,3G to xmx. Amazingly, I saw the wa down to 10%. Quite happy with that, I 
changed the xmx 3,3G on each node. But then things really went to hell, a lot 
of timeouts on the frontend. It was not working at all. So I rolled back.

After some time, probably because of the growing data of the new feature to a 
nominal size, things went again to very high %wa, and cassandra was not able to 
keep it up. So we kind of reverted the feature, the column family is still used 
but only by one thread on the frontend. The wa was reduced to 20%, but things 
continued to not properly working, from time to time, a bunch of timeout are 
raised on our frontend.

In the mean time, I took time to do some proper monitoring of cassandra: column 
family read & write counts, latency, memtable size, but also the dropped 
messages, the pending tasks, the timeouts between nodes. It's just a start but 
it haves me a first nice view of what is actually going on.

I tried again reducing the xmx on one node. Cassandra is not complaining of 
having not enough heap, memtables are not flushed insanely every second, the 
number of read and write is reduced compared to the other node, the cpu is 
lower too, there is not much pending tasks, no message dropped more than 1 or 2 
from time to time. Everything indicates that there is probably more room to 
more work, but the node doesn't take it. Even its read and write latencies are 
lower than on the other nodes. But if I keep this long enough with this xmx, 
timeouts start to raise on the frontends.
After some individual node experiment, the cluster was starting be be quite 
"sick". Even with 6G, the %wa were reducing, read and write counts too, on kind 
of every node. And more and more timeout raised on the frontend.
The only thing that I could see worrying, is the heap climbing slowly above the 
75% threshold and from time to time suddenly dropping from 95% to 70%. I looked 
at the full gc counter, not much pressure.
And another thing was some "Timed out replaying hints to /10.0.0.56; aborting 
further deliveries" in the log. But logged as info, so I guess not much 
important.

After some long useless staring at the monitoring graphs, I gave a try to using 
the openjdk 6b24 rather than openjdk 7u9, and force cassandra to load jamm, 
since in 1.0 the init script blacklist the openjdk. Node after node, I saw that 
the heap was behaving more like I use to see on jam based apps, some nice up 
and down rather than a long and slow climb. But read and write counts were 
still low on every node, and timeout were still bursting on our frontend.
A continuing mess until I restarted the "first" node of the cluster. There was 
still one to switch to java6 + jamm, but as soon as I restarted my "first" 
node, every node started working more, %wa climbing, read & write count 
climbing, no more timeout on the frontend, the frontend being then fast has 
hell.

I understand that my cluster is probably under-capacity. But I don't understand 
how since there is something within cassandra which might block the full use of 
the machine resources. It seems kind of related to the heap, but I don't know 
how. Any idea ?
I intend to start monitoring more metrics, but do you 

Re: Dead node still being pinged

2012-06-14 Thread Nicolas Lalevée

Le 13 juin 2012 à 20:52, aaron morton a écrit :

>> You meant -Dcassandra.load_ring_state=false right ?
> yes, sorry. 
> 
>> Maybe I could open a jira about my issue ? Maybe there was a config mess on 
>> my part at some point, ie the unsynchronized date on my machines, but I 
>> think it would be nice if cassandra could resolve itself of that 
>> inconsistent state.
> The old nodes are not listed in the ring are they ?
> 
> You can try calling unsafeAssassinateEndpoint() on the Gossip MBean. 

unsafe, assassinate, hum :)
I had read the source code of that function to reassure myself, but I did 
called it.
And it worked, I don't see any packet from new nodes to the old nodes anymore.
The gossip info changed. I have now some 'LEFT' statuses instead of 'removed' 
ones:
/10.10.0.24
  REMOVAL_COORDINATOR:REMOVER,0
  STATUS:LEFT,141713094015402114482637574571109123934,1339920978684
/10.10.0.22
  REMOVAL_COORDINATOR:REMOVER,113427455640312814857969558651062452224
  STATUS:LEFT,141713094015402114482637574571109123934,1339920834956

Thanks you very much Aaron.

Nicolas


> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 14/06/2012, at 12:06 AM, Nicolas Lalevée wrote:
> 
>> 
>> Le 13 juin 2012 à 10:30, aaron morton a écrit :
>> 
>>> Here is what I *think* is going on, if Brandon is around he may be able to 
>>> help out. 
>>> 
>>> 
>>> The old nodes are being included in the Gossip rounds, because 
>>> Gossiper.doGossipToUnreachableMember() just looks at the nodes that are 
>>> unreachable. It does not check if they have been removed from the cluster. 
>>> 
>>> Information about the removed nodes is kept by gossip so that if a node is 
>>> removed while it is down it will shut down when restarted. This information 
>>> *should* stay in gossip for 3 days. 
>>> 
>>> In your gossip info, the last long on the STATUS lines is the expiry time 
>>> for this info…
>>> 
>>> /10.10.0.24
>>> STATUS:removed,127605887595351923798765477786913079296,1336530323263
>>> REMOVAL_COORDINATOR:REMOVER,0
>>> /10.10.0.22
>>> STATUS:removed,42535295865117307932921825928971026432,1336529659203
>>> REMOVAL_COORDINATOR:REMOVER,113427455640312814857969558651062452224
>>> 
>>> For the first line it's 
>>> In [48]: datetime.datetime.fromtimestamp(1336530323263/1000)
>>> Out[48]: datetime.datetime(2012, 5, 9, 14, 25, 23)
>>> 
>>> So that's good. 
>>> 
>>> The Gossip round will remove the 0.24 and 0.22 nodes from the local state 
>>> if the expiry time has passed, and the node is marked as dead and it's not 
>>> in the token ring. 
>>> 
>>> You can see if the node thinks 0.24 and 0.22 are up by looking 
>>> getSimpleStates() on the FailureDetectorMBean. (I use jmxterm to do this 
>>> sort of thing)
>> 
>> The two old nodes are still seen as down:
>> SimpleStates:[/10.10.0.22:DOWN, /10.10.0.24:DOWN, /10.10.0.26:UP, 
>> /10.10.0.25:UP, /10.10.0.27:UP]
>> 
>>> 
>>> The other thing that can confuse things is the gossip generation. If your 
>>> old nodes were started with a datetime in the future that can muck things 
>>> up. 
>> 
>> I have just checked, my old nodes machines are nicely synchronized. My new 
>> nodes have some lag of few seconds, some in the future, some in the past. I 
>> definitively need to fix that.
>> 
>>> The simple to try is starting the server with the 
>>> -Dcassandra.join_ring=false JVM option. This will force the node to get the 
>>> ring info from othernodes. Check things with nodetool gossip info to see if 
>>> the other nodes tell it about the old ones again.
>> 
>> You meant -Dcassandra.load_ring_state=false right ?
>> 
>> Then nothing changed.
>> 
>>> Sorry, gossip can be tricky to diagnose over email. 
>> 
>> No worry, I really appreciate that you take time looking into my issues.
>> 
>> Maybe I could open a jira about my issue ? Maybe there was a config mess on 
>> my part at some point, ie the unsynchronized date on my machines, but I 
>> think it would be nice if cassandra could resolve itself of that 
>> inconsistent state.
>> 
>> Nicolas
>> 
>>> 
>>> 
>>> 
>>> 
>>> -
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>

Re: Dead node still being pinged

2012-06-13 Thread Nicolas Lalevée

Le 13 juin 2012 à 10:30, aaron morton a écrit :

> Here is what I *think* is going on, if Brandon is around he may be able to 
> help out. 
> 
> 
> The old nodes are being included in the Gossip rounds, because 
> Gossiper.doGossipToUnreachableMember() just looks at the nodes that are 
> unreachable. It does not check if they have been removed from the cluster. 
> 
> Information about the removed nodes is kept by gossip so that if a node is 
> removed while it is down it will shut down when restarted. This information 
> *should* stay in gossip for 3 days. 
> 
> In your gossip info, the last long on the STATUS lines is the expiry time for 
> this info…
> 
> /10.10.0.24
> STATUS:removed,127605887595351923798765477786913079296,1336530323263
> REMOVAL_COORDINATOR:REMOVER,0
> /10.10.0.22
> STATUS:removed,42535295865117307932921825928971026432,1336529659203
> REMOVAL_COORDINATOR:REMOVER,113427455640312814857969558651062452224
> 
> For the first line it's 
> In [48]: datetime.datetime.fromtimestamp(1336530323263/1000)
> Out[48]: datetime.datetime(2012, 5, 9, 14, 25, 23)
> 
> So that's good. 
> 
> The Gossip round will remove the 0.24 and 0.22 nodes from the local state if 
> the expiry time has passed, and the node is marked as dead and it's not in 
> the token ring. 
> 
> You can see if the node thinks 0.24 and 0.22 are up by looking 
> getSimpleStates() on the FailureDetectorMBean. (I use jmxterm to do this sort 
> of thing)

The two old nodes are still seen as down:
SimpleStates:[/10.10.0.22:DOWN, /10.10.0.24:DOWN, /10.10.0.26:UP, 
/10.10.0.25:UP, /10.10.0.27:UP]

> 
> The other thing that can confuse things is the gossip generation. If your old 
> nodes were started with a datetime in the future that can muck things up. 

I have just checked, my old nodes machines are nicely synchronized. My new 
nodes have some lag of few seconds, some in the future, some in the past. I 
definitively need to fix that.

> The simple to try is starting the server with the -Dcassandra.join_ring=false 
> JVM option. This will force the node to get the ring info from othernodes. 
> Check things with nodetool gossip info to see if the other nodes tell it 
> about the old ones again.

You meant -Dcassandra.load_ring_state=false right ?

Then nothing changed.

> Sorry, gossip can be tricky to diagnose over email. 

No worry, I really appreciate that you take time looking into my issues.

Maybe I could open a jira about my issue ? Maybe there was a config mess on my 
part at some point, ie the unsynchronized date on my machines, but I think it 
would be nice if cassandra could resolve itself of that inconsistent state.

Nicolas

> 
> 
> 
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 12/06/2012, at 10:33 PM, Nicolas Lalevée wrote:
> 
>> I have one dirty solution to try: bring data-2 and data-4 back up and down 
>> again. Is there any way I can tell cassandra to not get any data, so when I 
>> would get my old node up, no streaming would start ?
>> 
>> cheers,
>> Nicolas
>> 
>> Le 12 juin 2012 à 12:25, Nicolas Lalevée a écrit :
>> 
>>> Le 12 juin 2012 à 11:03, aaron morton a écrit :
>>> 
>>>> Try purging the hints for 10.10.0.24 using the HintedHandOffManager MBean.
>>> 
>>> As far as I could tell, there were no hinted hand off to be delivered. 
>>> Nevertheless I have called "deleteHintsForEndpoint" on every node for the 
>>> two expected to be out nodes.
>>> Nothing changed, I still see packet being send to these old nodes.
>>> 
>>> I looked closer to ResponsePendingTasks of MessagingService. Actually the 
>>> numbers change, between 0 and about 4. So tasks are ending but new ones 
>>> come just after.
>>> 
>>> Nicolas
>>> 
>>>> 
>>>> Cheers
>>>> 
>>>> -
>>>> Aaron Morton
>>>> Freelance Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>> 
>>>> On 12/06/2012, at 3:33 AM, Nicolas Lalevée wrote:
>>>> 
>>>>> finally, thanks to the groovy jmx builder, it was not that hard.
>>>>> 
>>>>> 
>>>>> Le 11 juin 2012 à 12:12, Samuel CARRIERE a écrit :
>>>>> 
>>>>>> If I were you, I would connect (through JMX, with jconsole) to one of 
>>>>>> the nodes that is sending messages to an old node, and would have a look 
>>>>>> at these MBean : 
>>>>>> - org.apache.net.FailureDetector : does SimpleStates 

Re: Dead node still being pinged

2012-06-12 Thread Nicolas Lalevée
I have one dirty solution to try: bring data-2 and data-4 back up and down 
again. Is there any way I can tell cassandra to not get any data, so when I 
would get my old node up, no streaming would start ?

cheers,
Nicolas

Le 12 juin 2012 à 12:25, Nicolas Lalevée a écrit :

> Le 12 juin 2012 à 11:03, aaron morton a écrit :
> 
>> Try purging the hints for 10.10.0.24 using the HintedHandOffManager MBean.
> 
> As far as I could tell, there were no hinted hand off to be delivered. 
> Nevertheless I have called "deleteHintsForEndpoint" on every node for the two 
> expected to be out nodes.
> Nothing changed, I still see packet being send to these old nodes.
> 
> I looked closer to ResponsePendingTasks of MessagingService. Actually the 
> numbers change, between 0 and about 4. So tasks are ending but new ones come 
> just after.
> 
> Nicolas
> 
>> 
>> Cheers
>> 
>> -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 12/06/2012, at 3:33 AM, Nicolas Lalevée wrote:
>> 
>>> finally, thanks to the groovy jmx builder, it was not that hard.
>>> 
>>> 
>>> Le 11 juin 2012 à 12:12, Samuel CARRIERE a écrit :
>>> 
>>>> If I were you, I would connect (through JMX, with jconsole) to one of the 
>>>> nodes that is sending messages to an old node, and would have a look at 
>>>> these MBean : 
>>>>  - org.apache.net.FailureDetector : does SimpleStates looks good ? (or do 
>>>> you see an IP of an old node)
>>> 
>>> SimpleStates:[/10.10.0.22:DOWN, /10.10.0.24:DOWN, /10.10.0.26:UP, 
>>> /10.10.0.25:UP, /10.10.0.27:UP]
>>> 
>>>>  - org.apache.net.MessagingService : do you see one of the old IP in one 
>>>> of the attributes ?
>>> 
>>> data-5:
>>> CommandCompletedTasks:
>>> [10.10.0.22:2, 10.10.0.26:6147307, 10.10.0.27:6084684, 10.10.0.24:2]
>>> CommandPendingTasks:
>>> [10.10.0.22:0, 10.10.0.26:0, 10.10.0.27:0, 10.10.0.24:0]
>>> ResponseCompletedTasks:
>>> [10.10.0.22:1487, 10.10.0.26:6187204, 10.10.0.27:6062890, 10.10.0.24:1495]
>>> ResponsePendingTasks:
>>> [10.10.0.22:0, 10.10.0.26:0, 10.10.0.27:0, 10.10.0.24:0]
>>> 
>>> data-6:
>>> CommandCompletedTasks:
>>> [10.10.0.22:2, 10.10.0.27:6064992, 10.10.0.24:2, 10.10.0.25:6308102]
>>> CommandPendingTasks:
>>> [10.10.0.22:0, 10.10.0.27:0, 10.10.0.24:0, 10.10.0.25:0]
>>> ResponseCompletedTasks:
>>> [10.10.0.22:1463, 10.10.0.27:6067943, 10.10.0.24:1474, 10.10.0.25:6367692]
>>> ResponsePendingTasks:
>>> [10.10.0.22:0, 10.10.0.27:0, 10.10.0.24:2, 10.10.0.25:0]
>>> 
>>> data-7:
>>> CommandCompletedTasks:
>>> [10.10.0.22:2, 10.10.0.26:6043653, 10.10.0.24:2, 10.10.0.25:5964168]
>>> CommandPendingTasks:
>>> [10.10.0.22:0, 10.10.0.26:0, 10.10.0.24:0, 10.10.0.25:0]
>>> ResponseCompletedTasks:
>>> [10.10.0.22:1424, 10.10.0.26:6090251, 10.10.0.24:1431, 10.10.0.25:6094954]
>>> ResponsePendingTasks:
>>> [10.10.0.22:4, 10.10.0.26:0, 10.10.0.24:1, 10.10.0.25:0]
>>> 
>>>>  - org.apache.net.StreamingService : do you see an old IP in StreamSources 
>>>> or StreamDestinations ?
>>> 
>>> nothing streaming on the 3 nodes.
>>> nodetool netstats confirmed that.
>>> 
>>>>  - org.apache.internal.HintedHandoff : are there non-zero ActiveCount, 
>>>> CurrentlyBlockedTasks, PendingTasks, TotalBlockedTask ?
>>> 
>>> On the 3 nodes, all at 0.
>>> 
>>> I don't know much what I'm looking at, but it seems that some 
>>> ResponsePendingTasks needs to end.
>>> 
>>> Nicolas
>>> 
>>>> 
>>>> Samuel 
>>>> 
>>>> 
>>>> 
>>>> Nicolas Lalevée 
>>>> 08/06/2012 21:03
>>>> Veuillez répondre à
>>>> user@cassandra.apache.org
>>>> 
>>>> A
>>>> user@cassandra.apache.org
>>>> cc
>>>> Objet
>>>> Re: Dead node still being pinged
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Le 8 juin 2012 à 20:02, Samuel CARRIERE a écrit :
>>>> 
>>>>> I'm in the train but just a guess : maybe it's hinted handoff. A look in 
>>>>> the logs of the new nodes could confirm that : look for the IP of an old 
>>>>> node and maybe you'll find hinte

Re: Dead node still being pinged

2012-06-12 Thread Nicolas Lalevée
Le 12 juin 2012 à 11:03, aaron morton a écrit :

> Try purging the hints for 10.10.0.24 using the HintedHandOffManager MBean.

As far as I could tell, there were no hinted hand off to be delivered. 
Nevertheless I have called "deleteHintsForEndpoint" on every node for the two 
expected to be out nodes.
Nothing changed, I still see packet being send to these old nodes.

I looked closer to ResponsePendingTasks of MessagingService. Actually the 
numbers change, between 0 and about 4. So tasks are ending but new ones come 
just after.

Nicolas

> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 12/06/2012, at 3:33 AM, Nicolas Lalevée wrote:
> 
>> finally, thanks to the groovy jmx builder, it was not that hard.
>> 
>> 
>> Le 11 juin 2012 à 12:12, Samuel CARRIERE a écrit :
>> 
>>> If I were you, I would connect (through JMX, with jconsole) to one of the 
>>> nodes that is sending messages to an old node, and would have a look at 
>>> these MBean : 
>>>   - org.apache.net.FailureDetector : does SimpleStates looks good ? (or do 
>>> you see an IP of an old node)
>> 
>> SimpleStates:[/10.10.0.22:DOWN, /10.10.0.24:DOWN, /10.10.0.26:UP, 
>> /10.10.0.25:UP, /10.10.0.27:UP]
>> 
>>>   - org.apache.net.MessagingService : do you see one of the old IP in one 
>>> of the attributes ?
>> 
>> data-5:
>> CommandCompletedTasks:
>> [10.10.0.22:2, 10.10.0.26:6147307, 10.10.0.27:6084684, 10.10.0.24:2]
>> CommandPendingTasks:
>> [10.10.0.22:0, 10.10.0.26:0, 10.10.0.27:0, 10.10.0.24:0]
>> ResponseCompletedTasks:
>> [10.10.0.22:1487, 10.10.0.26:6187204, 10.10.0.27:6062890, 10.10.0.24:1495]
>> ResponsePendingTasks:
>> [10.10.0.22:0, 10.10.0.26:0, 10.10.0.27:0, 10.10.0.24:0]
>> 
>> data-6:
>> CommandCompletedTasks:
>> [10.10.0.22:2, 10.10.0.27:6064992, 10.10.0.24:2, 10.10.0.25:6308102]
>> CommandPendingTasks:
>> [10.10.0.22:0, 10.10.0.27:0, 10.10.0.24:0, 10.10.0.25:0]
>> ResponseCompletedTasks:
>> [10.10.0.22:1463, 10.10.0.27:6067943, 10.10.0.24:1474, 10.10.0.25:6367692]
>> ResponsePendingTasks:
>> [10.10.0.22:0, 10.10.0.27:0, 10.10.0.24:2, 10.10.0.25:0]
>> 
>> data-7:
>> CommandCompletedTasks:
>> [10.10.0.22:2, 10.10.0.26:6043653, 10.10.0.24:2, 10.10.0.25:5964168]
>> CommandPendingTasks:
>> [10.10.0.22:0, 10.10.0.26:0, 10.10.0.24:0, 10.10.0.25:0]
>> ResponseCompletedTasks:
>> [10.10.0.22:1424, 10.10.0.26:6090251, 10.10.0.24:1431, 10.10.0.25:6094954]
>> ResponsePendingTasks:
>> [10.10.0.22:4, 10.10.0.26:0, 10.10.0.24:1, 10.10.0.25:0]
>> 
>>>   - org.apache.net.StreamingService : do you see an old IP in StreamSources 
>>> or StreamDestinations ?
>> 
>> nothing streaming on the 3 nodes.
>> nodetool netstats confirmed that.
>> 
>>>   - org.apache.internal.HintedHandoff : are there non-zero ActiveCount, 
>>> CurrentlyBlockedTasks, PendingTasks, TotalBlockedTask ?
>> 
>> On the 3 nodes, all at 0.
>> 
>> I don't know much what I'm looking at, but it seems that some 
>> ResponsePendingTasks needs to end.
>> 
>> Nicolas
>> 
>>> 
>>> Samuel 
>>> 
>>> 
>>> 
>>> Nicolas Lalevée 
>>> 08/06/2012 21:03
>>> Veuillez répondre à
>>> user@cassandra.apache.org
>>> 
>>> A
>>> user@cassandra.apache.org
>>> cc
>>> Objet
>>> Re: Dead node still being pinged
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Le 8 juin 2012 à 20:02, Samuel CARRIERE a écrit :
>>> 
>>>> I'm in the train but just a guess : maybe it's hinted handoff. A look in 
>>>> the logs of the new nodes could confirm that : look for the IP of an old 
>>>> node and maybe you'll find hinted handoff related messages.
>>> 
>>> I grepped on every node about every old node, I got nothing since the 
>>> "crash".
>>> 
>>> If it can be of some help, here is some grepped log of the crash:
>>> 
>>> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 
>>> 00:39:30,241 StorageService.java (line 2417) Endpoint /10.10.0.24 is down 
>>> and will not receive data for re-replication of /10.10.0.22
>>> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 
>>> 00:39:30,242 StorageService.java (line 2417) Endpoint /10.10.0.24 is down 
>>> and will not receive data for re-replication of /10.10.0.

Re: Dead node still being pinged

2012-06-11 Thread Nicolas Lalevée
finally, thanks to the groovy jmx builder, it was not that hard.


Le 11 juin 2012 à 12:12, Samuel CARRIERE a écrit :

> If I were you, I would connect (through JMX, with jconsole) to one of the 
> nodes that is sending messages to an old node, and would have a look at these 
> MBean : 
>- org.apache.net.FailureDetector : does SimpleStates looks good ? (or do 
> you see an IP of an old node)

SimpleStates:[/10.10.0.22:DOWN, /10.10.0.24:DOWN, /10.10.0.26:UP, 
/10.10.0.25:UP, /10.10.0.27:UP]

>- org.apache.net.MessagingService : do you see one of the old IP in one of 
> the attributes ?

data-5:
CommandCompletedTasks:
[10.10.0.22:2, 10.10.0.26:6147307, 10.10.0.27:6084684, 10.10.0.24:2]
CommandPendingTasks:
[10.10.0.22:0, 10.10.0.26:0, 10.10.0.27:0, 10.10.0.24:0]
ResponseCompletedTasks:
[10.10.0.22:1487, 10.10.0.26:6187204, 10.10.0.27:6062890, 10.10.0.24:1495]
ResponsePendingTasks:
[10.10.0.22:0, 10.10.0.26:0, 10.10.0.27:0, 10.10.0.24:0]

data-6:
CommandCompletedTasks:
[10.10.0.22:2, 10.10.0.27:6064992, 10.10.0.24:2, 10.10.0.25:6308102]
CommandPendingTasks:
[10.10.0.22:0, 10.10.0.27:0, 10.10.0.24:0, 10.10.0.25:0]
ResponseCompletedTasks:
[10.10.0.22:1463, 10.10.0.27:6067943, 10.10.0.24:1474, 10.10.0.25:6367692]
ResponsePendingTasks:
[10.10.0.22:0, 10.10.0.27:0, 10.10.0.24:2, 10.10.0.25:0]

data-7:
CommandCompletedTasks:
[10.10.0.22:2, 10.10.0.26:6043653, 10.10.0.24:2, 10.10.0.25:5964168]
CommandPendingTasks:
[10.10.0.22:0, 10.10.0.26:0, 10.10.0.24:0, 10.10.0.25:0]
ResponseCompletedTasks:
[10.10.0.22:1424, 10.10.0.26:6090251, 10.10.0.24:1431, 10.10.0.25:6094954]
ResponsePendingTasks:
[10.10.0.22:4, 10.10.0.26:0, 10.10.0.24:1, 10.10.0.25:0]

>- org.apache.net.StreamingService : do you see an old IP in StreamSources 
> or StreamDestinations ?

nothing streaming on the 3 nodes.
nodetool netstats confirmed that.

>- org.apache.internal.HintedHandoff : are there non-zero ActiveCount, 
> CurrentlyBlockedTasks, PendingTasks, TotalBlockedTask ?

On the 3 nodes, all at 0.

I don't know much what I'm looking at, but it seems that some 
ResponsePendingTasks needs to end.

Nicolas

> 
> Samuel 
> 
> 
> 
> Nicolas Lalevée 
> 08/06/2012 21:03
> Veuillez répondre à
> user@cassandra.apache.org
> 
> A
> user@cassandra.apache.org
> cc
> Objet
> Re: Dead node still being pinged
> 
> 
> 
> 
> 
> 
> Le 8 juin 2012 à 20:02, Samuel CARRIERE a écrit :
> 
> > I'm in the train but just a guess : maybe it's hinted handoff. A look in 
> > the logs of the new nodes could confirm that : look for the IP of an old 
> > node and maybe you'll find hinted handoff related messages.
> 
> I grepped on every node about every old node, I got nothing since the "crash".
> 
> If it can be of some help, here is some grepped log of the crash:
> 
> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 
> 00:39:30,241 StorageService.java (line 2417) Endpoint /10.10.0.24 is down and 
> will not receive data for re-replication of /10.10.0.22
> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 
> 00:39:30,242 StorageService.java (line 2417) Endpoint /10.10.0.24 is down and 
> will not receive data for re-replication of /10.10.0.22
> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 
> 00:39:30,242 StorageService.java (line 2417) Endpoint /10.10.0.24 is down and 
> will not receive data for re-replication of /10.10.0.22
> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 
> 00:39:30,243 StorageService.java (line 2417) Endpoint /10.10.0.24 is down and 
> will not receive data for re-replication of /10.10.0.22
> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 
> 00:39:30,243 StorageService.java (line 2417) Endpoint /10.10.0.24 is down and 
> will not receive data for re-replication of /10.10.0.22
> system.log.1: INFO [GossipStage:1] 2012-05-06 00:44:33,822 Gossiper.java 
> (line 818) InetAddress /10.10.0.24 is now dead.
> system.log.1: INFO [GossipStage:1] 2012-05-06 04:25:23,894 Gossiper.java 
> (line 818) InetAddress /10.10.0.24 is now dead.
> system.log.1: INFO [OptionalTasks:1] 2012-05-06 04:25:23,895 
> HintedHandOffManager.java (line 179) Deleting any stored hints for /10.10.0.24
> system.log.1: INFO [GossipStage:1] 2012-05-06 04:25:23,895 
> StorageService.java (line 1157) Removing token 
> 127605887595351923798765477786913079296 for /10.10.0.24
> system.log.1: INFO [GossipStage:1] 2012-05-09 04:26:25,015 Gossiper.java 
> (line 818) InetAddress /10.10.0.24 is now dead.
> 
> 
> Maybe its the way I have removed nodes ? AFAIR I didn't used the decommission 
> command. For each node I got the node down and then issue a remove token 
> command.
> Here is what I can find in the log abo

Re: Dead node still being pinged

2012-06-11 Thread Nicolas Lalevée

Le 11 juin 2012 à 12:12, Samuel CARRIERE a écrit :

> 
> Well, I don't see anything special in the logs. "Remove token" seems to have 
> done its job : accorging to the logs, old stored hints have been deleted. 
> 
> If I were you, I would connect (through JMX, with jconsole) to one of the 
> nodes that is sending messages to an old node, and would have a look at these 
> MBean : 
>- org.apache.net.FailureDetector : does SimpleStates looks good ? (or do 
> you see an IP of an old node) 
>- org.apache.net.MessagingService : do you see one of the old IP in one of 
> the attributes ? 
>- org.apache.net.StreamingService : do you see an old IP in StreamSources 
> or StreamDestinations ? 
>- org.apache.internal.HintedHandoff : are there non-zero ActiveCount, 
> CurrentlyBlockedTasks, PendingTasks, TotalBlockedTask ? 

I feared I had too do such lookups... JMX sucks when there is some ssh 
tunneling to do. I'll get time to look into thoses. Thanks.

By the way, maybe an interesting info (same on every node):
root@data-5 ~ # nodetool -h data-local gossipinfo
/10.10.0.27
  LOAD:2.34205351889E11
  SCHEMA:21099fc0-978c-11e1--bc70eee231ef
  RPC_ADDRESS:10.10.0.27
  STATUS:NORMAL,113427455640312814857969558651062452224
  RELEASE_VERSION:1.0.9
/10.10.0.26
  LOAD:2.64617657147E11
  SCHEMA:21099fc0-978c-11e1--bc70eee231ef
  RPC_ADDRESS:10.10.0.26
  STATUS:NORMAL,56713727820156407428984779325531226112
  RELEASE_VERSION:1.0.9
/10.10.0.25
  LOAD:2.34154095981E11
  SCHEMA:21099fc0-978c-11e1--bc70eee231ef
  RPC_ADDRESS:10.10.0.25
  STATUS:NORMAL,0
  RELEASE_VERSION:1.0.9
/10.10.0.24
  STATUS:removed,127605887595351923798765477786913079296,1336530323263
  REMOVAL_COORDINATOR:REMOVER,0
/10.10.0.22
  STATUS:removed,42535295865117307932921825928971026432,1336529659203
  REMOVAL_COORDINATOR:REMOVER,113427455640312814857969558651062452224


Nicolas


> 
> Samuel 
> 
> 
> 
> Nicolas Lalevée 
> 08/06/2012 21:03
> Veuillez répondre à
> user@cassandra.apache.org
> 
> A
> user@cassandra.apache.org
> cc
> Objet
> Re: Dead node still being pinged
> 
> 
> 
> 
> 
> 
> Le 8 juin 2012 à 20:02, Samuel CARRIERE a écrit :
> 
> > I'm in the train but just a guess : maybe it's hinted handoff. A look in 
> > the logs of the new nodes could confirm that : look for the IP of an old 
> > node and maybe you'll find hinted handoff related messages.
> 
> I grepped on every node about every old node, I got nothing since the "crash".
> 
> If it can be of some help, here is some grepped log of the crash:
> 
> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 
> 00:39:30,241 StorageService.java (line 2417) Endpoint /10.10.0.24 is down and 
> will not receive data for re-replication of /10.10.0.22
> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 
> 00:39:30,242 StorageService.java (line 2417) Endpoint /10.10.0.24 is down and 
> will not receive data for re-replication of /10.10.0.22
> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 
> 00:39:30,242 StorageService.java (line 2417) Endpoint /10.10.0.24 is down and 
> will not receive data for re-replication of /10.10.0.22
> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 
> 00:39:30,243 StorageService.java (line 2417) Endpoint /10.10.0.24 is down and 
> will not receive data for re-replication of /10.10.0.22
> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 
> 00:39:30,243 StorageService.java (line 2417) Endpoint /10.10.0.24 is down and 
> will not receive data for re-replication of /10.10.0.22
> system.log.1: INFO [GossipStage:1] 2012-05-06 00:44:33,822 Gossiper.java 
> (line 818) InetAddress /10.10.0.24 is now dead.
> system.log.1: INFO [GossipStage:1] 2012-05-06 04:25:23,894 Gossiper.java 
> (line 818) InetAddress /10.10.0.24 is now dead.
> system.log.1: INFO [OptionalTasks:1] 2012-05-06 04:25:23,895 
> HintedHandOffManager.java (line 179) Deleting any stored hints for /10.10.0.24
> system.log.1: INFO [GossipStage:1] 2012-05-06 04:25:23,895 
> StorageService.java (line 1157) Removing token 
> 127605887595351923798765477786913079296 for /10.10.0.24
> system.log.1: INFO [GossipStage:1] 2012-05-09 04:26:25,015 Gossiper.java 
> (line 818) InetAddress /10.10.0.24 is now dead.
> 
> 
> Maybe its the way I have removed nodes ? AFAIR I didn't used the decommission 
> command. For each node I got the node down and then issue a remove token 
> command.
> Here is what I can find in the log about when I removed one of them:
> 
> system.log.1: INFO [GossipTasks:1] 2012-05-02 17:21:10,281 Gossiper.java 
> (line 818) InetAddress /10.10.0.24 is now dead.
> system.log.1: INFO [HintedHandoff:1] 2012-05

Re: Dead node still being pinged

2012-06-08 Thread Nicolas Lalevée
Le 8 juin 2012 à 20:50, aaron morton a écrit :

> Are the old machines listed in the seed list on the new ones ?

No they don't.

The first of my old node was, when I was "migrating". But not anymore.

Nicolas


> Cheers
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 9/06/2012, at 12:10 AM, Nicolas Lalevée wrote:
> 
>> I had a configuration where I had 4 nodes, data-1,4. We then bought 3 bigger 
>> machines, data-5,7. And we moved all data from data-1,4 to data-5,7.
>> To move all the data without interruption of service, I added one new node 
>> at a time. And then I removed one by one the old machines via a "remove 
>> token".
>> 
>> Everything was working fine. Until there was an expected load on our 
>> cluster, the machine started to swap and become unresponsive. We fixed the 
>> unexpected load and the three new machines were restarted. After that the 
>> new cassandra machines were stating that some old token were not assigned, 
>> namely from data-2 and data-4. To fix this I issued again some "remove 
>> token" commands.
>> 
>> Everything seems to be back to normal, but on the network I still see some 
>> packet from the new cluster to the old machines. On the port 7000.
>> How I can tell cassandra to completely forget about the old machines ?
>> 
>> Nicolas
>> 
> 



Re: Dead node still being pinged

2012-06-08 Thread Nicolas Lalevée
ervice.java 
(line 1157) Removing token 145835300108973619103103718265651724288 for 
/10.10.0.24


Nicolas


> 
> 
> - Message d'origine -
> De : Nicolas Lalevée [nicolas.lale...@hibnet.org]
> Envoyé : 08/06/2012 19:26 ZE2
> À : user@cassandra.apache.org
> Objet : Re: Dead node still being pinged
> 
> 
> 
> Le 8 juin 2012 à 15:17, Samuel CARRIERE a écrit :
> 
>> What does nodetool ring says ? (Ask every node)
> 
> currently, each of new node see only the tokens of the new nodes.
> 
>> Have you checked that the list of seeds in every yaml is correct ?
> 
> yes, it is correct, every of my new node point to the first of my new node
> 
>> What version of cassandra are you using ?
> 
> Sorry I should have wrote this in my first mail.
> I use the 1.0.9
> 
> Nicolas
> 
>> 
>> Samuel
>> 
>> 
>> 
>> Nicolas Lalevée 
>> 08/06/2012 14:10
>> Veuillez répondre à
>> user@cassandra.apache.org
>> 
>> A
>> user@cassandra.apache.org
>> cc
>> Objet
>> Dead node still being pinged
>> 
>> 
>> 
>> 
>> 
>> I had a configuration where I had 4 nodes, data-1,4. We then bought 3 bigger 
>> machines, data-5,7. And we moved all data from data-1,4 to data-5,7.
>> To move all the data without interruption of service, I added one new node 
>> at a time. And then I removed one by one the old machines via a "remove 
>> token".
>> 
>> Everything was working fine. Until there was an expected load on our 
>> cluster, the machine started to swap and become unresponsive. We fixed the 
>> unexpected load and the three new machines were restarted. After that the 
>> new cassandra machines were stating that some old token were not assigned, 
>> namely from data-2 and data-4. To fix this I issued again some "remove 
>> token" commands.
>> 
>> Everything seems to be back to normal, but on the network I still see some 
>> packet from the new cluster to the old machines. On the port 7000.
>> How I can tell cassandra to completely forget about the old machines ?
>> 
>> Nicolas
>> 
>> 
> 



Re: Dead node still being pinged

2012-06-08 Thread Nicolas Lalevée
Le 8 juin 2012 à 15:17, Samuel CARRIERE a écrit :

> What does nodetool ring says ? (Ask every node) 

currently, each of new node see only the tokens of the new nodes.

> Have you checked that the list of seeds in every yaml is correct ? 

yes, it is correct, every of my new node point to the first of my new node

> What version of cassandra are you using ?

Sorry I should have wrote this in my first mail.
I use the 1.0.9

Nicolas

> 
> Samuel 
> 
> 
> 
> Nicolas Lalevée 
> 08/06/2012 14:10
> Veuillez répondre à
> user@cassandra.apache.org
> 
> A
> user@cassandra.apache.org
> cc
> Objet
> Dead node still being pinged
> 
> 
> 
> 
> 
> I had a configuration where I had 4 nodes, data-1,4. We then bought 3 bigger 
> machines, data-5,7. And we moved all data from data-1,4 to data-5,7.
> To move all the data without interruption of service, I added one new node at 
> a time. And then I removed one by one the old machines via a "remove token".
> 
> Everything was working fine. Until there was an expected load on our cluster, 
> the machine started to swap and become unresponsive. We fixed the unexpected 
> load and the three new machines were restarted. After that the new cassandra 
> machines were stating that some old token were not assigned, namely from 
> data-2 and data-4. To fix this I issued again some "remove token" commands.
> 
> Everything seems to be back to normal, but on the network I still see some 
> packet from the new cluster to the old machines. On the port 7000.
> How I can tell cassandra to completely forget about the old machines ?
> 
> Nicolas
> 
> 



Dead node still being pinged

2012-06-08 Thread Nicolas Lalevée
I had a configuration where I had 4 nodes, data-1,4. We then bought 3 bigger 
machines, data-5,7. And we moved all data from data-1,4 to data-5,7.
To move all the data without interruption of service, I added one new node at a 
time. And then I removed one by one the old machines via a "remove token".

Everything was working fine. Until there was an expected load on our cluster, 
the machine started to swap and become unresponsive. We fixed the unexpected 
load and the three new machines were restarted. After that the new cassandra 
machines were stating that some old token were not assigned, namely from data-2 
and data-4. To fix this I issued again some "remove token" commands.

Everything seems to be back to normal, but on the network I still see some 
packet from the new cluster to the old machines. On the port 7000.
How I can tell cassandra to completely forget about the old machines ?

Nicolas