Re: replace_token code?

2012-09-11 Thread Yang
replied in blue, Thanks
Yang


I thought the very first log line already acquired ownership , instead of
later in the sequence?


 WARN [main] 2012-09-10 08:00:21,855 TokenMetadata.java (line 160) Token
166594924822352415786406422619018814804 changing ownership from /
10.72.201.80 to /10.190.221.204



On Tue, Sep 11, 2012 at 1:55 PM, aaron morton wrote:

> This looks correct…
>
>   INFO [GossipStage:1] 2012-09-10 08:01:23,036 Gossiper.java (line 850)
>> Node /10.72.201.80 is now part of the cluster
>>
>  INFO [GossipStage:1] 2012-09-10 08:01:23,037 Gossiper.java (line 816)
>> InetAddress /10.72.201.80 is now UP
>>
>
>>
> 80 joined the ring because it was in the stored ring state.
>


This is where I was having a doubt: instead of being allowed to come out
from "stored ring state", 80 should be immediately purged from ring
membership right after the first log  line, which purports to have acquired
ownership.  It's true that token ownership and ring membership are
orthogonal things, but here an explicit "taking over token"
operation immediately implies that the old one must be dead, and should be
kicked out of the ring. Granted that the detection of duplicate ownership
later will kick the old node out, I guess it maybe leaves room for
uncertainty before
the duplication is detected.


>
> INFO [GossipStage:1] 2012-09-10 08:01:23,038 StorageService.java (line
>> 1126) Nodes /10.72.201.80 and /10.190.221.204 have the same token
>> 166594924822352415786406422619018814804.  Ignoring /10.72.201.80
>>
> New node took ownership
>
>  INFO [GossipTasks:1] 2012-09-10 08:01:32,967 Gossiper.java (line 830)
>> InetAddress /10.72.201.80 is now dead.
>>  INFO [GossipTasks:1] 2012-09-10 08:01:53,976 Gossiper.java (line 644)
>> FatClient /10.72.201.80 has been silent for 3ms, removing from gossip
>>
> Old node marked as dead and the process to remove is started.
>
> Has the 80 node re appeared in the logs ?
>
no,

>
> If it does can you include the output from nodetool gossipinfo ?
>






>
> Cheers
>
>
>   -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 11/09/2012, at 5:59 AM, Yang  wrote:
>
> Thanks Jim, looks I'll have to read into the code to understand what is
> happening under the hood
>
> yang
>
> On Mon, Sep 10, 2012 at 9:45 AM, Jim Cistaro  wrote:
>
>>  We have seen various issues from these replaced nodes hanging around.
>>  For clusters where a lot of nodes have been replaced, we see these
>> replaced nodes having an impact on heap/GC and a lot of tcp
>> timeouts/retransmits (because the old nodes no longer exist).  As a result,
>> we have begun cleaning these up using unsafeAssassinateEndpoint via jmx.
>>  We have only started using recently.  So far no bad side effects.  This
>> also helps because those replaced nodes can appear as "unreachable nodes"
>> wrt schema and sometimes prevent things like CF truncation.
>>
>>  Using unsafeAssassinateEndpoint will clean these from unreachable nodes
>> and will mark them as LEFT in gossip info.  There is a ttl for them in
>> gossipinfo and they should go away after 3 days.  Once they are marked
>> LEFT, you should stop seeing those up/same/dead messages.
>>
>>  unsafeAssassinateEndpoint is "unsafe" in that, if you specify IP of a
>> real node in cluster, that node will be assassinated.  Otherwise, if you
>> specify nodes that have been replaced, it is supposed to work correctly.
>>
>>  Hope this helps,
>> jc
>>
>>
>>   From: Yang 
>> Reply-To: 
>> Date: Mon, 10 Sep 2012 01:10:56 -0700
>> To: 
>> Subject: replace_token code?
>>
>>  it looks that by specifying replace_token, the old owner is not removed
>> from gossip (which I had thought it would do).
>> Then it's understandable that the old owner would resurface later and we
>> get some warning saying that the same token is owned by both.
>>
>>
>>  I ran an example with a 2-node cluster, with RF=2.  host 10.72.201.80
>> was running for a while and had some data, then i shut it down, and
>> booted up 10.190.221.204 with replace_token of the old token owned by the
>> previous host.
>> the following log sequence shows that the new host does acquire the
>> token, but it does not at the same time remove 80 forcefully from gossip.
>> instead, a few seconds later, it believed that .80 became live again.
>> I don't have much understanding of the Gossip protocol, but roughly know
>> that it's probability-based, looks we need an "assertive"/"NOW"
>> membership control message for replace_token.
>>
>>
>>
>>
>>  thanks
>> yang
>>
>>
>>   WARN [main] 2012-09-10 08:00:21,855 TokenMetadata.java (line 160)
>> Token 166594924822352415786406422619018814804 changing ownership from /
>> 10.72.201.80 to /10.190.221.204
>>  INFO [main] 2012-09-10 08:00:21,855 StorageService.java (line 753)
>> JOINING: Starting to bootstrap...
>>  INFO [CompactionExecutor:2] 2012-09-10 08:00:21,875 CompactionTask.java
>> (line 109) Compacting
>> [SSTableReader(path='/mnt/cassandra/data/system

Re: replace_token code?

2012-09-11 Thread aaron morton
This looks correct…

>  INFO [GossipStage:1] 2012-09-10 08:01:23,036 Gossiper.java (line 850) Node 
> /10.72.201.80 is now part of the cluster

>  INFO [GossipStage:1] 2012-09-10 08:01:23,037 Gossiper.java (line 816) 
> InetAddress /10.72.201.80 is now UP
>  
80 joined the ring because it was in the stored ring state. 

> INFO [GossipStage:1] 2012-09-10 08:01:23,038 StorageService.java (line 1126) 
> Nodes /10.72.201.80 and /10.190.221.204 have the same token 
> 166594924822352415786406422619018814804.  Ignoring /10.72.201.80
New node took ownership

>  INFO [GossipTasks:1] 2012-09-10 08:01:32,967 Gossiper.java (line 830) 
> InetAddress /10.72.201.80 is now dead.
>  INFO [GossipTasks:1] 2012-09-10 08:01:53,976 Gossiper.java (line 644) 
> FatClient /10.72.201.80 has been silent for 3ms, removing from gossip
Old node marked as dead and the process to remove is started. 

Has the 80 node re appeared in the logs ? 

If it does can you include the output from nodetool gossipinfo ?

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 11/09/2012, at 5:59 AM, Yang  wrote:

> Thanks Jim, looks I'll have to read into the code to understand what is 
> happening under the hood
> 
> yang
> 
> On Mon, Sep 10, 2012 at 9:45 AM, Jim Cistaro  wrote:
> We have seen various issues from these replaced nodes hanging around.  For 
> clusters where a lot of nodes have been replaced, we see these replaced nodes 
> having an impact on heap/GC and a lot of tcp timeouts/retransmits (because 
> the old nodes no longer exist).  As a result, we have begun cleaning these up 
> using unsafeAssassinateEndpoint via jmx.  We have only started using 
> recently.  So far no bad side effects.  This also helps because those 
> replaced nodes can appear as "unreachable nodes" wrt schema and sometimes 
> prevent things like CF truncation.
> 
> Using unsafeAssassinateEndpoint will clean these from unreachable nodes and 
> will mark them as LEFT in gossip info.  There is a ttl for them in gossipinfo 
> and they should go away after 3 days.  Once they are marked LEFT, you should 
> stop seeing those up/same/dead messages.
> 
> unsafeAssassinateEndpoint is "unsafe" in that, if you specify IP of a real 
> node in cluster, that node will be assassinated.  Otherwise, if you specify 
> nodes that have been replaced, it is supposed to work correctly.
> 
> Hope this helps,
> jc
> 
> 
> From: Yang 
> Reply-To: 
> Date: Mon, 10 Sep 2012 01:10:56 -0700
> To: 
> Subject: replace_token code?
> 
> it looks that by specifying replace_token, the old owner is not removed from 
> gossip (which I had thought it would do).
> Then it's understandable that the old owner would resurface later and we get 
> some warning saying that the same token is owned by both.
> 
> 
> I ran an example with a 2-node cluster, with RF=2.  host 10.72.201.80 was 
> running for a while and had some data, then i shut it down, and 
> booted up 10.190.221.204 with replace_token of the old token owned by the 
> previous host.
> the following log sequence shows that the new host does acquire the token, 
> but it does not at the same time remove 80 forcefully from gossip.
> instead, a few seconds later, it believed that .80 became live again.
> I don't have much understanding of the Gossip protocol, but roughly know that 
> it's probability-based, looks we need an "assertive"/"NOW" 
> membership control message for replace_token.
> 
> 
> 
> 
> thanks
> yang
> 
> 
>  WARN [main] 2012-09-10 08:00:21,855 TokenMetadata.java (line 160) Token 
> 166594924822352415786406422619018814804 changing ownership from /10.72.201.80 
> to /10.190.221.204
>  INFO [main] 2012-09-10 08:00:21,855 StorageService.java (line 753) JOINING: 
> Starting to bootstrap...
>  INFO [CompactionExecutor:2] 2012-09-10 08:00:21,875 CompactionTask.java 
> (line 109) Compacting 
> [SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-1-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-3-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-4-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-2-Data.db')]
>  INFO [CompactionExecutor:2] 2012-09-10 08:00:21,979 CompactionTask.java 
> (line 221) Compacted to 
> [/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-5-Data.db,].  
> 499 to 394 (~78% of original) bytes for 3 keys at 0.003997MB/s.  Time: 94ms.
>  INFO [Thread-4] 2012-09-10 08:00:22,070 StreamInSession.java (line 214) 
> Finished streaming session 1 from /10.72.102.61
>  INFO [main] 2012-09-10 08:00:22,073 ColumnFamilyStore.java (line 643) 
> Enqueuing flush of Memtable-LocationInfo@30624226(77/96 serialized/live 
> bytes, 2 ops)
>  INFO [FlushWriter:2] 2012-09-10 08:00:22,074 Memtable.java (line 266) 
> Writing Memtable-LocationInfo@30624226(77/96 serialized/live bytes, 2 ops)
>  INFO [F

Re: replace_token code?

2012-09-10 Thread Yang
Thanks Jim, looks I'll have to read into the code to understand what is
happening under the hood

yang

On Mon, Sep 10, 2012 at 9:45 AM, Jim Cistaro  wrote:

>  We have seen various issues from these replaced nodes hanging around.
>  For clusters where a lot of nodes have been replaced, we see these
> replaced nodes having an impact on heap/GC and a lot of tcp
> timeouts/retransmits (because the old nodes no longer exist).  As a result,
> we have begun cleaning these up using unsafeAssassinateEndpoint via jmx.
>  We have only started using recently.  So far no bad side effects.  This
> also helps because those replaced nodes can appear as "unreachable nodes"
> wrt schema and sometimes prevent things like CF truncation.
>
>  Using unsafeAssassinateEndpoint will clean these from unreachable nodes
> and will mark them as LEFT in gossip info.  There is a ttl for them in
> gossipinfo and they should go away after 3 days.  Once they are marked
> LEFT, you should stop seeing those up/same/dead messages.
>
>  unsafeAssassinateEndpoint is "unsafe" in that, if you specify IP of a
> real node in cluster, that node will be assassinated.  Otherwise, if you
> specify nodes that have been replaced, it is supposed to work correctly.
>
>  Hope this helps,
> jc
>
>
>   From: Yang 
> Reply-To: 
> Date: Mon, 10 Sep 2012 01:10:56 -0700
> To: 
> Subject: replace_token code?
>
>  it looks that by specifying replace_token, the old owner is not removed
> from gossip (which I had thought it would do).
> Then it's understandable that the old owner would resurface later and we
> get some warning saying that the same token is owned by both.
>
>
>  I ran an example with a 2-node cluster, with RF=2.  host 10.72.201.80
> was running for a while and had some data, then i shut it down, and
> booted up 10.190.221.204 with replace_token of the old token owned by the
> previous host.
> the following log sequence shows that the new host does acquire the token,
> but it does not at the same time remove 80 forcefully from gossip.
> instead, a few seconds later, it believed that .80 became live again.
> I don't have much understanding of the Gossip protocol, but roughly know
> that it's probability-based, looks we need an "assertive"/"NOW"
> membership control message for replace_token.
>
>
>
>
>  thanks
> yang
>
>
>   WARN [main] 2012-09-10 08:00:21,855 TokenMetadata.java (line 160) Token
> 166594924822352415786406422619018814804 changing ownership from /
> 10.72.201.80 to /10.190.221.204
>  INFO [main] 2012-09-10 08:00:21,855 StorageService.java (line 753)
> JOINING: Starting to bootstrap...
>  INFO [CompactionExecutor:2] 2012-09-10 08:00:21,875 CompactionTask.java
> (line 109) Compacting
> [SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-1-Data.db'),
> SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-3-Data.db'),
> SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-4-Data.db'),
> SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-2-Data.db')]
>  INFO [CompactionExecutor:2] 2012-09-10 08:00:21,979 CompactionTask.java
> (line 221) Compacted to
> [/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-5-Data.db,].
>  499 to 394 (~78% of original) bytes for 3 keys at 0.003997MB/s.  Time:
> 94ms.
>  INFO [Thread-4] 2012-09-10 08:00:22,070 StreamInSession.java (line 214)
> Finished streaming session 1 from /10.72.102.61
>  INFO [main] 2012-09-10 08:00:22,073 ColumnFamilyStore.java (line 643)
> Enqueuing flush of Memtable-LocationInfo@30624226(77/96 serialized/live
> bytes, 2 ops)
>  INFO [FlushWriter:2] 2012-09-10 08:00:22,074 Memtable.java (line 266)
> Writing Memtable-LocationInfo@30624226(77/96 serialized/live bytes, 2 ops)
>  INFO [FlushWriter:2] 2012-09-10 08:00:22,082 Memtable.java (line 307)
> Completed flushing
> /mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-6-Data.db
> (163 bytes) for commitlog position ReplayPosition(segmentId=6360937126265,
> position=3854)
>  INFO [main] 2012-09-10 08:00:22,087 StorageService.java (line 1098) Node /
> 10.190.221.204 state jump to normal
>  INFO [main] 2012-09-10 08:00:22,088 StorageService.java (line 666)
> Bootstrap/Replace/Move completed! Now serving reads.
>  INFO [main] 2012-09-10 08:00:22,089 Mx4jTool.java (line 72) Will not load
> MX4J, mx4j-tools.jar is not in the classpath
>  INFO [main] 2012-09-10 08:00:22,119 CassandraDaemon.java (line 124)
> Binding thrift service to /10.190.221.204:9160
>  INFO [main] 2012-09-10 08:00:22,122 CassandraDaemon.java (line 133) Using
> TFastFramedTransport with a max frame size of 15728640 bytes.
>  INFO [main] 2012-09-10 08:00:22,125 CassandraDaemon.java (line 160) Using
> synchronous/threadpool thrift server on /10.190.221.204 : 9160
>  INFO [Thread-5] 2012-09-10 08:00:22,125 CassandraDaemon.java (line 212)
> Listening for thrift clients...
>  INFO [RMI TCP Connection(4)-127.0.0.1] 2012-09-10 08:00:59,014
>

Re: replace_token code?

2012-09-10 Thread Jim Cistaro
We have seen various issues from these replaced nodes hanging around.  For 
clusters where a lot of nodes have been replaced, we see these replaced nodes 
having an impact on heap/GC and a lot of tcp timeouts/retransmits (because the 
old nodes no longer exist).  As a result, we have begun cleaning these up using 
unsafeAssassinateEndpoint via jmx.  We have only started using recently.  So 
far no bad side effects.  This also helps because those replaced nodes can 
appear as "unreachable nodes" wrt schema and sometimes prevent things like CF 
truncation.

Using unsafeAssassinateEndpoint will clean these from unreachable nodes and 
will mark them as LEFT in gossip info.  There is a ttl for them in gossipinfo 
and they should go away after 3 days.  Once they are marked LEFT, you should 
stop seeing those up/same/dead messages.

unsafeAssassinateEndpoint is "unsafe" in that, if you specify IP of a real node 
in cluster, that node will be assassinated.  Otherwise, if you specify nodes 
that have been replaced, it is supposed to work correctly.

Hope this helps,
jc


From: Yang mailto:tedd...@gmail.com>>
Reply-To: mailto:user@cassandra.apache.org>>
Date: Mon, 10 Sep 2012 01:10:56 -0700
To: mailto:user@cassandra.apache.org>>
Subject: replace_token code?

it looks that by specifying replace_token, the old owner is not removed from 
gossip (which I had thought it would do).
Then it's understandable that the old owner would resurface later and we get 
some warning saying that the same token is owned by both.


I ran an example with a 2-node cluster, with RF=2.  host 10.72.201.80 was 
running for a while and had some data, then i shut it down, and
booted up 10.190.221.204 with replace_token of the old token owned by the 
previous host.
the following log sequence shows that the new host does acquire the token, but 
it does not at the same time remove 80 forcefully from gossip.
instead, a few seconds later, it believed that .80 became live again.
I don't have much understanding of the Gossip protocol, but roughly know that 
it's probability-based, looks we need an "assertive"/"NOW"
membership control message for replace_token.




thanks
yang


 WARN [main] 2012-09-10 08:00:21,855 TokenMetadata.java (line 160) Token 
166594924822352415786406422619018814804 changing ownership from 
/10.72.201.80 to /10.190.221.204
 INFO [main] 2012-09-10 08:00:21,855 StorageService.java (line 753) JOINING: 
Starting to bootstrap...
 INFO [CompactionExecutor:2] 2012-09-10 08:00:21,875 CompactionTask.java (line 
109) Compacting 
[SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-1-Data.db'),
 
SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-3-Data.db'),
 
SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-4-Data.db'),
 
SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-2-Data.db')]
 INFO [CompactionExecutor:2] 2012-09-10 08:00:21,979 CompactionTask.java (line 
221) Compacted to 
[/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-5-Data.db,].  
499 to 394 (~78% of original) bytes for 3 keys at 0.003997MB/s.  Time: 94ms.
 INFO [Thread-4] 2012-09-10 08:00:22,070 StreamInSession.java (line 214) 
Finished streaming session 1 from /10.72.102.61
 INFO [main] 2012-09-10 08:00:22,073 ColumnFamilyStore.java (line 643) 
Enqueuing flush of Memtable-LocationInfo@30624226(77/96 serialized/live bytes, 
2 ops)
 INFO [FlushWriter:2] 2012-09-10 08:00:22,074 Memtable.java (line 266) Writing 
Memtable-LocationInfo@30624226(77/96 serialized/live bytes, 2 ops)
 INFO [FlushWriter:2] 2012-09-10 08:00:22,082 Memtable.java (line 307) 
Completed flushing 
/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-6-Data.db (163 
bytes) for commitlog position ReplayPosition(segmentId=6360937126265, 
position=3854)
 INFO [main] 2012-09-10 08:00:22,087 StorageService.java (line 1098) Node 
/10.190.221.204 state jump to normal
 INFO [main] 2012-09-10 08:00:22,088 StorageService.java (line 666) 
Bootstrap/Replace/Move completed! Now serving reads.
 INFO [main] 2012-09-10 08:00:22,089 Mx4jTool.java (line 72) Will not load 
MX4J, mx4j-tools.jar is not in the classpath
 INFO [main] 2012-09-10 08:00:22,119 CassandraDaemon.java (line 124) Binding 
thrift service to /10.190.221.204:9160
 INFO [main] 2012-09-10 08:00:22,122 CassandraDaemon.java (line 133) Using 
TFastFramedTransport with a max frame size of 15728640 bytes.
 INFO [main] 2012-09-10 08:00:22,125 CassandraDaemon.java (line 160) Using 
synchronous/threadpool thrift server on /10.190.221.204 
: 9160
 INFO [Thread-5] 2012-09-10 08:00:22,125 CassandraDaemon.java (line 212) 
Listening for thrift clients...
 INFO [RMI TCP Connection(4)-127.0.0.1] 2012-09-10 08:00:59,014 
ColumnFamilyStore.java (line 643) Enqueuing flush of 
Memtabl