Re: nodetool cleanup isn't cleaning up?
getRangeToEndpointMap is very useful, thanks, I didn't know about it... however, I've reconfigured my cluster since (moved some nodes and tokens) so not the problem is gone. I guess I'll use getRangeToEndpointMap next time I see something like this... On Thu, Jun 3, 2010 at 9:15 AM, Jonathan Ellis wrote: > Then the next step is to check StorageService.getRangeToEndpointMap via jmx > > On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory wrote: > > I'm using RackAwareStrategy. But it still doesn't make sense I think... > > let's see what did I miss... > > According to http://wiki.apache.org/cassandra/Operations > > > > RackAwareStrategy: replica 2 is placed in the first node along the ring > the > > belongs in another data center than the first; the remaining N-2 > replicas, > > if any, are placed on the first nodes along the ring in the same rack as > the > > first > > > > 192.168.252.124Up803.33 MB > > 56713727820156410577229101238628035242 |<--| > > 192.168.252.99Up 352.85 MB > > 56713727820156410577229101238628035243 | ^ > > 192.168.252.125Up134.24 MB > > 85070591730234615865843651857942052863 v | > > 192.168.254.57Up 676.41 MB > > 113427455640312821154458202477256070485| ^ > > 192.168.254.58Up 99.74 MB > > 141784319550391026443072753096570088106v | > > 192.168.254.59Up 99.94 MB > > 170141183460469231731687303715884105727|-->| > > Alright, so I made a mistake and didn't use the alternate-datacenter > > suggestion on the page so the first node of every DC is overloaded with > > replicas. However, the current situation still doesn't make sense to me. > > .252.124 will be overloaded b/c it has the first token in the 252 dc. > > .254.57 will also be overloaded since it has the first token in the .254 > DC. > > But for which node does 252.99 serve as a replicator? It's not the first > in > > the DC and it's just one single token more than it's predecessor (which > is > > in the same DC). > > On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis > wrote: > >> > >> I'm saying that .99 is getting a copy of all the data for which .124 > >> is the primary. (If you are using RackUnawarePartitioner. If you are > >> using RackAware it is some other node.) > >> > >> On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory wrote: > >> > ok, let me try and translate your answer ;) > >> > Are you saying that the data that was left on the node is > >> > non-primary-replicas of rows from the time before the move? > >> > So this implies that when a node moves in the ring, it will affect > >> > distribution of: > >> > - new keys > >> > - old keys primary node > >> > -- but will not affect distribution of old keys non-primary replicas. > >> > If so, still I don't understand something... I would expect even the > >> > non-primary replicas of keys to be moved since if they don't, how > would > >> > they > >> > be found? I mean upon reads the serving node should not care about > >> > whether > >> > the row is new or old, it should have a consistent and global mapping > of > >> > tokens. So I guess this ruins my theory... > >> > What did you mean then? Is this deletions of non-primary replicated > >> > data? > >> > How does the replication factor affect the load on the moved host > then? > >> > > >> > On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis > >> > wrote: > >> >> > >> >> well, there you are then. > >> >> > >> >> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory > wrote: > >> >> > yes, replication factor = 2 > >> >> > > >> >> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis < > jbel...@gmail.com> > >> >> > wrote: > >> >> >> > >> >> >> you have replication factor > 1 ? > >> >> >> > >> >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory > >> >> >> wrote: > >> >> >> > I hope I understand nodetool cleanup correctly - it should clean > >> >> >> > up > >> >> >> > all > >> >> >> > data > >> >> >> > that does not (currently) belong to this node. If so, I think it > >> >> >> > might > >> >> >> > not > >> >> >> > be working correctly. > >> >> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below > >> >> >> > 192.168.252.99Up 279.35 MB > >> >> >> > 3544607988759775661076818827414252202 > >> >> >> > |<--| > >> >> >> > 192.168.252.124Up 167.23 MB > >> >> >> > 56713727820156410577229101238628035242 | ^ > >> >> >> > 192.168.252.125Up 82.91 MB > >> >> >> > 85070591730234615865843651857942052863 v | > >> >> >> > 192.168.254.57Up 366.6 MB > >> >> >> > 113427455640312821154458202477256070485| ^ > >> >> >> > 192.168.254.58Up 88.44 MB > >> >> >> > 141784319550391026443072753096570088106v | > >> >> >> > 192.168.254.59Up 88.45 MB > >> >> >> > 170141183460469231731687303715884105727|-->| > >> >> >> > I wanted 124 to take all the load from 99. So I issued a move > >> >> >> > command. > >> >> >> > $ nodetool -h cass99 -p 9004 move > >> >> >> > 56713727820156410577229101238628035243 > >> >> >> > > >>
Re: nodetool cleanup isn't cleaning up?
Then the next step is to check StorageService.getRangeToEndpointMap via jmx On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory wrote: > I'm using RackAwareStrategy. But it still doesn't make sense I think... > let's see what did I miss... > According to http://wiki.apache.org/cassandra/Operations > > RackAwareStrategy: replica 2 is placed in the first node along the ring the > belongs in another data center than the first; the remaining N-2 replicas, > if any, are placed on the first nodes along the ring in the same rack as the > first > > 192.168.252.124Up 803.33 MB > 56713727820156410577229101238628035242 |<--| > 192.168.252.99Up 352.85 MB > 56713727820156410577229101238628035243 | ^ > 192.168.252.125Up 134.24 MB > 85070591730234615865843651857942052863 v | > 192.168.254.57Up 676.41 MB > 113427455640312821154458202477256070485 | ^ > 192.168.254.58Up 99.74 MB > 141784319550391026443072753096570088106 v | > 192.168.254.59Up 99.94 MB > 170141183460469231731687303715884105727 |-->| > Alright, so I made a mistake and didn't use the alternate-datacenter > suggestion on the page so the first node of every DC is overloaded with > replicas. However, the current situation still doesn't make sense to me. > .252.124 will be overloaded b/c it has the first token in the 252 dc. > .254.57 will also be overloaded since it has the first token in the .254 DC. > But for which node does 252.99 serve as a replicator? It's not the first in > the DC and it's just one single token more than it's predecessor (which is > in the same DC). > On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis wrote: >> >> I'm saying that .99 is getting a copy of all the data for which .124 >> is the primary. (If you are using RackUnawarePartitioner. If you are >> using RackAware it is some other node.) >> >> On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory wrote: >> > ok, let me try and translate your answer ;) >> > Are you saying that the data that was left on the node is >> > non-primary-replicas of rows from the time before the move? >> > So this implies that when a node moves in the ring, it will affect >> > distribution of: >> > - new keys >> > - old keys primary node >> > -- but will not affect distribution of old keys non-primary replicas. >> > If so, still I don't understand something... I would expect even the >> > non-primary replicas of keys to be moved since if they don't, how would >> > they >> > be found? I mean upon reads the serving node should not care about >> > whether >> > the row is new or old, it should have a consistent and global mapping of >> > tokens. So I guess this ruins my theory... >> > What did you mean then? Is this deletions of non-primary replicated >> > data? >> > How does the replication factor affect the load on the moved host then? >> > >> > On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis >> > wrote: >> >> >> >> well, there you are then. >> >> >> >> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory wrote: >> >> > yes, replication factor = 2 >> >> > >> >> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis >> >> > wrote: >> >> >> >> >> >> you have replication factor > 1 ? >> >> >> >> >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory >> >> >> wrote: >> >> >> > I hope I understand nodetool cleanup correctly - it should clean >> >> >> > up >> >> >> > all >> >> >> > data >> >> >> > that does not (currently) belong to this node. If so, I think it >> >> >> > might >> >> >> > not >> >> >> > be working correctly. >> >> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below >> >> >> > 192.168.252.99Up 279.35 MB >> >> >> > 3544607988759775661076818827414252202 >> >> >> > |<--| >> >> >> > 192.168.252.124Up 167.23 MB >> >> >> > 56713727820156410577229101238628035242 | ^ >> >> >> > 192.168.252.125Up 82.91 MB >> >> >> > 85070591730234615865843651857942052863 v | >> >> >> > 192.168.254.57Up 366.6 MB >> >> >> > 113427455640312821154458202477256070485 | ^ >> >> >> > 192.168.254.58Up 88.44 MB >> >> >> > 141784319550391026443072753096570088106 v | >> >> >> > 192.168.254.59Up 88.45 MB >> >> >> > 170141183460469231731687303715884105727 |-->| >> >> >> > I wanted 124 to take all the load from 99. So I issued a move >> >> >> > command. >> >> >> > $ nodetool -h cass99 -p 9004 move >> >> >> > 56713727820156410577229101238628035243 >> >> >> > >> >> >> > This command tells 99 to take the space b/w >> >> >> > >> >> >> > >> >> >> > >> >> >> > (56713727820156410577229101238628035242, 56713727820156410577229101238628035243] >> >> >> > which is basically just one item in the token space, almost >> >> >> > nothing... I >> >> >> > wanted it to be very slim (just playing around). >> >> >> > So, next I get this: >> >> >> > 192.168.252.124Up 803.33 MB >> >> >> > 56713727820156410577229101238628035242 |<--| >> >> >> > 192.168.252.99Up 352.85 MB >> >> >> > 56713727820156410577229101
Re: nodetool cleanup isn't cleaning up?
I'm using RackAwareStrategy. But it still doesn't make sense I think... let's see what did I miss... According to http://wiki.apache.org/cassandra/Operations - RackAwareStrategy: replica 2 is placed in the first node along the ring the belongs in *another* data center than the first; the remaining N-2 replicas, if any, are placed on the first nodes along the ring in the * same* rack as the first 192.168.252.124Up803.33 MB 56713727820156410577229101238628035242 |<--| 192.168.252.99Up 352.85 MB 56713727820156410577229101238628035243 | ^ 192.168.252.125Up134.24 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 676.41 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 99.74 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 99.94 MB 170141183460469231731687303715884105727|-->| Alright, so I made a mistake and didn't use the alternate-datacenter suggestion on the page so the first node of every DC is overloaded with replicas. However, the current situation still doesn't make sense to me. .252.124 will be overloaded b/c it has the first token in the 252 dc. .254.57 will also be overloaded since it has the first token in the .254 DC. But for which node does 252.99 serve as a replicator? It's not the first in the DC and it's just one single token more than it's predecessor (which is in the same DC). On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis wrote: > I'm saying that .99 is getting a copy of all the data for which .124 > is the primary. (If you are using RackUnawarePartitioner. If you are > using RackAware it is some other node.) > > On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory wrote: > > ok, let me try and translate your answer ;) > > Are you saying that the data that was left on the node is > > non-primary-replicas of rows from the time before the move? > > So this implies that when a node moves in the ring, it will affect > > distribution of: > > - new keys > > - old keys primary node > > -- but will not affect distribution of old keys non-primary replicas. > > If so, still I don't understand something... I would expect even the > > non-primary replicas of keys to be moved since if they don't, how would > they > > be found? I mean upon reads the serving node should not care about > whether > > the row is new or old, it should have a consistent and global mapping of > > tokens. So I guess this ruins my theory... > > What did you mean then? Is this deletions of non-primary replicated data? > > How does the replication factor affect the load on the moved host then? > > > > On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis > wrote: > >> > >> well, there you are then. > >> > >> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory wrote: > >> > yes, replication factor = 2 > >> > > >> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis > >> > wrote: > >> >> > >> >> you have replication factor > 1 ? > >> >> > >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory > wrote: > >> >> > I hope I understand nodetool cleanup correctly - it should clean up > >> >> > all > >> >> > data > >> >> > that does not (currently) belong to this node. If so, I think it > >> >> > might > >> >> > not > >> >> > be working correctly. > >> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below > >> >> > 192.168.252.99Up 279.35 MB > >> >> > 3544607988759775661076818827414252202 > >> >> > |<--| > >> >> > 192.168.252.124Up 167.23 MB > >> >> > 56713727820156410577229101238628035242 | ^ > >> >> > 192.168.252.125Up 82.91 MB > >> >> > 85070591730234615865843651857942052863 v | > >> >> > 192.168.254.57Up 366.6 MB > >> >> > 113427455640312821154458202477256070485| ^ > >> >> > 192.168.254.58Up 88.44 MB > >> >> > 141784319550391026443072753096570088106v | > >> >> > 192.168.254.59Up 88.45 MB > >> >> > 170141183460469231731687303715884105727|-->| > >> >> > I wanted 124 to take all the load from 99. So I issued a move > >> >> > command. > >> >> > $ nodetool -h cass99 -p 9004 move > >> >> > 56713727820156410577229101238628035243 > >> >> > > >> >> > This command tells 99 to take the space b/w > >> >> > > >> >> > > >> >> > > (56713727820156410577229101238628035242, > 56713727820156410577229101238628035243] > >> >> > which is basically just one item in the token space, almost > >> >> > nothing... I > >> >> > wanted it to be very slim (just playing around). > >> >> > So, next I get this: > >> >> > 192.168.252.124Up 803.33 MB > >> >> > 56713727820156410577229101238628035242 |<--| > >> >> > 192.168.252.99Up 352.85 MB > >> >> > 56713727820156410577229101238628035243 | ^ > >> >> > 192.168.252.125Up 134.24 MB > >> >> > 85070591730234615865843651857942052863 v | > >> >> > 192.168.254.57Up 676.41 MB > >> >> > 113427455640312821154458202477256070485| ^ > >> >> > 192.168.254.58Up 99.7
Re: nodetool cleanup isn't cleaning up?
I'm saying that .99 is getting a copy of all the data for which .124 is the primary. (If you are using RackUnawarePartitioner. If you are using RackAware it is some other node.) On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory wrote: > ok, let me try and translate your answer ;) > Are you saying that the data that was left on the node is > non-primary-replicas of rows from the time before the move? > So this implies that when a node moves in the ring, it will affect > distribution of: > - new keys > - old keys primary node > -- but will not affect distribution of old keys non-primary replicas. > If so, still I don't understand something... I would expect even the > non-primary replicas of keys to be moved since if they don't, how would they > be found? I mean upon reads the serving node should not care about whether > the row is new or old, it should have a consistent and global mapping of > tokens. So I guess this ruins my theory... > What did you mean then? Is this deletions of non-primary replicated data? > How does the replication factor affect the load on the moved host then? > > On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis wrote: >> >> well, there you are then. >> >> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory wrote: >> > yes, replication factor = 2 >> > >> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis >> > wrote: >> >> >> >> you have replication factor > 1 ? >> >> >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory wrote: >> >> > I hope I understand nodetool cleanup correctly - it should clean up >> >> > all >> >> > data >> >> > that does not (currently) belong to this node. If so, I think it >> >> > might >> >> > not >> >> > be working correctly. >> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below >> >> > 192.168.252.99Up 279.35 MB >> >> > 3544607988759775661076818827414252202 >> >> > |<--| >> >> > 192.168.252.124Up 167.23 MB >> >> > 56713727820156410577229101238628035242 | ^ >> >> > 192.168.252.125Up 82.91 MB >> >> > 85070591730234615865843651857942052863 v | >> >> > 192.168.254.57Up 366.6 MB >> >> > 113427455640312821154458202477256070485 | ^ >> >> > 192.168.254.58Up 88.44 MB >> >> > 141784319550391026443072753096570088106 v | >> >> > 192.168.254.59Up 88.45 MB >> >> > 170141183460469231731687303715884105727 |-->| >> >> > I wanted 124 to take all the load from 99. So I issued a move >> >> > command. >> >> > $ nodetool -h cass99 -p 9004 move >> >> > 56713727820156410577229101238628035243 >> >> > >> >> > This command tells 99 to take the space b/w >> >> > >> >> > >> >> > (56713727820156410577229101238628035242, 56713727820156410577229101238628035243] >> >> > which is basically just one item in the token space, almost >> >> > nothing... I >> >> > wanted it to be very slim (just playing around). >> >> > So, next I get this: >> >> > 192.168.252.124Up 803.33 MB >> >> > 56713727820156410577229101238628035242 |<--| >> >> > 192.168.252.99Up 352.85 MB >> >> > 56713727820156410577229101238628035243 | ^ >> >> > 192.168.252.125Up 134.24 MB >> >> > 85070591730234615865843651857942052863 v | >> >> > 192.168.254.57Up 676.41 MB >> >> > 113427455640312821154458202477256070485 | ^ >> >> > 192.168.254.58Up 99.74 MB >> >> > 141784319550391026443072753096570088106 v | >> >> > 192.168.254.59Up 99.94 MB >> >> > 170141183460469231731687303715884105727 |-->| >> >> > The tokens are correct, but it seems that 99 still has a lot of data. >> >> > Why? >> >> > OK, that might be b/c it didn't delete its moved data. >> >> > So next I issued a nodetool cleanup, which should have taken care of >> >> > that. >> >> > Only that it didn't, the node 99 still has 352 MB of data. Why? >> >> > So, you know what, I waited for 1h. Still no good, data wasn't >> >> > cleaned >> >> > up. >> >> > I restarted the server. Still, data wasn't cleaned up... I issued a >> >> > cleanup >> >> > again... still no good... what's up with this node? >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> Jonathan Ellis >> >> Project Chair, Apache Cassandra >> >> co-founder of Riptano, the source for professional Cassandra support >> >> http://riptano.com >> > >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: nodetool cleanup isn't cleaning up?
ok, let me try and translate your answer ;) Are you saying that the data that was left on the node is non-primary-replicas of rows from the time before the move? So this implies that when a node moves in the ring, it will affect distribution of: - new keys - old keys primary node -- but will not affect distribution of old keys non-primary replicas. If so, still I don't understand something... I would expect even the non-primary replicas of keys to be moved since if they don't, how would they be found? I mean upon reads the serving node should not care about whether the row is new or old, it should have a consistent and global mapping of tokens. So I guess this ruins my theory... What did you mean then? Is this deletions of non-primary replicated data? How does the replication factor affect the load on the moved host then? On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis wrote: > well, there you are then. > > On Mon, May 31, 2010 at 2:34 PM, Ran Tavory wrote: > > yes, replication factor = 2 > > > > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis > wrote: > >> > >> you have replication factor > 1 ? > >> > >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory wrote: > >> > I hope I understand nodetool cleanup correctly - it should clean up > all > >> > data > >> > that does not (currently) belong to this node. If so, I think it might > >> > not > >> > be working correctly. > >> > Look at nodes 192.168.252.124 and 192.168.252.99 below > >> > 192.168.252.99Up 279.35 MB > >> > 3544607988759775661076818827414252202 > >> > |<--| > >> > 192.168.252.124Up 167.23 MB > >> > 56713727820156410577229101238628035242 | ^ > >> > 192.168.252.125Up 82.91 MB > >> > 85070591730234615865843651857942052863 v | > >> > 192.168.254.57Up 366.6 MB > >> > 113427455640312821154458202477256070485| ^ > >> > 192.168.254.58Up 88.44 MB > >> > 141784319550391026443072753096570088106v | > >> > 192.168.254.59Up 88.45 MB > >> > 170141183460469231731687303715884105727|-->| > >> > I wanted 124 to take all the load from 99. So I issued a move command. > >> > $ nodetool -h cass99 -p 9004 move > 56713727820156410577229101238628035243 > >> > > >> > This command tells 99 to take the space b/w > >> > > >> > > (56713727820156410577229101238628035242, > 56713727820156410577229101238628035243] > >> > which is basically just one item in the token space, almost nothing... > I > >> > wanted it to be very slim (just playing around). > >> > So, next I get this: > >> > 192.168.252.124Up 803.33 MB > >> > 56713727820156410577229101238628035242 |<--| > >> > 192.168.252.99Up 352.85 MB > >> > 56713727820156410577229101238628035243 | ^ > >> > 192.168.252.125Up 134.24 MB > >> > 85070591730234615865843651857942052863 v | > >> > 192.168.254.57Up 676.41 MB > >> > 113427455640312821154458202477256070485| ^ > >> > 192.168.254.58Up 99.74 MB > >> > 141784319550391026443072753096570088106v | > >> > 192.168.254.59Up 99.94 MB > >> > 170141183460469231731687303715884105727|-->| > >> > The tokens are correct, but it seems that 99 still has a lot of data. > >> > Why? > >> > OK, that might be b/c it didn't delete its moved data. > >> > So next I issued a nodetool cleanup, which should have taken care of > >> > that. > >> > Only that it didn't, the node 99 still has 352 MB of data. Why? > >> > So, you know what, I waited for 1h. Still no good, data wasn't cleaned > >> > up. > >> > I restarted the server. Still, data wasn't cleaned up... I issued a > >> > cleanup > >> > again... still no good... what's up with this node? > >> > > >> > > >> > >> > >> > >> -- > >> Jonathan Ellis > >> Project Chair, Apache Cassandra > >> co-founder of Riptano, the source for professional Cassandra support > >> http://riptano.com > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
Re: nodetool cleanup isn't cleaning up?
well, there you are then. On Mon, May 31, 2010 at 2:34 PM, Ran Tavory wrote: > yes, replication factor = 2 > > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis wrote: >> >> you have replication factor > 1 ? >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory wrote: >> > I hope I understand nodetool cleanup correctly - it should clean up all >> > data >> > that does not (currently) belong to this node. If so, I think it might >> > not >> > be working correctly. >> > Look at nodes 192.168.252.124 and 192.168.252.99 below >> > 192.168.252.99Up 279.35 MB >> > 3544607988759775661076818827414252202 >> > |<--| >> > 192.168.252.124Up 167.23 MB >> > 56713727820156410577229101238628035242 | ^ >> > 192.168.252.125Up 82.91 MB >> > 85070591730234615865843651857942052863 v | >> > 192.168.254.57Up 366.6 MB >> > 113427455640312821154458202477256070485 | ^ >> > 192.168.254.58Up 88.44 MB >> > 141784319550391026443072753096570088106 v | >> > 192.168.254.59Up 88.45 MB >> > 170141183460469231731687303715884105727 |-->| >> > I wanted 124 to take all the load from 99. So I issued a move command. >> > $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243 >> > >> > This command tells 99 to take the space b/w >> > >> > (56713727820156410577229101238628035242, 56713727820156410577229101238628035243] >> > which is basically just one item in the token space, almost nothing... I >> > wanted it to be very slim (just playing around). >> > So, next I get this: >> > 192.168.252.124Up 803.33 MB >> > 56713727820156410577229101238628035242 |<--| >> > 192.168.252.99Up 352.85 MB >> > 56713727820156410577229101238628035243 | ^ >> > 192.168.252.125Up 134.24 MB >> > 85070591730234615865843651857942052863 v | >> > 192.168.254.57Up 676.41 MB >> > 113427455640312821154458202477256070485 | ^ >> > 192.168.254.58Up 99.74 MB >> > 141784319550391026443072753096570088106 v | >> > 192.168.254.59Up 99.94 MB >> > 170141183460469231731687303715884105727 |-->| >> > The tokens are correct, but it seems that 99 still has a lot of data. >> > Why? >> > OK, that might be b/c it didn't delete its moved data. >> > So next I issued a nodetool cleanup, which should have taken care of >> > that. >> > Only that it didn't, the node 99 still has 352 MB of data. Why? >> > So, you know what, I waited for 1h. Still no good, data wasn't cleaned >> > up. >> > I restarted the server. Still, data wasn't cleaned up... I issued a >> > cleanup >> > again... still no good... what's up with this node? >> > >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: nodetool cleanup isn't cleaning up?
yes, replication factor = 2 On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis wrote: > you have replication factor > 1 ? > > On Mon, May 31, 2010 at 7:23 AM, Ran Tavory wrote: > > I hope I understand nodetool cleanup correctly - it should clean up all > data > > that does not (currently) belong to this node. If so, I think it might > not > > be working correctly. > > Look at nodes 192.168.252.124 and 192.168.252.99 below > > 192.168.252.99Up 279.35 MB > 3544607988759775661076818827414252202 > > |<--| > > 192.168.252.124Up 167.23 MB > > 56713727820156410577229101238628035242 | ^ > > 192.168.252.125Up 82.91 MB > > 85070591730234615865843651857942052863 v | > > 192.168.254.57Up 366.6 MB > > 113427455640312821154458202477256070485| ^ > > 192.168.254.58Up 88.44 MB > > 141784319550391026443072753096570088106v | > > 192.168.254.59Up 88.45 MB > > 170141183460469231731687303715884105727|-->| > > I wanted 124 to take all the load from 99. So I issued a move command. > > $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243 > > > > This command tells 99 to take the space b/w > > > (56713727820156410577229101238628035242, > 56713727820156410577229101238628035243] > > which is basically just one item in the token space, almost nothing... I > > wanted it to be very slim (just playing around). > > So, next I get this: > > 192.168.252.124Up 803.33 MB > > 56713727820156410577229101238628035242 |<--| > > 192.168.252.99Up 352.85 MB > > 56713727820156410577229101238628035243 | ^ > > 192.168.252.125Up 134.24 MB > > 85070591730234615865843651857942052863 v | > > 192.168.254.57Up 676.41 MB > > 113427455640312821154458202477256070485| ^ > > 192.168.254.58Up 99.74 MB > > 141784319550391026443072753096570088106v | > > 192.168.254.59Up 99.94 MB > > 170141183460469231731687303715884105727|-->| > > The tokens are correct, but it seems that 99 still has a lot of data. > Why? > > OK, that might be b/c it didn't delete its moved data. > > So next I issued a nodetool cleanup, which should have taken care of > that. > > Only that it didn't, the node 99 still has 352 MB of data. Why? > > So, you know what, I waited for 1h. Still no good, data wasn't cleaned > up. > > I restarted the server. Still, data wasn't cleaned up... I issued a > cleanup > > again... still no good... what's up with this node? > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
Re: nodetool cleanup isn't cleaning up?
you have replication factor > 1 ? On Mon, May 31, 2010 at 7:23 AM, Ran Tavory wrote: > I hope I understand nodetool cleanup correctly - it should clean up all data > that does not (currently) belong to this node. If so, I think it might not > be working correctly. > Look at nodes 192.168.252.124 and 192.168.252.99 below > 192.168.252.99Up 279.35 MB 3544607988759775661076818827414252202 > |<--| > 192.168.252.124Up 167.23 MB > 56713727820156410577229101238628035242 | ^ > 192.168.252.125Up 82.91 MB > 85070591730234615865843651857942052863 v | > 192.168.254.57Up 366.6 MB > 113427455640312821154458202477256070485 | ^ > 192.168.254.58Up 88.44 MB > 141784319550391026443072753096570088106 v | > 192.168.254.59Up 88.45 MB > 170141183460469231731687303715884105727 |-->| > I wanted 124 to take all the load from 99. So I issued a move command. > $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243 > > This command tells 99 to take the space b/w > (56713727820156410577229101238628035242, 56713727820156410577229101238628035243] > which is basically just one item in the token space, almost nothing... I > wanted it to be very slim (just playing around). > So, next I get this: > 192.168.252.124Up 803.33 MB > 56713727820156410577229101238628035242 |<--| > 192.168.252.99Up 352.85 MB > 56713727820156410577229101238628035243 | ^ > 192.168.252.125Up 134.24 MB > 85070591730234615865843651857942052863 v | > 192.168.254.57Up 676.41 MB > 113427455640312821154458202477256070485 | ^ > 192.168.254.58Up 99.74 MB > 141784319550391026443072753096570088106 v | > 192.168.254.59Up 99.94 MB > 170141183460469231731687303715884105727 |-->| > The tokens are correct, but it seems that 99 still has a lot of data. Why? > OK, that might be b/c it didn't delete its moved data. > So next I issued a nodetool cleanup, which should have taken care of that. > Only that it didn't, the node 99 still has 352 MB of data. Why? > So, you know what, I waited for 1h. Still no good, data wasn't cleaned up. > I restarted the server. Still, data wasn't cleaned up... I issued a cleanup > again... still no good... what's up with this node? > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: nodetool cleanup isn't cleaning up?
Hello! I think (but not sure, please correct me if required), that after you change token, nodes just receive new data, but don't immediate deletes old one. It seems like "clean" will mark them as tombstone and it will be deleted when you run "compact" after GCGraceSeconds seconds. On 31.05.2010 17:00, Ran Tavory wrote: Do you think it's the tombstones that take up the disk space? Shouldn't the tombstones be moved along with the data? On Mon, May 31, 2010 at 3:29 PM, Maxim Kramarenko mailto:maxi...@trackstudio.com>> wrote: Hello! You likely need wait for GCGraceSeconds seconds or modify this param. http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html === Thus, a delete operation can't just wipe out all traces of the data being removed immediately: if we did, and a replica did not receive the delete operation, when it becomes available again it will treat the replicas that did receive the delete as having missed a write update, and repair them! So, instead of wiping out data on delete, Cassandra replaces it with a special value called a tombstone. The tombstone can then be propagated to replicas that missed the initial remove request. ... Here, we defined a constant, GCGraceSeconds, and had each node track tombstone age locally. Once it has aged past the constant, it can be GC'd. ===
Re: nodetool cleanup isn't cleaning up?
Do you think it's the tombstones that take up the disk space? Shouldn't the tombstones be moved along with the data? On Mon, May 31, 2010 at 3:29 PM, Maxim Kramarenko wrote: > Hello! > > You likely need wait for GCGraceSeconds seconds or modify this param. > > http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html > === > Thus, a delete operation can't just wipe out all traces of the data being > removed immediately: if we did, and a replica did not receive the delete > operation, when it becomes available again it will treat the replicas that > did receive the delete as having missed a write update, and repair them! So, > instead of wiping out data on delete, Cassandra replaces it with a special > value called a tombstone. The tombstone can then be propagated to replicas > that missed the initial remove request. > ... > Here, we defined a constant, GCGraceSeconds, and had each node track > tombstone age locally. Once it has aged past the constant, it can be GC'd. > === > > > > On 31.05.2010 16:23, Ran Tavory wrote: > >> I hope I understand nodetool cleanup correctly - it should clean up all >> data that does not (currently) belong to this node. If so, I think it >> might not be working correctly. >> >> Look at nodes 192.168.252.124 and 192.168.252.99 below >> >> 192.168.252.99Up 279.35 MB >> 3544607988759775661076818827414252202 |<--| >> 192.168.252.124Up 167.23 MB >> 56713727820156410577229101238628035242 | ^ >> 192.168.252.125Up 82.91 MB >> 85070591730234615865843651857942052863 v | >> 192.168.254.57Up 366.6 MB >> 113427455640312821154458202477256070485| ^ >> 192.168.254.58Up 88.44 MB >> 141784319550391026443072753096570088106v | >> 192.168.254.59Up 88.45 MB >> 170141183460469231731687303715884105727|-->| >> >> I wanted 124 to take all the load from 99. So I issued a move command. >> >> $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243 >> >> This command tells 99 to take the space b/w >> (56713727820156410577229101238628035242, >> 56713727820156410577229101238628035243] >> which is basically just one item in the token space, almost nothing... I >> wanted it to be very slim (just playing around). >> >> So, next I get this: >> >> 192.168.252.124Up 803.33 MB >> 56713727820156410577229101238628035242 |<--| >> 192.168.252.99Up 352.85 MB >> 56713727820156410577229101238628035243 | ^ >> 192.168.252.125Up 134.24 MB >> 85070591730234615865843651857942052863 v | >> 192.168.254.57Up 676.41 MB >> 113427455640312821154458202477256070485| ^ >> 192.168.254.58Up 99.74 MB >> 141784319550391026443072753096570088106v | >> 192.168.254.59Up 99.94 MB >> 170141183460469231731687303715884105727|-->| >> >> The tokens are correct, but it seems that 99 still has a lot of data. >> Why? OK, that might be b/c it didn't delete its moved data. >> So next I issued a nodetool cleanup, which should have taken care of >> that. Only that it didn't, the node 99 still has 352 MB of data. Why? >> So, you know what, I waited for 1h. Still no good, data wasn't cleaned up. >> I restarted the server. Still, data wasn't cleaned up... I issued a >> cleanup again... still no good... what's up with this node? >> >> >> > -- > Best regards, > Maximmailto:maxi...@trackstudio.com > > LinkedIn Profile: http://www.linkedin.com/in/maximkr > Google Talk/Jabber: maxi...@gmail.com > ICQ number: 307863079 > Skype Chat: maxim.kramarenko > Yahoo! Messenger: maxim_kramarenko >
Re: nodetool cleanup isn't cleaning up?
Hello! You likely need wait for GCGraceSeconds seconds or modify this param. http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html === Thus, a delete operation can't just wipe out all traces of the data being removed immediately: if we did, and a replica did not receive the delete operation, when it becomes available again it will treat the replicas that did receive the delete as having missed a write update, and repair them! So, instead of wiping out data on delete, Cassandra replaces it with a special value called a tombstone. The tombstone can then be propagated to replicas that missed the initial remove request. ... Here, we defined a constant, GCGraceSeconds, and had each node track tombstone age locally. Once it has aged past the constant, it can be GC'd. === On 31.05.2010 16:23, Ran Tavory wrote: I hope I understand nodetool cleanup correctly - it should clean up all data that does not (currently) belong to this node. If so, I think it might not be working correctly. Look at nodes 192.168.252.124 and 192.168.252.99 below 192.168.252.99Up 279.35 MB 3544607988759775661076818827414252202 |<--| 192.168.252.124Up 167.23 MB 56713727820156410577229101238628035242 | ^ 192.168.252.125Up 82.91 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 366.6 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 88.44 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 88.45 MB 170141183460469231731687303715884105727|-->| I wanted 124 to take all the load from 99. So I issued a move command. $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243 This command tells 99 to take the space b/w (56713727820156410577229101238628035242, 56713727820156410577229101238628035243] which is basically just one item in the token space, almost nothing... I wanted it to be very slim (just playing around). So, next I get this: 192.168.252.124Up 803.33 MB 56713727820156410577229101238628035242 |<--| 192.168.252.99Up 352.85 MB 56713727820156410577229101238628035243 | ^ 192.168.252.125Up 134.24 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 676.41 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 99.74 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 99.94 MB 170141183460469231731687303715884105727|-->| The tokens are correct, but it seems that 99 still has a lot of data. Why? OK, that might be b/c it didn't delete its moved data. So next I issued a nodetool cleanup, which should have taken care of that. Only that it didn't, the node 99 still has 352 MB of data. Why? So, you know what, I waited for 1h. Still no good, data wasn't cleaned up. I restarted the server. Still, data wasn't cleaned up... I issued a cleanup again... still no good... what's up with this node? -- Best regards, Maximmailto:maxi...@trackstudio.com LinkedIn Profile: http://www.linkedin.com/in/maximkr Google Talk/Jabber: maxi...@gmail.com ICQ number: 307863079 Skype Chat: maxim.kramarenko Yahoo! Messenger: maxim_kramarenko
nodetool cleanup isn't cleaning up?
I hope I understand nodetool cleanup correctly - it should clean up all data that does not (currently) belong to this node. If so, I think it might not be working correctly. Look at nodes 192.168.252.124 and 192.168.252.99 below 192.168.252.99Up 279.35 MB 3544607988759775661076818827414252202 |<--| 192.168.252.124Up 167.23 MB 56713727820156410577229101238628035242 | ^ 192.168.252.125Up 82.91 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 366.6 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 88.44 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 88.45 MB 170141183460469231731687303715884105727|-->| I wanted 124 to take all the load from 99. So I issued a move command. $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243 This command tells 99 to take the space b/w (56713727820156410577229101238628035242, 56713727820156410577229101238628035243] which is basically just one item in the token space, almost nothing... I wanted it to be very slim (just playing around). So, next I get this: 192.168.252.124Up 803.33 MB 56713727820156410577229101238628035242 |<--| 192.168.252.99Up 352.85 MB 56713727820156410577229101238628035243 | ^ 192.168.252.125Up 134.24 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 676.41 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 99.74 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 99.94 MB 170141183460469231731687303715884105727|-->| The tokens are correct, but it seems that 99 still has a lot of data. Why? OK, that might be b/c it didn't delete its moved data. So next I issued a nodetool cleanup, which should have taken care of that. Only that it didn't, the node 99 still has 352 MB of data. Why? So, you know what, I waited for 1h. Still no good, data wasn't cleaned up. I restarted the server. Still, data wasn't cleaned up... I issued a cleanup again... still no good... what's up with this node?