Re: Bootstrapping taking long

2011-01-05 Thread Ran Tavory
I see. Thanks for claryfing Jonathan.

On Wednesday, January 5, 2011, Jonathan Ellis  wrote:
> 1676 says "Avoid dropping messages off the client request path."
> Bootstrap messages are "off the client requst path."  So, if some of
> the nodes involved were loaded enough that they were dropping messages
> older than RPC_TIMEOUT to cope, it could lose part of the bootstrap
> communication permanently.
>
> On Wed, Jan 5, 2011 at 10:01 AM, Ran Tavory  wrote:
>> OK, thanks, so I see we had the same problem (I too had multiple keyspace,
>> not that I know why it matters to the problem at hand) and I see that by
>> upgrading to 0.6.7 you solved your problem (I didn't try it, had a different
>> workaround) but frankly, I don't understand
>> how https://issues.apache.org/jira/browse/CASSANDRA-1676 would relate the
>> the "stuck bootstrap" problem (I'm not saying that it isn't, I'd just like
>> to understand why...)
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

-- 
/Ran


Re: Bootstrapping taking long

2011-01-05 Thread Jonathan Ellis
1676 says "Avoid dropping messages off the client request path."
Bootstrap messages are "off the client requst path."  So, if some of
the nodes involved were loaded enough that they were dropping messages
older than RPC_TIMEOUT to cope, it could lose part of the bootstrap
communication permanently.

On Wed, Jan 5, 2011 at 10:01 AM, Ran Tavory  wrote:
> OK, thanks, so I see we had the same problem (I too had multiple keyspace,
> not that I know why it matters to the problem at hand) and I see that by
> upgrading to 0.6.7 you solved your problem (I didn't try it, had a different
> workaround) but frankly, I don't understand
> how https://issues.apache.org/jira/browse/CASSANDRA-1676 would relate the
> the "stuck bootstrap" problem (I'm not saying that it isn't, I'd just like
> to understand why...)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Bootstrapping taking long

2011-01-05 Thread Ran Tavory
OK, thanks, so I see we had the same problem (I too had multiple keyspace,
not that I know why it matters to the problem at hand) and I see that by
upgrading to 0.6.7 you solved your problem (I didn't try it, had a different
workaround) but frankly, I don't understand how
https://issues.apache.org/jira/browse/CASSANDRA-1676 would relate the the
"stuck bootstrap" problem (I'm not saying that it isn't, I'd just like to
understand why...)


On Wed, Jan 5, 2011 at 5:42 PM, Thibaut Britz  wrote:

> Had the same Problem a while ago. Upgrading solved the problem (Don't know
> if you have to redeploy your cluster though)
>
> http://www.mail-archive.com/user@cassandra.apache.org/msg07106.html
>
>
>
> On Wed, Jan 5, 2011 at 4:29 PM, Ran Tavory  wrote:
>
>> @Thibaut wrong email? Or how's "Avoid dropping messages off the client
>> request path" (CASSANDRA-1676) related to the bootstrap questions I had?
>>
>>
>> On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz <
>> thibaut.br...@trendiction.com> wrote:
>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-1676
>>>
>>> you have to use at least 0.6.7
>>>
>>>
>>>
>>> On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo 
>>> wrote:
>>>
 On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory  wrote:
 > In storage-conf I see this comment [1] from which I understand that
 the
 > recommended way to bootstrap a new node is to set AutoBootstrap=true
 and
 > remove itself from the seeds list.
 > Moreover, I did try to set AutoBootstrap=true and have the node in its
 own
 > seeds list, but it would not bootstrap. I don't recall the exact
 message but
 > it was something like "I found myself in the seeds list therefore I'm
 not
 > going to bootstrap even though AutoBootstrap is true".
 >
 > [1]
 >   
 >   false
 > On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn 
 wrote:
 >>
 >> If "seed list should be the same across the cluster" that means that
 nodes
 >> *should* have themselves as a seed. If that doesn't work for Ran,
 then that
 >> is the first problem, no?
 >>
 >>
 >> On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani 
 wrote:
 >>>
 >>> Well your ring issues don't make sense to me, seed list should be
 the
 >>> same across the cluster.
 >>> I'm just thinking of other things to try, non-boostrapped nodes
 should
 >>> join the ring instantly but reads will fail if you aren't using
 quorum.
 >>>
 >>> On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory 
 wrote:
 
  I haven't tried repair.  Should I?
 
  On Jan 5, 2011 3:48 PM, "Jake Luciani"  wrote:
  > Have you tried not bootstrapping but setting the token and
 manually
  > calling
  > repair?
  >
  > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory 
 wrote:
  >
  >> My conclusion is lame: I tried this on several hosts and saw the
 same
  >> behavior, the only way I was able to join new nodes was to first
  >> start them
  >> when they are *not in* their own seeds list and after they
  >> finish transferring the data, then restart them with themselves
 *in*
  >> their
  >> own seeds list. After doing that the node would join the ring.
  >> This is either my misunderstanding or a bug, but the only place
 I
  >> found it
  >> documented stated that the new node should not be in its own
 seeds
  >> list.
  >> Version 0.6.6.
  >>
  >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn
  >> wrote:
  >>
  >>> My nodes all have themselves in their list of seeds - always
 did -
  >>> and
  >>> everything works. (You may ask why I did this. I don't know, I
 must
  >>> have
  >>> copied it from an example somewhere.)
  >>>
  >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory 
 wrote:
  >>>
   I was able to make the node join the ring but I'm confused.
   What I did is, first when adding the node, this node was not
 in the
   seeds
   list of itself. AFAIK this is how it's supposed to be. So it
 was
   able to
   transfer all data to itself from other nodes but then it
 stayed in
   the
   bootstrapping state.
   So what I did (and I don't know why it works), is add this
 node to
   the
   seeds list in its own storage-conf.xml file. Then restart the
   server and
   then I finally see it in the ring...
   If I had added the node to the seeds list of itself when first
   joining
   it, it would not join the ring but if I do it in two phases it
 did
   work.
   So it's either my misunderstanding

Re: Bootstrapping taking long

2011-01-05 Thread Thibaut Britz
Had the same Problem a while ago. Upgrading solved the problem (Don't know
if you have to redeploy your cluster though)

http://www.mail-archive.com/user@cassandra.apache.org/msg07106.html


On Wed, Jan 5, 2011 at 4:29 PM, Ran Tavory  wrote:

> @Thibaut wrong email? Or how's "Avoid dropping messages off the client
> request path" (CASSANDRA-1676) related to the bootstrap questions I had?
>
>
> On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz <
> thibaut.br...@trendiction.com> wrote:
>
>> https://issues.apache.org/jira/browse/CASSANDRA-1676
>>
>> you have to use at least 0.6.7
>>
>>
>>
>> On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo wrote:
>>
>>> On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory  wrote:
>>> > In storage-conf I see this comment [1] from which I understand that the
>>> > recommended way to bootstrap a new node is to set AutoBootstrap=true
>>> and
>>> > remove itself from the seeds list.
>>> > Moreover, I did try to set AutoBootstrap=true and have the node in its
>>> own
>>> > seeds list, but it would not bootstrap. I don't recall the exact
>>> message but
>>> > it was something like "I found myself in the seeds list therefore I'm
>>> not
>>> > going to bootstrap even though AutoBootstrap is true".
>>> >
>>> > [1]
>>> >   
>>> >   false
>>> > On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn 
>>> wrote:
>>> >>
>>> >> If "seed list should be the same across the cluster" that means that
>>> nodes
>>> >> *should* have themselves as a seed. If that doesn't work for Ran, then
>>> that
>>> >> is the first problem, no?
>>> >>
>>> >>
>>> >> On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani 
>>> wrote:
>>> >>>
>>> >>> Well your ring issues don't make sense to me, seed list should be the
>>> >>> same across the cluster.
>>> >>> I'm just thinking of other things to try, non-boostrapped nodes
>>> should
>>> >>> join the ring instantly but reads will fail if you aren't using
>>> quorum.
>>> >>>
>>> >>> On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory  wrote:
>>> 
>>>  I haven't tried repair.  Should I?
>>> 
>>>  On Jan 5, 2011 3:48 PM, "Jake Luciani"  wrote:
>>>  > Have you tried not bootstrapping but setting the token and
>>> manually
>>>  > calling
>>>  > repair?
>>>  >
>>>  > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory 
>>> wrote:
>>>  >
>>>  >> My conclusion is lame: I tried this on several hosts and saw the
>>> same
>>>  >> behavior, the only way I was able to join new nodes was to first
>>>  >> start them
>>>  >> when they are *not in* their own seeds list and after they
>>>  >> finish transferring the data, then restart them with themselves
>>> *in*
>>>  >> their
>>>  >> own seeds list. After doing that the node would join the ring.
>>>  >> This is either my misunderstanding or a bug, but the only place I
>>>  >> found it
>>>  >> documented stated that the new node should not be in its own
>>> seeds
>>>  >> list.
>>>  >> Version 0.6.6.
>>>  >>
>>>  >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn
>>>  >> wrote:
>>>  >>
>>>  >>> My nodes all have themselves in their list of seeds - always did
>>> -
>>>  >>> and
>>>  >>> everything works. (You may ask why I did this. I don't know, I
>>> must
>>>  >>> have
>>>  >>> copied it from an example somewhere.)
>>>  >>>
>>>  >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory 
>>> wrote:
>>>  >>>
>>>   I was able to make the node join the ring but I'm confused.
>>>   What I did is, first when adding the node, this node was not in
>>> the
>>>   seeds
>>>   list of itself. AFAIK this is how it's supposed to be. So it
>>> was
>>>   able to
>>>   transfer all data to itself from other nodes but then it stayed
>>> in
>>>   the
>>>   bootstrapping state.
>>>   So what I did (and I don't know why it works), is add this node
>>> to
>>>   the
>>>   seeds list in its own storage-conf.xml file. Then restart the
>>>   server and
>>>   then I finally see it in the ring...
>>>   If I had added the node to the seeds list of itself when first
>>>   joining
>>>   it, it would not join the ring but if I do it in two phases it
>>> did
>>>   work.
>>>   So it's either my misunderstanding or a bug...
>>>  
>>>  
>>>   On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory 
>>>   wrote:
>>>  
>>>  > The new node does not see itself as part of the ring, it sees
>>> all
>>>  > others
>>>  > but itself, so from that perspective the view is consistent.
>>>  > The only problem is that the node never finishes to bootstrap.
>>> It
>>>  > stays
>>>  > in this state for hours (It's been 20 hours now...)
>>>  >
>>>  >
>>>  > $ bin/nodetool -p 9004 -h localhost streams
>>>  >> Mode: Bootstrapping
>>>  >> Not sending any stream

Re: Bootstrapping taking long

2011-01-05 Thread Ran Tavory
@Thibaut wrong email? Or how's "Avoid dropping messages off the client
request path" (CASSANDRA-1676) related to the bootstrap questions I had?

On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz  wrote:

> https://issues.apache.org/jira/browse/CASSANDRA-1676
>
> you have to use at least 0.6.7
>
>
>
> On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo wrote:
>
>> On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory  wrote:
>> > In storage-conf I see this comment [1] from which I understand that the
>> > recommended way to bootstrap a new node is to set AutoBootstrap=true and
>> > remove itself from the seeds list.
>> > Moreover, I did try to set AutoBootstrap=true and have the node in its
>> own
>> > seeds list, but it would not bootstrap. I don't recall the exact message
>> but
>> > it was something like "I found myself in the seeds list therefore I'm
>> not
>> > going to bootstrap even though AutoBootstrap is true".
>> >
>> > [1]
>> >   
>> >   false
>> > On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn 
>> wrote:
>> >>
>> >> If "seed list should be the same across the cluster" that means that
>> nodes
>> >> *should* have themselves as a seed. If that doesn't work for Ran, then
>> that
>> >> is the first problem, no?
>> >>
>> >>
>> >> On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani  wrote:
>> >>>
>> >>> Well your ring issues don't make sense to me, seed list should be the
>> >>> same across the cluster.
>> >>> I'm just thinking of other things to try, non-boostrapped nodes should
>> >>> join the ring instantly but reads will fail if you aren't using
>> quorum.
>> >>>
>> >>> On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory  wrote:
>> 
>>  I haven't tried repair.  Should I?
>> 
>>  On Jan 5, 2011 3:48 PM, "Jake Luciani"  wrote:
>>  > Have you tried not bootstrapping but setting the token and manually
>>  > calling
>>  > repair?
>>  >
>>  > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory 
>> wrote:
>>  >
>>  >> My conclusion is lame: I tried this on several hosts and saw the
>> same
>>  >> behavior, the only way I was able to join new nodes was to first
>>  >> start them
>>  >> when they are *not in* their own seeds list and after they
>>  >> finish transferring the data, then restart them with themselves
>> *in*
>>  >> their
>>  >> own seeds list. After doing that the node would join the ring.
>>  >> This is either my misunderstanding or a bug, but the only place I
>>  >> found it
>>  >> documented stated that the new node should not be in its own seeds
>>  >> list.
>>  >> Version 0.6.6.
>>  >>
>>  >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn
>>  >> wrote:
>>  >>
>>  >>> My nodes all have themselves in their list of seeds - always did
>> -
>>  >>> and
>>  >>> everything works. (You may ask why I did this. I don't know, I
>> must
>>  >>> have
>>  >>> copied it from an example somewhere.)
>>  >>>
>>  >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory 
>> wrote:
>>  >>>
>>   I was able to make the node join the ring but I'm confused.
>>   What I did is, first when adding the node, this node was not in
>> the
>>   seeds
>>   list of itself. AFAIK this is how it's supposed to be. So it was
>>   able to
>>   transfer all data to itself from other nodes but then it stayed
>> in
>>   the
>>   bootstrapping state.
>>   So what I did (and I don't know why it works), is add this node
>> to
>>   the
>>   seeds list in its own storage-conf.xml file. Then restart the
>>   server and
>>   then I finally see it in the ring...
>>   If I had added the node to the seeds list of itself when first
>>   joining
>>   it, it would not join the ring but if I do it in two phases it
>> did
>>   work.
>>   So it's either my misunderstanding or a bug...
>>  
>>  
>>   On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory 
>>   wrote:
>>  
>>  > The new node does not see itself as part of the ring, it sees
>> all
>>  > others
>>  > but itself, so from that perspective the view is consistent.
>>  > The only problem is that the node never finishes to bootstrap.
>> It
>>  > stays
>>  > in this state for hours (It's been 20 hours now...)
>>  >
>>  >
>>  > $ bin/nodetool -p 9004 -h localhost streams
>>  >> Mode: Bootstrapping
>>  >> Not sending any streams.
>>  >> Not receiving any streams.
>>  >
>>  >
>>  > On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall 
>>  > wrote:
>>  >
>>  >> Does the new node have itself in the list of seeds per chance?
>>  >> This
>>  >> could cause some issues if so.
>>  >>
>>  >> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory 
>>  >> wrote:
>>  >> > I'm still

Re: Bootstrapping taking long

2011-01-05 Thread Thibaut Britz
https://issues.apache.org/jira/browse/CASSANDRA-1676

you have to use at least 0.6.7


On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo wrote:

> On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory  wrote:
> > In storage-conf I see this comment [1] from which I understand that the
> > recommended way to bootstrap a new node is to set AutoBootstrap=true and
> > remove itself from the seeds list.
> > Moreover, I did try to set AutoBootstrap=true and have the node in its
> own
> > seeds list, but it would not bootstrap. I don't recall the exact message
> but
> > it was something like "I found myself in the seeds list therefore I'm not
> > going to bootstrap even though AutoBootstrap is true".
> >
> > [1]
> >   
> >   false
> > On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn 
> wrote:
> >>
> >> If "seed list should be the same across the cluster" that means that
> nodes
> >> *should* have themselves as a seed. If that doesn't work for Ran, then
> that
> >> is the first problem, no?
> >>
> >>
> >> On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani  wrote:
> >>>
> >>> Well your ring issues don't make sense to me, seed list should be the
> >>> same across the cluster.
> >>> I'm just thinking of other things to try, non-boostrapped nodes should
> >>> join the ring instantly but reads will fail if you aren't using quorum.
> >>>
> >>> On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory  wrote:
> 
>  I haven't tried repair.  Should I?
> 
>  On Jan 5, 2011 3:48 PM, "Jake Luciani"  wrote:
>  > Have you tried not bootstrapping but setting the token and manually
>  > calling
>  > repair?
>  >
>  > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory 
> wrote:
>  >
>  >> My conclusion is lame: I tried this on several hosts and saw the
> same
>  >> behavior, the only way I was able to join new nodes was to first
>  >> start them
>  >> when they are *not in* their own seeds list and after they
>  >> finish transferring the data, then restart them with themselves
> *in*
>  >> their
>  >> own seeds list. After doing that the node would join the ring.
>  >> This is either my misunderstanding or a bug, but the only place I
>  >> found it
>  >> documented stated that the new node should not be in its own seeds
>  >> list.
>  >> Version 0.6.6.
>  >>
>  >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn
>  >> wrote:
>  >>
>  >>> My nodes all have themselves in their list of seeds - always did -
>  >>> and
>  >>> everything works. (You may ask why I did this. I don't know, I
> must
>  >>> have
>  >>> copied it from an example somewhere.)
>  >>>
>  >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory 
> wrote:
>  >>>
>   I was able to make the node join the ring but I'm confused.
>   What I did is, first when adding the node, this node was not in
> the
>   seeds
>   list of itself. AFAIK this is how it's supposed to be. So it was
>   able to
>   transfer all data to itself from other nodes but then it stayed
> in
>   the
>   bootstrapping state.
>   So what I did (and I don't know why it works), is add this node
> to
>   the
>   seeds list in its own storage-conf.xml file. Then restart the
>   server and
>   then I finally see it in the ring...
>   If I had added the node to the seeds list of itself when first
>   joining
>   it, it would not join the ring but if I do it in two phases it
> did
>   work.
>   So it's either my misunderstanding or a bug...
>  
>  
>   On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory 
>   wrote:
>  
>  > The new node does not see itself as part of the ring, it sees
> all
>  > others
>  > but itself, so from that perspective the view is consistent.
>  > The only problem is that the node never finishes to bootstrap.
> It
>  > stays
>  > in this state for hours (It's been 20 hours now...)
>  >
>  >
>  > $ bin/nodetool -p 9004 -h localhost streams
>  >> Mode: Bootstrapping
>  >> Not sending any streams.
>  >> Not receiving any streams.
>  >
>  >
>  > On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall 
>  > wrote:
>  >
>  >> Does the new node have itself in the list of seeds per chance?
>  >> This
>  >> could cause some issues if so.
>  >>
>  >> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory 
>  >> wrote:
>  >> > I'm still at lost. I haven't been able to resolve this. I
> tried
>  >> > adding another node at a different location on the ring but
>  >> > this node
>  >> > too remains stuck in the bootstrapping state for many hours
>  >> > without
>  >> > any of the other nodes being busy with anti compaction or
>  >

Re: Bootstrapping taking long

2011-01-05 Thread Edward Capriolo
On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory  wrote:
> In storage-conf I see this comment [1] from which I understand that the
> recommended way to bootstrap a new node is to set AutoBootstrap=true and
> remove itself from the seeds list.
> Moreover, I did try to set AutoBootstrap=true and have the node in its own
> seeds list, but it would not bootstrap. I don't recall the exact message but
> it was something like "I found myself in the seeds list therefore I'm not
> going to bootstrap even though AutoBootstrap is true".
>
> [1]
>   
>   false
> On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn  wrote:
>>
>> If "seed list should be the same across the cluster" that means that nodes
>> *should* have themselves as a seed. If that doesn't work for Ran, then that
>> is the first problem, no?
>>
>>
>> On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani  wrote:
>>>
>>> Well your ring issues don't make sense to me, seed list should be the
>>> same across the cluster.
>>> I'm just thinking of other things to try, non-boostrapped nodes should
>>> join the ring instantly but reads will fail if you aren't using quorum.
>>>
>>> On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory  wrote:

 I haven't tried repair.  Should I?

 On Jan 5, 2011 3:48 PM, "Jake Luciani"  wrote:
 > Have you tried not bootstrapping but setting the token and manually
 > calling
 > repair?
 >
 > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory  wrote:
 >
 >> My conclusion is lame: I tried this on several hosts and saw the same
 >> behavior, the only way I was able to join new nodes was to first
 >> start them
 >> when they are *not in* their own seeds list and after they
 >> finish transferring the data, then restart them with themselves *in*
 >> their
 >> own seeds list. After doing that the node would join the ring.
 >> This is either my misunderstanding or a bug, but the only place I
 >> found it
 >> documented stated that the new node should not be in its own seeds
 >> list.
 >> Version 0.6.6.
 >>
 >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn
 >> wrote:
 >>
 >>> My nodes all have themselves in their list of seeds - always did -
 >>> and
 >>> everything works. (You may ask why I did this. I don't know, I must
 >>> have
 >>> copied it from an example somewhere.)
 >>>
 >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory  wrote:
 >>>
  I was able to make the node join the ring but I'm confused.
  What I did is, first when adding the node, this node was not in the
  seeds
  list of itself. AFAIK this is how it's supposed to be. So it was
  able to
  transfer all data to itself from other nodes but then it stayed in
  the
  bootstrapping state.
  So what I did (and I don't know why it works), is add this node to
  the
  seeds list in its own storage-conf.xml file. Then restart the
  server and
  then I finally see it in the ring...
  If I had added the node to the seeds list of itself when first
  joining
  it, it would not join the ring but if I do it in two phases it did
  work.
  So it's either my misunderstanding or a bug...
 
 
  On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory 
  wrote:
 
 > The new node does not see itself as part of the ring, it sees all
 > others
 > but itself, so from that perspective the view is consistent.
 > The only problem is that the node never finishes to bootstrap. It
 > stays
 > in this state for hours (It's been 20 hours now...)
 >
 >
 > $ bin/nodetool -p 9004 -h localhost streams
 >> Mode: Bootstrapping
 >> Not sending any streams.
 >> Not receiving any streams.
 >
 >
 > On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall 
 > wrote:
 >
 >> Does the new node have itself in the list of seeds per chance?
 >> This
 >> could cause some issues if so.
 >>
 >> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory 
 >> wrote:
 >> > I'm still at lost. I haven't been able to resolve this. I tried
 >> > adding another node at a different location on the ring but
 >> > this node
 >> > too remains stuck in the bootstrapping state for many hours
 >> > without
 >> > any of the other nodes being busy with anti compaction or
 >> > anything
 >> > else. I don't know what's keeping it from finishing the
 >> > bootstrap,no
 >> > CPU, no io, files were already streamed so what is it waiting
 >> > for?
 >> > I read the release notes of 0.6.7 and 0.6.8 and there didn't
 >> > seem to
 >> > be anything addressing a similar issue so I figured there was
 >> > no
 >

Re: Bootstrapping taking long

2011-01-05 Thread Ran Tavory
In storage-conf I see this comment [1] from which I understand that the
recommended way to bootstrap a new node is to set AutoBootstrap=true and
remove itself from the seeds list.
Moreover, I did try to set AutoBootstrap=true and have the node in its own
seeds list, but it would not bootstrap. I don't recall the exact message but
it was something like "I found myself in the seeds list therefore I'm not
going to bootstrap even though AutoBootstrap is true".

[1]
  
  false

On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn  wrote:

> If "seed list should be the same across the cluster" that means that nodes
> *should* have themselves as a seed. If that doesn't work for Ran, then that
> is the first problem, no?
>
>
> On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani  wrote:
>
>> Well your ring issues don't make sense to me, seed list should be the same
>> across the cluster.
>> I'm just thinking of other things to try, non-boostrapped nodes should
>> join the ring instantly but reads will fail if you aren't using quorum.
>>
>>
>> On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory  wrote:
>>
>>> I haven't tried repair.  Should I?
>>> On Jan 5, 2011 3:48 PM, "Jake Luciani"  wrote:
>>> > Have you tried not bootstrapping but setting the token and manually
>>> calling
>>> > repair?
>>> >
>>> > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory  wrote:
>>> >
>>> >> My conclusion is lame: I tried this on several hosts and saw the same
>>> >> behavior, the only way I was able to join new nodes was to first start
>>> them
>>> >> when they are *not in* their own seeds list and after they
>>> >> finish transferring the data, then restart them with themselves *in*
>>> their
>>> >> own seeds list. After doing that the node would join the ring.
>>> >> This is either my misunderstanding or a bug, but the only place I
>>> found it
>>> >> documented stated that the new node should not be in its own seeds
>>> list.
>>> >> Version 0.6.6.
>>> >>
>>> >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn >> >wrote:
>>> >>
>>> >>> My nodes all have themselves in their list of seeds - always did -
>>> and
>>> >>> everything works. (You may ask why I did this. I don't know, I must
>>> have
>>> >>> copied it from an example somewhere.)
>>> >>>
>>> >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory  wrote:
>>> >>>
>>>  I was able to make the node join the ring but I'm confused.
>>>  What I did is, first when adding the node, this node was not in the
>>> seeds
>>>  list of itself. AFAIK this is how it's supposed to be. So it was
>>> able to
>>>  transfer all data to itself from other nodes but then it stayed in
>>> the
>>>  bootstrapping state.
>>>  So what I did (and I don't know why it works), is add this node to
>>> the
>>>  seeds list in its own storage-conf.xml file. Then restart the server
>>> and
>>>  then I finally see it in the ring...
>>>  If I had added the node to the seeds list of itself when first
>>> joining
>>>  it, it would not join the ring but if I do it in two phases it did
>>> work.
>>>  So it's either my misunderstanding or a bug...
>>> 
>>> 
>>>  On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory 
>>> wrote:
>>> 
>>> > The new node does not see itself as part of the ring, it sees all
>>> others
>>> > but itself, so from that perspective the view is consistent.
>>> > The only problem is that the node never finishes to bootstrap. It
>>> stays
>>> > in this state for hours (It's been 20 hours now...)
>>> >
>>> >
>>> > $ bin/nodetool -p 9004 -h localhost streams
>>> >> Mode: Bootstrapping
>>> >> Not sending any streams.
>>> >> Not receiving any streams.
>>> >
>>> >
>>> > On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall 
>>> wrote:
>>> >
>>> >> Does the new node have itself in the list of seeds per chance?
>>> This
>>> >> could cause some issues if so.
>>> >>
>>> >> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory 
>>> wrote:
>>> >> > I'm still at lost. I haven't been able to resolve this. I tried
>>> >> > adding another node at a different location on the ring but this
>>> node
>>> >> > too remains stuck in the bootstrapping state for many hours
>>> without
>>> >> > any of the other nodes being busy with anti compaction or
>>> anything
>>> >> > else. I don't know what's keeping it from finishing the
>>> bootstrap,no
>>> >> > CPU, no io, files were already streamed so what is it waiting
>>> for?
>>> >> > I read the release notes of 0.6.7 and 0.6.8 and there didn't
>>> seem to
>>> >> > be anything addressing a similar issue so I figured there was no
>>> >> point
>>> >> > in upgrading. But let me know if you think there is.
>>> >> > Or any other advice...
>>> >> >
>>> >> > On Tuesday, January 4, 2011, Ran Tavory 
>>> wrote:
>>> >> >> Thanks Jake, but unfortunately the streams directory is empty
>>> so I
>>> >> don't think that any of the nodes is anti-compacting data right
>>> 

Re: Bootstrapping taking long

2011-01-05 Thread David Boxenhorn
If "seed list should be the same across the cluster" that means that nodes
*should* have themselves as a seed. If that doesn't work for Ran, then that
is the first problem, no?


On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani  wrote:

> Well your ring issues don't make sense to me, seed list should be the same
> across the cluster.
> I'm just thinking of other things to try, non-boostrapped nodes should join
> the ring instantly but reads will fail if you aren't using quorum.
>
>
> On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory  wrote:
>
>> I haven't tried repair.  Should I?
>> On Jan 5, 2011 3:48 PM, "Jake Luciani"  wrote:
>> > Have you tried not bootstrapping but setting the token and manually
>> calling
>> > repair?
>> >
>> > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory  wrote:
>> >
>> >> My conclusion is lame: I tried this on several hosts and saw the same
>> >> behavior, the only way I was able to join new nodes was to first start
>> them
>> >> when they are *not in* their own seeds list and after they
>> >> finish transferring the data, then restart them with themselves *in*
>> their
>> >> own seeds list. After doing that the node would join the ring.
>> >> This is either my misunderstanding or a bug, but the only place I found
>> it
>> >> documented stated that the new node should not be in its own seeds
>> list.
>> >> Version 0.6.6.
>> >>
>> >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn > >wrote:
>> >>
>> >>> My nodes all have themselves in their list of seeds - always did - and
>> >>> everything works. (You may ask why I did this. I don't know, I must
>> have
>> >>> copied it from an example somewhere.)
>> >>>
>> >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory  wrote:
>> >>>
>>  I was able to make the node join the ring but I'm confused.
>>  What I did is, first when adding the node, this node was not in the
>> seeds
>>  list of itself. AFAIK this is how it's supposed to be. So it was able
>> to
>>  transfer all data to itself from other nodes but then it stayed in
>> the
>>  bootstrapping state.
>>  So what I did (and I don't know why it works), is add this node to
>> the
>>  seeds list in its own storage-conf.xml file. Then restart the server
>> and
>>  then I finally see it in the ring...
>>  If I had added the node to the seeds list of itself when first
>> joining
>>  it, it would not join the ring but if I do it in two phases it did
>> work.
>>  So it's either my misunderstanding or a bug...
>> 
>> 
>>  On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory  wrote:
>> 
>> > The new node does not see itself as part of the ring, it sees all
>> others
>> > but itself, so from that perspective the view is consistent.
>> > The only problem is that the node never finishes to bootstrap. It
>> stays
>> > in this state for hours (It's been 20 hours now...)
>> >
>> >
>> > $ bin/nodetool -p 9004 -h localhost streams
>> >> Mode: Bootstrapping
>> >> Not sending any streams.
>> >> Not receiving any streams.
>> >
>> >
>> > On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall 
>> wrote:
>> >
>> >> Does the new node have itself in the list of seeds per chance? This
>> >> could cause some issues if so.
>> >>
>> >> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory 
>> wrote:
>> >> > I'm still at lost. I haven't been able to resolve this. I tried
>> >> > adding another node at a different location on the ring but this
>> node
>> >> > too remains stuck in the bootstrapping state for many hours
>> without
>> >> > any of the other nodes being busy with anti compaction or
>> anything
>> >> > else. I don't know what's keeping it from finishing the
>> bootstrap,no
>> >> > CPU, no io, files were already streamed so what is it waiting
>> for?
>> >> > I read the release notes of 0.6.7 and 0.6.8 and there didn't seem
>> to
>> >> > be anything addressing a similar issue so I figured there was no
>> >> point
>> >> > in upgrading. But let me know if you think there is.
>> >> > Or any other advice...
>> >> >
>> >> > On Tuesday, January 4, 2011, Ran Tavory 
>> wrote:
>> >> >> Thanks Jake, but unfortunately the streams directory is empty so
>> I
>> >> don't think that any of the nodes is anti-compacting data right now
>> or had
>> >> been in the past 5 hours. It seems that all the data was already
>> transferred
>> >> to the joining host but the joining node, after having received the
>> data
>> >> would still remain in bootstrapping mode and not join the cluster.
>> I'm not
>> >> sure that *all* data was transferred (perhaps other nodes need to
>> transfer
>> >> more data) but nothing is actually happening so I assume all has
>> been moved.
>> >> >> Perhaps it's a configuration error from my part. Should I use I
>> use
>> >> AutoBootstrap=true ? Anything else I should look out for in the
>> >> configuration file or something else?
>> >>>

Re: Bootstrapping taking long

2011-01-05 Thread Jake Luciani
Well your ring issues don't make sense to me, seed list should be the same
across the cluster.
I'm just thinking of other things to try, non-boostrapped nodes should join
the ring instantly but reads will fail if you aren't using quorum.


On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory  wrote:

> I haven't tried repair.  Should I?
> On Jan 5, 2011 3:48 PM, "Jake Luciani"  wrote:
> > Have you tried not bootstrapping but setting the token and manually
> calling
> > repair?
> >
> > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory  wrote:
> >
> >> My conclusion is lame: I tried this on several hosts and saw the same
> >> behavior, the only way I was able to join new nodes was to first start
> them
> >> when they are *not in* their own seeds list and after they
> >> finish transferring the data, then restart them with themselves *in*
> their
> >> own seeds list. After doing that the node would join the ring.
> >> This is either my misunderstanding or a bug, but the only place I found
> it
> >> documented stated that the new node should not be in its own seeds list.
> >> Version 0.6.6.
> >>
> >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn  >wrote:
> >>
> >>> My nodes all have themselves in their list of seeds - always did - and
> >>> everything works. (You may ask why I did this. I don't know, I must
> have
> >>> copied it from an example somewhere.)
> >>>
> >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory  wrote:
> >>>
>  I was able to make the node join the ring but I'm confused.
>  What I did is, first when adding the node, this node was not in the
> seeds
>  list of itself. AFAIK this is how it's supposed to be. So it was able
> to
>  transfer all data to itself from other nodes but then it stayed in the
>  bootstrapping state.
>  So what I did (and I don't know why it works), is add this node to the
>  seeds list in its own storage-conf.xml file. Then restart the server
> and
>  then I finally see it in the ring...
>  If I had added the node to the seeds list of itself when first joining
>  it, it would not join the ring but if I do it in two phases it did
> work.
>  So it's either my misunderstanding or a bug...
> 
> 
>  On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory  wrote:
> 
> > The new node does not see itself as part of the ring, it sees all
> others
> > but itself, so from that perspective the view is consistent.
> > The only problem is that the node never finishes to bootstrap. It
> stays
> > in this state for hours (It's been 20 hours now...)
> >
> >
> > $ bin/nodetool -p 9004 -h localhost streams
> >> Mode: Bootstrapping
> >> Not sending any streams.
> >> Not receiving any streams.
> >
> >
> > On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall 
> wrote:
> >
> >> Does the new node have itself in the list of seeds per chance? This
> >> could cause some issues if so.
> >>
> >> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory 
> wrote:
> >> > I'm still at lost. I haven't been able to resolve this. I tried
> >> > adding another node at a different location on the ring but this
> node
> >> > too remains stuck in the bootstrapping state for many hours
> without
> >> > any of the other nodes being busy with anti compaction or anything
> >> > else. I don't know what's keeping it from finishing the
> bootstrap,no
> >> > CPU, no io, files were already streamed so what is it waiting for?
> >> > I read the release notes of 0.6.7 and 0.6.8 and there didn't seem
> to
> >> > be anything addressing a similar issue so I figured there was no
> >> point
> >> > in upgrading. But let me know if you think there is.
> >> > Or any other advice...
> >> >
> >> > On Tuesday, January 4, 2011, Ran Tavory  wrote:
> >> >> Thanks Jake, but unfortunately the streams directory is empty so
> I
> >> don't think that any of the nodes is anti-compacting data right now
> or had
> >> been in the past 5 hours. It seems that all the data was already
> transferred
> >> to the joining host but the joining node, after having received the
> data
> >> would still remain in bootstrapping mode and not join the cluster.
> I'm not
> >> sure that *all* data was transferred (perhaps other nodes need to
> transfer
> >> more data) but nothing is actually happening so I assume all has
> been moved.
> >> >> Perhaps it's a configuration error from my part. Should I use I
> use
> >> AutoBootstrap=true ? Anything else I should look out for in the
> >> configuration file or something else?
> >> >>
> >> >>
> >> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani 
> >> wrote:
> >> >>
> >> >> In 0.6, locate the node doing anti-compaction and look in the
> >> "streams" subdirectory in the keyspace data dir to monitor the
> >> anti-compaction progress (it puts new SSTables for bootstrapping
> node in
> >> there)
> >> >>
> >> >>

Re: Bootstrapping taking long

2011-01-05 Thread Ran Tavory
I haven't tried repair.  Should I?
On Jan 5, 2011 3:48 PM, "Jake Luciani"  wrote:
> Have you tried not bootstrapping but setting the token and manually
calling
> repair?
>
> On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory  wrote:
>
>> My conclusion is lame: I tried this on several hosts and saw the same
>> behavior, the only way I was able to join new nodes was to first start
them
>> when they are *not in* their own seeds list and after they
>> finish transferring the data, then restart them with themselves *in*
their
>> own seeds list. After doing that the node would join the ring.
>> This is either my misunderstanding or a bug, but the only place I found
it
>> documented stated that the new node should not be in its own seeds list.
>> Version 0.6.6.
>>
>> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn wrote:
>>
>>> My nodes all have themselves in their list of seeds - always did - and
>>> everything works. (You may ask why I did this. I don't know, I must have
>>> copied it from an example somewhere.)
>>>
>>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory  wrote:
>>>
 I was able to make the node join the ring but I'm confused.
 What I did is, first when adding the node, this node was not in the
seeds
 list of itself. AFAIK this is how it's supposed to be. So it was able
to
 transfer all data to itself from other nodes but then it stayed in the
 bootstrapping state.
 So what I did (and I don't know why it works), is add this node to the
 seeds list in its own storage-conf.xml file. Then restart the server
and
 then I finally see it in the ring...
 If I had added the node to the seeds list of itself when first joining
 it, it would not join the ring but if I do it in two phases it did
work.
 So it's either my misunderstanding or a bug...


 On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory  wrote:

> The new node does not see itself as part of the ring, it sees all
others
> but itself, so from that perspective the view is consistent.
> The only problem is that the node never finishes to bootstrap. It
stays
> in this state for hours (It's been 20 hours now...)
>
>
> $ bin/nodetool -p 9004 -h localhost streams
>> Mode: Bootstrapping
>> Not sending any streams.
>> Not receiving any streams.
>
>
> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall  wrote:
>
>> Does the new node have itself in the list of seeds per chance? This
>> could cause some issues if so.
>>
>> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory  wrote:
>> > I'm still at lost. I haven't been able to resolve this. I tried
>> > adding another node at a different location on the ring but this
node
>> > too remains stuck in the bootstrapping state for many hours without
>> > any of the other nodes being busy with anti compaction or anything
>> > else. I don't know what's keeping it from finishing the
bootstrap,no
>> > CPU, no io, files were already streamed so what is it waiting for?
>> > I read the release notes of 0.6.7 and 0.6.8 and there didn't seem
to
>> > be anything addressing a similar issue so I figured there was no
>> point
>> > in upgrading. But let me know if you think there is.
>> > Or any other advice...
>> >
>> > On Tuesday, January 4, 2011, Ran Tavory  wrote:
>> >> Thanks Jake, but unfortunately the streams directory is empty so I
>> don't think that any of the nodes is anti-compacting data right now
or had
>> been in the past 5 hours. It seems that all the data was already
transferred
>> to the joining host but the joining node, after having received the
data
>> would still remain in bootstrapping mode and not join the cluster.
I'm not
>> sure that *all* data was transferred (perhaps other nodes need to
transfer
>> more data) but nothing is actually happening so I assume all has been
moved.
>> >> Perhaps it's a configuration error from my part. Should I use I
use
>> AutoBootstrap=true ? Anything else I should look out for in the
>> configuration file or something else?
>> >>
>> >>
>> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani 
>> wrote:
>> >>
>> >> In 0.6, locate the node doing anti-compaction and look in the
>> "streams" subdirectory in the keyspace data dir to monitor the
>> anti-compaction progress (it puts new SSTables for bootstrapping node
in
>> there)
>> >>
>> >>
>> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory 
>> wrote:
>> >>
>> >>
>> >> Running nodetool decommission didn't help. Actually the node
refused
>> to decommission itself (b/c it wasn't part of the ring). So I simply
stopped
>> the process, deleted all the data directories and started it again.
It
>> worked in the sense of the node bootstrapped again but as before,
after it
>> had finished moving the data nothing happened for a long time (I'm
still
>> waiting, but nothing seems to

Re: Bootstrapping taking long

2011-01-05 Thread Jake Luciani
Have you tried not bootstrapping but setting the token and manually calling
repair?

On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory  wrote:

> My conclusion is lame: I tried this on several hosts and saw the same
> behavior, the only way I was able to join new nodes was to first start them
> when they are *not in* their own seeds list and after they
> finish transferring the data, then restart them with themselves *in* their
> own seeds list. After doing that the node would join the ring.
> This is either my misunderstanding or a bug, but the only place I found it
> documented stated that the new node should not be in its own seeds list.
> Version 0.6.6.
>
> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn wrote:
>
>> My nodes all have themselves in their list of seeds - always did - and
>> everything works. (You may ask why I did this. I don't know, I must have
>> copied it from an example somewhere.)
>>
>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory  wrote:
>>
>>> I was able to make the node join the ring but I'm confused.
>>> What I did is, first when adding the node, this node was not in the seeds
>>> list of itself. AFAIK this is how it's supposed to be. So it was able to
>>> transfer all data to itself from other nodes but then it stayed in the
>>> bootstrapping state.
>>> So what I did (and I don't know why it works), is add this node to the
>>> seeds list in its own storage-conf.xml file. Then restart the server and
>>> then I finally see it in the ring...
>>> If I had added the node to the seeds list of itself when first joining
>>> it, it would not join the ring but if I do it in two phases it did work.
>>> So it's either my misunderstanding or a bug...
>>>
>>>
>>> On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory  wrote:
>>>
 The new node does not see itself as part of the ring, it sees all others
 but itself, so from that perspective the view is consistent.
 The only problem is that the node never finishes to bootstrap. It stays
 in this state for hours (It's been 20 hours now...)


 $ bin/nodetool -p 9004 -h localhost streams
> Mode: Bootstrapping
> Not sending any streams.
> Not receiving any streams.


 On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall  wrote:

> Does the new node have itself in the list of seeds per chance? This
> could cause some issues if so.
>
> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory  wrote:
> > I'm still at lost.   I haven't been able to resolve this. I tried
> > adding another node at a different location on the ring but this node
> > too remains stuck in the bootstrapping state for many hours without
> > any of the other nodes being busy with anti compaction or anything
> > else. I don't know what's keeping it from finishing the bootstrap,no
> > CPU, no io, files were already streamed so what is it waiting for?
> > I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
> > be anything addressing a similar issue so I figured there was no
> point
> > in upgrading. But let me know if you think there is.
> > Or any other advice...
> >
> > On Tuesday, January 4, 2011, Ran Tavory  wrote:
> >> Thanks Jake, but unfortunately the streams directory is empty so I
> don't think that any of the nodes is anti-compacting data right now or had
> been in the past 5 hours. It seems that all the data was already 
> transferred
> to the joining host but the joining node, after having received the data
> would still remain in bootstrapping mode and not join the cluster. I'm not
> sure that *all* data was transferred (perhaps other nodes need to transfer
> more data) but nothing is actually happening so I assume all has been 
> moved.
> >> Perhaps it's a configuration error from my part. Should I use I use
> AutoBootstrap=true ? Anything else I should look out for in the
> configuration file or something else?
> >>
> >>
> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani 
> wrote:
> >>
> >> In 0.6, locate the node doing anti-compaction and look in the
> "streams" subdirectory in the keyspace data dir to monitor the
> anti-compaction progress (it puts new SSTables for bootstrapping node in
> there)
> >>
> >>
> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory 
> wrote:
> >>
> >>
> >> Running nodetool decommission didn't help. Actually the node refused
> to decommission itself (b/c it wasn't part of the ring). So I simply 
> stopped
> the process, deleted all the data directories and started it again. It
> worked in the sense of the node bootstrapped again but as before, after it
> had finished moving the data nothing happened for a long time (I'm still
> waiting, but nothing seems to be happening).
> >>
> >>
> >>
> >>
> >> Any hints how to analyze a "stuck" bootstrapping node??thanks
> >> On Tue, Jan 4, 2011 at 1:51 P

Re: Bootstrapping taking long

2011-01-05 Thread David Boxenhorn
I started all my nodes the first time with seeds in their own lists, and it
worked. I think I started them in 0.6.1, but I'm not sure. (I'm now using
0.6.8).


On Wed, Jan 5, 2011 at 2:07 PM, Ran Tavory  wrote:

> My conclusion is lame: I tried this on several hosts and saw the same
> behavior, the only way I was able to join new nodes was to first start them
> when they are *not in* their own seeds list and after they
> finish transferring the data, then restart them with themselves *in* their
> own seeds list. After doing that the node would join the ring.
> This is either my misunderstanding or a bug, but the only place I found it
> documented stated that the new node should not be in its own seeds list.
> Version 0.6.6.
>
> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn wrote:
>
>> My nodes all have themselves in their list of seeds - always did - and
>> everything works. (You may ask why I did this. I don't know, I must have
>> copied it from an example somewhere.)
>>
>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory  wrote:
>>
>>> I was able to make the node join the ring but I'm confused.
>>> What I did is, first when adding the node, this node was not in the seeds
>>> list of itself. AFAIK this is how it's supposed to be. So it was able to
>>> transfer all data to itself from other nodes but then it stayed in the
>>> bootstrapping state.
>>> So what I did (and I don't know why it works), is add this node to the
>>> seeds list in its own storage-conf.xml file. Then restart the server and
>>> then I finally see it in the ring...
>>> If I had added the node to the seeds list of itself when first joining
>>> it, it would not join the ring but if I do it in two phases it did work.
>>> So it's either my misunderstanding or a bug...
>>>
>>>
>>> On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory  wrote:
>>>
 The new node does not see itself as part of the ring, it sees all others
 but itself, so from that perspective the view is consistent.
 The only problem is that the node never finishes to bootstrap. It stays
 in this state for hours (It's been 20 hours now...)


 $ bin/nodetool -p 9004 -h localhost streams
> Mode: Bootstrapping
> Not sending any streams.
> Not receiving any streams.


 On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall  wrote:

> Does the new node have itself in the list of seeds per chance? This
> could cause some issues if so.
>
> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory  wrote:
> > I'm still at lost.   I haven't been able to resolve this. I tried
> > adding another node at a different location on the ring but this node
> > too remains stuck in the bootstrapping state for many hours without
> > any of the other nodes being busy with anti compaction or anything
> > else. I don't know what's keeping it from finishing the bootstrap,no
> > CPU, no io, files were already streamed so what is it waiting for?
> > I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
> > be anything addressing a similar issue so I figured there was no
> point
> > in upgrading. But let me know if you think there is.
> > Or any other advice...
> >
> > On Tuesday, January 4, 2011, Ran Tavory  wrote:
> >> Thanks Jake, but unfortunately the streams directory is empty so I
> don't think that any of the nodes is anti-compacting data right now or had
> been in the past 5 hours. It seems that all the data was already 
> transferred
> to the joining host but the joining node, after having received the data
> would still remain in bootstrapping mode and not join the cluster. I'm not
> sure that *all* data was transferred (perhaps other nodes need to transfer
> more data) but nothing is actually happening so I assume all has been 
> moved.
> >> Perhaps it's a configuration error from my part. Should I use I use
> AutoBootstrap=true ? Anything else I should look out for in the
> configuration file or something else?
> >>
> >>
> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani 
> wrote:
> >>
> >> In 0.6, locate the node doing anti-compaction and look in the
> "streams" subdirectory in the keyspace data dir to monitor the
> anti-compaction progress (it puts new SSTables for bootstrapping node in
> there)
> >>
> >>
> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory 
> wrote:
> >>
> >>
> >> Running nodetool decommission didn't help. Actually the node refused
> to decommission itself (b/c it wasn't part of the ring). So I simply 
> stopped
> the process, deleted all the data directories and started it again. It
> worked in the sense of the node bootstrapped again but as before, after it
> had finished moving the data nothing happened for a long time (I'm still
> waiting, but nothing seems to be happening).
> >>
> >>
> >>
> >>
> >> Any hints how to analyze 

Re: Bootstrapping taking long

2011-01-05 Thread Ran Tavory
My conclusion is lame: I tried this on several hosts and saw the same
behavior, the only way I was able to join new nodes was to first start them
when they are *not in* their own seeds list and after they
finish transferring the data, then restart them with themselves *in* their
own seeds list. After doing that the node would join the ring.
This is either my misunderstanding or a bug, but the only place I found it
documented stated that the new node should not be in its own seeds list.
Version 0.6.6.

On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn  wrote:

> My nodes all have themselves in their list of seeds - always did - and
> everything works. (You may ask why I did this. I don't know, I must have
> copied it from an example somewhere.)
>
> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory  wrote:
>
>> I was able to make the node join the ring but I'm confused.
>> What I did is, first when adding the node, this node was not in the seeds
>> list of itself. AFAIK this is how it's supposed to be. So it was able to
>> transfer all data to itself from other nodes but then it stayed in the
>> bootstrapping state.
>> So what I did (and I don't know why it works), is add this node to the
>> seeds list in its own storage-conf.xml file. Then restart the server and
>> then I finally see it in the ring...
>> If I had added the node to the seeds list of itself when first joining it,
>> it would not join the ring but if I do it in two phases it did work.
>> So it's either my misunderstanding or a bug...
>>
>>
>> On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory  wrote:
>>
>>> The new node does not see itself as part of the ring, it sees all others
>>> but itself, so from that perspective the view is consistent.
>>> The only problem is that the node never finishes to bootstrap. It stays
>>> in this state for hours (It's been 20 hours now...)
>>>
>>>
>>> $ bin/nodetool -p 9004 -h localhost streams
 Mode: Bootstrapping
 Not sending any streams.
 Not receiving any streams.
>>>
>>>
>>> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall  wrote:
>>>
 Does the new node have itself in the list of seeds per chance? This
 could cause some issues if so.

 On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory  wrote:
 > I'm still at lost.   I haven't been able to resolve this. I tried
 > adding another node at a different location on the ring but this node
 > too remains stuck in the bootstrapping state for many hours without
 > any of the other nodes being busy with anti compaction or anything
 > else. I don't know what's keeping it from finishing the bootstrap,no
 > CPU, no io, files were already streamed so what is it waiting for?
 > I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
 > be anything addressing a similar issue so I figured there was no point
 > in upgrading. But let me know if you think there is.
 > Or any other advice...
 >
 > On Tuesday, January 4, 2011, Ran Tavory  wrote:
 >> Thanks Jake, but unfortunately the streams directory is empty so I
 don't think that any of the nodes is anti-compacting data right now or had
 been in the past 5 hours. It seems that all the data was already 
 transferred
 to the joining host but the joining node, after having received the data
 would still remain in bootstrapping mode and not join the cluster. I'm not
 sure that *all* data was transferred (perhaps other nodes need to transfer
 more data) but nothing is actually happening so I assume all has been 
 moved.
 >> Perhaps it's a configuration error from my part. Should I use I use
 AutoBootstrap=true ? Anything else I should look out for in the
 configuration file or something else?
 >>
 >>
 >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani 
 wrote:
 >>
 >> In 0.6, locate the node doing anti-compaction and look in the
 "streams" subdirectory in the keyspace data dir to monitor the
 anti-compaction progress (it puts new SSTables for bootstrapping node in
 there)
 >>
 >>
 >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory  wrote:
 >>
 >>
 >> Running nodetool decommission didn't help. Actually the node refused
 to decommission itself (b/c it wasn't part of the ring). So I simply 
 stopped
 the process, deleted all the data directories and started it again. It
 worked in the sense of the node bootstrapped again but as before, after it
 had finished moving the data nothing happened for a long time (I'm still
 waiting, but nothing seems to be happening).
 >>
 >>
 >>
 >>
 >> Any hints how to analyze a "stuck" bootstrapping node??thanks
 >> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory  wrote:
 >> Thanks Shimi, so indeed anticompaction was run on one of the other
 nodes from the same DC but to my understanding it has already ended. A few
 hour ago...
 >>
 >>
 >>
 >> I plenty of log messages such as

Re: Bootstrapping taking long

2011-01-05 Thread David Boxenhorn
My nodes all have themselves in their list of seeds - always did - and
everything works. (You may ask why I did this. I don't know, I must have
copied it from an example somewhere.)

On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory  wrote:

> I was able to make the node join the ring but I'm confused.
> What I did is, first when adding the node, this node was not in the seeds
> list of itself. AFAIK this is how it's supposed to be. So it was able to
> transfer all data to itself from other nodes but then it stayed in the
> bootstrapping state.
> So what I did (and I don't know why it works), is add this node to the
> seeds list in its own storage-conf.xml file. Then restart the server and
> then I finally see it in the ring...
> If I had added the node to the seeds list of itself when first joining it,
> it would not join the ring but if I do it in two phases it did work.
> So it's either my misunderstanding or a bug...
>
>
> On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory  wrote:
>
>> The new node does not see itself as part of the ring, it sees all others
>> but itself, so from that perspective the view is consistent.
>> The only problem is that the node never finishes to bootstrap. It stays in
>> this state for hours (It's been 20 hours now...)
>>
>>
>> $ bin/nodetool -p 9004 -h localhost streams
>>> Mode: Bootstrapping
>>> Not sending any streams.
>>> Not receiving any streams.
>>
>>
>> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall  wrote:
>>
>>> Does the new node have itself in the list of seeds per chance? This
>>> could cause some issues if so.
>>>
>>> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory  wrote:
>>> > I'm still at lost.   I haven't been able to resolve this. I tried
>>> > adding another node at a different location on the ring but this node
>>> > too remains stuck in the bootstrapping state for many hours without
>>> > any of the other nodes being busy with anti compaction or anything
>>> > else. I don't know what's keeping it from finishing the bootstrap,no
>>> > CPU, no io, files were already streamed so what is it waiting for?
>>> > I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
>>> > be anything addressing a similar issue so I figured there was no point
>>> > in upgrading. But let me know if you think there is.
>>> > Or any other advice...
>>> >
>>> > On Tuesday, January 4, 2011, Ran Tavory  wrote:
>>> >> Thanks Jake, but unfortunately the streams directory is empty so I
>>> don't think that any of the nodes is anti-compacting data right now or had
>>> been in the past 5 hours. It seems that all the data was already transferred
>>> to the joining host but the joining node, after having received the data
>>> would still remain in bootstrapping mode and not join the cluster. I'm not
>>> sure that *all* data was transferred (perhaps other nodes need to transfer
>>> more data) but nothing is actually happening so I assume all has been moved.
>>> >> Perhaps it's a configuration error from my part. Should I use I use
>>> AutoBootstrap=true ? Anything else I should look out for in the
>>> configuration file or something else?
>>> >>
>>> >>
>>> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani 
>>> wrote:
>>> >>
>>> >> In 0.6, locate the node doing anti-compaction and look in the
>>> "streams" subdirectory in the keyspace data dir to monitor the
>>> anti-compaction progress (it puts new SSTables for bootstrapping node in
>>> there)
>>> >>
>>> >>
>>> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory  wrote:
>>> >>
>>> >>
>>> >> Running nodetool decommission didn't help. Actually the node refused
>>> to decommission itself (b/c it wasn't part of the ring). So I simply stopped
>>> the process, deleted all the data directories and started it again. It
>>> worked in the sense of the node bootstrapped again but as before, after it
>>> had finished moving the data nothing happened for a long time (I'm still
>>> waiting, but nothing seems to be happening).
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> Any hints how to analyze a "stuck" bootstrapping node??thanks
>>> >> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory  wrote:
>>> >> Thanks Shimi, so indeed anticompaction was run on one of the other
>>> nodes from the same DC but to my understanding it has already ended. A few
>>> hour ago...
>>> >>
>>> >>
>>> >>
>>> >> I plenty of log messages such as [1] which ended a couple of hours
>>> ago, and I've seen the new node streaming and accepting the data from the
>>> node which performed the anticompaction and so far it was normal so it
>>> seemed that data is at its right place. But now the new node seems sort of
>>> stuck. None of the other nodes is anticompacting right now or had been
>>> anticompacting since then.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> The new node's CPU is close to zero, it's iostats are almost zero so I
>>> can't find another bottleneck that would keep it hanging.
>>> >> On the IRC someone suggested I'd maybe retry to join this node,
>>> e.g. decommission and rejoin it again. I'll try it now...
>>> >>

Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory
I was able to make the node join the ring but I'm confused.
What I did is, first when adding the node, this node was not in the seeds
list of itself. AFAIK this is how it's supposed to be. So it was able to
transfer all data to itself from other nodes but then it stayed in the
bootstrapping state.
So what I did (and I don't know why it works), is add this node to the seeds
list in its own storage-conf.xml file. Then restart the server and then I
finally see it in the ring...
If I had added the node to the seeds list of itself when first joining it,
it would not join the ring but if I do it in two phases it did work.
So it's either my misunderstanding or a bug...

On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory  wrote:

> The new node does not see itself as part of the ring, it sees all others
> but itself, so from that perspective the view is consistent.
> The only problem is that the node never finishes to bootstrap. It stays in
> this state for hours (It's been 20 hours now...)
>
>
> $ bin/nodetool -p 9004 -h localhost streams
>> Mode: Bootstrapping
>> Not sending any streams.
>> Not receiving any streams.
>
>
> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall  wrote:
>
>> Does the new node have itself in the list of seeds per chance? This
>> could cause some issues if so.
>>
>> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory  wrote:
>> > I'm still at lost.   I haven't been able to resolve this. I tried
>> > adding another node at a different location on the ring but this node
>> > too remains stuck in the bootstrapping state for many hours without
>> > any of the other nodes being busy with anti compaction or anything
>> > else. I don't know what's keeping it from finishing the bootstrap,no
>> > CPU, no io, files were already streamed so what is it waiting for?
>> > I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
>> > be anything addressing a similar issue so I figured there was no point
>> > in upgrading. But let me know if you think there is.
>> > Or any other advice...
>> >
>> > On Tuesday, January 4, 2011, Ran Tavory  wrote:
>> >> Thanks Jake, but unfortunately the streams directory is empty so I
>> don't think that any of the nodes is anti-compacting data right now or had
>> been in the past 5 hours. It seems that all the data was already transferred
>> to the joining host but the joining node, after having received the data
>> would still remain in bootstrapping mode and not join the cluster. I'm not
>> sure that *all* data was transferred (perhaps other nodes need to transfer
>> more data) but nothing is actually happening so I assume all has been moved.
>> >> Perhaps it's a configuration error from my part. Should I use I use
>> AutoBootstrap=true ? Anything else I should look out for in the
>> configuration file or something else?
>> >>
>> >>
>> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani  wrote:
>> >>
>> >> In 0.6, locate the node doing anti-compaction and look in the "streams"
>> subdirectory in the keyspace data dir to monitor the anti-compaction
>> progress (it puts new SSTables for bootstrapping node in there)
>> >>
>> >>
>> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory  wrote:
>> >>
>> >>
>> >> Running nodetool decommission didn't help. Actually the node refused to
>> decommission itself (b/c it wasn't part of the ring). So I simply stopped
>> the process, deleted all the data directories and started it again. It
>> worked in the sense of the node bootstrapped again but as before, after it
>> had finished moving the data nothing happened for a long time (I'm still
>> waiting, but nothing seems to be happening).
>> >>
>> >>
>> >>
>> >>
>> >> Any hints how to analyze a "stuck" bootstrapping node??thanks
>> >> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory  wrote:
>> >> Thanks Shimi, so indeed anticompaction was run on one of the other
>> nodes from the same DC but to my understanding it has already ended. A few
>> hour ago...
>> >>
>> >>
>> >>
>> >> I plenty of log messages such as [1] which ended a couple of hours ago,
>> and I've seen the new node streaming and accepting the data from the node
>> which performed the anticompaction and so far it was normal so it seemed
>> that data is at its right place. But now the new node seems sort of stuck.
>> None of the other nodes is anticompacting right now or had been
>> anticompacting since then.
>> >>
>> >>
>> >>
>> >>
>> >> The new node's CPU is close to zero, it's iostats are almost zero so I
>> can't find another bottleneck that would keep it hanging.
>> >> On the IRC someone suggested I'd maybe retry to join this node,
>> e.g. decommission and rejoin it again. I'll try it now...
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721
>> CompactionManager.java (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>> >>
>> >>
>> >>
>> >>
>> >>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683
>> CompactionManager.java 

Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory
The new node does not see itself as part of the ring, it sees all others but
itself, so from that perspective the view is consistent.
The only problem is that the node never finishes to bootstrap. It stays in
this state for hours (It's been 20 hours now...)

$ bin/nodetool -p 9004 -h localhost streams
> Mode: Bootstrapping
> Not sending any streams.
> Not receiving any streams.


On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall  wrote:

> Does the new node have itself in the list of seeds per chance? This
> could cause some issues if so.
>
> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory  wrote:
> > I'm still at lost.   I haven't been able to resolve this. I tried
> > adding another node at a different location on the ring but this node
> > too remains stuck in the bootstrapping state for many hours without
> > any of the other nodes being busy with anti compaction or anything
> > else. I don't know what's keeping it from finishing the bootstrap,no
> > CPU, no io, files were already streamed so what is it waiting for?
> > I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
> > be anything addressing a similar issue so I figured there was no point
> > in upgrading. But let me know if you think there is.
> > Or any other advice...
> >
> > On Tuesday, January 4, 2011, Ran Tavory  wrote:
> >> Thanks Jake, but unfortunately the streams directory is empty so I don't
> think that any of the nodes is anti-compacting data right now or had been in
> the past 5 hours. It seems that all the data was already transferred to the
> joining host but the joining node, after having received the data would
> still remain in bootstrapping mode and not join the cluster. I'm not sure
> that *all* data was transferred (perhaps other nodes need to transfer more
> data) but nothing is actually happening so I assume all has been moved.
> >> Perhaps it's a configuration error from my part. Should I use I use
> AutoBootstrap=true ? Anything else I should look out for in the
> configuration file or something else?
> >>
> >>
> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani  wrote:
> >>
> >> In 0.6, locate the node doing anti-compaction and look in the "streams"
> subdirectory in the keyspace data dir to monitor the anti-compaction
> progress (it puts new SSTables for bootstrapping node in there)
> >>
> >>
> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory  wrote:
> >>
> >>
> >> Running nodetool decommission didn't help. Actually the node refused to
> decommission itself (b/c it wasn't part of the ring). So I simply stopped
> the process, deleted all the data directories and started it again. It
> worked in the sense of the node bootstrapped again but as before, after it
> had finished moving the data nothing happened for a long time (I'm still
> waiting, but nothing seems to be happening).
> >>
> >>
> >>
> >>
> >> Any hints how to analyze a "stuck" bootstrapping node??thanks
> >> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory  wrote:
> >> Thanks Shimi, so indeed anticompaction was run on one of the other nodes
> from the same DC but to my understanding it has already ended. A few hour
> ago...
> >>
> >>
> >>
> >> I plenty of log messages such as [1] which ended a couple of hours ago,
> and I've seen the new node streaming and accepting the data from the node
> which performed the anticompaction and so far it was normal so it seemed
> that data is at its right place. But now the new node seems sort of stuck.
> None of the other nodes is anticompacting right now or had been
> anticompacting since then.
> >>
> >>
> >>
> >>
> >> The new node's CPU is close to zero, it's iostats are almost zero so I
> can't find another bottleneck that would keep it hanging.
> >> On the IRC someone suggested I'd maybe retry to join this node,
> e.g. decommission and rejoin it again. I'll try it now...
> >>
> >>
> >>
> >>
> >>
> >>
> >> [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721
> CompactionManager.java (line 338) AntiCompacting
> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
> >>
> >>
> >>
> >>
> >>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
> (line 338) AntiCompacting
> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
> >>
> >>
> >>
> >>
> >>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
> (line 338) AntiCompacting
> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
> >>
> >>
> >

Re: Bootstrapping taking long

2011-01-04 Thread Nate McCall
Does the new node have itself in the list of seeds per chance? This
could cause some issues if so.

On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory  wrote:
> I'm still at lost.   I haven't been able to resolve this. I tried
> adding another node at a different location on the ring but this node
> too remains stuck in the bootstrapping state for many hours without
> any of the other nodes being busy with anti compaction or anything
> else. I don't know what's keeping it from finishing the bootstrap,no
> CPU, no io, files were already streamed so what is it waiting for?
> I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
> be anything addressing a similar issue so I figured there was no point
> in upgrading. But let me know if you think there is.
> Or any other advice...
>
> On Tuesday, January 4, 2011, Ran Tavory  wrote:
>> Thanks Jake, but unfortunately the streams directory is empty so I don't 
>> think that any of the nodes is anti-compacting data right now or had been in 
>> the past 5 hours. It seems that all the data was already transferred to the 
>> joining host but the joining node, after having received the data would 
>> still remain in bootstrapping mode and not join the cluster. I'm not sure 
>> that *all* data was transferred (perhaps other nodes need to transfer more 
>> data) but nothing is actually happening so I assume all has been moved.
>> Perhaps it's a configuration error from my part. Should I use I use 
>> AutoBootstrap=true ? Anything else I should look out for in the 
>> configuration file or something else?
>>
>>
>> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani  wrote:
>>
>> In 0.6, locate the node doing anti-compaction and look in the "streams" 
>> subdirectory in the keyspace data dir to monitor the anti-compaction 
>> progress (it puts new SSTables for bootstrapping node in there)
>>
>>
>> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory  wrote:
>>
>>
>> Running nodetool decommission didn't help. Actually the node refused to 
>> decommission itself (b/c it wasn't part of the ring). So I simply stopped 
>> the process, deleted all the data directories and started it again. It 
>> worked in the sense of the node bootstrapped again but as before, after it 
>> had finished moving the data nothing happened for a long time (I'm still 
>> waiting, but nothing seems to be happening).
>>
>>
>>
>>
>> Any hints how to analyze a "stuck" bootstrapping node??thanks
>> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory  wrote:
>> Thanks Shimi, so indeed anticompaction was run on one of the other nodes 
>> from the same DC but to my understanding it has already ended. A few hour 
>> ago...
>>
>>
>>
>> I plenty of log messages such as [1] which ended a couple of hours ago, and 
>> I've seen the new node streaming and accepting the data from the node which 
>> performed the anticompaction and so far it was normal so it seemed that data 
>> is at its right place. But now the new node seems sort of stuck. None of the 
>> other nodes is anticompacting right now or had been anticompacting since 
>> then.
>>
>>
>>
>>
>> The new node's CPU is close to zero, it's iostats are almost zero so I can't 
>> find another bottleneck that would keep it hanging.
>> On the IRC someone suggested I'd maybe retry to join this node, 
>> e.g. decommission and rejoin it again. I'll try it now...
>>
>>
>>
>>
>>
>>
>> [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java 
>> (line 338) AntiCompacting 
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>>
>>
>>
>>
>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java 
>> (line 338) AntiCompacting 
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
>>
>>
>>
>>
>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java 
>> (line 338) AntiCompacting 
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
>>
>>
>>
>>
>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java 
>> (line 338) AntiCompacting 
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>>
>>
>>
>>
>>
>> On Tue, Jan 4, 2011 at 12:45 PM, shimi  wrote:
>>
>>
>>
>>
>>
>> In my experience most of the time it takes for a node to join the cluster is 
>> the anticompaction on the other nodes. The streaming part is very fast.
>> Check the other nodes logs to see if there is a

Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory
I'm still at lost.   I haven't been able to resolve this. I tried
adding another node at a different location on the ring but this node
too remains stuck in the bootstrapping state for many hours without
any of the other nodes being busy with anti compaction or anything
else. I don't know what's keeping it from finishing the bootstrap,no
CPU, no io, files were already streamed so what is it waiting for?
I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
be anything addressing a similar issue so I figured there was no point
in upgrading. But let me know if you think there is.
Or any other advice...

On Tuesday, January 4, 2011, Ran Tavory  wrote:
> Thanks Jake, but unfortunately the streams directory is empty so I don't 
> think that any of the nodes is anti-compacting data right now or had been in 
> the past 5 hours. It seems that all the data was already transferred to the 
> joining host but the joining node, after having received the data would still 
> remain in bootstrapping mode and not join the cluster. I'm not sure that 
> *all* data was transferred (perhaps other nodes need to transfer more data) 
> but nothing is actually happening so I assume all has been moved.
> Perhaps it's a configuration error from my part. Should I use I use 
> AutoBootstrap=true ? Anything else I should look out for in the configuration 
> file or something else?
>
>
> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani  wrote:
>
> In 0.6, locate the node doing anti-compaction and look in the "streams" 
> subdirectory in the keyspace data dir to monitor the anti-compaction progress 
> (it puts new SSTables for bootstrapping node in there)
>
>
> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory  wrote:
>
>
> Running nodetool decommission didn't help. Actually the node refused to 
> decommission itself (b/c it wasn't part of the ring). So I simply stopped the 
> process, deleted all the data directories and started it again. It worked in 
> the sense of the node bootstrapped again but as before, after it had finished 
> moving the data nothing happened for a long time (I'm still waiting, but 
> nothing seems to be happening).
>
>
>
>
> Any hints how to analyze a "stuck" bootstrapping node??thanks
> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory  wrote:
> Thanks Shimi, so indeed anticompaction was run on one of the other nodes from 
> the same DC but to my understanding it has already ended. A few hour ago...
>
>
>
> I plenty of log messages such as [1] which ended a couple of hours ago, and 
> I've seen the new node streaming and accepting the data from the node which 
> performed the anticompaction and so far it was normal so it seemed that data 
> is at its right place. But now the new node seems sort of stuck. None of the 
> other nodes is anticompacting right now or had been anticompacting since then.
>
>
>
>
> The new node's CPU is close to zero, it's iostats are almost zero so I can't 
> find another bottleneck that would keep it hanging.
> On the IRC someone suggested I'd maybe retry to join this node, 
> e.g. decommission and rejoin it again. I'll try it now...
>
>
>
>
>
>
> [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java 
> (line 338) AntiCompacting 
> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>
>
>
>
>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java 
> (line 338) AntiCompacting 
> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
>
>
>
>
>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java 
> (line 338) AntiCompacting 
> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
>
>
>
>
>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java 
> (line 338) AntiCompacting 
> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>
>
>
>
>
> On Tue, Jan 4, 2011 at 12:45 PM, shimi  wrote:
>
>
>
>
>
> In my experience most of the time it takes for a node to join the cluster is 
> the anticompaction on the other nodes. The streaming part is very fast.
> Check the other nodes logs to see if there is any node doing anticompaction.I 
> don't remember how much data I had in the cluster when I needed to add/remove 
> nodes. I do remember that it took a few hours.
>
>
>
>
>
>
> The node will join the ring only when it will finish the bootstrap.
> --
> /Ran
>
>

-- 
/Ran


Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory
Thanks Jake, but unfortunately the streams directory is empty so I don't
think that any of the nodes is anti-compacting data right now or had been in
the past 5 hours.
It seems that all the data was already transferred to the joining host but
the joining node, after having received the data would still remain in
bootstrapping mode and not join the cluster. I'm not sure that *all* data
was transferred (perhaps other nodes need to transfer more data) but nothing
is actually happening so I assume all has been moved.
Perhaps it's a configuration error from my part. Should I use I use
AutoBootstrap=true ? Anything else I should look out for in the
configuration file or something else?


On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani  wrote:

> In 0.6, locate the node doing anti-compaction and look in the "streams"
> subdirectory in the keyspace data dir to monitor the anti-compaction
> progress (it puts new SSTables for bootstrapping node in there)
>
>
> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory  wrote:
>
>> Running nodetool decommission didn't help. Actually the node refused to
>> decommission itself (b/c it wasn't part of the ring). So I simply stopped
>> the process, deleted all the data directories and started it again. It
>> worked in the sense of the node bootstrapped again but as before, after it
>> had finished moving the data nothing happened for a long time (I'm still
>> waiting, but nothing seems to be happening).
>>
>> Any hints how to analyze a "stuck" bootstrapping node??
>> thanks
>>
>> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory  wrote:
>>
>>> Thanks Shimi, so indeed anticompaction was run on one of the other nodes
>>> from the same DC but to my understanding it has already ended. A few hour
>>> ago...
>>> I plenty of log messages such as [1] which ended a couple of hours ago,
>>> and I've seen the new node streaming and accepting the data from the node
>>> which performed the anticompaction and so far it was normal so it seemed
>>> that data is at its right place. But now the new node seems sort of stuck.
>>> None of the other nodes is anticompacting right now or had been
>>> anticompacting since then.
>>> The new node's CPU is close to zero, it's iostats are almost zero so I
>>> can't find another bottleneck that would keep it hanging.
>>>
>>> On the IRC someone suggested I'd maybe retry to join this node,
>>> e.g. decommission and rejoin it again. I'll try it now...
>>>
>>>
>>> [1]
>>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java
>>> (line 338) AntiCompacting
>>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
>>> (line 338) AntiCompacting
>>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
>>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
>>> (line 338) AntiCompacting
>>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
>>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
>>> (line 338) AntiCompacting
>>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>>>
>>> On Tue, Jan 4, 2011 at 12:45 PM, shimi  wrote:
>>>
 In my experience most of the time it takes for a node to join the
 cluster is the anticompaction on the other nodes. The streaming part is 
 very
 fast.
 Check the other nodes logs to see if there is any node doing
 anticompaction.
 I don't remember how much data I had in the cluster when I needed to
 add/remove nodes. I do remember that it took a few hours.

 The node will join the ring only when it will finish the bootstrap.

 Shimi


 On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory  wrote:

> I asked the same question on the IRC but no luck there, everyone's
> asleep ;)...
>
> Using 0.6.6 I'm adding a new node to the cluster.
> It starts out fine but then gets stuck on the bootstrapping state for
> too long. More than an hour and still counting.
>
> $ bin/nodetool -p 9004 -h localhost streams
>> Mode: Bootstrapping
>> Not sending any streams.
>> Not receiving any streams.
>
>
> It seemed to have streamed data from other nodes and indeed the load is
> non-zero but I'm not clear what's keeping it right now from finishing.
>
>> $ bin/nodetoo

Re: Bootstrapping taking long

2011-01-04 Thread shimi
You will have something new to talk about in your talk tomorrow :)

You said that the anti compaction was only on a single node? I think that
your new node should get data from at least two other nodes (depending on
the replication factor). Maybe the problem is not in the new node.
In old version (I think prior to 0.6.3) there was case of stuck bootstrap
that required restart to the new node and the nodes which were suppose to
stream data to it. As far as I remember this case was resolved. I haven't
seen this problem since then.

Shimi

On Tue, Jan 4, 2011 at 3:01 PM, Ran Tavory  wrote:

> Running nodetool decommission didn't help. Actually the node refused to
> decommission itself (b/c it wasn't part of the ring). So I simply stopped
> the process, deleted all the data directories and started it again. It
> worked in the sense of the node bootstrapped again but as before, after it
> had finished moving the data nothing happened for a long time (I'm still
> waiting, but nothing seems to be happening).
>
> Any hints how to analyze a "stuck" bootstrapping node??
> thanks
>
> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory  wrote:
>
>> Thanks Shimi, so indeed anticompaction was run on one of the other nodes
>> from the same DC but to my understanding it has already ended. A few hour
>> ago...
>> I plenty of log messages such as [1] which ended a couple of hours ago,
>> and I've seen the new node streaming and accepting the data from the node
>> which performed the anticompaction and so far it was normal so it seemed
>> that data is at its right place. But now the new node seems sort of stuck.
>> None of the other nodes is anticompacting right now or had been
>> anticompacting since then.
>> The new node's CPU is close to zero, it's iostats are almost zero so I
>> can't find another bottleneck that would keep it hanging.
>>
>> On the IRC someone suggested I'd maybe retry to join this node,
>> e.g. decommission and rejoin it again. I'll try it now...
>>
>>
>> [1]
>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java
>> (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
>> (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
>> (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
>> (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>>
>> On Tue, Jan 4, 2011 at 12:45 PM, shimi  wrote:
>>
>>> In my experience most of the time it takes for a node to join the cluster
>>> is the anticompaction on the other nodes. The streaming part is very fast.
>>> Check the other nodes logs to see if there is any node doing
>>> anticompaction.
>>> I don't remember how much data I had in the cluster when I needed to
>>> add/remove nodes. I do remember that it took a few hours.
>>>
>>> The node will join the ring only when it will finish the bootstrap.
>>>
>>> Shimi
>>>
>>>
>>> On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory  wrote:
>>>
 I asked the same question on the IRC but no luck there, everyone's
 asleep ;)...

 Using 0.6.6 I'm adding a new node to the cluster.
 It starts out fine but then gets stuck on the bootstrapping state for
 too long. More than an hour and still counting.

 $ bin/nodetool -p 9004 -h localhost streams
> Mode: Bootstrapping
> Not sending any streams.
> Not receiving any streams.


 It seemed to have streamed data from other nodes and indeed the load is
 non-zero but I'm not clear what's keeping it right now from finishing.

> $ bin/nodetool -p 9004 -h localhost info
> 51042355038140769519506191114765231716
> Load : 22.49 GB
> Generation No: 1294133781
> Uptime (seconds) : 1795
> Heap Memory (MB) : 315.31 / 6117.00


 nodetool ring does not list this new node in the ring, although nodetool
 can happily talk to the new node, it's just not listing itself as a member
 of the ring. This is expected when the node is still bootstrapping, so the
 question is still how long might the b

Re: Bootstrapping taking long

2011-01-04 Thread Jake Luciani
In 0.6, locate the node doing anti-compaction and look in the "streams"
subdirectory in the keyspace data dir to monitor the anti-compaction
progress (it puts new SSTables for bootstrapping node in there)

On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory  wrote:

> Running nodetool decommission didn't help. Actually the node refused to
> decommission itself (b/c it wasn't part of the ring). So I simply stopped
> the process, deleted all the data directories and started it again. It
> worked in the sense of the node bootstrapped again but as before, after it
> had finished moving the data nothing happened for a long time (I'm still
> waiting, but nothing seems to be happening).
>
> Any hints how to analyze a "stuck" bootstrapping node??
> thanks
>
> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory  wrote:
>
>> Thanks Shimi, so indeed anticompaction was run on one of the other nodes
>> from the same DC but to my understanding it has already ended. A few hour
>> ago...
>> I plenty of log messages such as [1] which ended a couple of hours ago,
>> and I've seen the new node streaming and accepting the data from the node
>> which performed the anticompaction and so far it was normal so it seemed
>> that data is at its right place. But now the new node seems sort of stuck.
>> None of the other nodes is anticompacting right now or had been
>> anticompacting since then.
>> The new node's CPU is close to zero, it's iostats are almost zero so I
>> can't find another bottleneck that would keep it hanging.
>>
>> On the IRC someone suggested I'd maybe retry to join this node,
>> e.g. decommission and rejoin it again. I'll try it now...
>>
>>
>> [1]
>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java
>> (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
>> (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
>> (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
>> (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>>
>> On Tue, Jan 4, 2011 at 12:45 PM, shimi  wrote:
>>
>>> In my experience most of the time it takes for a node to join the cluster
>>> is the anticompaction on the other nodes. The streaming part is very fast.
>>> Check the other nodes logs to see if there is any node doing
>>> anticompaction.
>>> I don't remember how much data I had in the cluster when I needed to
>>> add/remove nodes. I do remember that it took a few hours.
>>>
>>> The node will join the ring only when it will finish the bootstrap.
>>>
>>> Shimi
>>>
>>>
>>> On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory  wrote:
>>>
 I asked the same question on the IRC but no luck there, everyone's
 asleep ;)...

 Using 0.6.6 I'm adding a new node to the cluster.
 It starts out fine but then gets stuck on the bootstrapping state for
 too long. More than an hour and still counting.

 $ bin/nodetool -p 9004 -h localhost streams
> Mode: Bootstrapping
> Not sending any streams.
> Not receiving any streams.


 It seemed to have streamed data from other nodes and indeed the load is
 non-zero but I'm not clear what's keeping it right now from finishing.

> $ bin/nodetool -p 9004 -h localhost info
> 51042355038140769519506191114765231716
> Load : 22.49 GB
> Generation No: 1294133781
> Uptime (seconds) : 1795
> Heap Memory (MB) : 315.31 / 6117.00


 nodetool ring does not list this new node in the ring, although nodetool
 can happily talk to the new node, it's just not listing itself as a member
 of the ring. This is expected when the node is still bootstrapping, so the
 question is still how long might the bootstrap take and whether is it 
 stuck.

 The data ins't huge so I find it hard to believe that streaming or anti
 compaction are the bottlenecks. I have ~20G on each node and the new node
 already has just about that so it seems that all data had already been
 streamed to it successfully, or at least most of t

Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory
Running nodetool decommission didn't help. Actually the node refused to
decommission itself (b/c it wasn't part of the ring). So I simply stopped
the process, deleted all the data directories and started it again. It
worked in the sense of the node bootstrapped again but as before, after it
had finished moving the data nothing happened for a long time (I'm still
waiting, but nothing seems to be happening).

Any hints how to analyze a "stuck" bootstrapping node??
thanks

On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory  wrote:

> Thanks Shimi, so indeed anticompaction was run on one of the other nodes
> from the same DC but to my understanding it has already ended. A few hour
> ago...
> I plenty of log messages such as [1] which ended a couple of hours ago, and
> I've seen the new node streaming and accepting the data from the node which
> performed the anticompaction and so far it was normal so it seemed that data
> is at its right place. But now the new node seems sort of stuck. None of the
> other nodes is anticompacting right now or had been anticompacting since
> then.
> The new node's CPU is close to zero, it's iostats are almost zero so I
> can't find another bottleneck that would keep it hanging.
>
> On the IRC someone suggested I'd maybe retry to join this node,
> e.g. decommission and rejoin it again. I'll try it now...
>
>
> [1]
>  INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java
> (line 338) AntiCompacting
> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
> (line 338) AntiCompacting
> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
> (line 338) AntiCompacting
> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
> (line 338) AntiCompacting
> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>
> On Tue, Jan 4, 2011 at 12:45 PM, shimi  wrote:
>
>> In my experience most of the time it takes for a node to join the cluster
>> is the anticompaction on the other nodes. The streaming part is very fast.
>> Check the other nodes logs to see if there is any node doing
>> anticompaction.
>> I don't remember how much data I had in the cluster when I needed to
>> add/remove nodes. I do remember that it took a few hours.
>>
>> The node will join the ring only when it will finish the bootstrap.
>>
>> Shimi
>>
>>
>> On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory  wrote:
>>
>>> I asked the same question on the IRC but no luck there, everyone's asleep
>>> ;)...
>>>
>>> Using 0.6.6 I'm adding a new node to the cluster.
>>> It starts out fine but then gets stuck on the bootstrapping state for too
>>> long. More than an hour and still counting.
>>>
>>> $ bin/nodetool -p 9004 -h localhost streams
 Mode: Bootstrapping
 Not sending any streams.
 Not receiving any streams.
>>>
>>>
>>> It seemed to have streamed data from other nodes and indeed the load is
>>> non-zero but I'm not clear what's keeping it right now from finishing.
>>>
 $ bin/nodetool -p 9004 -h localhost info
 51042355038140769519506191114765231716
 Load : 22.49 GB
 Generation No: 1294133781
 Uptime (seconds) : 1795
 Heap Memory (MB) : 315.31 / 6117.00
>>>
>>>
>>> nodetool ring does not list this new node in the ring, although nodetool
>>> can happily talk to the new node, it's just not listing itself as a member
>>> of the ring. This is expected when the node is still bootstrapping, so the
>>> question is still how long might the bootstrap take and whether is it stuck.
>>>
>>> The data ins't huge so I find it hard to believe that streaming or anti
>>> compaction are the bottlenecks. I have ~20G on each node and the new node
>>> already has just about that so it seems that all data had already been
>>> streamed to it successfully, or at least most of the data... So what is it
>>> waiting for now? (same question, rephrased... ;)
>>>
>>> I tried:
>>> 1. Restarting the new node. No good. All logs seem normal but at the end
>>> the node is still in bootstrap mode.
>>> 2. As someone suggested I increased the rpc timeout from 10k to 30k
>>> (RpcTimeoutInMillis) but that didn't seem to help. I did this only on the
>

Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory
Thanks Shimi, so indeed anticompaction was run on one of the other nodes
from the same DC but to my understanding it has already ended. A few hour
ago...
I plenty of log messages such as [1] which ended a couple of hours ago, and
I've seen the new node streaming and accepting the data from the node which
performed the anticompaction and so far it was normal so it seemed that data
is at its right place. But now the new node seems sort of stuck. None of the
other nodes is anticompacting right now or had been anticompacting since
then.
The new node's CPU is close to zero, it's iostats are almost zero so I can't
find another bottleneck that would keep it hanging.

On the IRC someone suggested I'd maybe retry to join this node,
e.g. decommission and rejoin it again. I'll try it now...


[1]
 INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
 INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
 INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
 INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]

On Tue, Jan 4, 2011 at 12:45 PM, shimi  wrote:

> In my experience most of the time it takes for a node to join the cluster
> is the anticompaction on the other nodes. The streaming part is very fast.
> Check the other nodes logs to see if there is any node doing
> anticompaction.
> I don't remember how much data I had in the cluster when I needed to
> add/remove nodes. I do remember that it took a few hours.
>
> The node will join the ring only when it will finish the bootstrap.
>
> Shimi
>
>
> On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory  wrote:
>
>> I asked the same question on the IRC but no luck there, everyone's asleep
>> ;)...
>>
>> Using 0.6.6 I'm adding a new node to the cluster.
>> It starts out fine but then gets stuck on the bootstrapping state for too
>> long. More than an hour and still counting.
>>
>> $ bin/nodetool -p 9004 -h localhost streams
>>> Mode: Bootstrapping
>>> Not sending any streams.
>>> Not receiving any streams.
>>
>>
>> It seemed to have streamed data from other nodes and indeed the load is
>> non-zero but I'm not clear what's keeping it right now from finishing.
>>
>>> $ bin/nodetool -p 9004 -h localhost info
>>> 51042355038140769519506191114765231716
>>> Load : 22.49 GB
>>> Generation No: 1294133781
>>> Uptime (seconds) : 1795
>>> Heap Memory (MB) : 315.31 / 6117.00
>>
>>
>> nodetool ring does not list this new node in the ring, although nodetool
>> can happily talk to the new node, it's just not listing itself as a member
>> of the ring. This is expected when the node is still bootstrapping, so the
>> question is still how long might the bootstrap take and whether is it stuck.
>>
>> The data ins't huge so I find it hard to believe that streaming or anti
>> compaction are the bottlenecks. I have ~20G on each node and the new node
>> already has just about that so it seems that all data had already been
>> streamed to it successfully, or at least most of the data... So what is it
>> waiting for now? (same question, rephrased... ;)
>>
>> I tried:
>> 1. Restarting the new node. No good. All logs seem normal but at the end
>> the node is still in bootstrap mode.
>> 2. As someone suggested I increased the rpc timeout from 10k to 30k
>> (RpcTimeoutInMillis) but that didn't seem to help. I did this only on the
>> new node. Should I have done that on all (old) nodes as well? Or maybe only
>> on the ones that were supposed to stream data to that node.
>> 3. Logging level at DEBUG now but nothing interesting going on except
>> for occasional messages such as [1] or [2]
>>
>> So the question is: what's keeping the new node from finishing the
>> bootstrap and how can I check its status?
>> Thanks
>>
>> [1] DEBUG [Timer-1] 2011-01-04 05:21:24,402 LoadDisseminator.java (line
>> 36) Disseminating load info ...
>> [2] DEBUG [RMI TCP Connection(22)-192.168.252.88] 2011-01-04 05:12:48,033
>> StorageService.java (line 1189) computing ranges for
>> 283

Re: Bootstrapping taking long

2011-01-04 Thread shimi
In my experience most of the time it takes for a node to join the cluster is
the anticompaction on the other nodes. The streaming part is very fast.
Check the other nodes logs to see if there is any node doing anticompaction.
I don't remember how much data I had in the cluster when I needed to
add/remove nodes. I do remember that it took a few hours.

The node will join the ring only when it will finish the bootstrap.

Shimi


On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory  wrote:

> I asked the same question on the IRC but no luck there, everyone's asleep
> ;)...
>
> Using 0.6.6 I'm adding a new node to the cluster.
> It starts out fine but then gets stuck on the bootstrapping state for too
> long. More than an hour and still counting.
>
> $ bin/nodetool -p 9004 -h localhost streams
>> Mode: Bootstrapping
>> Not sending any streams.
>> Not receiving any streams.
>
>
> It seemed to have streamed data from other nodes and indeed the load is
> non-zero but I'm not clear what's keeping it right now from finishing.
>
>> $ bin/nodetool -p 9004 -h localhost info
>> 51042355038140769519506191114765231716
>> Load : 22.49 GB
>> Generation No: 1294133781
>> Uptime (seconds) : 1795
>> Heap Memory (MB) : 315.31 / 6117.00
>
>
> nodetool ring does not list this new node in the ring, although nodetool
> can happily talk to the new node, it's just not listing itself as a member
> of the ring. This is expected when the node is still bootstrapping, so the
> question is still how long might the bootstrap take and whether is it stuck.
>
> The data ins't huge so I find it hard to believe that streaming or anti
> compaction are the bottlenecks. I have ~20G on each node and the new node
> already has just about that so it seems that all data had already been
> streamed to it successfully, or at least most of the data... So what is it
> waiting for now? (same question, rephrased... ;)
>
> I tried:
> 1. Restarting the new node. No good. All logs seem normal but at the end
> the node is still in bootstrap mode.
> 2. As someone suggested I increased the rpc timeout from 10k to 30k
> (RpcTimeoutInMillis) but that didn't seem to help. I did this only on the
> new node. Should I have done that on all (old) nodes as well? Or maybe only
> on the ones that were supposed to stream data to that node.
> 3. Logging level at DEBUG now but nothing interesting going on except
> for occasional messages such as [1] or [2]
>
> So the question is: what's keeping the new node from finishing the
> bootstrap and how can I check its status?
> Thanks
>
> [1] DEBUG [Timer-1] 2011-01-04 05:21:24,402 LoadDisseminator.java (line 36)
> Disseminating load info ...
> [2] DEBUG [RMI TCP Connection(22)-192.168.252.88] 2011-01-04 05:12:48,033
> StorageService.java (line 1189) computing ranges for
> 28356863910078205288614550619314017621,
> 56713727820156410577229101238628035242,
>  85070591730234615865843651857942052863,
> 113427455640312821154458202477256070484,
> 141784319550391026443072753096570088105,
> 170141183460469231731687303715884105727
>
> --
> /Ran
>
>


Bootstrapping taking long

2011-01-04 Thread Ran Tavory
I asked the same question on the IRC but no luck there, everyone's asleep
;)...

Using 0.6.6 I'm adding a new node to the cluster.
It starts out fine but then gets stuck on the bootstrapping state for too
long. More than an hour and still counting.

$ bin/nodetool -p 9004 -h localhost streams
> Mode: Bootstrapping
> Not sending any streams.
> Not receiving any streams.


It seemed to have streamed data from other nodes and indeed the load is
non-zero but I'm not clear what's keeping it right now from finishing.

> $ bin/nodetool -p 9004 -h localhost info
> 51042355038140769519506191114765231716
> Load : 22.49 GB
> Generation No: 1294133781
> Uptime (seconds) : 1795
> Heap Memory (MB) : 315.31 / 6117.00


nodetool ring does not list this new node in the ring, although nodetool can
happily talk to the new node, it's just not listing itself as a member of
the ring. This is expected when the node is still bootstrapping, so the
question is still how long might the bootstrap take and whether is it stuck.

The data ins't huge so I find it hard to believe that streaming or anti
compaction are the bottlenecks. I have ~20G on each node and the new node
already has just about that so it seems that all data had already been
streamed to it successfully, or at least most of the data... So what is it
waiting for now? (same question, rephrased... ;)

I tried:
1. Restarting the new node. No good. All logs seem normal but at the end the
node is still in bootstrap mode.
2. As someone suggested I increased the rpc timeout from 10k to 30k
(RpcTimeoutInMillis) but that didn't seem to help. I did this only on the
new node. Should I have done that on all (old) nodes as well? Or maybe only
on the ones that were supposed to stream data to that node.
3. Logging level at DEBUG now but nothing interesting going on except
for occasional messages such as [1] or [2]

So the question is: what's keeping the new node from finishing the bootstrap
and how can I check its status?
Thanks

[1] DEBUG [Timer-1] 2011-01-04 05:21:24,402 LoadDisseminator.java (line 36)
Disseminating load info ...
[2] DEBUG [RMI TCP Connection(22)-192.168.252.88] 2011-01-04 05:12:48,033
StorageService.java (line 1189) computing ranges for
28356863910078205288614550619314017621,
56713727820156410577229101238628035242,
 85070591730234615865843651857942052863,
113427455640312821154458202477256070484,
141784319550391026443072753096570088105,
170141183460469231731687303715884105727

-- 
/Ran