[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-31 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426085#comment-13426085
 ] 

Jonathan Ellis commented on CASSANDRA-4427:
---

+1 then.

nit: i'd also change sleep(delay) in the MigrationManager loop to sleep(1000), 
or even sleep(100)

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.1.3
>
> Attachments: 4427-4.txt, 4427-5.txt, 4427-v2.txt, 4427-v3.txt, 
> 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-31 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426075#comment-13426075
 ] 

Brandon Williams commented on CASSANDRA-4427:
-

bq. Don't you still want the full ring delay to make sure you know about 
everyone in the cluster (so if you are picking a "balanced" token it does the 
Right Thing)?

Well, if we got any non-empty schema, a full gossip round has occurred so we 
should be good to go at that point, since it will have also populated our 
knowledge of the ring.

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.1.3
>
> Attachments: 4427-4.txt, 4427-5.txt, 4427-v2.txt, 4427-v3.txt, 
> 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-31 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426074#comment-13426074
 ] 

Jonathan Ellis commented on CASSANDRA-4427:
---

Don't you still want the full ring delay to make sure you know about everyone 
in the cluster (so if you are picking a "balanced" token it does the Right 
Thing)?

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.1.3
>
> Attachments: 4427-4.txt, 4427-5.txt, 4427-v2.txt, 4427-v3.txt, 
> 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-31 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425991#comment-13425991
 ] 

Brandon Williams commented on CASSANDRA-4427:
-

+1

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.1.3
>
> Attachments: 4427-4.txt, 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-31 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425920#comment-13425920
 ] 

Brandon Williams commented on CASSANDRA-4427:
-

Here's the real problem:

{noformat}

 INFO 16:49:57,531 Starting up server gossip
 INFO 16:49:57,547 Enqueuing flush of Memtable-LocationInfo@1547338589(126/157 
serialized/live bytes, 3 ops)
 INFO 16:49:57,548 Writing Memtable-LocationInfo@1547338589(126/157 
serialized/live bytes, 3 ops)
 INFO 16:49:57,586 Completed flushing 
/var/lib/cassandra/data/system/LocationInfo/system-LocationInfo-he-1-Data.db 
(234 bytes) for commitlog position ReplayPosition(segmentId=10938112371080118, 
position=595)
 INFO 16:49:57,616 Starting Messaging Service on port 7000
 INFO 16:49:59,634 Saved token not found. Using 
113427455640312821154458202477256070484 from configuration
 INFO 16:49:59,636 Enqueuing flush of Memtable-LocationInfo@1088940267(53/66 
serialized/live bytes, 2 ops)
 INFO 16:49:59,636 Writing Memtable-LocationInfo@1088940267(53/66 
serialized/live bytes, 2 ops)
 INFO 16:49:59,652 Completed flushing 
/var/lib/cassandra/data/system/LocationInfo/system-LocationInfo-he-2-Data.db 
(163 bytes) for commitlog position ReplayPosition(segmentId=10938112371080118, 
position=776)
 INFO 16:49:59,655 Node cassandra-3/10.179.111.137 state jump to normal
 INFO 16:49:59,656 Bootstrap/Replace/Move completed! Now serving reads.
 INFO 16:49:59,690 Binding thrift service to cassandra-3/10.179.111.137:9160
 INFO 16:49:59,694 Using TFastFramedTransport with a max frame size of 15728640 
bytes.
 INFO 16:49:59,698 Using synchronous/threadpool thrift server on 
cassandra-3/10.179.111.137 : 9160
 INFO 16:49:59,699 Listening for thrift clients...
 INFO 16:49:59,873 Node /10.179.64.227 is now part of the cluster
 INFO 16:49:59,874 InetAddress /10.179.64.227 is now UP
 INFO 16:49:59,876 Enqueuing flush of Memtable-LocationInfo@1301257077(35/43 
serialized/live bytes, 1 ops)
 INFO 16:49:59,877 Writing Memtable-LocationInfo@1301257077(35/43 
serialized/live bytes, 1 ops)
 INFO 16:49:59,892 Completed flushing 
/var/lib/cassandra/data/system/LocationInfo/system-LocationInfo-he-3-Data.db 
(89 bytes) for commitlog position ReplayPosition(segmentId=10938112371080118, 
position=874)
 INFO 16:49:59,894 Node /10.179.65.102 is now part of the cluster
 INFO 16:49:59,894 InetAddress /10.179.65.102 is now UP
{noformat}

Gossip hasn't quite discovered any other nodes yet when the schema check fires.

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-31 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425905#comment-13425905
 ] 

Jonathan Ellis commented on CASSANDRA-4427:
---

59adb24e-f3cd-3e02-97f0-5b395827453f is emptyVersion, so from that snippet it 
looks like it's working as designed.

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-29 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424595#comment-13424595
 ] 

Brandon Williams commented on CASSANDRA-4427:
-

bq. Added a quick fix for this case. If the cluster is so new that there is no 
SCHEMA state, then there's no actual schema info either.

LGTM.

bq. Granted, but surely two rounds is a better measure than the zero we had 
before. (Which apparently worked most of the time...) Remember, our goal is to 
avoid the full RING_DELAY sleep when we don't need to bootstrap.

I know.  It's a situation with no perfect solution unfortunately (but I agree 2 
> 0 ;)

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-29 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424590#comment-13424590
 ] 

Jonathan Ellis commented on CASSANDRA-4427:
---

bq. This doesn't quite work, because we're looking for the SCHEMA app state, 
which at startup won't always exist

Added a quick fix for this case.  If the cluster is so new that there is no 
SCHEMA state, then there's no actual schema info either.

bq. It's possible that you could have 3 seeds and all but one could be down, 
thus 2 gossip rounds doesn't guarantee you'll have any appstates

Granted, but surely two rounds is a better measure than the zero we had before. 
 (Which apparently worked most of the time...)  Remember, our goal is to avoid 
the full RING_DELAY sleep when we don't need to bootstrap.

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-29 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424580#comment-13424580
 ] 

Brandon Williams commented on CASSANDRA-4427:
-

Note: that was with autobootstrap disabled.  But, I'm also not convinced that 
waiting two gossiper rounds is sufficient either (alert the ring_delay police!)

It's possible that you could have 3 seeds and all but one could be down, thus 2 
gossip rounds doesn't guarantee you'll have any appstates.

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-29 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424563#comment-13424563
 ] 

Brandon Williams commented on CASSANDRA-4427:
-

This doesn't quite work, because we're looking for the SCHEMA app state, which 
at startup won't exist since the gossiper isn't even started yet:

{noformat}
ERROR [main] 2012-07-29 01:08:28,476 CassandraDaemon.java (line 335) Exception 
encountered during startup
java.lang.NullPointerException
at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:527)
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:475)
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:366)
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:228)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:318)
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:361)
{noformat}


> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-28 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424311#comment-13424311
 ] 

Sylvain Lebresne commented on CASSANDRA-4427:
-

lgtm, +1

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-27 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424177#comment-13424177
 ] 

Jonathan Ellis commented on CASSANDRA-4427:
---

bq. I believe the schemaPresent condition shouldn't be negated

Right, fix pushed to same github branch.

bq. I would have put the initialization fo Schema.emptyVersion in a static 
block to make it explicit that it's a one time initialization

I thought you couldn't declare emptyVersion final that way...  I was wrong, the 
compiler is smart enough to recognize the static block.  Also fixed.

bq. it could be nice to also log whether we're going to boostrap or not and why 
in the other case.

Added a debug line.

bq. exclude ourselves when we check for schemaPresent

Done.  (Since we can't have one ourselves unless another does too -- or unless 
we already joined the ring successfully -- there is no loss of correctness.)

bq. this feels a bit bigger than what I'm plainly confortable pushing in 1.0 at 
this point

+1, let's leave it as a known issue in 1.0.

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-27 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423909#comment-13423909
 ] 

Sylvain Lebresne commented on CASSANDRA-4427:
-

I suspect the test failures are due to the removal of the seeds special case 
and because our tests are not fully realistic. Namely, in the tests, while 
localhost is a seed, it gets a schema loaded before joinTokenRing is called, 
and so it ends up with schemaPresent = true and tries to bootstrap (even though 
it's the only node). That shouldn't happen in real life but at least on the 
short term fixing the tests themselves is more work than is worth it, so maybe 
we can:
* Either we back the isSeed test
* Or exclude ourselves when we check for schemaPresent

Some Preference?

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.0.11, 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-27 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423777#comment-13423777
 ] 

Sylvain Lebresne commented on CASSANDRA-4427:
-

In the check for bootstrap:
{noformat}
if (DatabaseDescriptor.isAutoBootstrap()
&& (SystemTable.bootstrapInProgress() || (!SystemTable.bootstrapComplete() 
&& !schemaPresent)))
{noformat}
I believe the schemaPresent condition shouldn't be negated. We want to skip 
boostrap is there is no schema, but bootstrap if there is one.

Even with that fixed, this breaks some of the unit tests (BoostrapperTest, 
EmbeddedCassandraServiceTest, StreamingTransferTest and 
AntiEntropyServiceStandardTest). Namely:
{noformat}
junit] java.lang.RuntimeException: No other nodes seen!  Unable to bootstrap.If 
you intended to start a single-node cluster, you should make sure your 
broadcast_address (or listen_address) is listed as a seed.  Otherwise, you need 
to determine why the seed being contacted has no knowledge of the rest of the 
cluster.  Usually, this can be solved by giving all nodes the same seed list.
junit]  at 
org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:127)
junit]  at 
org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:109)
junit]  at 
org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:104)
junit]  at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:629)
junit]  at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:526)
junit]  at 
org.apache.cassandra.dht.BootStrapperTest.testTokenRoundtrip(BootStrapperTest.java:50)
{noformat}

On committing to 1.0, I'm not sure what was the intention, but this feels a bit 
bigger than what I'm plainly confortable pushing in 1.0 at this point, and it 
feels we can tell people on 1.0 to wipe the data dir on a failed boostrap 
before retrying. That's not a strong opposition though, more an opinion.

Nits:
* Instead of calculateEmptySchema(), I would have put the initialization fo 
Schema.emptyVersion in a static block to make it explicit that it's a one time 
initialization. Though if you made that on purpose because you don't like 
static blocks, that's good enough for me.
* We log when we detect a boostrap failure, but it could be nice to also log 
whether we're going to boostrap or not and why in the other case.


> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.0.11, 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-23 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420736#comment-13420736
 ] 

Jonathan Ellis commented on CASSANDRA-4427:
---

bq. there is still one behavior that the patch changes, that is it will always 
boobstrap non seeds node

You're right.  Okay, take five: https://github.com/jbellis/cassandra/tree/4427-5

4 patches here on top of Brandon's work.  The main ones are the 1st and 4th.  
In the first, I remove the seed special case since it's a subset of the empty 
schema case.  (Unless you're Doing It Wrong and adding seed nodes directly to 
an active cluster, which always surprises people when it burns them.  So I say 
good riddance.)

The first also adds a 2-gossip-round sleep so that (always assuming seeds are 
set correctly) we eliminate the risk of thinking schema is empty incorrectly 
due to a race w/ gossip.  The fourth patch follows this up by making the schema 
check based on other peers' schema uuids instead of local data.  Which is 
unlikely to be a problem today, but is is still a race-y approach and the 
correct alternative was straightforward.

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.0.11, 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-18 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417016#comment-13417016
 ] 

Brandon Williams commented on CASSANDRA-4427:
-

bq. The fact is that recording that bootstrap is in progress (along with the 
system table check) would allow to fix the instajoin while keeping the current 
behavior unchanged otherwise, and I do feel that recording the info is not a 
bad idea in itself, so that would have my preference.

I tend to agree that having an explicit, persisted flag feels a lot less 
fragile than the current logic, and being able to indicate a failure to the 
user seems like a good improvement.

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.0.11, 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-18 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416945#comment-13416945
 ] 

Sylvain Lebresne commented on CASSANDRA-4427:
-

bq. Adding "bootstrap in progress" concept does nothing for this one way or the 
other.

You're right, brain fart, sorry.

Anyway, there is still one behavior that the patch changes, that is it will 
always boobstrap non seeds node, while previously the system table check was 
making sure we never bootstrapped a node in a new cluster, independently of 
whether it was a seed or not.

It is clearly not a bad idea when you start a new cluster to set all those 
nodes as seeds, but I just want to point out that the behavior is changed and 
I'm not sure everyone always set all of its initial node as seeds today. I'll 
also note that boostrapping some of the node in an initial cluster don't break 
anything, it just makes the node start much less quickly that they would 
otherwise.

I'm not sure how I feel about changing that behavior, especially in a minor 
release. The fact is that recording that bootstrap is in progress (along with 
the system table check) would allow to fix the instajoin while keeping the 
current behavior unchanged otherwise, and I do feel that recording the info is 
not a bad idea in itself, so that would have my preference. But that is not an 
extremely strong preference either.

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.0.11, 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-18 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416921#comment-13416921
 ] 

Jonathan Ellis commented on CASSANDRA-4427:
---

bq. I believe this simpler fix doesn't handle the case of boostrapping multiple 
nodes into an existing cluster.

We've never tried to prevent this, except by saying "thou shalt space 
bootstraps apart two minutes," because the only way to stop it is to drop the 
"balanced" token picking altogether.  Adding "bootstrap in progress" concept 
does nothing for this one way or the other.

bq. Namely, in that case, that will have a schema and so the node will have a 
system table by the time it checks for it and we'll end up picking the same 
token for multiple nodes.

This is exactly how it's supposed to work: if there's a schema, we use 
"existing cluster mode" and pick a token to divide the range of the heaviest 
node (and cross our fingers that the user is spacing things out enough between 
node additions).  If there's no schema, we use "new cluster mode" and pick a 
random token.

Let the record show that back in CASSANDRA-3219 I said this was confusing 
behavior and we should add explicit initial_token modes instead of trying to 
make it magical. :)


> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.0.11, 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-17 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416041#comment-13416041
 ] 

Sylvain Lebresne commented on CASSANDRA-4427:
-

I believe this simpler fix doesn't handle the case of boostrapping multiple 
nodes into an existing cluster. Namely, in that case, that will have a schema 
and so the node will have a system table by the time it checks for it and we'll 
end up picking the same token for multiple nodes.

Also, I think checking system tables existence is fairly fragile and I would 
prefer moving away from it. It is way too easy to screw that up by having 
something (anything) written to those system tables. Typically, I don't know if 
that fix works for multiple nodes started in a brand new cluster (with not all 
being seeds), because without careful checking I don't know if we can end up 
writing some info in the system tables before checking for getBootstrapToken.

Overall I do like the idea of registering that the bootstrap is in process, 
because on top of (I think) fixing the problem in a non-fragile way, it also 
allows us better reporting. Even outside of the problem of generating tokens, I 
think it is reassuring for a user that restart a node that failed to boostrap 
to have the software acknowledge that it understand and handle correctly the 
situation.

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Jonathan Ellis
> Fix For: 1.0.11, 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415714#comment-13415714
 ] 

Jonathan Ellis commented on CASSANDRA-4427:
---

Started trying to improve the comments and got stuck on the schema check: it's 
basically a no-op (except for the purposes of screwing up a partial bootstrap 
like this), since we perform the check before waiting for gossip to fill in the 
schema.

Simpler fix at https://github.com/jbellis/cassandra/tree/4427-4 to move the 
schema check into getBootstrapToken.

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Brandon Williams
> Fix For: 1.0.11, 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415631#comment-13415631
 ] 

Jonathan Ellis commented on CASSANDRA-4427:
---

+1

nit: worth adding a comment to explain wtf all the clauses of that if statement 
are, so we don't have to dig through ticket history next time

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Brandon Williams
> Fix For: 1.0.11, 1.1.3
>
> Attachments: 4427-v2.txt, 4427-v3.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-13 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413931#comment-13413931
 ] 

Jonathan Ellis commented on CASSANDRA-4427:
---

I think you're misreading the original seed logic...  !(isBootstrapped || 
isSeed) expands to !isBootstrapped && !isSeed.  Still need that so that 
single-node clusters don't try to bootstrap.

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Brandon Williams
> Fix For: 1.0.11, 1.1.3
>
> Attachments: 4427-v2.txt, 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-12 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413244#comment-13413244
 ] 

Brandon Williams commented on CASSANDRA-4427:
-

bq. Indeed, in 1.0.0 we decided to draw this line based on whether a schema had 
been created or not

This seems more dangerous than it was worth, since you can easily receive even 
partial schema within a couple of seconds, realize you made some sort of 
mistake (forgot to mount the data dir, etc) and restart it, possibly wrecking 
your production app.

(The seed check still seems strange regardless)

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Brandon Williams
> Fix For: 1.0.11, 1.1.3
>
> Attachments: 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4427) Restarting a failed bootstrap instajoins the ring

2012-07-12 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413145#comment-13413145
 ] 

Jonathan Ellis commented on CASSANDRA-4427:
---

Here's what we were trying to address there:

bq. Now there is a actual new problem with 1.0.0. That problem is that when you 
start an initial cluster, i.e, when in 0.8 you would start node with 
auto-boostrap=false, you do often end up starting nodes simultaneously. That is 
why older version were using random token when auto-bootstrap was false. This 
problem does need to be fix for 1.0.0 because that is a serious regression. 
However, my argument is that even though we now default to auto-boostrap=true, 
that doesn't mean that there is no difference between setting up the initial 
nodes of a cluster and the latter bootstrapping of nodes to add capacity to an 
existing cluster. Indeed, in 1.0.0 we decided to draw this line based on 
whether a schema had been created or not (we call the bootstrap() method based 
on that). Imho, this means that we have no boostrap option and the "I have no 
schema" is the old auto-boostrap=false. So we should use random token in that 
case and balanced one otherwise the same way we are doing it in 
 0.8.

> Restarting a failed bootstrap instajoins the ring
> -
>
> Key: CASSANDRA-4427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Brandon Williams
>Assignee: Brandon Williams
> Fix For: 1.0.11, 1.1.3
>
> Attachments: 4427.txt
>
>
> I think when we made auto_bootstrap = true the default, we broke the check 
> for the bootstrap flag, creating a dangerous situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira