[jira] [Commented] (CASSANDRA-2644) Make bootstrap retry
[ https://issues.apache.org/jira/browse/CASSANDRA-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041155#comment-13041155 ] Hudson commented on CASSANDRA-2644: --- Integrated in Cassandra-0.8 #146 (See [https://builds.apache.org/hudson/job/Cassandra-0.8/146/]) > Make bootstrap retry > > > Key: CASSANDRA-2644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2644 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.8.0 beta 2 >Reporter: Chris Goffinet >Assignee: Chris Goffinet > Fix For: 0.8.1 > > Attachments: > 0001-Make-ExpiringMap-have-objects-with-specific-timeouts.patch, > 0002-Make-bootstrap-retry-and-increment-timeout-for-every.patch > > > We ran into a situation where we had rpc_timeout set to 1 second, and the > node needing to compute the token took over a second (1.6 seconds). The > bootstrapping node hangs forever without getting a token because the expiring > map removes it before the reply comes back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2644) Make bootstrap retry
[ https://issues.apache.org/jira/browse/CASSANDRA-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041150#comment-13041150 ] Hudson commented on CASSANDRA-2644: --- Integrated in Cassandra #912 (See [https://builds.apache.org/hudson/job/Cassandra/912/]) > Make bootstrap retry > > > Key: CASSANDRA-2644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2644 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.8.0 beta 2 >Reporter: Chris Goffinet >Assignee: Chris Goffinet > Fix For: 0.8.1 > > Attachments: > 0001-Make-ExpiringMap-have-objects-with-specific-timeouts.patch, > 0002-Make-bootstrap-retry-and-increment-timeout-for-every.patch > > > We ran into a situation where we had rpc_timeout set to 1 second, and the > node needing to compute the token took over a second (1.6 seconds). The > bootstrapping node hangs forever without getting a token because the expiring > map removes it before the reply comes back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2644) Make bootstrap retry
[ https://issues.apache.org/jira/browse/CASSANDRA-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041045#comment-13041045 ] Chris Goffinet commented on CASSANDRA-2644: --- Made changes to 02 patch, and commited to 0.8.1. Thanks! > Make bootstrap retry > > > Key: CASSANDRA-2644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2644 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.8.0 beta 2 >Reporter: Chris Goffinet >Assignee: Chris Goffinet > Fix For: 0.8.1 > > Attachments: > 0001-Make-ExpiringMap-have-objects-with-specific-timeouts.patch, > 0002-Make-bootstrap-retry-and-increment-timeout-for-every.patch > > > We ran into a situation where we had rpc_timeout set to 1 second, and the > node needing to compute the token took over a second (1.6 seconds). The > bootstrapping node hangs forever without getting a token because the expiring > map removes it before the reply comes back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2644) Make bootstrap retry
[ https://issues.apache.org/jira/browse/CASSANDRA-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035098#comment-13035098 ] Jonathan Ellis commented on CASSANDRA-2644: --- bq. But there are still cases that retries will recover from... flapping/down nodes Fair enough, but increasing the timeout is still unwarranted. Let's just make it wait for max(DEFAULT_TIMEOUT, BOOTSTRAP_TIMEOUT) with B_T equal to, say, 30s. Committed patch 01 to 0.8.1 branch, btw. > Make bootstrap retry > > > Key: CASSANDRA-2644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2644 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.8.0 beta 2 >Reporter: Chris Goffinet >Assignee: Chris Goffinet > Fix For: 0.8.1 > > Attachments: > 0001-Make-ExpiringMap-have-objects-with-specific-timeouts.patch, > 0002-Make-bootstrap-retry-and-increment-timeout-for-every.patch > > > We ran into a situation where we had rpc_timeout set to 1 second, and the > node needing to compute the token took over a second (1.6 seconds). The > bootstrapping node hangs forever without getting a token because the expiring > map removes it before the reply comes back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2644) Make bootstrap retry
[ https://issues.apache.org/jira/browse/CASSANDRA-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032871#comment-13032871 ] Sylvain Lebresne commented on CASSANDRA-2644: - Not fully related to the discussion here but streaming is another part of bootstrap so I'll mention that CASSANDRA-2433 introduces some mechanism to handle unrecoverable failures during streaming (that is, streaming already retry on errors but 1) it retries indefinitely while the CASSANDRA-2433 introduce a max retry and 2) it doesn't detect the other end being dead). Anyway, just referencing the ticket so that if this ticket becomes "make bootstrap handle failures better", we don't duplicate efforts. > Make bootstrap retry > > > Key: CASSANDRA-2644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2644 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.8.0 beta 2 >Reporter: Chris Goffinet >Assignee: Chris Goffinet > Fix For: 0.8.1 > > Attachments: > 0001-Make-ExpiringMap-have-objects-with-specific-timeouts.patch, > 0002-Make-bootstrap-retry-and-increment-timeout-for-every.patch > > > We ran into a situation where we had rpc_timeout set to 1 second, and the > node needing to compute the token took over a second (1.6 seconds). The > bootstrapping node hangs forever without getting a token because the expiring > map removes it before the reply comes back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2644) Make bootstrap retry
[ https://issues.apache.org/jira/browse/CASSANDRA-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032823#comment-13032823 ] Stu Hood commented on CASSANDRA-2644: - Good point. > Make bootstrap retry > > > Key: CASSANDRA-2644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2644 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.8.0 beta 2 >Reporter: Chris Goffinet >Assignee: Chris Goffinet > Fix For: 0.8.1 > > Attachments: > 0001-Make-ExpiringMap-have-objects-with-specific-timeouts.patch, > 0002-Make-bootstrap-retry-and-increment-timeout-for-every.patch > > > We ran into a situation where we had rpc_timeout set to 1 second, and the > node needing to compute the token took over a second (1.6 seconds). The > bootstrapping node hangs forever without getting a token because the expiring > map removes it before the reply comes back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2644) Make bootstrap retry
[ https://issues.apache.org/jira/browse/CASSANDRA-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032741#comment-13032741 ] Jonathan Ellis commented on CASSANDRA-2644: --- I think the retry logic is a distraction here. If it doesn't work the first time because of anything other than "we didn't wait long enough" (i.e. it errored out) it's not likely to magically unbreak for the second. Suggest just giving it a long retry to begin with. > Make bootstrap retry > > > Key: CASSANDRA-2644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2644 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.8.0 beta 2 >Reporter: Chris Goffinet >Assignee: Chris Goffinet > Fix For: 0.8.1 > > Attachments: > 0001-Make-ExpiringMap-have-objects-with-specific-timeouts.patch, > 0002-Make-bootstrap-retry-and-increment-timeout-for-every.patch > > > We ran into a situation where we had rpc_timeout set to 1 second, and the > node needing to compute the token took over a second (1.6 seconds). The > bootstrapping node hangs forever without getting a token because the expiring > map removes it before the reply comes back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2644) Make bootstrap retry
[ https://issues.apache.org/jira/browse/CASSANDRA-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032734#comment-13032734 ] Stu Hood commented on CASSANDRA-2644: - +1 Thanks Chris! > Make bootstrap retry > > > Key: CASSANDRA-2644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2644 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.8.0 beta 2 >Reporter: Chris Goffinet >Assignee: Chris Goffinet > Fix For: 0.8.1 > > Attachments: > 0001-Make-ExpiringMap-have-objects-with-specific-timeouts.patch, > 0002-Make-bootstrap-retry-and-increment-timeout-for-every.patch > > > We ran into a situation where we had rpc_timeout set to 1 second, and the > node needing to compute the token took over a second (1.6 seconds). The > bootstrapping node hangs forever without getting a token because the expiring > map removes it before the reply comes back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2644) Make bootstrap retry
[ https://issues.apache.org/jira/browse/CASSANDRA-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032715#comment-13032715 ] Chris Goffinet commented on CASSANDRA-2644: --- I have a patch for this I'll be adding within the next day. I make ExpiringMap support custom timeouts per object, and make bootstrap getToken retry, while exponentially increasing the timeout until retries is met. > Make bootstrap retry > > > Key: CASSANDRA-2644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2644 > Project: Cassandra > Issue Type: Bug >Affects Versions: 0.8.0 beta 2 >Reporter: Chris Goffinet >Assignee: Chris Goffinet > > We ran into a situation where we had rpc_timeout set to 1 second, and the > node needing to compute the token took over a second (1.6 seconds). The > bootstrapping node hangs forever without getting a token. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira