[jira] [Commented] (CASSANDRA-3112) Make repair fail when an unexpected error occurs

2013-05-28 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668760#comment-13668760
 ] 

Jonathan Ellis commented on CASSANDRA-3112:
---

What is the scope of this ticket?  Should it be wontfixed or moved to 2.1?

 Make repair fail when an unexpected error occurs
 

 Key: CASSANDRA-3112
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3112
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: repair
 Fix For: 2.0

 Attachments: 0003-Report-streaming-errors-back-to-repair-v4.patch, 
 0004-Reports-validation-compaction-errors-back-to-repair-v4.patch


 CASSANDRA-2433 makes it so that nodetool repair will fail if a node 
 participating to repair dies before completing his part of the repair. This 
 handles most of the situation where repair was previously hanging, but repair 
 can still hang if an unexpected error occurs during either the merkle tree 
 creation (an on-disk corruption triggers an IOError say) or during streaming 
 (though I'm not sure what could make streaming failed outside of 'one of the 
 node died' (besides a bug)).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-3112) Make repair fail when an unexpected error occurs

2013-05-28 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668807#comment-13668807
 ] 

Yuki Morishita commented on CASSANDRA-3112:
---

I'm working on CASSANDRA-5426 and this can be dup of that. CASSANDRA-5426 is 
targeting for 2.0.0 release.

 Make repair fail when an unexpected error occurs
 

 Key: CASSANDRA-3112
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3112
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: repair
 Fix For: 2.0

 Attachments: 0003-Report-streaming-errors-back-to-repair-v4.patch, 
 0004-Reports-validation-compaction-errors-back-to-repair-v4.patch


 CASSANDRA-2433 makes it so that nodetool repair will fail if a node 
 participating to repair dies before completing his part of the repair. This 
 handles most of the situation where repair was previously hanging, but repair 
 can still hang if an unexpected error occurs during either the merkle tree 
 creation (an on-disk corruption triggers an IOError say) or during streaming 
 (though I'm not sure what could make streaming failed outside of 'one of the 
 node died' (besides a bug)).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-3112) Make repair fail when an unexpected error occurs

2013-03-11 Thread Jason Wee (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13598641#comment-13598641
 ] 

Jason Wee commented on CASSANDRA-3112:
--

In StreamOutSession.java, the logger in the method convict(...) has two holder 
but only 1 variable given... missing another variable to log it?

logger.error(StreamOutSession {} failed because {} died or was 
restarted/removed, endpoint);


 Make repair fail when an unexpected error occurs
 

 Key: CASSANDRA-3112
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3112
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: repair
 Fix For: 2.0

 Attachments: 0003-Report-streaming-errors-back-to-repair-v4.patch, 
 0004-Reports-validation-compaction-errors-back-to-repair-v4.patch


 CASSANDRA-2433 makes it so that nodetool repair will fail if a node 
 participating to repair dies before completing his part of the repair. This 
 handles most of the situation where repair was previously hanging, but repair 
 can still hang if an unexpected error occurs during either the merkle tree 
 creation (an on-disk corruption triggers an IOError say) or during streaming 
 (though I'm not sure what could make streaming failed outside of 'one of the 
 node died' (besides a bug)).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-3112) Make repair fail when an unexpected error occurs

2011-12-27 Thread Vijay (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176420#comment-13176420
 ] 

Vijay commented on CASSANDRA-3112:
--

But do you know what is the reason for it making no progress? Because unless 
we know what can cause it, not sure what to fix?
it is usually is in the Streaming phase, i think adding a SoTimeout might fix 
it... but it is so random i couldn't reproduce in my tests but definitely 
seeing it in production.

How can we lose messages, aren't tcp supposed to avoid this?
Once you send the message the other node might get restarted (without 
validation or starting any thing) or the sockets can get reset, Actually i 
think when i posted this message it was because of CASSANDRA-3577. There isnt 
something like hints or a retry on the messages sent for the repairs.

I understand this isnt the scope of this ticket, but i still think there should 
be a way to orchestrate repairs with a little complicated logic and i will try 
to do some parts of it in the other ticket.




 Make repair fail when an unexpected error occurs
 

 Key: CASSANDRA-3112
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3112
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: repair
 Fix For: 1.1

 Attachments: 0003-Report-streaming-errors-back-to-repair-v4.patch, 
 0004-Reports-validation-compaction-errors-back-to-repair-v4.patch


 CASSANDRA-2433 makes it so that nodetool repair will fail if a node 
 participating to repair dies before completing his part of the repair. This 
 handles most of the situation where repair was previously hanging, but repair 
 can still hang if an unexpected error occurs during either the merkle tree 
 creation (an on-disk corruption triggers an IOError say) or during streaming 
 (though I'm not sure what could make streaming failed outside of 'one of the 
 node died' (besides a bug)).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3112) Make repair fail when an unexpected error occurs

2011-12-01 Thread Vijay (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161327#comment-13161327
 ] 

Vijay commented on CASSANDRA-3112:
--

Hi Sylvain,

I have seen the following issues in the Repairs specially in AWS Multi DC 
deployments...
1) Stream session or the stream doesn't have any progress (Read Timeout/rpc 
timeout - Socket timeout might help)
2) Validation compaction completed but the result tree is sent but not received?
3) Repair request is sent but the receiving node didn't receive it?
4) When we have a big repair which runs for hours it will be better to retry 
the failed part rather than full retry.

Do you think it is worth to address this in a separate ticket? else i will 
close CASSANDRA-3487.


 Make repair fail when an unexpected error occurs
 

 Key: CASSANDRA-3112
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3112
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: repair
 Fix For: 1.0.6

 Attachments: 0003-Report-streaming-errors-back-to-repair-v4.patch, 
 0004-Reports-validation-compaction-errors-back-to-repair-v4.patch


 CASSANDRA-2433 makes it so that nodetool repair will fail if a node 
 participating to repair dies before completing his part of the repair. This 
 handles most of the situation where repair was previously hanging, but repair 
 can still hang if an unexpected error occurs during either the merkle tree 
 creation (an on-disk corruption triggers an IOError say) or during streaming 
 (though I'm not sure what could make streaming failed outside of 'one of the 
 node died' (besides a bug)).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira