[jira] [Updated] (CASSANDRA-13636) No documentation on how to handle error in repair

Jeff Jirsa (JIRA) Fri, 19 Jan 2018 17:13:33 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-13636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jeff Jirsa updated CASSANDRA-13636:
-----------------------------------
    Description: 
After having a node go down and restarted, I ran an incremental repair. It 
exited with the following error:
{code:java}
[2017-06-26 04:01:12,241] Repair command #39 finished in 0 seconds
[2017-06-26 04:01:12,250] Starting repair command #40, repairing keyspace fgp 
with repair options (parallelism: parallel, primary range: false, incremental: 
true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of 
ranges: 790)
[2017-06-26 04:01:12,468] Did not get positive replies from all endpoints. List 
of failed endpoint(s): [10.0.2.13]
[2017-06-26 04:01:12,469] Repair command #40 finished with error
error: Repair job has failed with the error message: [2017-06-26 04:01:12,468] 
Did not get positive replies from all endpoints. List of failed endpoint(s): 
[10.0.2.13]
– StackTrace –
java.lang.RuntimeException: Repair job has failed with the error message: 
[2017-06-26 04:01:12,468] Did not get positive replies from all endpoints. List 
of failed endpoint(s): [10.0.2.13]
at org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115)
at 
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108){code}
On the node that did not provide a positive reply, the logs showed:
{code:java}
INFO 04:01:12 repair #176c3760-5a24-11e7-868e-97841c634787 Starting 
anticompaction for system_traces.events on 0/[] sstables
INFO 04:01:12 repair #176c3760-5a24-11e7-868e-97841c634787 Completed 
anticompaction successfully
INFO 04:01:12 repair #176c3760-5a24-11e7-868e-97841c634787 Completed 
anticompaction successfully
ERROR 04:01:12 Table with id ffcc2ef0-3122-11e7-8f76-b3cac7d588b7 was dropped 
during prepare phase of repair
{code}
I was unable to find documentation which describes this situation or how to 
recover from this situation. Running a full repair results in the same error.

  was:
After having a node go down and restarted, I ran an incremental repair.  It 
exited with the following error:

[2017-06-26 04:01:12,241] Repair command #39 finished in 0 seconds
[2017-06-26 04:01:12,250] Starting repair command #40, repairing keyspace fgp 
with repair options (parallelism: parallel, primary range: false, incremental: 
true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of 
ranges: 790)
[2017-06-26 04:01:12,468] Did not get positive replies from all endpoints. List 
of failed endpoint(s): [10.0.2.13]
[2017-06-26 04:01:12,469] Repair command #40 finished with error
error: Repair job has failed with the error message: [2017-06-26 04:01:12,468] 
Did not get positive replies from all endpoints. List of failed endpoint(s): 
[10.0.2.13]
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error message: 
[2017-06-26 04:01:12,468] Did not get positive replies from all endpoints. List 
of failed endpoint(s): [10.0.2.13]
        at 
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115)
        at 
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
        at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)
        at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)
        at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)
        at 
com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)

On the node that did not provide a positive reply, the logs showed:

INFO  04:01:12 [repair #176c3760-5a24-11e7-868e-97841c634787] Starting 
anticompaction for system_traces.events on 0/[] sstables
INFO  04:01:12 [repair #176c3760-5a24-11e7-868e-97841c634787] Completed 
anticompaction successfully
INFO  04:01:12 [repair #176c3760-5a24-11e7-868e-97841c634787] Completed 
anticompaction successfully
ERROR 04:01:12 Table with id ffcc2ef0-3122-11e7-8f76-b3cac7d588b7 was dropped 
during prepare phase of repair

I was unable to find documentation which describes this situation or how to 
recover from this situation. Running a full repair results in the same error.






> No documentation on how to handle error in repair
> -------------------------------------------------
>
>                 Key: CASSANDRA-13636
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13636
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Documentation and Website
>         Environment: 6 nodes running Apache Cassandra 3.0.13 running in 
> docker environment.
>            Reporter: David Ryan
>            Priority: Minor
>
> After having a node go down and restarted, I ran an incremental repair. It 
> exited with the following error:
> {code:java}
> [2017-06-26 04:01:12,241] Repair command #39 finished in 0 seconds
> [2017-06-26 04:01:12,250] Starting repair command #40, repairing keyspace fgp 
> with repair options (parallelism: parallel, primary range: false, 
> incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], 
> hosts: [], # of ranges: 790)
> [2017-06-26 04:01:12,468] Did not get positive replies from all endpoints. 
> List of failed endpoint(s): [10.0.2.13]
> [2017-06-26 04:01:12,469] Repair command #40 finished with error
> error: Repair job has failed with the error message: [2017-06-26 
> 04:01:12,468] Did not get positive replies from all endpoints. List of failed 
> endpoint(s): [10.0.2.13]
> – StackTrace –
> java.lang.RuntimeException: Repair job has failed with the error message: 
> [2017-06-26 04:01:12,468] Did not get positive replies from all endpoints. 
> List of failed endpoint(s): [10.0.2.13]
> at org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115)
> at 
> org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
> at 
> com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)
> at 
> com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)
> at 
> com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)
> at 
> com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108){code}
> On the node that did not provide a positive reply, the logs showed:
> {code:java}
> INFO 04:01:12 repair #176c3760-5a24-11e7-868e-97841c634787 Starting 
> anticompaction for system_traces.events on 0/[] sstables
> INFO 04:01:12 repair #176c3760-5a24-11e7-868e-97841c634787 Completed 
> anticompaction successfully
> INFO 04:01:12 repair #176c3760-5a24-11e7-868e-97841c634787 Completed 
> anticompaction successfully
> ERROR 04:01:12 Table with id ffcc2ef0-3122-11e7-8f76-b3cac7d588b7 was dropped 
> during prepare phase of repair
> {code}
> I was unable to find documentation which describes this situation or how to 
> recover from this situation. Running a full repair results in the same error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13636) No documentation on how to handle error in repair

Reply via email to