Re: Repair giving error

2018-01-22 Thread Alain RODRIGUEZ
Hello,

Some other thoughts:

- Are you using internode secured communications (and then use the port
7001 instead) ?
- A rolling restart might help, have you tried restarting a few / all the
nodes?

This issue is very weird and I am only making poor guesses here. This is
not an issue I have seen in the past, thus It might help to see the raw
outputs (nodetool status , keyspace replication strategy, WARN or
ERR logs,...) and also to have the command you are running.
Also, if there have been operations ran on this cluster recently that might
have trigger this (RF change, Snitch change, new DC, ... or any other major
change). it's good we know about history to have a feel of what the cluster
state can be currently.
Did this same command use to run and now fails or are repairs it something
you are trying to add and that never worked so far?

Some context might help us to help you :-),

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-01-18 19:00 GMT+00:00 Akshit Jain :

> Hi alain
> Thanks for the response.
> I'm using cassandra 3.10
> nodetool status  shows all the nodes up
> No schema disaggrement
> port 7000 is open
>
> Regards
> Akshit Jain
> 9891724697
>
> On Thu, Jan 18, 2018 at 4:53 PM, Alain RODRIGUEZ 
> wrote:
>
>> Hello,
>>
>> I looks like a communication issue.
>>
>> What Cassandra version are you using?
>> What's the result of 'nodetool status '?
>> Any schema disagreement 'nodetool describecluster'?
>> Is the port 7000 opened and the nodes communicating with each other?(Ping
>> is not proving connection is up, even though it is good to know the machine
>> is there and up :)).
>> Any other errors you could see in the logs?
>>
>> You might want to consider this an open source project my coworkers have
>> been working on (and are maintaining) called reaper that aims at making
>> repairs more efficient and easy to manage as repair is one of the most
>> tricky operation to handle for a Cassandra operator: http://cassandra-rea
>> per.io/. I did not work on this project directly but we have good
>> feedbacks and like this tool ourselves.
>>
>> C*heers,
>> ---
>> Alain Rodriguez - @arodream - al...@thelastpickle.com
>> France / Spain
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>>
>>
>> 2018-01-14 7:47 GMT+00:00 Akshit Jain :
>>
>>> ​I have a 10 node C* cluster with 4-5 keyspaces​.
>>> I tried to perform nodetool repair one by one for each keyspace.
>>> For some keyspaces the repair passed but for some it gave this error:
>>> ​
>>> I am not able to figure out what is causing this issue.The replica nodes
>>> are up and I am able to ping them from this node.​
>>> ​Any suggestions?​
>>>
>>> *Error I am getting on incremental repair:*
>>>
>>> *[2018-01-10 12:50:14,047] Did not get positive replies from all
>>> endpoints. List of failed endpoint(s): [​a.b.c.d, ​e.f.g.h]*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *-- StackTrace --java.lang.RuntimeException: Repair job has failed with
>>> the error message: [2018-01-10 12:50:14,047] Did not get positive replies
>>> from all endpoints. List of failed endpoint(s): [​a.b.c.d, ​e.f.g.h]at
>>> org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115)at
>>> org.apache.cassandra.utils.pro
>>> gress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)at
>>> com.sun.jmx.remote.internal.Cl
>>> ientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)at
>>> com.sun.jmx.remote.internal.Cl
>>> ientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)at
>>> com.sun.jmx.remote.internal.Cl
>>> ientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)at
>>> com.sun.jmx.remote.internal.Cl
>>> ientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)*
>>>
>>
>>
>


Re: Repair giving error

2018-01-18 Thread Akshit Jain
Hi alain
Thanks for the response.
I'm using cassandra 3.10
nodetool status  shows all the nodes up
No schema disaggrement
port 7000 is open

Regards
Akshit Jain
9891724697

On Thu, Jan 18, 2018 at 4:53 PM, Alain RODRIGUEZ  wrote:

> Hello,
>
> I looks like a communication issue.
>
> What Cassandra version are you using?
> What's the result of 'nodetool status '?
> Any schema disagreement 'nodetool describecluster'?
> Is the port 7000 opened and the nodes communicating with each other?(Ping
> is not proving connection is up, even though it is good to know the machine
> is there and up :)).
> Any other errors you could see in the logs?
>
> You might want to consider this an open source project my coworkers have
> been working on (and are maintaining) called reaper that aims at making
> repairs more efficient and easy to manage as repair is one of the most
> tricky operation to handle for a Cassandra operator: http://cassandra-
> reaper.io/. I did not work on this project directly but we have good
> feedbacks and like this tool ourselves.
>
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
>
> 2018-01-14 7:47 GMT+00:00 Akshit Jain :
>
>> ​I have a 10 node C* cluster with 4-5 keyspaces​.
>> I tried to perform nodetool repair one by one for each keyspace.
>> For some keyspaces the repair passed but for some it gave this error:
>> ​
>> I am not able to figure out what is causing this issue.The replica nodes
>> are up and I am able to ping them from this node.​
>> ​Any suggestions?​
>>
>> *Error I am getting on incremental repair:*
>>
>> *[2018-01-10 12:50:14,047] Did not get positive replies from all
>> endpoints. List of failed endpoint(s): [​a.b.c.d, ​e.f.g.h]*
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *-- StackTrace --java.lang.RuntimeException: Repair job has failed with
>> the error message: [2018-01-10 12:50:14,047] Did not get positive replies
>> from all endpoints. List of failed endpoint(s): [​a.b.c.d, ​e.f.g.h]at
>> org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115)at
>> org.apache.cassandra.utils.pro
>> gress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)at
>> com.sun.jmx.remote.internal.Cl
>> ientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)at
>> com.sun.jmx.remote.internal.Cl
>> ientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)at
>> com.sun.jmx.remote.internal.Cl
>> ientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)at
>> com.sun.jmx.remote.internal.Cl
>> ientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)*
>>
>
>


Re: Repair giving error

2018-01-18 Thread Alain RODRIGUEZ
Hello,

I looks like a communication issue.

What Cassandra version are you using?
What's the result of 'nodetool status '?
Any schema disagreement 'nodetool describecluster'?
Is the port 7000 opened and the nodes communicating with each other?(Ping
is not proving connection is up, even though it is good to know the machine
is there and up :)).
Any other errors you could see in the logs?

You might want to consider this an open source project my coworkers have
been working on (and are maintaining) called reaper that aims at making
repairs more efficient and easy to manage as repair is one of the most
tricky operation to handle for a Cassandra operator:
http://cassandra-reaper.io/. I did not work on this project directly but we
have good feedbacks and like this tool ourselves.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com




2018-01-14 7:47 GMT+00:00 Akshit Jain :

> ​I have a 10 node C* cluster with 4-5 keyspaces​.
> I tried to perform nodetool repair one by one for each keyspace.
> For some keyspaces the repair passed but for some it gave this error:
> ​
> I am not able to figure out what is causing this issue.The replica nodes
> are up and I am able to ping them from this node.​
> ​Any suggestions?​
>
> *Error I am getting on incremental repair:*
>
> *[2018-01-10 12:50:14,047] Did not get positive replies from all
> endpoints. List of failed endpoint(s): [​a.b.c.d, ​e.f.g.h]*
>
>
>
>
>
>
>
>
>
> *-- StackTrace --java.lang.RuntimeException: Repair job has failed with
> the error message: [2018-01-10 12:50:14,047] Did not get positive replies
> from all endpoints. List of failed endpoint(s): [​a.b.c.d, ​e.f.g.h]at
> org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115)at
> org.apache.cassandra.utils.pro
> gress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)at
> com.sun.jmx.remote.internal.Cl
> ientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)at
> com.sun.jmx.remote.internal.Cl
> ientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)at
> com.sun.jmx.remote.internal.Cl
> ientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)at
> com.sun.jmx.remote.internal.Cl
> ientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)*
>


Repair giving error

2018-01-13 Thread Akshit Jain
​I have a 10 node C* cluster with 4-5 keyspaces​.
I tried to perform nodetool repair one by one for each keyspace.
For some keyspaces the repair passed but for some it gave this error:
​
I am not able to figure out what is causing this issue.The replica nodes
are up and I am able to ping them from this node.​
​Any suggestions?​

*Error I am getting on incremental repair:*

*[2018-01-10 12:50:14,047] Did not get positive replies from all endpoints.
List of failed endpoint(s): [​a.b.c.d, ​e.f.g.h]*









*-- StackTrace --java.lang.RuntimeException: Repair job has failed with the
error message: [2018-01-10 12:50:14,047] Did not get positive replies from
all endpoints. List of failed endpoint(s): [​a.b.c.d, ​e.f.g.h]at
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115)at
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)at
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)at
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)at
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)at
com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)*