[ 
https://issues.apache.org/jira/browse/CASSANDRA-12280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Roth resolved CASSANDRA-12280.
---------------------------------------
       Resolution: Not A Bug
    Reproduced In: 3.7, 3.0.8, 3.9  (was: 3.0.8, 3.7, 3.9)

Seems like this was a configuration and/or kernel issue.

For all those who are interested what I did to fix that issue:

- Set tcp keepalive to these values: 
https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html.
 Before the total timeout was at about 15min. 90s as in the article fails 
faster. Maybe that helps that things to resume faster causing less hangs.
- otc_coalescing_strategy = DISABLED. Don't know if it had a lot of impact but 
it had no negative effects. I should double test that but ... I'm glad it works 
right now.
- Removed a suspicious host. That was the only host with a different kernel and 
it seemed that streams from / to this host hung more often than others.

After these changes I did not recognize any (obvious and noticeable) hangs any 
more.

> nodetool repair hangs
> ---------------------
>
>                 Key: CASSANDRA-12280
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12280
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Benjamin Roth
>
> nodetool repair hangs when repairing a keyspace, does not hang when 
> repairting table/mv by table/mv.
> Command executed (both variants make it hang):
> nodetool repair likes like dislike_by_source_mv like_by_contact_mv 
> match_valid_mv like_out dislike match match_by_contact_mv like_valid_mv 
> like_out_by_source_mv
> OR
> nodetool repair likes
> Logs:
> https://gist.github.com/brstgt/bf8b20fa1942d29ab60926ede7340b75
> Nodetool output:
> https://gist.github.com/brstgt/3aa73662da4b0190630ac1aad6c90a6f
> Schema:
> https://gist.github.com/brstgt/3fd59e0166f86f8065085532e3638097



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to