[jira] [Commented] (CASSANDRA-10829) cleanup + repair generates a lot of logs
[ https://issues.apache.org/jira/browse/CASSANDRA-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098337#comment-15098337 ] Carl Yeksigian commented on CASSANDRA-10829: +1 > cleanup + repair generates a lot of logs > > > Key: CASSANDRA-10829 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10829 > Project: Cassandra > Issue Type: Bug > Environment: 5 nodes on Cassandra 2.1.11 (on Debian) >Reporter: Fabien Rousseau >Assignee: Marcus Eriksson > Fix For: 2.1.x > > > One of our node generates a lot of cassandra logs (int the 10 MB/s) and CPU > usage has increased (by a factor 2-3). > This was most probably triggered by a "nodetool snapshot" while a cleanup was > already running on this node. > An example of those logs: > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > (I removed SSTableReader information because it's rather long... I can share > it privately if needed) > Note that the date has not been changed (only 1ms between logs) > It should not generate that gigantic amount of logs :) > This is probably linked to: > https://issues.apache.org/jira/browse/CASSANDRA-9637 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10829) cleanup + repair generates a lot of logs
[ https://issues.apache.org/jira/browse/CASSANDRA-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15097863#comment-15097863 ] Marcus Eriksson commented on CASSANDRA-10829: - pushed a fix and reran tests > cleanup + repair generates a lot of logs > > > Key: CASSANDRA-10829 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10829 > Project: Cassandra > Issue Type: Bug > Environment: 5 nodes on Cassandra 2.1.11 (on Debian) >Reporter: Fabien Rousseau >Assignee: Marcus Eriksson > Fix For: 2.1.x > > > One of our node generates a lot of cassandra logs (int the 10 MB/s) and CPU > usage has increased (by a factor 2-3). > This was most probably triggered by a "nodetool snapshot" while a cleanup was > already running on this node. > An example of those logs: > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > (I removed SSTableReader information because it's rather long... I can share > it privately if needed) > Note that the date has not been changed (only 1ms between logs) > It should not generate that gigantic amount of logs :) > This is probably linked to: > https://issues.apache.org/jira/browse/CASSANDRA-9637 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10829) cleanup + repair generates a lot of logs
[ https://issues.apache.org/jira/browse/CASSANDRA-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083978#comment-15083978 ] Carl Yeksigian commented on CASSANDRA-10829: I think we need to make {{finished}} thread-safe, since we're submitting those to the compaction executor, which may be multi-threaded. > cleanup + repair generates a lot of logs > > > Key: CASSANDRA-10829 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10829 > Project: Cassandra > Issue Type: Bug > Environment: 5 nodes on Cassandra 2.1.11 (on Debian) >Reporter: Fabien Rousseau >Assignee: Marcus Eriksson > Fix For: 2.1.x > > > One of our node generates a lot of cassandra logs (int the 10 MB/s) and CPU > usage has increased (by a factor 2-3). > This was most probably triggered by a "nodetool snapshot" while a cleanup was > already running on this node. > An example of those logs: > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > (I removed SSTableReader information because it's rather long... I can share > it privately if needed) > Note that the date has not been changed (only 1ms between logs) > It should not generate that gigantic amount of logs :) > This is probably linked to: > https://issues.apache.org/jira/browse/CASSANDRA-9637 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10829) cleanup + repair generates a lot of logs
[ https://issues.apache.org/jira/browse/CASSANDRA-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082705#comment-15082705 ] Marcus Eriksson commented on CASSANDRA-10829: - right, forgot to trigger a new build for testall, did that now (no code changes) > cleanup + repair generates a lot of logs > > > Key: CASSANDRA-10829 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10829 > Project: Cassandra > Issue Type: Bug > Environment: 5 nodes on Cassandra 2.1.11 (on Debian) >Reporter: Fabien Rousseau >Assignee: Marcus Eriksson > Fix For: 2.1.x > > > One of our node generates a lot of cassandra logs (int the 10 MB/s) and CPU > usage has increased (by a factor 2-3). > This was most probably triggered by a "nodetool snapshot" while a cleanup was > already running on this node. > An example of those logs: > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > (I removed SSTableReader information because it's rather long... I can share > it privately if needed) > Note that the date has not been changed (only 1ms between logs) > It should not generate that gigantic amount of logs :) > This is probably linked to: > https://issues.apache.org/jira/browse/CASSANDRA-9637 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10829) cleanup + repair generates a lot of logs
[ https://issues.apache.org/jira/browse/CASSANDRA-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081735#comment-15081735 ] Carl Yeksigian commented on CASSANDRA-10829: Looks good, but the test failures look like we need to use a {{ConcurrentHashSet}} for {{finished}} since we're getting {{IllegalStateException}}s. > cleanup + repair generates a lot of logs > > > Key: CASSANDRA-10829 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10829 > Project: Cassandra > Issue Type: Bug > Environment: 5 nodes on Cassandra 2.1.11 (on Debian) >Reporter: Fabien Rousseau >Assignee: Marcus Eriksson > Fix For: 2.1.x > > > One of our node generates a lot of cassandra logs (int the 10 MB/s) and CPU > usage has increased (by a factor 2-3). > This was most probably triggered by a "nodetool snapshot" while a cleanup was > already running on this node. > An example of those logs: > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > (I removed SSTableReader information because it's rather long... I can share > it privately if needed) > Note that the date has not been changed (only 1ms between logs) > It should not generate that gigantic amount of logs :) > This is probably linked to: > https://issues.apache.org/jira/browse/CASSANDRA-9637 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10829) cleanup + repair generates a lot of logs
[ https://issues.apache.org/jira/browse/CASSANDRA-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066392#comment-15066392 ] Marcus Eriksson commented on CASSANDRA-10829: - good point, pushed new commit on top of the old one and triggered new builds in cassci. > cleanup + repair generates a lot of logs > > > Key: CASSANDRA-10829 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10829 > Project: Cassandra > Issue Type: Bug > Environment: 5 nodes on Cassandra 2.1.11 (on Debian) >Reporter: Fabien Rousseau >Assignee: Marcus Eriksson > Fix For: 2.1.x > > > One of our node generates a lot of cassandra logs (int the 10 MB/s) and CPU > usage has increased (by a factor 2-3). > This was most probably triggered by a "nodetool snapshot" while a cleanup was > already running on this node. > An example of those logs: > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > (I removed SSTableReader information because it's rather long... I can share > it privately if needed) > Note that the date has not been changed (only 1ms between logs) > It should not generate that gigantic amount of logs :) > This is probably linked to: > https://issues.apache.org/jira/browse/CASSANDRA-9637 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10829) cleanup + repair generates a lot of logs
[ https://issues.apache.org/jira/browse/CASSANDRA-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064320#comment-15064320 ] Carl Yeksigian commented on CASSANDRA-10829: We need to ensure we only {{unmarkCompacting}} the sstables which we haven't already unmarked, otherwise we could be running a different compaction operation on the previously unmarked sstables and unmark it again after this operation is finished. > cleanup + repair generates a lot of logs > > > Key: CASSANDRA-10829 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10829 > Project: Cassandra > Issue Type: Bug > Environment: 5 nodes on Cassandra 2.1.11 (on Debian) >Reporter: Fabien Rousseau >Assignee: Marcus Eriksson > Fix For: 2.1.x > > > One of our node generates a lot of cassandra logs (int the 10 MB/s) and CPU > usage has increased (by a factor 2-3). > This was most probably triggered by a "nodetool snapshot" while a cleanup was > already running on this node. > An example of those logs: > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > (I removed SSTableReader information because it's rather long... I can share > it privately if needed) > Note that the date has not been changed (only 1ms between logs) > It should not generate that gigantic amount of logs :) > This is probably linked to: > https://issues.apache.org/jira/browse/CASSANDRA-9637 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10829) cleanup + repair generates a lot of logs
[ https://issues.apache.org/jira/browse/CASSANDRA-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063695#comment-15063695 ] Marcus Eriksson commented on CASSANDRA-10829: - correct except: bq. rate limit is already on 2.2+ (not related to this ticket) no, that is the only thing that is needed on 2.2+ > cleanup + repair generates a lot of logs > > > Key: CASSANDRA-10829 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10829 > Project: Cassandra > Issue Type: Bug > Environment: 5 nodes on Cassandra 2.1.11 (on Debian) >Reporter: Fabien Rousseau >Assignee: Marcus Eriksson > > One of our node generates a lot of cassandra logs (int the 10 MB/s) and CPU > usage has increased (by a factor 2-3). > This was most probably triggered by a "nodetool snapshot" while a cleanup was > already running on this node. > An example of those logs: > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > (I removed SSTableReader information because it's rather long... I can share > it privately if needed) > Note that the date has not been changed (only 1ms between logs) > It should not generate that gigantic amount of logs :) > This is probably linked to: > https://issues.apache.org/jira/browse/CASSANDRA-9637 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10829) cleanup + repair generates a lot of logs
[ https://issues.apache.org/jira/browse/CASSANDRA-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062366#comment-15062366 ] Fabien Rousseau commented on CASSANDRA-10829: - If I undertstand well: - log rate limit is now once every 100ms - rate limit is already on 2.2+ (not related to this ticket) - compacting sstables are now released sooner (as soon as it has been compacted, and no more at the end of the whole operation) So, thanks a lot for the fix. > cleanup + repair generates a lot of logs > > > Key: CASSANDRA-10829 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10829 > Project: Cassandra > Issue Type: Bug > Environment: 5 nodes on Cassandra 2.1.11 (on Debian) >Reporter: Fabien Rousseau >Assignee: Marcus Eriksson > > One of our node generates a lot of cassandra logs (int the 10 MB/s) and CPU > usage has increased (by a factor 2-3). > This was most probably triggered by a "nodetool snapshot" while a cleanup was > already running on this node. > An example of those logs: > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > (I removed SSTableReader information because it's rather long... I can share > it privately if needed) > Note that the date has not been changed (only 1ms between logs) > It should not generate that gigantic amount of logs :) > This is probably linked to: > https://issues.apache.org/jira/browse/CASSANDRA-9637 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10829) cleanup + repair generates a lot of logs
[ https://issues.apache.org/jira/browse/CASSANDRA-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062136#comment-15062136 ] Marcus Eriksson commented on CASSANDRA-10829: - Problem is that we use the sstables marked as 'compacting' to create the canonical view (to get the original sstable instances) and when running cleanup, scrub, upgradesstables and anticompaction we mark each sstable as compacted as we finish it, but we keep it as compacting until the entire operation is done. This means that we have sstables which are marked as compacted in the CANONICAL_SSTABLES and we cannot reference them and it makes us log the 'spinning ... ' message. There is also an issue with the log rate limiting, it was meant to only log every 100ms, but it logs every iteration. Patch that unmarks compacting once the sstable finished and fixes the rate limiting: https://github.com/krummas/cassandra/commits/marcuse/10829 and tests: https://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-10829-dtest/ https://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-10829-testall/ seems only the rate limiting fix applies to 2.2+ > cleanup + repair generates a lot of logs > > > Key: CASSANDRA-10829 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10829 > Project: Cassandra > Issue Type: Bug > Environment: 5 nodes on Cassandra 2.1.11 (on Debian) >Reporter: Fabien Rousseau >Assignee: Marcus Eriksson > > One of our node generates a lot of cassandra logs (int the 10 MB/s) and CPU > usage has increased (by a factor 2-3). > This was most probably triggered by a "nodetool snapshot" while a cleanup was > already running on this node. > An example of those logs: > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > (I removed SSTableReader information because it's rather long... I can share > it privately if needed) > Note that the date has not been changed (only 1ms between logs) > It should not generate that gigantic amount of logs :) > This is probably linked to: > https://issues.apache.org/jira/browse/CASSANDRA-9637 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10829) cleanup + repair generates a lot of logs
[ https://issues.apache.org/jira/browse/CASSANDRA-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059786#comment-15059786 ] Marcus Eriksson commented on CASSANDRA-10829: - I couldn't reproduce this with just a snapshot, triggering a repair while cleanup was running did reproduce, working on that > cleanup + repair generates a lot of logs > > > Key: CASSANDRA-10829 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10829 > Project: Cassandra > Issue Type: Bug > Environment: 5 nodes on Cassandra 2.1.11 (on Debian) >Reporter: Fabien Rousseau >Assignee: Marcus Eriksson > > One of our node generates a lot of cassandra logs (int the 10 MB/s) and CPU > usage has increased (by a factor 2-3). > This was most probably triggered by a "nodetool snapshot" while a cleanup was > already running on this node. > An example of those logs: > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,794 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1923 Spinning trying to > capture released readers [...] > 2015-12-08 09:15:17,795 INFO > [ValidationExecutor:689]ColumnFamilyStore.java:1924 Spinning trying to > capture all readers [...] > (I removed SSTableReader information because it's rather long... I can share > it privately if needed) > Note that the date has not been changed (only 1ms between logs) > It should not generate that gigantic amount of logs :) > This is probably linked to: > https://issues.apache.org/jira/browse/CASSANDRA-9637 -- This message was sent by Atlassian JIRA (v6.3.4#6332)