[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
[ https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221883#comment-16221883 ] Jan Urbański commented on CASSANDRA-13123: -- I just wanted to say thanks for fixing that test (aka cleaning up my mess). I was squinting at this for a while, but could not figure out the byteman/commitlog init interaction... > Draining a node might fail to delete all inactive commitlogs > > > Key: CASSANDRA-13123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13123 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jan Urbański >Assignee: Jan Urbański > Fix For: 3.0.16, 3.11.2, 4.0 > > Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, > 13123-trunk.txt > > > After issuing a drain command, it's possible that not all of the inactive > commitlogs are removed. > The drain command shuts down the CommitLog instance, which in turn shuts down > the CommitLogSegmentManager. This has the effect of discarding any pending > management tasks it might have, like the removal of inactive commitlogs. > This in turn leads to an excessive amount of commitlogs being left behind > after a drain and a lengthy recovery after a restart. With a fleet of dozens > of nodes, each of them leaving several GB of commitlogs after a drain and > taking up to two minutes to recover them on restart, the additional time > required to restart the entire fleet becomes noticeable. > This problem is not present in 3.x or trunk because of the CLSM rewrite done > in CASSANDRA-8844. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
[ https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220925#comment-16220925 ] Jeff Jirsa commented on CASSANDRA-13123: +1 (on splitting, and on your patch) > Draining a node might fail to delete all inactive commitlogs > > > Key: CASSANDRA-13123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13123 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jan Urbański >Assignee: Jan Urbański > Fix For: 3.0.16, 3.11.2, 4.0 > > Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, > 13123-trunk.txt > > > After issuing a drain command, it's possible that not all of the inactive > commitlogs are removed. > The drain command shuts down the CommitLog instance, which in turn shuts down > the CommitLogSegmentManager. This has the effect of discarding any pending > management tasks it might have, like the removal of inactive commitlogs. > This in turn leads to an excessive amount of commitlogs being left behind > after a drain and a lengthy recovery after a restart. With a fleet of dozens > of nodes, each of them leaving several GB of commitlogs after a drain and > taking up to two minutes to recover them on restart, the additional time > required to restart the entire fleet becomes noticeable. > This problem is not present in 3.x or trunk because of the CLSM rewrite done > in CASSANDRA-8844. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
[ https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220919#comment-16220919 ] Blake Eggleston commented on CASSANDRA-13123: - I don't think these 2 tests can be in the same test class without being run in a specific order. {{testCompressedCommitLogBackpressure}} needs it's byteman rules setup before the commit log is started. So if {{testShutdownWithPendingTasks}} sets up it's schema and successfully runs first, the other will hang. I have a branch where each test is in it's own class [here|https://github.com/bdeggleston/cassandra/tree/13123-fix-3.0], let me know if there are any objections > Draining a node might fail to delete all inactive commitlogs > > > Key: CASSANDRA-13123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13123 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jan Urbański >Assignee: Jan Urbański > Fix For: 3.0.16, 3.11.2, 4.0 > > Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, > 13123-trunk.txt > > > After issuing a drain command, it's possible that not all of the inactive > commitlogs are removed. > The drain command shuts down the CommitLog instance, which in turn shuts down > the CommitLogSegmentManager. This has the effect of discarding any pending > management tasks it might have, like the removal of inactive commitlogs. > This in turn leads to an excessive amount of commitlogs being left behind > after a drain and a lengthy recovery after a restart. With a fleet of dozens > of nodes, each of them leaving several GB of commitlogs after a drain and > taking up to two minutes to recover them on restart, the additional time > required to restart the entire fleet becomes noticeable. > This problem is not present in 3.x or trunk because of the CLSM rewrite done > in CASSANDRA-8844. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
[ https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213367#comment-16213367 ] Ariel Weisberg commented on CASSANDRA-13123: The new test that was added doesn't set up it's schema at all. It relies on the schema created during the previous test. Easy fix there. The original test (before the new one was added) configures DatabaseDescriptor before loading the commit log (after which it can't be reloaded). The new test loads the commit log before this configuration occurs causing the original test to fail because the commit log configuration is wrong. If you move the configuration into @BeforeClass the backpressure that is supposed to be in the original test doesn't occur and the test fails. No fragile test goes unpunished I guess. I think the might be an interaction with Byteman that I don't understand. > Draining a node might fail to delete all inactive commitlogs > > > Key: CASSANDRA-13123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13123 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jan Urbański >Assignee: Jan Urbański > Fix For: 3.0.16, 3.11.2, 4.0 > > Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, > 13123-trunk.txt > > > After issuing a drain command, it's possible that not all of the inactive > commitlogs are removed. > The drain command shuts down the CommitLog instance, which in turn shuts down > the CommitLogSegmentManager. This has the effect of discarding any pending > management tasks it might have, like the removal of inactive commitlogs. > This in turn leads to an excessive amount of commitlogs being left behind > after a drain and a lengthy recovery after a restart. With a fleet of dozens > of nodes, each of them leaving several GB of commitlogs after a drain and > taking up to two minutes to recover them on restart, the additional time > required to restart the entire fleet becomes noticeable. > This problem is not present in 3.x or trunk because of the CLSM rewrite done > in CASSANDRA-8844. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
[ https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16189523#comment-16189523 ] Joshua McKenzie commented on CASSANDRA-13123: - bq. suspect it may be a test ordering issue (if the two tests are run in one order they pass, in the other they fail, so probably setup/teardown conditions). The brittleness of CL startup/teardown in unit testing was a pretty significant pain in the ass when I was working on CDC. Stupp and I have both bumped up against that in the memorable recent past and tidied things up a bit, but I suspect it will require a more invasive re-arch of the segment allocation and CL startup/shutdown to get it really ironed out. > Draining a node might fail to delete all inactive commitlogs > > > Key: CASSANDRA-13123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13123 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jan Urbański >Assignee: Jan Urbański > Fix For: 3.0.15, 3.11.1, 4.0 > > Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, > 13123-trunk.txt > > > After issuing a drain command, it's possible that not all of the inactive > commitlogs are removed. > The drain command shuts down the CommitLog instance, which in turn shuts down > the CommitLogSegmentManager. This has the effect of discarding any pending > management tasks it might have, like the removal of inactive commitlogs. > This in turn leads to an excessive amount of commitlogs being left behind > after a drain and a lengthy recovery after a restart. With a fleet of dozens > of nodes, each of them leaving several GB of commitlogs after a drain and > taking up to two minutes to recover them on restart, the additional time > required to restart the entire fleet becomes noticeable. > This problem is not present in 3.x or trunk because of the CLSM rewrite done > in CASSANDRA-8844. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
[ https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187136#comment-16187136 ] Jeff Jirsa commented on CASSANDRA-13123: I noticed it on 3.0 branch, I haven't had time to investigate but I suspect it may be a test ordering issue (if the two tests are run in one order they pass, in the other they fail, so probably setup/teardown conditions). The first failure I see in cassci (datastax's CI environment, which I don't have access to other than the public read-only view) is http://cassci.datastax.com/job/cassandra-3.0_testall/954/ , which is the build after this change was committed ( http://cassci.datastax.com/job/cassandra-3.0_testall/953/ ) . It also fails in: http://cassci.datastax.com/job/cassandra-3.0_testall/964/ http://cassci.datastax.com/job/cassandra-3.0_testall/963/ http://cassci.datastax.com/job/cassandra-3.0_testall/956/ So % wise, it seems like 4 failures in the 15 builds since introduction. > Draining a node might fail to delete all inactive commitlogs > > > Key: CASSANDRA-13123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13123 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jan Urbański >Assignee: Jan Urbański > Fix For: 3.0.15, 3.11.1, 4.0 > > Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, > 13123-trunk.txt > > > After issuing a drain command, it's possible that not all of the inactive > commitlogs are removed. > The drain command shuts down the CommitLog instance, which in turn shuts down > the CommitLogSegmentManager. This has the effect of discarding any pending > management tasks it might have, like the removal of inactive commitlogs. > This in turn leads to an excessive amount of commitlogs being left behind > after a drain and a lengthy recovery after a restart. With a fleet of dozens > of nodes, each of them leaving several GB of commitlogs after a drain and > taking up to two minutes to recover them on restart, the additional time > required to restart the entire fleet becomes noticeable. > This problem is not present in 3.x or trunk because of the CLSM rewrite done > in CASSANDRA-8844. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
[ https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187024#comment-16187024 ] Jan Urbański commented on CASSANDRA-13123: -- Ugh, I'll take a look as soon as I can, thanks for the heads up. That's on master, right? > Draining a node might fail to delete all inactive commitlogs > > > Key: CASSANDRA-13123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13123 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jan Urbański >Assignee: Jan Urbański > Fix For: 3.0.15, 3.11.1, 4.0 > > Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, > 13123-trunk.txt > > > After issuing a drain command, it's possible that not all of the inactive > commitlogs are removed. > The drain command shuts down the CommitLog instance, which in turn shuts down > the CommitLogSegmentManager. This has the effect of discarding any pending > management tasks it might have, like the removal of inactive commitlogs. > This in turn leads to an excessive amount of commitlogs being left behind > after a drain and a lengthy recovery after a restart. With a fleet of dozens > of nodes, each of them leaving several GB of commitlogs after a drain and > taking up to two minutes to recover them on restart, the additional time > required to restart the entire fleet becomes noticeable. > This problem is not present in 3.x or trunk because of the CLSM rewrite done > in CASSANDRA-8844. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
[ https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186704#comment-16186704 ] Jeff Jirsa commented on CASSANDRA-13123: Hi folks, Pretty sure this commit breaks {{CommitLogSegmentManagerTest}} - have seen a pretty sharp rise in failures, and reverting this commit seems to solve them. {code} [junit] INFO 23:34:52 Initializing CommitLogTest.Standard1 [junit] INFO 23:34:52 Initializing CommitLogTest.Standard2 [junit] - --- [junit] Testcase: testShutdownWithPendingTasks(org.apache.cassandra.db.commitlog.CommitLogSegmentManagerTest): FAILED [junit] null [junit] junit.framework.AssertionFailedError [junit] at org.apache.cassandra.db.Keyspace.open(Keyspace.java:105) [junit] at org.apache.cassandra.db.commitlog.CommitLogSegmentManagerTest.testShutdownWithPendingTasks(CommitLogSegmentManagerTest.java:147) [junit] at org.jboss.byteman.contrib.bmunit.BMUnitRunner$10.evaluate(BMUnitRunner.java:371) [junit] at org.jboss.byteman.contrib.bmunit.BMUnitRunner$6.evaluate(BMUnitRunner.java:241) [junit] at org.jboss.byteman.contrib.bmunit.BMUnitRunner$1.evaluate(BMUnitRunner.java:75) [junit] [junit] [junit] Testcase: testCompressedCommitLogBackpressure(org.apache.cassandra.db.commitlog.CommitLogSegmentManagerTest): FAILED [junit] expected:<3> but was:<1> [junit] junit.framework.AssertionFailedError: expected:<3> but was:<1> [junit] at org.apache.cassandra.Util.spinAssertEquals(Util.java:535) [junit] at org.apache.cassandra.db.commitlog.CommitLogSegmentManagerTest.testCompressedCommitLogBackpressure(CommitLogSegmentManagerTest.java:112) [junit] at org.jboss.byteman.contrib.bmunit.BMUnitRunner$9.evaluate(BMUnitRunner.java:342) [junit] at org.jboss.byteman.contrib.bmunit.BMUnitRunner$6.evaluate(BMUnitRunner.java:241) [junit] at org.jboss.byteman.contrib.bmunit.BMUnitRunner$1.evaluate(BMUnitRunner.java:75) [junit] [junit] [junit] Test org.apache.cassandra.db.commitlog.CommitLogSegmentManagerTest FAILED [delete] Deleting directory /Users/jjirsa/Desktop/Dev/cassandra/build/test/cassandra/commitlog:0 [delete] Deleting directory /Users/jjirsa/Desktop/Dev/cassandra/build/test/cassandra/data:0 [delete] Deleting directory /Users/jjirsa/Desktop/Dev/cassandra/build/test/cassandra/saved_caches:0 [junitreport] Processing /Users/jjirsa/Desktop/Dev/cassandra/build/test/TESTS-TestSuites.xml to /var/folders/nq/4w83hn7s3h13dc5wxmvcdn9wgn/T/null1048031913 [junitreport] Loading stylesheet jar:file:/usr/local/ant/lib/ant-junit.jar!/org/apache/tools/ant/taskdefs/optional/junit/xsl/junit-frames.xsl [junitreport] Transform time: 256ms [junitreport] Deleting: /var/folders/nq/4w83hn7s3h13dc5wxmvcdn9wgn/T/null1048031913 {code} > Draining a node might fail to delete all inactive commitlogs > > > Key: CASSANDRA-13123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13123 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jan Urbański >Assignee: Jan Urbański > Fix For: 3.0.15, 3.11.1, 4.0 > > Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, > 13123-trunk.txt > > > After issuing a drain command, it's possible that not all of the inactive > commitlogs are removed. > The drain command shuts down the CommitLog instance, which in turn shuts down > the CommitLogSegmentManager. This has the effect of discarding any pending > management tasks it might have, like the removal of inactive commitlogs. > This in turn leads to an excessive amount of commitlogs being left behind > after a drain and a lengthy recovery after a restart. With a fleet of dozens > of nodes, each of them leaving several GB of commitlogs after a drain and > taking up to two minutes to recover them on restart, the additional time > required to restart the entire fleet becomes noticeable. > This problem is not present in 3.x or trunk because of the CLSM rewrite done > in CASSANDRA-8844. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
[ https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178816#comment-16178816 ] Jan Urbański commented on CASSANDRA-13123: -- No worries, thanks for the commit! > Draining a node might fail to delete all inactive commitlogs > > > Key: CASSANDRA-13123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13123 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jan Urbański >Assignee: Jan Urbański > Fix For: 3.0.15, 3.11.1, 4.0 > > Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, > 13123-trunk.txt > > > After issuing a drain command, it's possible that not all of the inactive > commitlogs are removed. > The drain command shuts down the CommitLog instance, which in turn shuts down > the CommitLogSegmentManager. This has the effect of discarding any pending > management tasks it might have, like the removal of inactive commitlogs. > This in turn leads to an excessive amount of commitlogs being left behind > after a drain and a lengthy recovery after a restart. With a fleet of dozens > of nodes, each of them leaving several GB of commitlogs after a drain and > taking up to two minutes to recover them on restart, the additional time > required to restart the entire fleet becomes noticeable. > This problem is not present in 3.x or trunk because of the CLSM rewrite done > in CASSANDRA-8844. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
[ https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163579#comment-16163579 ] Jason Brown commented on CASSANDRA-13123: - Sorry this fell off my review radar (more than) a few months ago. For the last month, however, I've been trying to run this patch, rebased on 3.0/3.11/trunk, on circleci and the results have almost always been broken (in ways seemingly unrelated to this ticket). I've run it locally and everything seemed legit, and I've now run the utests on the apache jenkins server, and things were good (a few completely unrelated things failed); ||3.0||3.11||trunk|| |[apache dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/9/]|[apache dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/10/]|[apache dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/11/]| Running the [dtests|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/304/] now (only for 3.0), and if it looks good I'll commit. > Draining a node might fail to delete all inactive commitlogs > > > Key: CASSANDRA-13123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13123 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jan Urbański >Assignee: Jan Urbański > Fix For: 3.8 > > Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, > 13123-trunk.txt > > > After issuing a drain command, it's possible that not all of the inactive > commitlogs are removed. > The drain command shuts down the CommitLog instance, which in turn shuts down > the CommitLogSegmentManager. This has the effect of discarding any pending > management tasks it might have, like the removal of inactive commitlogs. > This in turn leads to an excessive amount of commitlogs being left behind > after a drain and a lengthy recovery after a restart. With a fleet of dozens > of nodes, each of them leaving several GB of commitlogs after a drain and > taking up to two minutes to recover them on restart, the additional time > required to restart the entire fleet becomes noticeable. > This problem is not present in 3.x or trunk because of the CLSM rewrite done > in CASSANDRA-8844. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
[ https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962568#comment-15962568 ] Jan Urbański commented on CASSANDRA-13123: -- We've been running it for weeks with no problems, so +1 from me. > Draining a node might fail to delete all inactive commitlogs > > > Key: CASSANDRA-13123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13123 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jan Urbański >Assignee: Jan Urbański > Fix For: 3.8 > > Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, > 13123-trunk.txt > > > After issuing a drain command, it's possible that not all of the inactive > commitlogs are removed. > The drain command shuts down the CommitLog instance, which in turn shuts down > the CommitLogSegmentManager. This has the effect of discarding any pending > management tasks it might have, like the removal of inactive commitlogs. > This in turn leads to an excessive amount of commitlogs being left behind > after a drain and a lengthy recovery after a restart. With a fleet of dozens > of nodes, each of them leaving several GB of commitlogs after a drain and > taking up to two minutes to recover them on restart, the additional time > required to restart the entire fleet becomes noticeable. > This problem is not present in 3.x or trunk because of the CLSM rewrite done > in CASSANDRA-8844. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
[ https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962360#comment-15962360 ] Nate McCall commented on CASSANDRA-13123: - Ping [~jasobrown] [~wulczer] Are we good to commit on this then? > Draining a node might fail to delete all inactive commitlogs > > > Key: CASSANDRA-13123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13123 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jan Urbański >Assignee: Jan Urbański > Fix For: 3.8 > > Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, > 13123-trunk.txt > > > After issuing a drain command, it's possible that not all of the inactive > commitlogs are removed. > The drain command shuts down the CommitLog instance, which in turn shuts down > the CommitLogSegmentManager. This has the effect of discarding any pending > management tasks it might have, like the removal of inactive commitlogs. > This in turn leads to an excessive amount of commitlogs being left behind > after a drain and a lengthy recovery after a restart. With a fleet of dozens > of nodes, each of them leaving several GB of commitlogs after a drain and > taking up to two minutes to recover them on restart, the additional time > required to restart the entire fleet becomes noticeable. > This problem is not present in 3.x or trunk because of the CLSM rewrite done > in CASSANDRA-8844. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
[ https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831886#comment-15831886 ] Jason Brown commented on CASSANDRA-13123: - Pushed code for cassci to run tests. I think the change to {{CommitLogSegmentManager}} is probably legit, but I want to look at the test a little more before +1'ing it ||3.0||3.11||trunk|| |[branch|https://github.com/jasobrown/cassandra/tree/13123-3.0]|[branch|https://github.com/jasobrown/cassandra/tree/13123-3.11]|[branch|https://github.com/jasobrown/cassandra/tree/13123-trunk]| |[dtest|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13123-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13123-3.11-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13123-trunk-dtest/]| |[testall|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13123-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13123-3.11-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13123-trunk-testall/]| > Draining a node might fail to delete all inactive commitlogs > > > Key: CASSANDRA-13123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13123 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jan Urbański >Assignee: Jan Urbański > Fix For: 3.8 > > Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, > 13123-trunk.txt > > > After issuing a drain command, it's possible that not all of the inactive > commitlogs are removed. > The drain command shuts down the CommitLog instance, which in turn shuts down > the CommitLogSegmentManager. This has the effect of discarding any pending > management tasks it might have, like the removal of inactive commitlogs. > This in turn leads to an excessive amount of commitlogs being left behind > after a drain and a lengthy recovery after a restart. With a fleet of dozens > of nodes, each of them leaving several GB of commitlogs after a drain and > taking up to two minutes to recover them on restart, the additional time > required to restart the entire fleet becomes noticeable. > This problem is not present in 3.x or trunk because of the CLSM rewrite done > in CASSANDRA-8844. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
[ https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830254#comment-15830254 ] Jan Urbański commented on CASSANDRA-13123: -- [~jasobrown] I haven't had the chance to try this out in production yet, I'll try to do that tomorrow. The initial commitlog replay takes up to two minutes for each of our nodes right now and if I understand correctly, after a drain all commitlogs except for at most two would be deleted, so the initial replay phase would be reduced to essentially zero. The shutdown phase might take a bit longer, because it'll have to wait for those commitlogs to be deleted, of course. The exact improvement depends on the number of CLs left behind after a drain - on machines with heavily contended disks it can be a lot, on lightly loaded ones it might be 0. As to when we're doing drains, it's on every restart (it's part of the restart procedure that we have). > Draining a node might fail to delete all inactive commitlogs > > > Key: CASSANDRA-13123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13123 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jan Urbański >Assignee: Jan Urbański > Fix For: 3.8 > > Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, > 13123-trunk.txt > > > After issuing a drain command, it's possible that not all of the inactive > commitlogs are removed. > The drain command shuts down the CommitLog instance, which in turn shuts down > the CommitLogSegmentManager. This has the effect of discarding any pending > management tasks it might have, like the removal of inactive commitlogs. > This in turn leads to an excessive amount of commitlogs being left behind > after a drain and a lengthy recovery after a restart. With a fleet of dozens > of nodes, each of them leaving several GB of commitlogs after a drain and > taking up to two minutes to recover them on restart, the additional time > required to restart the entire fleet becomes noticeable. > This problem is not present in 3.x or trunk because of the CLSM rewrite done > in CASSANDRA-8844. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
[ https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829865#comment-15829865 ] Jason Brown commented on CASSANDRA-13123: - [~wulczer] Thanks for the patch. We are at the critical bug fix stage with 2.2, so I'll only look at the patch for 3.0 and up. I've taken a quick look and things seem legit (need to think about it a bit more), but can you comment on any startup improvement time you've observed, if you've deployed this? Also, when you are issuing a drain? On normal node restarts, or only at "special" events, like upgrading a node? > Draining a node might fail to delete all inactive commitlogs > > > Key: CASSANDRA-13123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13123 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jan Urbański > Fix For: 3.8 > > Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, > 13123-trunk.txt > > > After issuing a drain command, it's possible that not all of the inactive > commitlogs are removed. > The drain command shuts down the CommitLog instance, which in turn shuts down > the CommitLogSegmentManager. This has the effect of discarding any pending > management tasks it might have, like the removal of inactive commitlogs. > This in turn leads to an excessive amount of commitlogs being left behind > after a drain and a lengthy recovery after a restart. With a fleet of dozens > of nodes, each of them leaving several GB of commitlogs after a drain and > taking up to two minutes to recover them on restart, the additional time > required to restart the entire fleet becomes noticeable. > This problem is not present in 3.x or trunk because of the CLSM rewrite done > in CASSANDRA-8844. -- This message was sent by Atlassian JIRA (v6.3.4#6332)