[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs

2017-10-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221883#comment-16221883
 ] 

Jan Urbański commented on CASSANDRA-13123:
--

I just wanted to say thanks for fixing that test (aka cleaning up my mess). I 
was squinting at this for a while, but could not figure out the 
byteman/commitlog init interaction...

> Draining a node might fail to delete all inactive commitlogs
> 
>
> Key: CASSANDRA-13123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jan Urbański
>Assignee: Jan Urbański
> Fix For: 3.0.16, 3.11.2, 4.0
>
> Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 
> 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive 
> commitlogs are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down 
> the CommitLogSegmentManager. This has the effect of discarding any pending 
> management tasks it might have, like the removal of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind 
> after a drain and a lengthy recovery after a restart. With a fleet of dozens 
> of nodes, each of them leaving several GB of commitlogs after a drain and 
> taking up to two minutes to recover them on restart, the additional time 
> required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done 
> in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs

2017-10-26 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220925#comment-16220925
 ] 

Jeff Jirsa commented on CASSANDRA-13123:


+1 (on splitting, and on your patch)




> Draining a node might fail to delete all inactive commitlogs
> 
>
> Key: CASSANDRA-13123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jan Urbański
>Assignee: Jan Urbański
> Fix For: 3.0.16, 3.11.2, 4.0
>
> Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 
> 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive 
> commitlogs are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down 
> the CommitLogSegmentManager. This has the effect of discarding any pending 
> management tasks it might have, like the removal of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind 
> after a drain and a lengthy recovery after a restart. With a fleet of dozens 
> of nodes, each of them leaving several GB of commitlogs after a drain and 
> taking up to two minutes to recover them on restart, the additional time 
> required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done 
> in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs

2017-10-26 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220919#comment-16220919
 ] 

Blake Eggleston commented on CASSANDRA-13123:
-

I don't think these 2 tests can be in the same test class without being run in 
a specific order. {{testCompressedCommitLogBackpressure}} needs it's byteman 
rules setup before the commit log is started. So if 
{{testShutdownWithPendingTasks}} sets up it's schema and successfully runs 
first, the other will hang.

I have a branch where each test is in it's own class 
[here|https://github.com/bdeggleston/cassandra/tree/13123-fix-3.0], let me know 
if there are any objections

> Draining a node might fail to delete all inactive commitlogs
> 
>
> Key: CASSANDRA-13123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jan Urbański
>Assignee: Jan Urbański
> Fix For: 3.0.16, 3.11.2, 4.0
>
> Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 
> 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive 
> commitlogs are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down 
> the CommitLogSegmentManager. This has the effect of discarding any pending 
> management tasks it might have, like the removal of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind 
> after a drain and a lengthy recovery after a restart. With a fleet of dozens 
> of nodes, each of them leaving several GB of commitlogs after a drain and 
> taking up to two minutes to recover them on restart, the additional time 
> required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done 
> in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs

2017-10-20 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213367#comment-16213367
 ] 

Ariel Weisberg commented on CASSANDRA-13123:


The new test that was added doesn't set up it's schema at all. It relies on the 
schema created during the previous test. Easy fix there.

The original test (before the new one was added) configures DatabaseDescriptor 
before loading the commit log (after which it can't be reloaded). The new test 
loads the commit log before this configuration occurs causing the original test 
to fail because the commit log configuration is wrong.

If you move the configuration into @BeforeClass the backpressure that is 
supposed to be in the original test doesn't occur and the test fails. No 
fragile test goes unpunished I guess. I think the might be an interaction with 
Byteman that I don't understand.

> Draining a node might fail to delete all inactive commitlogs
> 
>
> Key: CASSANDRA-13123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jan Urbański
>Assignee: Jan Urbański
> Fix For: 3.0.16, 3.11.2, 4.0
>
> Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 
> 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive 
> commitlogs are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down 
> the CommitLogSegmentManager. This has the effect of discarding any pending 
> management tasks it might have, like the removal of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind 
> after a drain and a lengthy recovery after a restart. With a fleet of dozens 
> of nodes, each of them leaving several GB of commitlogs after a drain and 
> taking up to two minutes to recover them on restart, the additional time 
> required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done 
> in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs

2017-10-03 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16189523#comment-16189523
 ] 

Joshua McKenzie commented on CASSANDRA-13123:
-

bq.  suspect it may be a test ordering issue (if the two tests are run in one 
order they pass, in the other they fail, so probably setup/teardown conditions).
The brittleness of CL startup/teardown in unit testing was a pretty significant 
pain in the ass when I was working on CDC. Stupp and I have both bumped up 
against that in the memorable recent past and tidied things up a bit, but I 
suspect it will require a more invasive re-arch of the segment allocation and 
CL startup/shutdown to get it really ironed out.

> Draining a node might fail to delete all inactive commitlogs
> 
>
> Key: CASSANDRA-13123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jan Urbański
>Assignee: Jan Urbański
> Fix For: 3.0.15, 3.11.1, 4.0
>
> Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 
> 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive 
> commitlogs are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down 
> the CommitLogSegmentManager. This has the effect of discarding any pending 
> management tasks it might have, like the removal of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind 
> after a drain and a lengthy recovery after a restart. With a fleet of dozens 
> of nodes, each of them leaving several GB of commitlogs after a drain and 
> taking up to two minutes to recover them on restart, the additional time 
> required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done 
> in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs

2017-09-30 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187136#comment-16187136
 ] 

Jeff Jirsa commented on CASSANDRA-13123:


I noticed it on 3.0 branch, I haven't had time to investigate but I suspect it 
may be a test ordering issue (if the two tests are run in one order they pass, 
in the other they fail, so probably setup/teardown conditions).

The first failure I see in cassci (datastax's CI environment, which I don't 
have access to other than the public read-only view) is 
http://cassci.datastax.com/job/cassandra-3.0_testall/954/ , which is the build 
after this change was committed ( 
http://cassci.datastax.com/job/cassandra-3.0_testall/953/ ) .

It also fails in:
http://cassci.datastax.com/job/cassandra-3.0_testall/964/
http://cassci.datastax.com/job/cassandra-3.0_testall/963/
http://cassci.datastax.com/job/cassandra-3.0_testall/956/

So % wise, it seems like 4 failures in the 15 builds since introduction.





> Draining a node might fail to delete all inactive commitlogs
> 
>
> Key: CASSANDRA-13123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jan Urbański
>Assignee: Jan Urbański
> Fix For: 3.0.15, 3.11.1, 4.0
>
> Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 
> 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive 
> commitlogs are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down 
> the CommitLogSegmentManager. This has the effect of discarding any pending 
> management tasks it might have, like the removal of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind 
> after a drain and a lengthy recovery after a restart. With a fleet of dozens 
> of nodes, each of them leaving several GB of commitlogs after a drain and 
> taking up to two minutes to recover them on restart, the additional time 
> required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done 
> in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs

2017-09-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187024#comment-16187024
 ] 

Jan Urbański commented on CASSANDRA-13123:
--

Ugh, I'll take a look as soon as I can, thanks for the heads up. That's on 
master, right?

> Draining a node might fail to delete all inactive commitlogs
> 
>
> Key: CASSANDRA-13123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jan Urbański
>Assignee: Jan Urbański
> Fix For: 3.0.15, 3.11.1, 4.0
>
> Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 
> 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive 
> commitlogs are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down 
> the CommitLogSegmentManager. This has the effect of discarding any pending 
> management tasks it might have, like the removal of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind 
> after a drain and a lengthy recovery after a restart. With a fleet of dozens 
> of nodes, each of them leaving several GB of commitlogs after a drain and 
> taking up to two minutes to recover them on restart, the additional time 
> required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done 
> in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs

2017-09-29 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186704#comment-16186704
 ] 

Jeff Jirsa commented on CASSANDRA-13123:


Hi folks,

Pretty sure this commit breaks {{CommitLogSegmentManagerTest}} - have seen a 
pretty sharp rise in failures, and reverting this commit seems to solve them.

{code}
[junit] INFO  23:34:52 Initializing CommitLogTest.Standard1
[junit] INFO  23:34:52 Initializing CommitLogTest.Standard2
[junit] -  ---
[junit] Testcase: 
testShutdownWithPendingTasks(org.apache.cassandra.db.commitlog.CommitLogSegmentManagerTest):
  FAILED
[junit] null
[junit] junit.framework.AssertionFailedError
[junit] at org.apache.cassandra.db.Keyspace.open(Keyspace.java:105)
[junit] at 
org.apache.cassandra.db.commitlog.CommitLogSegmentManagerTest.testShutdownWithPendingTasks(CommitLogSegmentManagerTest.java:147)
[junit] at 
org.jboss.byteman.contrib.bmunit.BMUnitRunner$10.evaluate(BMUnitRunner.java:371)
[junit] at 
org.jboss.byteman.contrib.bmunit.BMUnitRunner$6.evaluate(BMUnitRunner.java:241)
[junit] at 
org.jboss.byteman.contrib.bmunit.BMUnitRunner$1.evaluate(BMUnitRunner.java:75)
[junit]
[junit]
[junit] Testcase: 
testCompressedCommitLogBackpressure(org.apache.cassandra.db.commitlog.CommitLogSegmentManagerTest):
   FAILED
[junit] expected:<3> but was:<1>
[junit] junit.framework.AssertionFailedError: expected:<3> but was:<1>
[junit] at org.apache.cassandra.Util.spinAssertEquals(Util.java:535)
[junit] at 
org.apache.cassandra.db.commitlog.CommitLogSegmentManagerTest.testCompressedCommitLogBackpressure(CommitLogSegmentManagerTest.java:112)
[junit] at 
org.jboss.byteman.contrib.bmunit.BMUnitRunner$9.evaluate(BMUnitRunner.java:342)
[junit] at 
org.jboss.byteman.contrib.bmunit.BMUnitRunner$6.evaluate(BMUnitRunner.java:241)
[junit] at 
org.jboss.byteman.contrib.bmunit.BMUnitRunner$1.evaluate(BMUnitRunner.java:75)
[junit]
[junit]
[junit] Test org.apache.cassandra.db.commitlog.CommitLogSegmentManagerTest 
FAILED
   [delete] Deleting directory 
/Users/jjirsa/Desktop/Dev/cassandra/build/test/cassandra/commitlog:0
   [delete] Deleting directory 
/Users/jjirsa/Desktop/Dev/cassandra/build/test/cassandra/data:0
   [delete] Deleting directory 
/Users/jjirsa/Desktop/Dev/cassandra/build/test/cassandra/saved_caches:0
[junitreport] Processing 
/Users/jjirsa/Desktop/Dev/cassandra/build/test/TESTS-TestSuites.xml to 
/var/folders/nq/4w83hn7s3h13dc5wxmvcdn9wgn/T/null1048031913
[junitreport] Loading stylesheet 
jar:file:/usr/local/ant/lib/ant-junit.jar!/org/apache/tools/ant/taskdefs/optional/junit/xsl/junit-frames.xsl
[junitreport] Transform time: 256ms
[junitreport] Deleting: 
/var/folders/nq/4w83hn7s3h13dc5wxmvcdn9wgn/T/null1048031913
{code}


> Draining a node might fail to delete all inactive commitlogs
> 
>
> Key: CASSANDRA-13123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jan Urbański
>Assignee: Jan Urbański
> Fix For: 3.0.15, 3.11.1, 4.0
>
> Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 
> 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive 
> commitlogs are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down 
> the CommitLogSegmentManager. This has the effect of discarding any pending 
> management tasks it might have, like the removal of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind 
> after a drain and a lengthy recovery after a restart. With a fleet of dozens 
> of nodes, each of them leaving several GB of commitlogs after a drain and 
> taking up to two minutes to recover them on restart, the additional time 
> required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done 
> in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs

2017-09-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178816#comment-16178816
 ] 

Jan Urbański commented on CASSANDRA-13123:
--

No worries, thanks for the commit!

> Draining a node might fail to delete all inactive commitlogs
> 
>
> Key: CASSANDRA-13123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jan Urbański
>Assignee: Jan Urbański
> Fix For: 3.0.15, 3.11.1, 4.0
>
> Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 
> 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive 
> commitlogs are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down 
> the CommitLogSegmentManager. This has the effect of discarding any pending 
> management tasks it might have, like the removal of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind 
> after a drain and a lengthy recovery after a restart. With a fleet of dozens 
> of nodes, each of them leaving several GB of commitlogs after a drain and 
> taking up to two minutes to recover them on restart, the additional time 
> required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done 
> in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs

2017-09-12 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163579#comment-16163579
 ] 

Jason Brown commented on CASSANDRA-13123:
-

Sorry this fell off my review radar (more than) a few months ago. For the last 
month, however, I've been trying to run this patch, rebased on 3.0/3.11/trunk, 
on circleci and the results have almost always been broken (in ways seemingly 
unrelated to this ticket). I've run it locally and everything seemed legit, and 
I've now run the utests on the apache jenkins server, and things were good (a 
few completely unrelated things failed);

||3.0||3.11||trunk||
|[apache 
dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/9/]|[apache
 
dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/10/]|[apache
 
dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/11/]|

Running the 
[dtests|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/304/]
 now (only for 3.0), and if it looks good I'll commit.

> Draining a node might fail to delete all inactive commitlogs
> 
>
> Key: CASSANDRA-13123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jan Urbański
>Assignee: Jan Urbański
> Fix For: 3.8
>
> Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 
> 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive 
> commitlogs are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down 
> the CommitLogSegmentManager. This has the effect of discarding any pending 
> management tasks it might have, like the removal of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind 
> after a drain and a lengthy recovery after a restart. With a fleet of dozens 
> of nodes, each of them leaving several GB of commitlogs after a drain and 
> taking up to two minutes to recover them on restart, the additional time 
> required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done 
> in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs

2017-04-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962568#comment-15962568
 ] 

Jan Urbański commented on CASSANDRA-13123:
--

We've been running it for weeks with no problems, so +1 from me.

> Draining a node might fail to delete all inactive commitlogs
> 
>
> Key: CASSANDRA-13123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jan Urbański
>Assignee: Jan Urbański
> Fix For: 3.8
>
> Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 
> 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive 
> commitlogs are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down 
> the CommitLogSegmentManager. This has the effect of discarding any pending 
> management tasks it might have, like the removal of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind 
> after a drain and a lengthy recovery after a restart. With a fleet of dozens 
> of nodes, each of them leaving several GB of commitlogs after a drain and 
> taking up to two minutes to recover them on restart, the additional time 
> required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done 
> in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs

2017-04-09 Thread Nate McCall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962360#comment-15962360
 ] 

Nate McCall commented on CASSANDRA-13123:
-

Ping [~jasobrown] [~wulczer] Are we good to commit on this then? 

> Draining a node might fail to delete all inactive commitlogs
> 
>
> Key: CASSANDRA-13123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jan Urbański
>Assignee: Jan Urbański
> Fix For: 3.8
>
> Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 
> 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive 
> commitlogs are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down 
> the CommitLogSegmentManager. This has the effect of discarding any pending 
> management tasks it might have, like the removal of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind 
> after a drain and a lengthy recovery after a restart. With a fleet of dozens 
> of nodes, each of them leaving several GB of commitlogs after a drain and 
> taking up to two minutes to recover them on restart, the additional time 
> required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done 
> in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs

2017-01-20 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831886#comment-15831886
 ] 

Jason Brown commented on CASSANDRA-13123:
-

Pushed code for cassci to run tests. I think the change to 
{{CommitLogSegmentManager}} is probably legit, but I want to look at the test a 
little more before +1'ing it

||3.0||3.11||trunk||
|[branch|https://github.com/jasobrown/cassandra/tree/13123-3.0]|[branch|https://github.com/jasobrown/cassandra/tree/13123-3.11]|[branch|https://github.com/jasobrown/cassandra/tree/13123-trunk]|
|[dtest|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13123-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13123-3.11-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13123-trunk-dtest/]|
|[testall|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13123-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13123-3.11-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13123-trunk-testall/]|


> Draining a node might fail to delete all inactive commitlogs
> 
>
> Key: CASSANDRA-13123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jan Urbański
>Assignee: Jan Urbański
> Fix For: 3.8
>
> Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 
> 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive 
> commitlogs are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down 
> the CommitLogSegmentManager. This has the effect of discarding any pending 
> management tasks it might have, like the removal of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind 
> after a drain and a lengthy recovery after a restart. With a fleet of dozens 
> of nodes, each of them leaving several GB of commitlogs after a drain and 
> taking up to two minutes to recover them on restart, the additional time 
> required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done 
> in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs

2017-01-19 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830254#comment-15830254
 ] 

Jan Urbański commented on CASSANDRA-13123:
--

[~jasobrown] I haven't had the chance to try this out in production yet, I'll 
try to do that tomorrow. The initial commitlog replay takes up to two minutes 
for each of our nodes right now and if I understand correctly, after a drain 
all commitlogs except for at most two would be deleted, so the initial replay 
phase would be reduced to essentially zero. The shutdown phase might take a bit 
longer, because it'll have to wait for those commitlogs to be deleted, of 
course.

The exact improvement depends on the number of CLs left behind after a drain - 
on machines with heavily contended disks it can be a lot, on lightly loaded 
ones it might be 0.

As to when we're doing drains, it's on every restart (it's part of the restart 
procedure that we have).

> Draining a node might fail to delete all inactive commitlogs
> 
>
> Key: CASSANDRA-13123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jan Urbański
>Assignee: Jan Urbański
> Fix For: 3.8
>
> Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 
> 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive 
> commitlogs are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down 
> the CommitLogSegmentManager. This has the effect of discarding any pending 
> management tasks it might have, like the removal of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind 
> after a drain and a lengthy recovery after a restart. With a fleet of dozens 
> of nodes, each of them leaving several GB of commitlogs after a drain and 
> taking up to two minutes to recover them on restart, the additional time 
> required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done 
> in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs

2017-01-19 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829865#comment-15829865
 ] 

Jason Brown commented on CASSANDRA-13123:
-

[~wulczer] Thanks for the patch. We are at the critical bug fix stage with 2.2, 
so I'll only look at the patch for 3.0 and up. I've taken a quick look and 
things seem legit (need to think about it a bit more), but can you comment on 
any startup improvement time you've observed, if you've deployed this?

Also, when you are issuing a drain? On normal node restarts, or only at 
"special" events, like upgrading a node?

> Draining a node might fail to delete all inactive commitlogs
> 
>
> Key: CASSANDRA-13123
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jan Urbański
> Fix For: 3.8
>
> Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 
> 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive 
> commitlogs are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down 
> the CommitLogSegmentManager. This has the effect of discarding any pending 
> management tasks it might have, like the removal of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind 
> after a drain and a lengthy recovery after a restart. With a fleet of dozens 
> of nodes, each of them leaving several GB of commitlogs after a drain and 
> taking up to two minutes to recover them on restart, the additional time 
> required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done 
> in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)