[jira] [Commented] (CASSANDRA-11479) BatchlogManager unit tests failing on truncate race condition

2021-09-08 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411791#comment-17411791
 ] 

Benjamin Lerer commented on CASSANDRA-11479:


It does not seems to be a problem anymore.

> BatchlogManager unit tests failing on truncate race condition
> -
>
> Key: CASSANDRA-11479
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11479
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Joel Knighton
>Assignee: Yuki Morishita
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
> Attachments: 
> TEST-org.apache.cassandra.batchlog.BatchlogManagerTest.log
>
>
> Example on CI 
> [here|http://cassci.datastax.com/job/trunk_testall/818/testReport/junit/org.apache.cassandra.batchlog/BatchlogManagerTest/testLegacyReplay_compression/].
>  This seems to have only started happening relatively recently (within the 
> last month or two).
> As far as I can tell, this is only showing up on BatchlogManagerTests purely 
> because it is an aggressive user of truncate. The assertion is hit in the 
> setUp method, so it can happen before any of the test methods. The assertion 
> occurs because a compaction is happening when truncate wants to discard 
> SSTables; trace level logs suggest that this compaction is submitted after 
> the pause on the CompactionStrategyManager.
> This should be reproducible by running BatchlogManagerTest in a loop - it 
> takes up to half an hour in my experience. A trace-level log from such a run 
> is attached - grep for my added log message "SSTABLES COMPACTING WHEN 
> DISCARDING" to find when the assert is hit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11479) BatchlogManager unit tests failing on truncate race condition

2019-04-01 Thread mck (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807202#comment-16807202
 ] 

mck commented on CASSANDRA-11479:
-

[~yukim] and [~jkni], is this still a problem?

I ran the following over 12 hours without a single failure on the 
{{cassandra-2.2}} branch.
{code}while : ; do ant test -Dtest.name=BatchlogManagerTest ; [[ "$?" -eq 0 ]] 
|| break ; done{code}

> BatchlogManager unit tests failing on truncate race condition
> -
>
> Key: CASSANDRA-11479
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11479
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Joel Knighton
>Assignee: Yuki Morishita
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
> Attachments: 
> TEST-org.apache.cassandra.batchlog.BatchlogManagerTest.log
>
>
> Example on CI 
> [here|http://cassci.datastax.com/job/trunk_testall/818/testReport/junit/org.apache.cassandra.batchlog/BatchlogManagerTest/testLegacyReplay_compression/].
>  This seems to have only started happening relatively recently (within the 
> last month or two).
> As far as I can tell, this is only showing up on BatchlogManagerTests purely 
> because it is an aggressive user of truncate. The assertion is hit in the 
> setUp method, so it can happen before any of the test methods. The assertion 
> occurs because a compaction is happening when truncate wants to discard 
> SSTables; trace level logs suggest that this compaction is submitted after 
> the pause on the CompactionStrategyManager.
> This should be reproducible by running BatchlogManagerTest in a loop - it 
> takes up to half an hour in my experience. A trace-level log from such a run 
> is attached - grep for my added log message "SSTABLES COMPACTING WHEN 
> DISCARDING" to find when the assert is hit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11479) BatchlogManager unit tests failing on truncate race condition

2018-06-06 Thread Kurt Greaves (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503195#comment-16503195
 ] 

Kurt Greaves commented on CASSANDRA-11479:
--

Probably shouldn't be removing {{waitForCessation(Iterable 
cfss)}} and instead just overloading so as not to break CompactionManager API 
between versions.

> BatchlogManager unit tests failing on truncate race condition
> -
>
> Key: CASSANDRA-11479
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11479
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Joel Knighton
>Assignee: Yuki Morishita
>Priority: Major
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
> Attachments: 
> TEST-org.apache.cassandra.batchlog.BatchlogManagerTest.log
>
>
> Example on CI 
> [here|http://cassci.datastax.com/job/trunk_testall/818/testReport/junit/org.apache.cassandra.batchlog/BatchlogManagerTest/testLegacyReplay_compression/].
>  This seems to have only started happening relatively recently (within the 
> last month or two).
> As far as I can tell, this is only showing up on BatchlogManagerTests purely 
> because it is an aggressive user of truncate. The assertion is hit in the 
> setUp method, so it can happen before any of the test methods. The assertion 
> occurs because a compaction is happening when truncate wants to discard 
> SSTables; trace level logs suggest that this compaction is submitted after 
> the pause on the CompactionStrategyManager.
> This should be reproducible by running BatchlogManagerTest in a loop - it 
> takes up to half an hour in my experience. A trace-level log from such a run 
> is attached - grep for my added log message "SSTABLES COMPACTING WHEN 
> DISCARDING" to find when the assert is hit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11479) BatchlogManager unit tests failing on truncate race condition

2017-02-13 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863235#comment-15863235
 ] 

Joel Knighton commented on CASSANDRA-11479:
---

More on my plate than Yuki's, I believe. The patch made sense to me, but I was 
waiting for a chance to dig deeper into the related compaction code before 
giving it a final OK. In the process, it slipped through the cracks quite 
badly. I'd be happy to do that, but it likely wouldn't happen in the next few 
days if you're interested in taking it on instead.

> BatchlogManager unit tests failing on truncate race condition
> -
>
> Key: CASSANDRA-11479
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11479
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Joel Knighton
>Assignee: Yuki Morishita
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
> Attachments: 
> TEST-org.apache.cassandra.batchlog.BatchlogManagerTest.log
>
>
> Example on CI 
> [here|http://cassci.datastax.com/job/trunk_testall/818/testReport/junit/org.apache.cassandra.batchlog/BatchlogManagerTest/testLegacyReplay_compression/].
>  This seems to have only started happening relatively recently (within the 
> last month or two).
> As far as I can tell, this is only showing up on BatchlogManagerTests purely 
> because it is an aggressive user of truncate. The assertion is hit in the 
> setUp method, so it can happen before any of the test methods. The assertion 
> occurs because a compaction is happening when truncate wants to discard 
> SSTables; trace level logs suggest that this compaction is submitted after 
> the pause on the CompactionStrategyManager.
> This should be reproducible by running BatchlogManagerTest in a loop - it 
> takes up to half an hour in my experience. A trace-level log from such a run 
> is attached - grep for my added log message "SSTABLES COMPACTING WHEN 
> DISCARDING" to find when the assert is hit.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-11479) BatchlogManager unit tests failing on truncate race condition

2017-02-10 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861990#comment-15861990
 ] 

Jeff Jirsa commented on CASSANDRA-11479:


[~yukim] / [~jkni] - is this still on your (collective) plates?



> BatchlogManager unit tests failing on truncate race condition
> -
>
> Key: CASSANDRA-11479
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11479
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Joel Knighton
>Assignee: Yuki Morishita
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
> Attachments: 
> TEST-org.apache.cassandra.batchlog.BatchlogManagerTest.log
>
>
> Example on CI 
> [here|http://cassci.datastax.com/job/trunk_testall/818/testReport/junit/org.apache.cassandra.batchlog/BatchlogManagerTest/testLegacyReplay_compression/].
>  This seems to have only started happening relatively recently (within the 
> last month or two).
> As far as I can tell, this is only showing up on BatchlogManagerTests purely 
> because it is an aggressive user of truncate. The assertion is hit in the 
> setUp method, so it can happen before any of the test methods. The assertion 
> occurs because a compaction is happening when truncate wants to discard 
> SSTables; trace level logs suggest that this compaction is submitted after 
> the pause on the CompactionStrategyManager.
> This should be reproducible by running BatchlogManagerTest in a loop - it 
> takes up to half an hour in my experience. A trace-level log from such a run 
> is attached - grep for my added log message "SSTABLES COMPACTING WHEN 
> DISCARDING" to find when the assert is hit.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-11479) BatchlogManager unit tests failing on truncate race condition

2016-04-21 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253216#comment-15253216
 ] 

Yuki Morishita commented on CASSANDRA-11479:


||branch||testall||dtest||
|[11479-2.2|https://github.com/yukim/cassandra/tree/11479-2.2]|[testall|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-11479-2.2-testall/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-11479-2.2-dtest/lastCompletedBuild/testReport/]|

I created patch for 2.2 to see if this works.
Basically added one more condition to {{isCompacting}} to check if table is in 
{{compactingCF}}.
Table is added to {{compactingCF}} in {{submitBackground}} and removed at the 
end of {{BackgroundCompactionCandidate#run}}.

> BatchlogManager unit tests failing on truncate race condition
> -
>
> Key: CASSANDRA-11479
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11479
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Joel Knighton
>Assignee: Yuki Morishita
> Fix For: 2.2.x, 3.0.x, 3.x
>
> Attachments: 
> TEST-org.apache.cassandra.batchlog.BatchlogManagerTest.log
>
>
> Example on CI 
> [here|http://cassci.datastax.com/job/trunk_testall/818/testReport/junit/org.apache.cassandra.batchlog/BatchlogManagerTest/testLegacyReplay_compression/].
>  This seems to have only started happening relatively recently (within the 
> last month or two).
> As far as I can tell, this is only showing up on BatchlogManagerTests purely 
> because it is an aggressive user of truncate. The assertion is hit in the 
> setUp method, so it can happen before any of the test methods. The assertion 
> occurs because a compaction is happening when truncate wants to discard 
> SSTables; trace level logs suggest that this compaction is submitted after 
> the pause on the CompactionStrategyManager.
> This should be reproducible by running BatchlogManagerTest in a loop - it 
> takes up to half an hour in my experience. A trace-level log from such a run 
> is attached - grep for my added log message "SSTABLES COMPACTING WHEN 
> DISCARDING" to find when the assert is hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11479) BatchlogManager unit tests failing on truncate race condition

2016-04-01 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15222438#comment-15222438
 ] 

Yuki Morishita commented on CASSANDRA-11479:


Code-wise, It looks like compaction still can be running during 
{{runWithCompactionDisabled}}.

>From the log attached (Thank you!):

{noformat}
TRACE [MemtableFlushWriter:1] 2016-04-01 15:02:32,335 Scheduling a background 
task check for system.batches with SizeTieredCompactionStrategy
{noformat}

Here, compaction was submitted for flushed SSTable.

{noformat}
TRACE [main] 2016-04-01 15:02:32,337 Cancelling in-progress compactions for 
batches
TRACE [CompactionExecutor:1] 2016-04-01 15:02:32,337 Compaction buckets are []
TRACE [main] 2016-04-01 15:02:32,337 Compactions successfully cancelled
{noformat}

and cancelling compaction happened, but since compaction strategy (STCS) failed 
to grab compacting SSTables, compaction task was not yet created.

{code:java}
// interrupt in-progress compactions
CompactionManager.instance.interruptCompactionForCFs(selfWithAuxiliaryCfs, 
interruptValidation);
CompactionManager.instance.waitForCessation(selfWithAuxiliaryCfs);
{code}

did not happen since there was no {{CompactionTask}} and no SSTables were in 
compaction yet.

So,

{noformat}
TRACE [main] 2016-04-01 15:02:32,337 Discarding sstable data for truncated CF + 
indexes
{noformat}

truncate proceeded, but

{noformat}
TRACE [CompactionExecutor:1] 2016-04-01 15:02:32,337 Compaction buckets are 
[[BigTableReader(path='build/test/cassandra/data:0/system/batches-919a4bc57a333573b03e13fc3f68b465/ma-5-big-Data.db'),
 
BigTableReader(path='build/test/cassandra/data:0/system/batches-919a4bc57a333573b03e13fc3f68b465/ma-4-big-Data.db')]]
{noformat}

in CompactionExecutor, [STCS was still trying to grab compaction candidate 
eagerly|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java#L180]
 and it succeeded, putting those two into {{compacting}} SSTable.
Thus, inside truncate logic,

{code:java}
public ReplayPosition discardSSTables(long truncatedAt)
{
assert data.getCompacting().isEmpty() : data.getCompacting();

}
{code}
throws AssertionError claiming some SSTables are still compacting.

{noformat}
INFO  [CompactionExecutor:1] 2016-04-01 15:02:32,345 Compaction interrupted: 
Compaction@919a4bc5-7a33-3573-b03e-13fc3f68b465(system, batches, 0/240554)bytes
{noformat}

Eventually, the submitted {{CompactionTask}} was aborted since 
{{runWithCompactionDisabled}} marks compaction strategy to {{pause()}} at first.

> BatchlogManager unit tests failing on truncate race condition
> -
>
> Key: CASSANDRA-11479
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11479
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Joel Knighton
> Attachments: 
> TEST-org.apache.cassandra.batchlog.BatchlogManagerTest.log
>
>
> Example on CI 
> [here|http://cassci.datastax.com/job/trunk_testall/818/testReport/junit/org.apache.cassandra.batchlog/BatchlogManagerTest/testLegacyReplay_compression/].
>  This seems to have only started happening relatively recently (within the 
> last month or two).
> As far as I can tell, this is only showing up on BatchlogManagerTests purely 
> because it is an aggressive user of truncate. The assertion is hit in the 
> setUp method, so it can happen before any of the test methods. The assertion 
> occurs because a compaction is happening when truncate wants to discard 
> SSTables; trace level logs suggest that this compaction is submitted after 
> the pause on the CompactionStrategyManager.
> This should be reproducible by running BatchlogManagerTest in a loop - it 
> takes up to half an hour in my experience. A trace-level log from such a run 
> is attached - grep for my added log message "SSTABLES COMPACTING WHEN 
> DISCARDING" to find when the assert is hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)