[jira] [Comment Edited] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

Paulo Motta (JIRA) Wed, 06 Dec 2017 06:36:21 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280241#comment-16280241
 ]


Paulo Motta edited comment on CASSANDRA-13948 at 12/6/17 2:35 PM:
------------------------------------------------------------------

bq. +1 on the 3.11 patch

Thanks for the review!

bq. need to check the trunk patch as well, cancelling "ready to commit" (i hope)

The merge went smoothly, most of the conflicts were related to CASSANDRA-9143, 
so I 
[updated|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-13948#diff-f9c882c974db60a710cf1f195cfdb801R113]
 {{CompactionStrategyManagerTest}} to mark a subset of sstables as repaired and 
pending repair to make sure sstables are being assigned the correct strategies 
for repaired and pending repair sstables.

However, there were 2 test failures in the trunk branch after the merge:
1. 
[testSetLocalCompactionStrategy|https://github.com/pauloricardomg/cassandra/blob/41416af426c41cdd38e157f38fb440342bca4dd0/test/unit/org/apache/cassandra/db/compaction/CompactionsCQLTest.java#L163]
2. 
[disk_balance_bootstrap_test|https://github.com/pauloricardomg/cassandra-dtest/blob/73d7a8e1deb5eab05867d804933621062c2f6762/disk_balance_test.py#L34]

1. was failing 
[here|https://github.com/pauloricardomg/cassandra/blob/41416af426c41cdd38e157f38fb440342bca4dd0/test/unit/org/apache/cassandra/db/compaction/CompactionsCQLTest.java#L175]
 because {{ALTER TABLE t WITH gc_grace_seconds = 1000}} was causing the 
manually set compaction strategy to be replaced by the strategy defined on the 
schema. After investigation, it turned out that the disk boundaries were being 
invalidated due to the schema reload (introduced by CASSANDRA-9425), and 
{{maybeReload(TableMetadata)}} was causing the compaction strategies to be 
reloaded with the schema settings instead of the manually set settings. In 
order to fix this, I split {{maybeReload}} in the original 
{{maybeReload(TableMetadata)}}, which should be called externally by 
{{ColumnFamilyStore}} and only reloads the strategies when the schema table 
parameters change, and {{maybeReloadDiskBoundaries}} which is used internally 
and reloads the compaction strategies with the same table settings when the 
disk boundaries are invalidated 
[here|https://github.com/apache/cassandra/commit/de5916e7c4f37736d5e1d06f0fc2b9c082b6bb99].

2.since the local ranges are not defined when the bootstrapping node starts, 
the disk boundaries are empty, but before CASSANDRA-9425 the boundaries were 
invalidated during keyspace construction 
([here|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/config/Schema.java#L388]),
 so the correct boundaries were used during streaming. After CASSANDRA-9425  
the boundaries were no longer reloaded during keyspace creation 
([here|https://github.com/apache/cassandra/blob/4c80eeece37d79f434078224a0504400ae10a20d/src/java/org/apache/cassandra/schema/Schema.java#L138]),
 so the empty boundaries were used during streaming and the disks were 
imbalanced. I had exactly the same problem on CASSANDRA-14083 
([here|https://issues.apache.org/jira/browse/CASSANDRA-14083?focusedCommentId=16272918&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16272918]).
 The solution is to invalidate the disk boundaries after the tokens are set 
during bootstrap 
([here|https://github.com/apache/cassandra/commit/a37bbda45142e1b351908a4ff5196eb08e92082b]).

After these two fixes, the tests were passing (failures seem unrelated - test 
screenshots from internal CI below):

||3.11||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.11...pauloricardomg:3.11-13948]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-13948]|[branch|https://github.com/apache/cassandra-dtest/compare/master...pauloricardomg:13948]|
|[testall|https://issues.apache.org/jira/secure/attachment/12900864/3.11-13948-testall.png]|[testall|https://issues.apache.org/jira/secure/attachment/12900862/trunk-13948-testall.png]|
|[dtest|https://issues.apache.org/jira/secure/attachment/12900865/3.11-13948-dtest.png]|[dtest|https://issues.apache.org/jira/secure/attachment/12900863/trunk-13948-dtest.png]|


was (Author: pauloricardomg):
bq. need to check the trunk patch as well, cancelling "ready to commit" (i hope)

The merge went smoothly, most of the conflicts were related to CASSANDRA-9143, 
so I 
[updated|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-13948#diff-f9c882c974db60a710cf1f195cfdb801R113]
 {{CompactionStrategyManagerTest}} to mark a subset of sstables as repaired and 
pending repair to make sure sstables are being assigned the correct strategies 
for repaired and pending repair sstables.

However, there were 2 test failures in the trunk branch after the merge:
1. 
[testSetLocalCompactionStrategy|https://github.com/pauloricardomg/cassandra/blob/41416af426c41cdd38e157f38fb440342bca4dd0/test/unit/org/apache/cassandra/db/compaction/CompactionsCQLTest.java#L163]
2. 
[disk_balance_bootstrap_test|https://github.com/pauloricardomg/cassandra-dtest/blob/73d7a8e1deb5eab05867d804933621062c2f6762/disk_balance_test.py#L34]

1. was failing 
[here|https://github.com/pauloricardomg/cassandra/blob/41416af426c41cdd38e157f38fb440342bca4dd0/test/unit/org/apache/cassandra/db/compaction/CompactionsCQLTest.java#L175]
 because {{ALTER TABLE t WITH gc_grace_seconds = 1000}} was causing the 
manually set compaction strategy to be replaced by the strategy defined on the 
schema. After investigation, it turned out that the disk boundaries were being 
invalidated due to the schema reload (introduced by CASSANDRA-9425), and 
{{maybeReload(TableMetadata)}} was causing the compaction strategies to be 
reloaded with the schema settings instead of the manually set settings. In 
order to fix this, I split {{maybeReload}} in the original 
{{maybeReload(TableMetadata)}}, which should be called externally by 
{{ColumnFamilyStore}} and only reloads the strategies when the schema table 
parameters change, and {{maybeReloadDiskBoundaries}} which is used internally 
and reloads the compaction strategies with the same table settings when the 
disk boundaries are invalidated 
[here|https://github.com/apache/cassandra/commit/de5916e7c4f37736d5e1d06f0fc2b9c082b6bb99].

2.since the local ranges are not defined when the bootstrapping node starts, 
the disk boundaries are empty, but before CASSANDRA-9425 the boundaries were 
invalidated during keyspace construction 
([here|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/config/Schema.java#L388]),
 so the correct boundaries were used during streaming. After CASSANDRA-9425  
the boundaries were no longer reloaded during keyspace creation 
([here|https://github.com/apache/cassandra/blob/4c80eeece37d79f434078224a0504400ae10a20d/src/java/org/apache/cassandra/schema/Schema.java#L138]),
 so the empty boundaries were used during streaming and the disks were 
imbalanced. I had exactly the same problem on CASSANDRA-14083 
([here|https://issues.apache.org/jira/browse/CASSANDRA-14083?focusedCommentId=16272918&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16272918]).
 The solution is to invalidate the disk boundaries after the tokens are set 
during bootstrap 
([here|https://github.com/apache/cassandra/commit/a37bbda45142e1b351908a4ff5196eb08e92082b]).

After these two fixes, the tests were passing (failures seem unrelated - test 
screenshots from internal CI below):

||3.11||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.11...pauloricardomg:3.11-13948]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-13948]|[branch|https://github.com/apache/cassandra-dtest/compare/master...pauloricardomg:13948]|
|[testall|https://issues.apache.org/jira/secure/attachment/12900864/3.11-13948-testall.png]|[testall|https://issues.apache.org/jira/secure/attachment/12900862/trunk-13948-testall.png]|
|[dtest|https://issues.apache.org/jira/secure/attachment/12900865/3.11-13948-dtest.png]|[dtest|https://issues.apache.org/jira/secure/attachment/12900863/trunk-13948-dtest.png]|

> Reload compaction strategies when JBOD disk boundary changes
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-13948
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>             Fix For: 3.11.x, 4.x
>
>         Attachments: 13948dtest.png, 13948testall.png, 3.11-13948-dtest.png, 
> 3.11-13948-testall.png, debug.log, dtest13948.png, dtest2.png, 
> threaddump-cleanup.txt, threaddump.txt, trace.log, trunk-13948-dtest.png, 
> trunk-13948-testall.png
>
>
> The thread dump below shows a race between an sstable replacement by the 
> {{IndexSummaryRedistribution}} and 
> {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=175 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
> @bci=1, line=836 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>  int) @bci=67, line=870 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) 
> @bci=17, line=1199 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, 
> line=943 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable,
>  java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification,
>  java.lang.Object) @bci=53, line=555 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection,
>  java.util.Collection, org.apache.cassandra.db.compaction.OperationType, 
> java.lang.Throwable) @bci=50, line=409 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable)
>  @bci=157, line=227 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable)
>  @bci=61, line=116 (Compiled frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit()
>  @bci=2, line=200 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish()
>  @bci=5, line=185 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries()
>  @bci=559, line=130 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=9, line=1420 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=4, line=250 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() 
> @bci=30, line=228 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() 
> @bci=4, line=125 (Interpreted frame)
>  - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run()
>  @bci=4, line=118 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
>  - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled 
> frame)
>  - 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>  @bci=1, line=180 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() 
> @bci=37, line=294 (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
>  @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
> {noformat}
> {noformat}
> Thread 94573: (state = IN_JAVA)
>  - java.util.HashMap$HashIterator.nextNode() @bci=95, line=1441 (Compiled 
> frame; information may be imprecise)
>  - java.util.HashMap$KeyIterator.next() @bci=1, line=1461 (Compiled frame)
>  - 
> org.apache.cassandra.db.lifecycle.View$3.apply(org.apache.cassandra.db.lifecycle.View)
>  @bci=20, line=268 (Compiled frame)
>  - org.apache.cassandra.db.lifecycle.View$3.apply(java.lang.Object) @bci=5, 
> line=265 (Compiled frame)
>  - 
> org.apache.cassandra.db.lifecycle.Tracker.apply(com.google.common.base.Predicate,
>  com.google.common.base.Function) @bci=13, line=133 (Compiled frame)
>  - org.apache.cassandra.db.lifecycle.Tracker.tryModify(java.lang.Iterable, 
> org.apache.cassandra.db.compaction.OperationType) @bci=31, line=99 (Compiled 
> frame)
>  - 
> org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(int)
>  @bci=84, line=139 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(int)
>  @bci=105, line=119 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run()
>  @bci=84, line=265 (Interpreted frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
>  @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
> {noformat}
> This particular node remain in this state forever, indicating 
> {{LeveledCompactionStrategyTask.getNextBackgroundTask}} was looping 
> indefinitely.
> What happened is that sstable references were replaced on the tracker by the 
> {{IndexSummaryRedistribution}} thread, so the 
> {{AbstractCompactionStrategy.getNextBackgroundTask}} could not create the 
> transaction with the old references, and the {{IndexSummaryRedistribution}} 
> could not update the sstable reference in the compaction strategy because 
> {{AbstractCompactionStrategy.getNextBackgroundTask}} was holding the 
> {{CompactionStrategyManager}} lock.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

Reply via email to