[ 
https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16267272#comment-16267272
 ] 

Paulo Motta edited comment on CASSANDRA-13948 at 11/30/17 1:00 AM:
-------------------------------------------------------------------

Thanks for the review!

After rebasing this on top of CASSANDRA-13215 and addressing your latest 
comments, I noticed a few things which could be improved and did the following 
updates:
* Since blacklisting a directory will refresh the disk boundaries, we only need 
to reload strategies when the disk boundary changes or the table parameters 
change. To avoid equals comparison every time we call {{maybeReload}}, I moved 
the {{isOutOfDate}} check from the {{DiskBoundaryManager}} to the 
{{DiskBoundaries}} object - which is invalidated when there are any boundary 
changes. 
([commit|https://github.com/pauloricardomg/cassandra/commit/662cd063ca2e1c382ba3cd5dc8032b0d3f12683c])
* I thought that it no longer makes sense to expose the compaction strategy 
index to outside the compaction strategy manager since it's possible to get the 
correct disk placement directly from {{CFS.getDiskBoundaries}}. This should 
prevent races when the {{CompactionStrategyManager}} reloads boundaries between 
successive calls to {{CSM.getCompactionStrategyIndex}}. [This 
commit|https://github.com/pauloricardomg/cassandra/commit/abd1340b000d4596d71f00e5de8507de967ee7a5]
 updates {{relocatesstables}} and {{scrub}} to use {{CFS.getDiskBoundaries}} 
instead, and make {{CSM.getCompactionStrategyIndex}} private.
* I found it a bit hard to reason about when to use {{maybeReload}} to write 
the documentation and made its use consistent across 
{{CompactionStrategyManager}} on [this 
commit|https://github.com/pauloricardomg/cassandra/commit/c0926e99edb1ffdcda16640eda6faf8e78da9e46])
 (as you suggested before) along with the documentation. I kept the previous 
call to {{maybeReload}} from {{ColumnFamilyStore.reload}}, but we could 
probably avoid this and make {{maybeReload}} private-only as this is being 
called on pretty much every operation.

It feels like we can simplify this and get rid of these locks altogether (or at 
least greatly reduce their scope) by encapsulating the disk boundaries and 
compaction strategies in an immutable object accessed with an atomic reference 
and pessimistically cancel any tasks with an old placement when the strategies 
are reloaded. This is a significant refactor of {{CompactionStrategyManager}} 
so we should probably do it another ticket.

I submitted internal CI with the [latest 
branch|https://github.com/pauloricardomg/cassandra/tree/3.11-13948] and will 
post the results here when ready. I will create a trunk version after this 
follow-up is reviewed.


was (Author: pauloricardomg):
Thanks for the review!

After rebasing this on top of CASSANDRA-13215 and addressing your latest 
comments, I noticed a few things which could be improved and did the following 
updates:
* Since blacklisting a directory will refresh the disk boundaries, we only need 
to reload strategies when the disk boundary changes or the table parameters 
change. To avoid equals comparison every time we call {{maybeReload}}, I moved 
the {{isOutOfDate}} check from the {{DiskBoundaryManager}} to the 
{{DiskBoundaries}} object - which is invalidated when there are any boundary 
changes. 
([commit|https://github.com/pauloricardomg/cassandra/commit/662cd063ca2e1c382ba3cd5dc8032b0d3f12683c])
* I thought that it no longer makes sense to expose the compaction strategy 
index to outside the compaction strategy manager since it's possible to get the 
correct disk placement directly from {{CFS.getDiskBoundaries}}. This should 
prevent races when the {{CompactionStrategyManager}} reloads boundaries between 
successive calls to {{CSM.getCompactionStrategyIndex}}. [This 
commit|https://github.com/pauloricardomg/cassandra/commit/abd1340b000d4596d71f00e5de8507de967ee7a5]
 updates {{relocatesstables}} and {{scrub}} to use {{CFS.getDiskBoundaries}} 
instead, and make {{CSM.getCompactionStrategyIndex}} private.
* I found it a bit hard to reason about when to use {{maybeReload}} to write 
the documentation and made its use consistent across 
{{CompactionStrategyManager}} on [this 
commit|https://github.com/pauloricardomg/cassandra/commit/8518d6c4f001641da36d6fd58474ed3b50476326])
 (as you suggested before) along with the documentation. I kept the previous 
call to {{maybeReload}} from {{ColumnFamilyStore.reload}}, but we could 
probably avoid this and make {{maybeReload}} private-only as this is being 
called on pretty much every operation.

It feels like we can simplify this and get rid of these locks altogether (or at 
least greatly reduce their scope) by encapsulating the disk boundaries and 
compaction strategies in an immutable object accessed with an atomic reference 
and pessimistically cancel any tasks with an old placement when the strategies 
are reloaded. This is a significant refactor of {{CompactionStrategyManager}} 
so we should probably do it another ticket.

I submitted internal CI with the [latest 
branch|https://github.com/pauloricardomg/cassandra/tree/3.11-13948] and will 
post the results here when ready. I will create a trunk version after this 
follow-up is reviewed.

> Reload compaction strategies when JBOD disk boundary changes
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-13948
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>             Fix For: 3.11.x, 4.x
>
>         Attachments: debug.log, dtest13948.png, dtest2.png, 
> threaddump-cleanup.txt, threaddump.txt, trace.log
>
>
> The thread dump below shows a race between an sstable replacement by the 
> {{IndexSummaryRedistribution}} and 
> {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=175 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
> @bci=1, line=836 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>  int) @bci=67, line=870 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) 
> @bci=17, line=1199 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, 
> line=943 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable,
>  java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification,
>  java.lang.Object) @bci=53, line=555 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection,
>  java.util.Collection, org.apache.cassandra.db.compaction.OperationType, 
> java.lang.Throwable) @bci=50, line=409 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable)
>  @bci=157, line=227 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable)
>  @bci=61, line=116 (Compiled frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit()
>  @bci=2, line=200 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish()
>  @bci=5, line=185 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries()
>  @bci=559, line=130 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=9, line=1420 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=4, line=250 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() 
> @bci=30, line=228 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() 
> @bci=4, line=125 (Interpreted frame)
>  - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run()
>  @bci=4, line=118 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
>  - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled 
> frame)
>  - 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>  @bci=1, line=180 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() 
> @bci=37, line=294 (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
>  @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
> {noformat}
> {noformat}
> Thread 94573: (state = IN_JAVA)
>  - java.util.HashMap$HashIterator.nextNode() @bci=95, line=1441 (Compiled 
> frame; information may be imprecise)
>  - java.util.HashMap$KeyIterator.next() @bci=1, line=1461 (Compiled frame)
>  - 
> org.apache.cassandra.db.lifecycle.View$3.apply(org.apache.cassandra.db.lifecycle.View)
>  @bci=20, line=268 (Compiled frame)
>  - org.apache.cassandra.db.lifecycle.View$3.apply(java.lang.Object) @bci=5, 
> line=265 (Compiled frame)
>  - 
> org.apache.cassandra.db.lifecycle.Tracker.apply(com.google.common.base.Predicate,
>  com.google.common.base.Function) @bci=13, line=133 (Compiled frame)
>  - org.apache.cassandra.db.lifecycle.Tracker.tryModify(java.lang.Iterable, 
> org.apache.cassandra.db.compaction.OperationType) @bci=31, line=99 (Compiled 
> frame)
>  - 
> org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(int)
>  @bci=84, line=139 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(int)
>  @bci=105, line=119 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run()
>  @bci=84, line=265 (Interpreted frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
>  @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
> {noformat}
> This particular node remain in this state forever, indicating 
> {{LeveledCompactionStrategyTask.getNextBackgroundTask}} was looping 
> indefinitely.
> What happened is that sstable references were replaced on the tracker by the 
> {{IndexSummaryRedistribution}} thread, so the 
> {{AbstractCompactionStrategy.getNextBackgroundTask}} could not create the 
> transaction with the old references, and the {{IndexSummaryRedistribution}} 
> could not update the sstable reference in the compaction strategy because 
> {{AbstractCompactionStrategy.getNextBackgroundTask}} was holding the 
> {{CompactionStrategyManager}} lock.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to