[
https://issues.apache.org/jira/browse/KUDU-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346980#comment-15346980
]
Todd Lipcon commented on KUDU-1495:
-----------------------------------
Seems like the issue is this block:
{code}
while (iter->first->running_ > 0) {
op->cond_->Wait();
iter = ops_.find(op);
CHECK(iter != ops_.end()) << "Tried to unregister " << op->name()
<< ", but another thread unregistered it while we were "
<< "waiting for it to complete";
}
{code}
we have 4 running compactions, so each time one finishes, it drops to 3, the op
doesn't get unregistered, and another compaction gets scheduled.
Seems like we should either (a) not mark compaction as runnable on a QUIESCING
tablet, or (b) have the maintenace manager 'unregister' wait for running ops
before unregistering them (or somehow mark them unrunnable while waiting)
> Deleted tablets may not quiesce maintenance operations in a timely fashion
> --------------------------------------------------------------------------
>
> Key: KUDU-1495
> URL: https://issues.apache.org/jira/browse/KUDU-1495
> Project: Kudu
> Issue Type: Bug
> Components: tserver
> Affects Versions: 0.9.0
> Reporter: Todd Lipcon
>
> With multiple maintenance manager threads, if a tablet is very
> under-compacted, and you delete the tablet, it will get stuck in 'QUIESCING'
> state for quite some time. It seems like new maintenance operations will
> still start on that tablet even though it is in 'quiescing' state. This can
> cause a tablet to remain for quite some time running compactions even after
> its table has been deleted.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)