[ 
https://issues.apache.org/jira/browse/CASSANDRA-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12975274#action_12975274
 ] 

Jonathan Ellis commented on CASSANDRA-809:
------------------------------------------

Updating for the past 10 months' worth of changes:

bq. Our node that hit this condition is essentially dead (its not gossiping or 
accepting any writes or reads, but is still alive).

This is basically fixed now that flow control is implemented (CASSANDRA-685) 
and refined (CASSANDRA-1358).

bq. It appears that there are n threads for n data directories that we flush 
to, but they're not dedicated to a data directory. We should have a thread per 
data directory and have that thread dedicated to that directory 

At least until we cap sstable size (CASSANDRA-1608?), one data volume is going 
to be the recommended configuration, so this is low priority.

bq. if a disk fills up, we stop trying to write to it
bq. if we're about to write more data to a disk than space available, we don't 
try and write to that disk

these two Cassandra has always done on compaction.  less sure about flush.

the nice thing about writes is that erroring out is almost identical to being 
completely down for ConsistencyLevel purposes.

bq. we balance data relatively evenly between disks

also low priority given the above.

bq. if a disk is misbehaving for a period of time, we stop using it and assume 
that data is lost (potentially notify an operator as well)

this is the biggest problem right now: if a disk/volume goes down, the rest of 
the node (in particular gossip) will keep functioning, so other nodes will 
continue trying to read from it.

short term the best fix for this is to provide timeout information to the 
dynamic snitch (CASSANDRA-1905) so it can route around such nodes.

> Full disk can result in being marked down
> -----------------------------------------
>
>                 Key: CASSANDRA-809
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-809
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ryan King
>             Fix For: 0.8
>
>
> We had a node file up the disk under one of two data directories. The result 
> was that the node stopped making progress. The problem appears to be this 
> (I'll update with more details as we find them):
> When new tasks are put onto most queues in Cassandra, if there isn't a thread 
> in the pool to handle the task immediately, the task in run in the caller's 
> thread
> (org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor:69 sets the 
> caller-runs policy).  The queue in question here is the queue that manages 
> flushes, which is enqueued to from various places in our code (and therefore 
> likely from multiple threads). Assuming that the full disk meant that no 
> threads doing flushing could make progress (it appears that way) eventually 
> any thread that calls the flush code would become stalled.
> Assuming our analysis is right (and we're still looking into it) we need to 
> make a change. Here's a proposal so far:
> SHORT TERM:
> * change the  TheadPoolExecutor policy to not be caller runs. This will let 
> other threads make progress in the event that one pool is stalled
> LONG TERM
> * It appears that there are n threads for n data directories that we flush 
> to, but they're not dedicated to a data directory. We should have a thread 
> per data directory and have that thread dedicated to that directory
> * Perhaps we could use the failure detector on disks?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to