[ 
https://issues.apache.org/jira/browse/CASSANDRA-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970102#action_12970102
 ] 

Peter Schuller commented on CASSANDRA-1470:
-------------------------------------------

Just to clarify then; as jbellis surmised my comments where indeed based on the 
fact that writes will be synchronous. In particular, what write caching gives 
you normally is the ability to defer the actual writing such that:

(1) future writes can be colesced with past writes which in the extreme case 
translates seek-bound I/O to huge slabs of sequential I/O
(2) re-written pages aren't re-written on disk
(3) it allows the program to continue (e.g. churning CPU) without interrupting 
to wait for disk I/O
(4) It de-couples the size of individual writes the application happens to make 
from the way it gets written out to disk

Using direct I/O in the general case is difficult because there is a lot of 
logic in the kernel to implement this in a way that works generally. But with 
cassandra, we:

(1) are not concerned with re-writing pages
(2) are not concerned with mixing seek-bound and streaming I/O
(3) are specifically after writing large amounts of data and we can select when 
to flush in-memory buffers

So the problem becomes easier. But still, each direct write will essentially 
behave like a write() followed by an fsync(), with the performance implications 
that has (though not necessarily exactly; e.g. an asynchronous write() followed 
by fsync() might sit in an i/o queue waiting if the fsync() doesn't highten the 
priority of the previous write etc - depending on exact kernel behavior and 
whatnot).

As far as I know, given large chunks being written we really should be able to 
achieve similar throughputs as the background writing done by the kernel. With 
one major caveat: If the writing is single-threaded, the lack of an 
asynchronous syscall API means that the thread will not be able to keep busy 
with CPU bound activity while waiting for the actual write. So while the 
writing when it does happen really should have the potential to be efficient, 
if one does want to simultaneously be CPU bound in e.g. compaction, the writing 
would have to happen from a background thread.

However, note that the CPU waiting is not necessarily as bad is it sounds. If 
your compaction is heavily CPU bound the effect will be small in relative terms 
because very little time is spent doing the I/O anyway. If the compaction is 
heavily disk bound, you don't really care anyway since any additional time 
spent spinning CPU is just going to *lessen* negative impacts of compaction 
because it decreases the effect on live traffic.

The most significant effect should be seen when compaction is reasonably 
balanced between CPU and disk, and in the extreme case one should potentially 
see up to a halving of compaction speed in a situation without live traffic 
further delaying I/O.

I hope I'm being clear :) (And definitely do correct me if I'm overlooking 
something.) I feel a bit bad commenting all the time without actually putting 
up any code...


> use direct io for compaction
> ----------------------------
>
>                 Key: CASSANDRA-1470
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1470
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 0.7.1
>
>         Attachments: 1470-v2.txt, 1470.txt, CASSANDRA-1470-for-0.6.patch, 
> CASSANDRA-1470-v10-for-0.7.patch, CASSANDRA-1470-v11-for-0.7.patch, 
> CASSANDRA-1470-v12-0.7.patch, CASSANDRA-1470-v2.patch, 
> CASSANDRA-1470-v3-0.7-with-LastErrorException-support.patch, 
> CASSANDRA-1470-v4-for-0.7.patch, CASSANDRA-1470-v5-for-0.7.patch, 
> CASSANDRA-1470-v6-for-0.7.patch, CASSANDRA-1470-v7-for-0.7.patch, 
> CASSANDRA-1470-v8-for-0.7.patch, CASSANDRA-1470-v9-for-0.7.patch, 
> CASSANDRA-1470.patch, 
> use.DirectIORandomAccessFile.for.commitlog.against.1022235.patch
>
>
> When compaction scans through a group of sstables, it forces the data in the 
> os buffer cache being used for hot reads, which can have a dramatic negative 
> effect on performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to