[jira] Commented: (CASSANDRA-1470) use direct io for compaction

Peter Schuller (JIRA) Fri, 10 Dec 2010 08:53:27 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970232#action_12970232
 ]


Peter Schuller commented on CASSANDRA-1470:
-------------------------------------------

@jake:

Pretty good idea to combine the two like this. It especially works if the new 
pages written can get intelligently pulled in (or rather "not dropped").

A few things:

(1) In order for DONTNEED to be effective you have to fsync() (well, fdatasync 
on Linux()) first. This will have similar performance implications as direct 
I/O (see my long post earlier on in this ticket too), but at least removes the 
need to carefully ensure writes happen in chunks (but instead fsync() frequency 
will have to be considered and traded).

(2) Remember that DONTNEED will affect the data globally for the system; 
meaning that a compaction that reads and does DONTNEED will actively active 
data from sstables being actively used. (Again see my longer post earlier in 
this issue). So you'd have to use mincore() when reading too in order to avoid 
evicting actively used data. (Note: Not doing so may be *worse* than current 
behavior, in addition to not causing an improvement, so I think this is 
important.)

But given that those are eventually addressed it seems mincore+advise seems 
like a pretty good combination.

One issue I can think of is that while mincore() gives you information in bulk 
for many pages, posix_fadvise() does not allow the equivalent. So we'd expect 
potentially quite a large number of posix_fadvise() calls assuming in-core data 
is scattered across a large file. That might be significant in some cases (e.g. 
if half of pages are in core, you may end up approaching a posix_fadvise() per 
page read).


> use direct io for compaction
> ----------------------------
>
>                 Key: CASSANDRA-1470
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1470
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 0.7.1
>
>         Attachments: 1470-v2.txt, 1470.txt, CASSANDRA-1470-for-0.6.patch, 
> CASSANDRA-1470-v10-for-0.7.patch, CASSANDRA-1470-v11-for-0.7.patch, 
> CASSANDRA-1470-v12-0.7.patch, CASSANDRA-1470-v2.patch, 
> CASSANDRA-1470-v3-0.7-with-LastErrorException-support.patch, 
> CASSANDRA-1470-v4-for-0.7.patch, CASSANDRA-1470-v5-for-0.7.patch, 
> CASSANDRA-1470-v6-for-0.7.patch, CASSANDRA-1470-v7-for-0.7.patch, 
> CASSANDRA-1470-v8-for-0.7.patch, CASSANDRA-1470-v9-for-0.7.patch, 
> CASSANDRA-1470.patch, 
> use.DirectIORandomAccessFile.for.commitlog.against.1022235.patch
>
>
> When compaction scans through a group of sstables, it forces the data in the 
> os buffer cache being used for hot reads, which can have a dramatic negative 
> effect on performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1470) use direct io for compaction

Reply via email to