[ 
https://issues.apache.org/jira/browse/CASSANDRA-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137426#comment-13137426
 ] 

Peter Schuller commented on CASSANDRA-3248:
-------------------------------------------

XFS should detect write barrier support and cause fsync() to actually penetrate 
the cache (unless the SATA drive is lying about flushing it's cache). 
Interesting that you seemed to be getting caching behavior still. LVM or 
anything in between that breaks write barriers?

(Not truly relevant to the test, but might be a relevant data point to see a 
case in practice where write barriers aren't working when they are expected to.)

                
> CommitLog writer should call fdatasync instead of fsync
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3248
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3248
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.6.13, 0.7.9, 0.8.6, 1.0.0, 1.1
>         Environment: Linux
>            Reporter: Zhu Han
>            Assignee: Brandon Williams
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> CommitLogSegment uses SequentialWriter to flush the buffered data to log 
> device. It depends on FileDescriptor#sync() which invokes fsync() as it force 
> the file attributes to disk.
> However, at least on Linux, fdatasync() is good enough for commit log flush:
> bq. fdatasync() is similar to fsync(), but does not flush modified metadata 
> unless that metadata is needed in order to allow a subsequent data retrieval 
> to be  correctly handled.  For example, changes to st_atime or st_mtime 
> (respectively, time of last access and time of last modification; see 
> stat(2)) do not require flushing because they are not necessary for a 
> subsequent data read to be handled correctly.  On the other hand, a change to 
> the file size (st_size,  as  made  by  say  ftruncate(2)),  would require a 
> metadata flush.
> File size is synced to disk by fdatasync() either. Although the commit log 
> recovery logic sorts the commit log segements on their modify timestamp, it 
> can be removed safely, IMHO.
> I checked the native code of JRE 6. On Linux and Solaris, 
> FileChannel#force(false) invokes fdatasync(). On windows, the false flag does 
> not have any impact.
> On my log device (commodity SATA HDD, write cache disabled), there is large 
> performance gap between fsync() and fdatasync():
> {quote}
> $sysbench --test=fileio --num-threads=1  --file-num=1 --file-total-size=10G 
> --file-fsync-all=on --file-fsync-mode={color:red}fdatasync{color} 
> --file-test-mode=seqwr --max-time=600 --file-block-size=2K  --max-requests=0 
> run
> {color:blue}54.90{color} Requests/sec executed
>    per-request statistics:
>          min:                                  8.29ms
>          avg:                                 18.18ms
>          max:                                108.36ms
>          approx.  95 percentile:              25.02ms
> $ sysbench --test=fileio --num-threads=1  --file-num=1 --file-total-size=10G 
> --file-fsync-all=on --file-fsync-mode={color:red}fsync{color} 
> --file-test-mode=seqwr --max-time=600 --file-block-size=2K  --max-requests=0 
> run
> {color:blue}28.08{color} Requests/sec executed
>     per-request statistics:
>          min:                                 33.28ms
>          avg:                                 35.61ms
>          max:                                911.87ms
>          approx.  95 percentile:              41.69ms
> {quote}
> I do think this is a very critical performance improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to