Hi,
I've been thinking about this for a while now - does Ceph really need a 
journal? Filesystems are already pretty good at committing data to disk when 
asked (and much faster too), we have external journals in XFS and Ext4...
In a scenario where client does an ordinary write, there's no need to flush it 
anywhere (the app didn't ask for it) so it ends up in pagecache and gets 
committed eventually.
If a client asks for the data to be flushed then fdatasync/fsync on the 
filestore object takes care of that, including ordering and stuff.
For reads, you just read from filestore (no need to differentiate between 
filestore/journal) - pagecache gives you the right version already.
 
Or is journal there to achieve some tiering for writes when the running 
spindles with SSDs? This is IMO the only thing ordinary filesystems don't do 
out of box even when filesystem journal is put on SSD - the data get flushed to 
spindle whenever fsync-ed (even with data=journal). But in reality, most of the 
data will hit the spindle either way and when you run with SSDs it will always 
be much slower. And even for tiering - there are already many options (bcache, 
flashcache or even ZFS L2ARC) that are much more performant and proven stable. 
I think the fact that people  have a need to combine Ceph with stuff like that 
already proves the point.

So a very interesting scenario would be to disable Ceph journal and at most use 
data=journal on ext4. The complexity of the data path would drop significantly, 
latencies decrease, CPU time is saved...  
I just feel that Ceph has lots of unnecessary complexity inside that duplicates 
what filesystems (and pagecache...) have been doing for a while now without 
eating most of our CPU cores - why don't we use that? Is it possible to disable 
journal completely?

Did I miss something that makes journal essential? 

Jan

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to