Jan, I believe the block device (vs. filesystem) OSD layout is addressed in the Newstore/Bluestore:
http://tracker.ceph.com/projects/ceph/wiki/NewStore_(new_osd_backend) -- Alex Gorbachev Storcium On Thu, Jan 28, 2016 at 4:32 PM, Jan Schermer <j...@schermer.cz> wrote: > You can't run Ceph OSD without a journal. The journal is always there. > If you don't have a journal partition then there's a "journal" file on the > OSD filesystem that does the same thing. If it's a partition then this file > turns into a symlink. > > You will always be better off with a journal on a separate partition > because of the way writeback cache in linux works (someone correct me if > I'm wrong). > The journal needs to flush to disk quite often, and linux is not always > able to flush only the journal data. You can't defer metadata flushing > forever and also doing fsync() makes all the dirty data flush as well. > ext2/3/4 also flushes data to the filesystem periodicaly (5s is it I > think?) which will make the latency of the journal go through the roof > momentarily. > (I'll leave researching how exactly XFS does it to those who care about > that "filesystem'o'thing"). > > P.S. I feel very strongly that this whole concept is broken fundamentaly. > We already have a journal for the filesystem which is time proven, well > behaved and above all fast. Instead there's this reinvented wheel which > supposedly does it better in userspace while not really avoiding the > filesystem journal either. It would maybe make sense if OSD was storing the > data on a block device directly, avoiding the filesystem altogether. But it > would still do the same bloody thing and (no disrespect) ext4 does this > better than Ceph ever will. > > > On 28 Jan 2016, at 20:01, Tyler Bishop <tyler.bis...@beyondhosting.net> > wrote: > > This is an interesting topic that i've been waiting for. > > Right now we run the journal as a partition on the data disk. I've build > drives without journals and the write performance seems okay but random io > performance is poor in comparison to what it should be. > > > [image: http://static.beyondhosting.net/img/bh-small.png] > > *Tyler Bishop *Chief Technical Officer > 513-299-7108 x10 > tyler.bis...@beyondhosting.net <tyler.bis...@beyondhosting.net> > If you are not the intended recipient of this transmission you are > notified that disclosing, copying, distributing or taking any action in > reliance on the contents of this information is strictly prohibited. > > > > ------------------------------ > *From: *"Bill WONG" <wongahsh...@gmail.com> > *To: *"ceph-users" <ceph-users@lists.ceph.com> > *Sent: *Thursday, January 28, 2016 1:36:01 PM > *Subject: *[ceph-users] SSD Journal > > Hi, > i have tested with SSD Journal with SATA, it works perfectly.. now, i am > testing with full SSD ceph cluster, now with full SSD ceph cluster, do i > still need to have SSD as journal disk? > > [ assumed i do not have PCIe SSD Flash which is better performance than > normal SSD disk] > > please give some ideas on full ssd ceph cluster ... thank you! > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com