Yes, I was talking about client fsync. (Sorry, I was not clear in my mail) But thanks for the informations about fsync in ceph osd,I'm understand now how things works.
About faking: I just want to say that if the client do a fsync, the fsync doesn't force the flush the disk platter but only to journal. With zfs, I didn't have activated nocacheflush, and write was always commit to journal AND disk platter before ack to the client.(so latency was bad ....) Thanks again ! ----- Mail original ----- De: "Sage Weil" <s...@inktank.com> À: "Alexandre DERUMIER" <aderum...@odiso.com> Cc: "Sage Weil" <s...@inktank.com>, ceph-devel@vger.kernel.org Envoyé: Mardi 22 Mai 2012 17:17:56 Objet: Re: write cache disabling recommendations for journal and storage disks ? On Tue, 22 May 2012, Alexandre DERUMIER wrote: > Thanks Sage, > > yes newer kernel doesn't need barrier option since 2.6.37 if I remember. > (support of REQ_FLUSH/FUA) > > > Just to be sure: > > If client do a fsync or fdatasync, does the write will go only to > journal and after 30seconds is flushed to disk ? > > or does it force the write to be committed to disk ? Oh, are you talking about the *ceph* client doing an fsync? In that case, it waits for a COMMIT from the osd, which happens when all replicas have written to the journal (or fs, whichever is durable first). The OSD itself is calling fdatasync() on the journal file. > Does the journal fake the fsync ? (with zfs, this is the nocacheflush=1 > system variable) I'm not sure what you mean by "faking" the fsync... sage > > > ----- Mail original ----- > > De: "Sage Weil" <s...@inktank.com> > À: "Alexandre DERUMIER" <aderum...@odiso.com> > Cc: ceph-devel@vger.kernel.org > Envoyé: Mardi 22 Mai 2012 16:41:50 > Objet: Re: write cache disabling recommendations for journal and storage > disks ? > > On Tue, 22 May 2012, Alexandre DERUMIER wrote: > > Hi, I have some questions about disabling write cache > > " > > http://ceph.com/docs/master/config-cluster/file-system-recommendations/ > > > > Ceph aims for data safety, which means that when the application receives > > notice that data was written to the disk, that data was actually written to > > the disk. For old kernels (<2.6.33), disable the write cache if the journal > > is on a raw disk. Newer kernels should work fine. > > > > Use hdparm to disable write caching on the hard disk: > > > > hdparm -W 0 /dev/hda 0 > > " > > To clarify: on newer kernels, calling fsync() or fdatasync() flushes the > disk's write cache, so this isn't something you need to worry about at > all. > > > Cache on journal disk: > > > > what happen if we have a powerfailure, if data are in cache of journal > > disk (ssd with/without supercapicitor) (so write is ack, but not really > > write on disk). > > The ack is only sent if the client requests it, and normally the client > does not. Which means the client didn't get the ack, and will resend the > request to the other replicas once the failed OSD is marked down. > > > Cache on disks storage: > > what happen if we have a powerfailure,if write is commited to journal, > > but write are in cache of storage disks and not yet on the platters ? > > On newer kernels, the file system is careful to flush the disk cache any > time durability matters (e.g., during a journal commit); there is no need > to disable it on that disk. If the write is durable in the journal, it > will be applied to the fs on ceph-osd restart. > > > Maybe the best way is to disable write cache on both (journal and > > storage disks) ? > > If you have an old kernel, disable it on the journal, and (if you're using > ext3) mount with -o discard. On newer kernels, I believe discard is > (finally) the default. > > sage > > > > -- > > -- > > > > > Alexandre D erumier > Ingénieur Système > Fixe : 03 20 68 88 90 > Fax : 03 20 68 90 81 > 45 Bvd du Général Leclerc 59100 Roubaix - France > 12 rue Marivaux 75002 Paris - France > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- -- Alexandre D erumier Ingénieur Système Fixe : 03 20 68 88 90 Fax : 03 20 68 90 81 45 Bvd du Général Leclerc 59100 Roubaix - France 12 rue Marivaux 75002 Paris - France -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html