Yes, I was talking about client fsync. (Sorry, I was not clear in my mail)

But thanks for the informations about fsync in ceph osd,I'm understand now how 
things works.

About faking: I just want to say that if the client do a fsync, the fsync 
doesn't force the flush the disk platter but only to journal.
With zfs, I didn't have activated nocacheflush, and write was always commit to 
journal AND disk platter before ack to the client.(so latency was bad ....)

Thanks again !



----- Mail original ----- 

De: "Sage Weil" <s...@inktank.com> 
À: "Alexandre DERUMIER" <aderum...@odiso.com> 
Cc: "Sage Weil" <s...@inktank.com>, ceph-devel@vger.kernel.org 
Envoyé: Mardi 22 Mai 2012 17:17:56 
Objet: Re: write cache disabling recommendations for journal and storage disks 
? 

On Tue, 22 May 2012, Alexandre DERUMIER wrote: 
> Thanks Sage, 
> 
> yes newer kernel doesn't need barrier option since 2.6.37 if I remember. 
> (support of REQ_FLUSH/FUA) 
> 
> 
> Just to be sure: 
> 
> If client do a fsync or fdatasync, does the write will go only to 
> journal and after 30seconds is flushed to disk ? 
> 
> or does it force the write to be committed to disk ? 

Oh, are you talking about the *ceph* client doing an fsync? In that case, 
it waits for a COMMIT from the osd, which happens when all replicas have 
written to the journal (or fs, whichever is durable first). 

The OSD itself is calling fdatasync() on the journal file. 

> Does the journal fake the fsync ? (with zfs, this is the nocacheflush=1 
> system variable) 

I'm not sure what you mean by "faking" the fsync... 

sage 


> 
> 
> ----- Mail original ----- 
> 
> De: "Sage Weil" <s...@inktank.com> 
> À: "Alexandre DERUMIER" <aderum...@odiso.com> 
> Cc: ceph-devel@vger.kernel.org 
> Envoyé: Mardi 22 Mai 2012 16:41:50 
> Objet: Re: write cache disabling recommendations for journal and storage 
> disks ? 
> 
> On Tue, 22 May 2012, Alexandre DERUMIER wrote: 
> > Hi, I have some questions about disabling write cache 
> > " 
> > http://ceph.com/docs/master/config-cluster/file-system-recommendations/ 
> > 
> > Ceph aims for data safety, which means that when the application receives 
> > notice that data was written to the disk, that data was actually written to 
> > the disk. For old kernels (<2.6.33), disable the write cache if the journal 
> > is on a raw disk. Newer kernels should work fine. 
> > 
> > Use hdparm to disable write caching on the hard disk: 
> > 
> > hdparm -W 0 /dev/hda 0 
> > " 
> 
> To clarify: on newer kernels, calling fsync() or fdatasync() flushes the 
> disk's write cache, so this isn't something you need to worry about at 
> all. 
> 
> > Cache on journal disk: 
> > 
> > what happen if we have a powerfailure, if data are in cache of journal 
> > disk (ssd with/without supercapicitor) (so write is ack, but not really 
> > write on disk). 
> 
> The ack is only sent if the client requests it, and normally the client 
> does not. Which means the client didn't get the ack, and will resend the 
> request to the other replicas once the failed OSD is marked down. 
> 
> > Cache on disks storage: 
> > what happen if we have a powerfailure,if write is commited to journal, 
> > but write are in cache of storage disks and not yet on the platters ? 
> 
> On newer kernels, the file system is careful to flush the disk cache any 
> time durability matters (e.g., during a journal commit); there is no need 
> to disable it on that disk. If the write is durable in the journal, it 
> will be applied to the fs on ceph-osd restart. 
> 
> > Maybe the best way is to disable write cache on both (journal and 
> > storage disks) ? 
> 
> If you have an old kernel, disable it on the journal, and (if you're using 
> ext3) mount with -o discard. On newer kernels, I believe discard is 
> (finally) the default. 
> 
> sage 
> 
> 
> 
> -- 
> 
> -- 
> 
> 
> 
> 
> Alexandre D erumier 
> Ingénieur Système 
> Fixe : 03 20 68 88 90 
> Fax : 03 20 68 90 81 
> 45 Bvd du Général Leclerc 59100 Roubaix - France 
> 12 rue Marivaux 75002 Paris - France 
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majord...@vger.kernel.org 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
> 


-- 

-- 




        Alexandre D erumier 
Ingénieur Système 
Fixe : 03 20 68 88 90 
Fax : 03 20 68 90 81 
45 Bvd du Général Leclerc 59100 Roubaix - France 
12 rue Marivaux 75002 Paris - France 
        
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to