On Monday 29 August 2011 at 15:27 Adriano dos Santos Fernandes wrote:

> > 
> > The default for ext3/4 is data=ordered  and, from the kernel docs:
> I read divergent things about this. Some says that ext3 default changed
> to writeback, others says it depends from a kernel configure option.

I haven't updated my kernel documentation since May 2010 but it seems 
consistent  with the info in these links:

 http://www.mjmwired.net/kernel/Documentation/filesystems/ext3.txt
 http://www.mjmwired.net/kernel/Documentation/filesystems/ext4.txt

Both say that the kernel default is data=ordered. However, distros can change 
this, so it is important to double check. The simplest way to do that is with:

  cat /proc/mounts


> >    "All data are forced directly out to the main file system prior to its
> > metadata being committed to the journal."
> > 
> > So presumably as long as data=ordered then a barrier flush will always
> > imply that all data is written to disc.
> 
> Does that means that FW=ON (i.e., O_SYNC mode) doesn't guarantee that a
> commit reported as succeeded may really succeed if a fast power loss
> happens and the hard disk has a non-battery based cache and barriers are
> disabled?

That is how I understand it. Each level seems to play smoke and mirrors. If 
just one level does asynchronous writes then all timings will be wrong and 
there is a risk to data integrity.

The levels are 

Application - we can set FW=ON or OFF. If ON the we are saying write 
everything to disc immediately. If FW=OFF then we see a massive performance 
gain on small test runs (especially if super* is used.)

Filesystem - ext3 (and others) are mounted async by default (at least for 
opensuse). I've done disc i/o tests with the partition mounted async that show 
anomalies for disc iops. The only way to remove the anomalies was with 
mounting with sync. I had previously mounted with barrier=1 but that was 
insufficent. (Of course FW=ON).

Disc drive - Modern consumer drives are shipped with write cache = on. In 
theory the capacitors store sufficient energy to flush the cache to disc in 
event of power failure. Either way, if write caching is on then test results 
will be skewed. If the cache is not saturated then tests will appear to be 
quick (but data not actually written to disk). If the cache is saturated then 
results for test B will be distorted by the delayed writes from test A.


> 
> And considering that O_SYNC and barrier are on, does it implies that any
> page write will make the metadata flush, or something else must be done?
> 

Hmmm. I think I've answered that in the section above. One thing is for sure - 
you know that the writes are synchronous when the performance drops massively 
:-)


Paul
-- 
Paul Reeves
http://www.ibphoenix.com
Specialists in Firebird support

------------------------------------------------------------------------------
EMC VNX: the world's simplest storage, starting under $10K
The only unified storage solution that offers unified management 
Up to 160% more powerful than alternatives and 25% more efficient. 
Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to