Greg Smith a écrit :
Rodger Donaldson wrote:
Cyril Scetbon wrote:
Does anyone know what can be the differences between linux kernels
2.6.29 and 2.6.30 that can cause this big difference (TPS x 7 !)
http://www.phoronix.com/scan.php?page=article&item=linux_2624_2633&num=2

http://www.csamuel.org/2009/04/11/default-ext3-mode-changing-in-2630

Yeah, I realized I answered the wrong question--Cyril wanted to know "why was 2.6.30 so much faster?", not "why did 2.6.33 get so much slower?", which is what I was focusing on. There's a good intro to what happened to speed up 2.6.30 at http://lwn.net/Articles/328363/ , with the short version being "the kernel stopped caring about data integrity at all in 2.6.30 by switching to writeback as its default". The give you an idea how wacky this is, less than a year ago Linus himself was ranting about how terrible that specific implementation was: http://lkml.org/lkml/2009/3/24/415 http://lkml.org/lkml/2009/3/24/460 and making it the default exposes a regression to bad behavior to everyone who upgrades to a newer kernel.

I'm just patiently waiting for Chris Mason (who works for Oracle--they care about doing the right thing here too) to replace Ted Tso as the person driving filesystem development in Linux land. That his "data=guarded" implementation was only partially merged into 2.6.30, and instead combined with this awful default change, speaks volumes about how far the Linux development priorities are out of sync (pun intended) with what database users expect. See http://www.h-online.com/open/news/item/Kernel-Log-What-s-coming-in-2-6-30-File-systems-New-and-revamped-file-systems-741319.html for a summary on how that drama played out. I let out a howling laugh when reading this was because "The rest have been put on hold, with the development cycle already entering the stabilisation phase." Linux kernel development hasn't had a stabilization phase in years.

It's interesting that we have pgbench available as a lens to watch all this through, because in its TPC-B-like default mode it has an interesting property: if performance on regular hardware gets too fast, it means data integrity must be broken, because regular drives can't do physical commits very often. What Phoronix should be doing is testing simple fsync rate using something like sysbench first[1], and if those numbers come back higher than disk RPM rate declare the combination unusable for PostgreSQL purposes rather than reporting on the fake numbers.

[1] http://www.westnet.com/~gsmith/content/postgresql/pg-benchmarking.pdf , page 26

Thank you Greg, that was exactly the answer I was waiting for. Everyone should take care about the changes made when such surprising numbers are provided !

Regards

--
Cyril SCETBON


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to