Greg Smith a écrit :
Rodger Donaldson wrote:
Cyril Scetbon wrote:
Does anyone know what can be the differences between linux kernels
2.6.29 and 2.6.30 that can cause this big difference (TPS x 7 !)
http://www.phoronix.com/scan.php?page=article&item=linux_2624_2633&num=2
http://www.csamuel.org/2009/04/11/default-ext3-mode-changing-in-2630
Yeah, I realized I answered the wrong question--Cyril wanted to know
"why was 2.6.30 so much faster?", not "why did 2.6.33 get so much
slower?", which is what I was focusing on. There's a good intro to
what happened to speed up 2.6.30 at http://lwn.net/Articles/328363/ ,
with the short version being "the kernel stopped caring about data
integrity at all in 2.6.30 by switching to writeback as its default".
The give you an idea how wacky this is, less than a year ago Linus
himself was ranting about how terrible that specific implementation
was: http://lkml.org/lkml/2009/3/24/415
http://lkml.org/lkml/2009/3/24/460 and making it the default exposes a
regression to bad behavior to everyone who upgrades to a newer kernel.
I'm just patiently waiting for Chris Mason (who works for Oracle--they
care about doing the right thing here too) to replace Ted Tso as the
person driving filesystem development in Linux land. That his
"data=guarded" implementation was only partially merged into 2.6.30,
and instead combined with this awful default change, speaks volumes
about how far the Linux development priorities are out of sync (pun
intended) with what database users expect. See
http://www.h-online.com/open/news/item/Kernel-Log-What-s-coming-in-2-6-30-File-systems-New-and-revamped-file-systems-741319.html
for a summary on how that drama played out. I let out a howling laugh
when reading this was because "The rest have been put on hold, with
the development cycle already entering the stabilisation phase."
Linux kernel development hasn't had a stabilization phase in years.
It's interesting that we have pgbench available as a lens to watch all
this through, because in its TPC-B-like default mode it has an
interesting property: if performance on regular hardware gets too
fast, it means data integrity must be broken, because regular drives
can't do physical commits very often. What Phoronix should be doing
is testing simple fsync rate using something like sysbench first[1],
and if those numbers come back higher than disk RPM rate declare the
combination unusable for PostgreSQL purposes rather than reporting on
the fake numbers.
[1]
http://www.westnet.com/~gsmith/content/postgresql/pg-benchmarking.pdf
, page 26
Thank you Greg, that was exactly the answer I was waiting for. Everyone
should take care about the changes made when such surprising numbers are
provided !
Regards
--
Cyril SCETBON
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general