Hi, On 2014-07-26 12:50:30 +0200, Fabien COELHO wrote: > >>The default blocksize is currently 8k, which is not necessary optimal for > >>all setup, especially with SSDs where the latency is much lower than HDD. > > > >I don't think that really follows. > > The rationale, which may be proven false, is that with a SSD the latency > penalty for reading and writing randomly vs sequentially is much lower than > for HDD, so there is less insentive to group stuff in larger chunks on that > account.
A higher number of blocks has overhead unrelated to this though: Increased waste/lower storage density as it gets more frequently that tuples don't fit into a page; more locks; higher number of buffer headers; more toasted rows; smaller toast chunks; more vacuuming/heap pruning WAL records, ... Now obviously there's also a inverse to this, otherwise we'd all be using 1GB page sizes. But I don't think storage latency has much to do with it - it's imo more about write amplification (i.e. turning a single row update into a 8/4/16/32 kb write). > >>There is a case for different values with significant impact on performance > >>(up to a not-to-be-sneezed-at 10% on a pgbench run on SSD, see > >>http://www.cybertec.at/postgresql-block-sizes-getting-started/), and ISTM > >>that the ability to align PostgreSQL block size to the underlying FS/HW > >>block size would be nice. > > > >I don't think that benchmark is very meaningful. Way too small scale, way > >to short runtime (there'll be barely any checkpoints, hot pruning, vacuum > >at all). > > These benchs have the merit to exist, to be consistent (the smaller the > blocksize, the better the performance), and ISTM that the performance > results suggest that this is worth investigating. Well, it's easy to make claims that aren't meaningful with bad benchmarks. Those numbers are *far* too low for the presented SSD - invalidating the entire thing. That's the speed you'd expect for rotating media, not an SSD. My laptop has the 1TB variant of that disk and I get nearly 10 that number of TPS. With a parallel parallel make running, a profiler started, and assertions enabled. This isn't an actual benchmark, sorry. It's SEO. > Possibly the "small" scale means that data fit in memory, so the benchmarks > as run emphasize write performance linked to the INSERT/UPDATE. Well, the generated data is 160MB in size. Nobody with a concurrent write heavy OLTP load has that little data. > What would you suggest as meaningful for scale and run time, say on a > dual-core 8GB memory 256GB SSD laptop? At the very least scale hundred - then it likely doesn't fit into internal caches on common consumer drives anymore. But more importantly the test has to run over several checkpoint cycles, so hot pruning and vacuuming are also measured. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers