Re: Initdb-time block size specification

2023-09-05 Thread Hannu Krosing
Sure, I was just hoping that somebody already knew without needing to specifically check :) And as I see in David's response, the tests are actually broken for other sizes. I'll see if I can (convince somebody to) set this up . Cheers Hannu On Tue, Sep 5, 2023 at 10:23 PM Andres Freund wrote:

Re: Initdb-time block size specification

2023-09-05 Thread David Christensen
On Tue, Sep 5, 2023 at 2:52 PM Hannu Krosing wrote: > > Something I also asked at this years Unconference - Do we currently > have Build Farm animals testing with different page sizes ? > > I'd say that testing all sizes from 4KB up (so 4, 8, 16, 32) should be > done at least before each release

Re: Initdb-time block size specification

2023-09-05 Thread Andres Freund
Hi, On 2023-09-05 21:52:18 +0200, Hannu Krosing wrote: > Something I also asked at this years Unconference - Do we currently > have Build Farm animals testing with different page sizes ? You can check that yourself as easily as anybody else. Greetings, Andres Freund

Re: Initdb-time block size specification

2023-09-05 Thread Hannu Krosing
Something I also asked at this years Unconference - Do we currently have Build Farm animals testing with different page sizes ? I'd say that testing all sizes from 4KB up (so 4, 8, 16, 32) should be done at least before each release if not continuously. -- Cheers Hannu On Tue, Sep 5, 2023 at

Re: Initdb-time block size specification

2023-09-05 Thread David Christensen
On Tue, Sep 5, 2023 at 9:04 AM Robert Haas wrote: > > On Sat, Sep 2, 2023 at 3:09 PM Tomas Vondra > wrote: > > Except that no one is forcing you to actually go 130mph or 32mph, right? > > You make it seem like this patch forces people to use some other page > > size, but that's clearly not what

Re: Initdb-time block size specification

2023-09-05 Thread Robert Haas
On Sat, Sep 2, 2023 at 3:09 PM Tomas Vondra wrote: > Except that no one is forcing you to actually go 130mph or 32mph, right? > You make it seem like this patch forces people to use some other page > size, but that's clearly not what it's doing - it gives you the option > to use smaller or larger

Re: Initdb-time block size specification

2023-09-02 Thread Tomas Vondra
On 9/1/23 16:57, Robert Haas wrote: > On Thu, Aug 31, 2023 at 2:32 PM David Christensen > wrote: >> Here's a patch atop the series which converts to 16-bit uints and >> passes regressions, but I don't consider well-vetted at this point. > > For what it's worth, my gut reaction to this patch

Re: Initdb-time block size specification

2023-09-01 Thread Robert Haas
On Thu, Aug 31, 2023 at 2:32 PM David Christensen wrote: > Here's a patch atop the series which converts to 16-bit uints and > passes regressions, but I don't consider well-vetted at this point. For what it's worth, my gut reaction to this patch series is similar to that of Andres: I think it

Re: Initdb-time block size specification

2023-08-31 Thread David Christensen
> I was definitely hand-waving additional implementation here for > non-native 128 bit support; the modulus algorithm as presented > requires 4 times the space as the divisor, so a uint16 implementation > should work for all 64-bit machines. Certainly open to other ideas or > implementations,

Re: Initdb-time block size specification

2023-08-31 Thread David Christensen
> + * pg_fastmod - calculates the modulus of a 32-bit number against a constant > + * divisor without using the division operator > + */ > +static inline uint32 pg_fastmod(uint32 n, uint32 divisor, uint64 fastinv) > +{ > +#ifdef HAVE_INT128 > + uint64_t lowbits = fastinv * n; > + return

Re: Initdb-time block size specification

2023-08-31 Thread John Naylor
On Thu, Aug 31, 2023 at 8:51 AM David Christensen < david.christen...@crunchydata.com> wrote: > 0005 - utility functions for fast div/mod operations; basically > montgomery multiplication +/* + * pg_fastmod - calculates the modulus of a 32-bit number against a constant + * divisor without using

Re: Initdb-time block size specification

2023-08-31 Thread David Christensen
Enclosed are TPC-H results for 1GB shared_buffers, 64MB work_mem on a 64GB laptop with SSD storage; everything else is default settings. TL;DR: unpatched version: 17.30 seconds, patched version: 17.15; there are some slight variations in runtime, but seems to be within the noise level at this

Re: Initdb-time block size specification

2023-07-04 Thread Peter Eisentraut
On 01.07.23 00:21, Tomas Vondra wrote: Right, that's the dance we do to protect against torn pages. But Andres suggested that if you have modern storage and configure it correctly, writing with 4kB pages would be atomic. So we wouldn't need to do this FPI stuff, eliminating pretty significant

Re: Initdb-time block size specification

2023-06-30 Thread Andres Freund
Hi, On 2023-06-30 16:28:59 -0500, David Christensen wrote: > On Fri, Jun 30, 2023 at 3:29 PM Andres Freund wrote: > >> And indeed. Comparing e.g. TPC-H, I see *massive* regressions. Some > >> queries > > are the same, sobut others regress by up to 70% (although more commonly > > around > >

Re: Initdb-time block size specification

2023-06-30 Thread Bruce Momjian
On Sat, Jul 1, 2023 at 01:18:34AM +0200, Tomas Vondra wrote: > What does "smartctl -a /dev/nvme0n1" say? There should be something like > this: > > Supported LBA Sizes (NSID 0x1) > Id Fmt Data Metadt Rel_Perf > 0 -4096 0 0 > 1 + 512 0 0 > > which says

Re: Initdb-time block size specification

2023-06-30 Thread Tomas Vondra
On 7/1/23 01:16, Bruce Momjian wrote: > On Fri, Jun 30, 2023 at 04:04:57PM -0700, Andres Freund wrote: >> Hi, >> >> On 2023-06-30 18:58:20 -0400, Bruce Momjian wrote: [1] On linux I think you need to use stat() to figure out the st_dev for a file, then look in /proc/self/mountinfo for

Re: Initdb-time block size specification

2023-06-30 Thread Bruce Momjian
On Fri, Jun 30, 2023 at 04:04:57PM -0700, Andres Freund wrote: > Hi, > > On 2023-06-30 18:58:20 -0400, Bruce Momjian wrote: > > > [1] On linux I think you need to use stat() to figure out the st_dev for a > > > file, then look in /proc/self/mountinfo for the block device, use the name > > > of

Re: Initdb-time block size specification

2023-06-30 Thread Tomas Vondra
On 7/1/23 00:59, Andres Freund wrote: > On 2023-06-30 18:37:39 -0400, Bruce Momjian wrote: >> On Sat, Jul 1, 2023 at 12:21:03AM +0200, Tomas Vondra wrote: >>> On 6/30/23 23:53, Bruce Momjian wrote: For a 4kB write, to say it is not partially written would be to require the operating

Re: Initdb-time block size specification

2023-06-30 Thread Andres Freund
Hi, On 2023-06-30 18:58:20 -0400, Bruce Momjian wrote: > > [1] On linux I think you need to use stat() to figure out the st_dev for a > > file, then look in /proc/self/mountinfo for the block device, use the name > > of the file to look in /sys/block/$d/queue/physical_block_size. > > I just got a

Re: Initdb-time block size specification

2023-06-30 Thread Bruce Momjian
On Fri, Jun 30, 2023 at 06:58:20PM -0400, Bruce Momjian wrote: > I just got a new server: > > https://momjian.us/main/blogs/blog/2023.html#June_28_2023 > > so tested this on my new M.2 NVME storage device: > > $ /sys/block/nvme0n1/queue/physical_block_size > 262144 > > that's

Re: Initdb-time block size specification

2023-06-30 Thread Andres Freund
On 2023-06-30 18:37:39 -0400, Bruce Momjian wrote: > On Sat, Jul 1, 2023 at 12:21:03AM +0200, Tomas Vondra wrote: > > On 6/30/23 23:53, Bruce Momjian wrote: > > > For a 4kB write, to say it is not partially written would be to require > > > the operating system to guarantee that the 4kB write is

Re: Initdb-time block size specification

2023-06-30 Thread Bruce Momjian
On Fri, Jun 30, 2023 at 03:51:18PM -0700, Andres Freund wrote: > > For a 4kB write, to say it is not partially written would be to require > > the operating system to guarantee that the 4kB write is not split into > > smaller writes which might each be atomic because smaller atomic writes > >

Re: Initdb-time block size specification

2023-06-30 Thread Tomas Vondra
On 7/1/23 00:05, Andres Freund wrote: > Hi, > > On 2023-06-30 23:27:45 +0200, Tomas Vondra wrote: >> On 6/30/23 23:11, Andres Freund wrote: >>> ... >>> >>> If we really wanted to do this - but I don't think we do - I'd argue for >>> working on the buildsystem support to build the postgres binary

Re: Initdb-time block size specification

2023-06-30 Thread Andres Freund
Hi, On 2023-06-30 17:53:34 -0400, Bruce Momjian wrote: > On Fri, Jun 30, 2023 at 11:42:30PM +0200, Tomas Vondra wrote: > > On 6/30/23 23:11, Andres Freund wrote: > > > I suspect you're going to see more benefits from going to a *lower* > > > setting > > > than a higher one. Some practical issues

Re: Initdb-time block size specification

2023-06-30 Thread Bruce Momjian
On Sat, Jul 1, 2023 at 12:21:03AM +0200, Tomas Vondra wrote: > On 6/30/23 23:53, Bruce Momjian wrote: > > For a 4kB write, to say it is not partially written would be to require > > the operating system to guarantee that the 4kB write is not split into > > smaller writes which might each be

Re: Initdb-time block size specification

2023-06-30 Thread Andres Freund
Hi, On 2023-06-30 23:42:30 +0200, Tomas Vondra wrote: > I wonder what are the conditions/options for disabling FPI. I kinda > assume it'd apply to new drives with 4k sectors, with properly aligned > partitions etc. But I haven't seen any particularly clear confirmation > that's correct. Yea,

Re: Initdb-time block size specification

2023-06-30 Thread Tomas Vondra
On 6/30/23 23:53, Bruce Momjian wrote: > On Fri, Jun 30, 2023 at 11:42:30PM +0200, Tomas Vondra wrote: >> >> >> On 6/30/23 23:11, Andres Freund wrote: >>> Hi, >>> >>> ... >>> >>> I suspect you're going to see more benefits from going to a *lower* setting >>> than a higher one. Some practical

Re: Initdb-time block size specification

2023-06-30 Thread Andres Freund
Hi, On 2023-06-30 23:27:45 +0200, Tomas Vondra wrote: > On 6/30/23 23:11, Andres Freund wrote: > > ... > > > > If we really wanted to do this - but I don't think we do - I'd argue for > > working on the buildsystem support to build the postgres binary multiple > > times, for 4, 8, 16 kB BLCKSZ

Re: Initdb-time block size specification

2023-06-30 Thread Bruce Momjian
On Fri, Jun 30, 2023 at 11:42:30PM +0200, Tomas Vondra wrote: > > > On 6/30/23 23:11, Andres Freund wrote: > > Hi, > > > > ... > > > > I suspect you're going to see more benefits from going to a *lower* setting > > than a higher one. Some practical issues aside, plenty of storage hardware > >

Re: Initdb-time block size specification

2023-06-30 Thread Tomas Vondra
On 6/30/23 23:11, Andres Freund wrote: > Hi, > > ... > > I suspect you're going to see more benefits from going to a *lower* setting > than a higher one. Some practical issues aside, plenty of storage hardware > these days would allow to get rid of FPIs if you go to 4k blocks (although it >

Re: Initdb-time block size specification

2023-06-30 Thread David Christensen
On Fri, Jun 30, 2023 at 4:17 PM Andres Freund wrote: > > Hi, > > On 2023-06-30 15:05:54 -0500, David Christensen wrote: > > > I am fairly certain this is going to be causing substantial performance > > > regressions. I think we should reject this even if we don't immediately > > > find > > >

Re: Initdb-time block size specification

2023-06-30 Thread David Christensen
On Fri, Jun 30, 2023 at 4:27 PM Tomas Vondra wrote: > On 6/30/23 23:11, Andres Freund wrote: > > ... > > > > If we really wanted to do this - but I don't think we do - I'd argue for > > working on the buildsystem support to build the postgres binary multiple > > times, for 4, 8, 16 kB BLCKSZ and

Re: Initdb-time block size specification

2023-06-30 Thread David Christensen
On Fri, Jun 30, 2023 at 4:12 PM Tomas Vondra wrote: > > > > On 6/30/23 22:05, David Christensen wrote: > > On Fri, Jun 30, 2023 at 2:39 PM Andres Freund wrote: > >> > >> ... > >> > >> Besides this, I've not really heard any convincing justification for > >> needing > >> this in the first place.

Re: Initdb-time block size specification

2023-06-30 Thread David Christensen
On Fri, Jun 30, 2023 at 3:29 PM Andres Freund wrote: >> And indeed. Comparing e.g. TPC-H, I see *massive* regressions. Some queries > are the same, sobut others regress by up to 70% (although more commonly around > 10-20%). Hmm, that is definitely not good. > That's larger than I thought,

Re: Initdb-time block size specification

2023-06-30 Thread Tomas Vondra
On 6/30/23 23:11, Andres Freund wrote: > ... > > If we really wanted to do this - but I don't think we do - I'd argue for > working on the buildsystem support to build the postgres binary multiple > times, for 4, 8, 16 kB BLCKSZ and having a wrapper postgres binary that just > exec's the

Re: Initdb-time block size specification

2023-06-30 Thread Andres Freund
Hi, On 2023-06-30 15:05:54 -0500, David Christensen wrote: > > I am fairly certain this is going to be causing substantial performance > > regressions. I think we should reject this even if we don't immediately > > find > > them, because it's almost guaranteed to cause some. > > What would be

Re: Initdb-time block size specification

2023-06-30 Thread Tomas Vondra
On 6/30/23 22:05, David Christensen wrote: > On Fri, Jun 30, 2023 at 2:39 PM Andres Freund wrote: >> >> ... >> >> Besides this, I've not really heard any convincing justification for needing >> this in the first place. > > Doing this would open up experiments in larger block sizes, so we >

Re: Initdb-time block size specification

2023-06-30 Thread Andres Freund
Hi, On 2023-06-30 14:09:55 -0500, David Christensen wrote: > On Fri, Jun 30, 2023 at 1:14 PM Tomas Vondra > > I wonder how to best evaluate/benchmark this. AFAICS what we want to > > measure is the extra cost of making the values dynamic (which means the > > compiler can't just optimize them

Re: Initdb-time block size specification

2023-06-30 Thread David Christensen
On Fri, Jun 30, 2023 at 2:39 PM Andres Freund wrote: > > Hi, > > On 2023-06-30 12:35:09 -0500, David Christensen wrote: > > As discussed somewhat at PgCon, enclosed is v1 of a patch to provide > > variable block sizes; basically instead of BLCKSZ being a compile-time > > constant, a single set of

Re: Initdb-time block size specification

2023-06-30 Thread Andres Freund
Hi, On 2023-06-30 12:35:09 -0500, David Christensen wrote: > As discussed somewhat at PgCon, enclosed is v1 of a patch to provide > variable block sizes; basically instead of BLCKSZ being a compile-time > constant, a single set of binaries can support all of the block sizes > Pg can support,

Re: Initdb-time block size specification

2023-06-30 Thread David Christensen
On Fri, Jun 30, 2023 at 1:14 PM Tomas Vondra wrote: > Do we really want to prefix the option with CLUSTER_? That seems to just > add a lot of noise into the patch, and I don't see much value in this > rename. I'd prefer keeping BLCKSZ and tweak just the couple places that > need "default" to use

Re: Initdb-time block size specification

2023-06-30 Thread Tomas Vondra
On 6/30/23 19:35, David Christensen wrote: > Hi, > > As discussed somewhat at PgCon, enclosed is v1 of a patch to provide > variable block sizes; basically instead of BLCKSZ being a compile-time > constant, a single set of binaries can support all of the block sizes > Pg can support, using the