Do you want me to try PG 16 without ICU or PG 15 with ICU? I can do that, but it will take a few days before the server is available.
On Mon, May 29, 2023 at 9:55 AM Peter Geoghegan <p...@bowt.ie> wrote: > On Sun, May 28, 2023 at 2:42 PM David Rowley <dgrowle...@gmail.com> wrote: > > c6e0fe1f2 might have helped improve some of that performance, but I > > suspect there must be something else as ~3x seems much more than I'd > > expect from reducing the memory overheads. Testing versions before > > and after that commit might give a better indication. > > I'm virtually certain that this is due to the change in default > collation provider, from libc to ICU. Mostly due to the fact that ICU > is capable of using abbreviated keys, and the system libc isn't > (unless you go out of your way to define TRUST_STRXFRM when building > Postgres). > > Many individual test cases involving larger non-C collation text sorts > showed similar improvements back when I worked on this. Offhand, I > believe that 3x - 3.5x improvements in execution times were common > with high entropy abbreviated keys on high cardinality input columns > at that time (this was with glibc). Low cardinality inputs were more > like 2.5x. > > I believe that ICU is faster than glibc in general -- even with > TRUST_STRXFRM enabled. But the TRUST_STRXFRM thing is bound to be the > most important factor here, by far. > > -- > Peter Geoghegan > -- Mark Callaghan mdcal...@gmail.com