On Tue, 1 Aug 2023 at 13:25, Andres Freund <and...@anarazel.de> wrote: > There's a lot of larger numbers in the file, which likely reduces the impact > some. And there's the overhead of actually inserting the rows into the table, > making the difference appear smaller than it is.
It might be worth special casing the first digit as we can save doing the multiplication by 10 and the overflow checks on the first digit. That should make it slightly faster to parse smaller numbers. > COPY (SELECT generate_series(1, 1000) a, 10 b, 20 c, 30 d, 40 e, 50 f FROM > generate_series(1, 10000)) TO '/tmp/lotsaints_wide.copy'; > > psql -c 'DROP TABLE IF EXISTS lotsaints_wide; CREATE UNLOGGED TABLE > lotsaints_wide(a int, b int, c int, d int, e int, f int);' && \ > pgbench -n -P1 -f <( echo "COPY lotsaints_wide FROM > '/tmp/lotsaints_wide.copy' WHERE false") -t 5 > > 15: 2992.605 > HEAD: 3325.201 > fastpath1.patch 2932.606 > fastpath2.patch 2783.915 > > Interestingly fastpath1 is slower now, even though it wasn't with earlier > patches (which still is repeatable). I do not have the foggiest as to why. I'm glad to see that. I've adjusted the patch to add the fast path for the 16 and 64-bit versions of the function. I also added the special case for processing the first digit, which looks like: /* process the first digit */ digit = (*ptr - '0'); if (likely(digit < 10)) { ptr++; tmp = digit; } /* process remaining digits */ for (;;) I tried adding the "at least 1 digit check" by adding an else { goto slow; } in the above code, but it seems to generate slower code than just checking if (unlikely(ptr == s)) { goto slow; } after the loop. I also noticed that I wasn't getting the same performance after adjusting the 16 and 64 bit versions. I assume that's down to code alignment, but unsure of that. I ended up adjusting all the "while (*ptr)" loops into "for (;;)" loops since the NUL char check is handled by the "else break;". I also removed the needless NUL char check in the isspace loops. It can't be isspace and '\0'. I also replaced the isdigit() function call and replaced it for manually checking the digit range. I see my compiler (gcc12.2) effectively generates the same code as the unsigned char fast path version checking if (digit < 10). Once I did that, I got the performance back again. With your new test with the small-sized ints, I get: REL_15_STABLE: latency average = 1696.390 ms master @ d3a38318a latency average = 1928.803 ms master + fastpath1.patch: latency average = 1634.736 ms master + fastpath2.patch: latency average = 1628.704 ms master + fastpath3.patch latency average = 1590.417 ms I see no particular reason not to go ahead with the attached patch and get this thread closed off. Any objections? David
pg_strtoint_fastpath3.patch
Description: Binary data