Hi, On 2023-07-25 23:37:08 +1200, David Rowley wrote: > On Tue, 25 Jul 2023 at 17:34, Andres Freund <and...@anarazel.de> wrote: > I've not really studied the fix_COPY_DEFAULT.patch patch. Is there a > reason to delay committing that? It would be good to eliminate that > as a variable for the current performance regression.
Yea, I don't think there's a reason to hold off on that. Sawada-san, do you want to commit it? Or Andrew? > > prep: > > COPY (SELECT generate_series(1, 2000000) a, (random() * 100000 - > > 50000)::int b, 3243423 c) TO '/tmp/lotsaints.copy'; > > DROP TABLE lotsaints; CREATE UNLOGGED TABLE lotsaints(a int, b int, c int); > > > > benchmark: > > psql -qX -c 'truncate lotsaints' && pgbench -n -P1 -f <( echo "COPY > > lotsaints FROM '/tmp/lotsaints.copy';") -t 15 > > > > I disabled turbo mode, pinned the server to a single core of my Xeon Gold > > 5215: > > > > HEAD: 812.690 > > > > your patch: 821.354 > > > > strtoint from 8692f6644e7: 824.543 > > > > strtoint from 6b423ec677d^: 806.678 > > I'm surprised to see the imul version is faster. It's certainly not > what we found when working on 6b423ec67. What CPUs did you test it on? I'd not be surprised if this were heavily dependent on the microarchitecture. > I've been fooling around a bit to try to learn what might be going on. > I wrote 2 patches: > > 1) pg_strtoint_imul.patch: Effectively reverts 6b423ec67 and puts the > code how it likely would have looked had I not done that work at all. > (I've assumed that the hex, octal, binary parsing would have been > added using the overflow multiplication functions just as the base 10 > parsing). > 2) pg_strtoint_optimize.patch: I've made another pass over the > functions with the current overflow checks and optimized the digit > checking code so that it can be done in a single check like: if (digit > < 10). This can be done by assigning the result of *ptr - '0' to an > unsigned char. Anything less than '0' will wrap around and anything > above '9' will remain so. I've also removed a few inefficiencies with > the isspace checking. We didn't need to do "while (*ptr && > isspace(*ptr)). It's fine just to do while (isspace(*ptr)) since '\0' > isn't a space char. I also got rid of the isxdigit call. The > hexlookup array contains -1 for non-hex chars. We may as well check > the digit value is >= 0. > > Here are the results I get using your test as quoted above: > > master @ 62e9af4c63f + fix_COPY_DEFAULT.patch > > latency average = 568.102 ms > > master @ 62e9af4c63f + fix_COPY_DEFAULT.patch + pg_strtoint_optimize.patch > > latency average = 531.238 ms > > master @ 62e9af4c63f + fix_COPY_DEFAULT.patch + pg_strtoint_imul.patch > > latency average = 559.333 ms (surprisingly also faster on my machine) > > The optimized version of the pg_strtoint functions wins over the imul > patch. Could you test to see if this is also the case in your Xeon > machine? (these are the numbers with turbo disabled, if I enable it they're all in the 6xx ms range, but the variance is higher) fix_COPY_DEFAULT.patch 774.344 fix_COPY_DEFAULT.patch + pg_strtoint32_base_10_first.v2.patch 777.673 fix_COPY_DEFAULT.patch + pg_strtoint_optimize.patch 777.545 fix_COPY_DEFAULT.patch + pg_strtoint_imul.patch 795.298 fix_COPY_DEFAULT.patch + pg_strtoint_imul.patch + likely(isdigit()) 773.477 fix_COPY_DEFAULT.patch + pg_strtoint32_base_10_first.v2.patch + pg_strtoint_imul.patch 774.443 fix_COPY_DEFAULT.patch + pg_strtoint32_base_10_first.v2.patch + pg_strtoint_imul.patch + likely(isdigit()) 774.513 fix_COPY_DEFAULT.patch + pg_strtoint32_base_10_first.v2.patch + pg_strtoint_imul.patch + likely(isdigit()) + unlikely(*ptr == '_') 763.879 One idea I had was to add a fastpath that won't parse all strings, but will parse the strings that we would generate, and fall back to the more general variant if it fails. See the attached, rough, prototype: fix_COPY_DEFAULT.patch + fastpath.patch: 746.971 fix_COPY_DEFAULT.patch + fastpath.patch + isdigit.patch: 715.570 Now, the precise contents of this fastpath are not yet clear (wrt imul or not), but I think the idea has promise. > > (when I say strtoint from, I did not replace the goto labels, so that part > > is > > unchanged and unrelated) > > I didn't quite follow this. I meant that I didn't undo ereport()->ereturn(). Greetings, Andres Freund
diff --git c/src/backend/utils/adt/numutils.c i/src/backend/utils/adt/numutils.c index 471fbb7ee63..52bbff801d0 100644 --- c/src/backend/utils/adt/numutils.c +++ i/src/backend/utils/adt/numutils.c @@ -301,6 +301,53 @@ pg_strtoint32_safe(const char *s, Node *escontext) uint32 tmp = 0; bool neg = false; + /* + * Fast path recognizing the most common output. + */ + if (1) + { + int32 tmp_s = 0; + + /* leading spaces are uncommon => slow path */ + + /* handle - sign, + is uncommon => slow path */ + if (*ptr == '-') + { + ptr++; + neg = true; + } + + /* require at least one digit */ + if (unlikely(!isdigit((unsigned char) *ptr))) + goto slow; + + /* process digits */ + while (*ptr && isdigit((unsigned char) *ptr)) + { + int8 digit = (*ptr++ - '0'); + + if (unlikely(pg_mul_s32_overflow(tmp_s, 10, &tmp_s)) || + unlikely(pg_sub_s32_overflow(tmp_s, digit, &tmp_s))) + goto out_of_range; + } + + /* trailing spaces are uncommon => slow path */ + if (unlikely(*ptr != '\0')) + goto slow; + + if (!neg) + { + /* could fail if input is most negative number */ + if (unlikely(tmp == PG_INT32_MIN)) + goto out_of_range; + tmp = -tmp; + } + + return tmp; + } + +slow: + /* skip leading spaces */ while (likely(*ptr) && isspace((unsigned char) *ptr)) ptr++;
diff --git i/src/backend/utils/adt/numutils.c w/src/backend/utils/adt/numutils.c index 52bbff801d0..2df6a50371d 100644 --- i/src/backend/utils/adt/numutils.c +++ w/src/backend/utils/adt/numutils.c @@ -293,6 +293,11 @@ pg_strtoint32(const char *s) return pg_strtoint32_safe(s, NULL); } +#if 1 +#undef isdigit +#define isdigit(c) ((c) >= '0' && c <= '9') +#endif + int32 pg_strtoint32_safe(const char *s, Node *escontext) {