On Thu, Oct 8, 2015 at 11:37 AM, Robert Haas <robertmh...@gmail.com> wrote: > I'm not convinced. Doesn't this exact same concept get used for > over-the-wire communication between BE and LE machines? There, this > operation is spelled htonl/ntohl. Some systems even have htonll, but > I'm sure there are still a bunch that don't.
I continue to disagree with that. The spelling of the macro that you propose suggests that this process occurs at a relatively high level of abstraction, which is misleading. Datums that have abbreviated key bytes packed into them are in general kind of special. All the same, here is a revision of the patch series along those lines. I'll also have to update the UUID patch to independently note the same issues. I should point out that I did not call the macro DatumToBigEndian(), because it's actually the other way around. I called it DatumToLittleEndian(), since the unsigned integer comparator would work correctly on big-endian systems without calling any new macro (which is of course why the new macro does nothing on big-endian systems). We start off with a big endian Datum/unsigned integer on all platforms, and then we byteswap it to make it a little-endian unsigned integer if and when that's required (i.e. only on little-endian systems). -- Peter Geoghegan
From ddc2bce56f8d375a22d7a635dd402ad1e507b85c Mon Sep 17 00:00:00 2001 From: Peter Geoghegan <peter.geoghega...@gmail.com> Date: Sat, 3 Oct 2015 16:11:02 -0700 Subject: [PATCH 2/3] Add two text sort caching optimizations Firstly, cache strxfrm() blobs across calls made to the text SortSupport abbreviation routine. Secondly, cache strcoll() results across calls to the text comparison routine (SortSupport authoritative comparator only). The cost of doing this should be immeasurably small in cases where the optimization does not help, while the benefits in cases where the optimization is effective should be quite significant. --- src/backend/utils/adt/varlena.c | 75 ++++++++++++++++++++++++++++++++++++++--- 1 file changed, 71 insertions(+), 4 deletions(-) diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index fadd827..d814502 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -62,6 +62,9 @@ typedef struct char *buf2; /* 2nd string, or abbreviation strxfrm() buf */ int buflen1; int buflen2; + int last_len1; /* Length of last buf1 string/strxfrm() blob */ + int last_len2; /* Length of last buf2 string/strxfrm() blob */ + int last_returned; /* Last comparison result (cache) */ bool collate_c; hyperLogLogState abbr_card; /* Abbreviated key cardinality state */ hyperLogLogState full_card; /* Full key cardinality state */ @@ -1832,9 +1835,20 @@ btsortsupport_worker(SortSupport ssup, Oid collid) tss->buflen1 = TEXTBUFLEN; tss->buf2 = palloc(TEXTBUFLEN); tss->buflen2 = TEXTBUFLEN; + /* Start with invalid values */ + tss->last_len1 = -1; + tss->last_len2 = -1; #ifdef HAVE_LOCALE_T tss->locale = locale; #endif + /* + * To avoid somehow confusing a strxfrm() blob and an original string + * within bttextfastcmp_locale(), initialize last returned cache to a + * sentinel value. A platform-specific actual strcoll() return value + * of INT_MIN seems unlikely, but if that occurs it will not cause + * wrong answers. + */ + tss->last_returned = INT_MIN; tss->collate_c = collate_c; ssup->ssup_extra = tss; @@ -1897,6 +1911,7 @@ bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup) { text *arg1 = DatumGetTextPP(x); text *arg2 = DatumGetTextPP(y); + bool arg1_match; TextSortSupport *tss = (TextSortSupport *) ssup->ssup_extra; /* working state */ @@ -1915,6 +1930,11 @@ bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup) /* Fast pre-check for equality, as discussed in varstr_cmp() */ if (len1 == len2 && memcmp(a1p, a2p, len1) == 0) { + /* + * No change in buf1 or buf2 contents, so avoid changing last_len1 or + * last_len2. Existing contents of buffers might still be used by next + * call. + */ result = 0; goto done; } @@ -1932,10 +1952,43 @@ bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup) tss->buf2 = MemoryContextAlloc(ssup->ssup_cxt, tss->buflen2); } - memcpy(tss->buf1, a1p, len1); - tss->buf1[len1] = '\0'; - memcpy(tss->buf2, a2p, len2); - tss->buf2[len2] = '\0'; + /* + * We're likely to be asked to compare the same strings repeatedly, and + * memcmp() is so much cheaper than strcoll() that it pays to try to cache + * comparisons, even though in general there is no reason to think that + * that will work out (every text datum may be unique). Caching does not + * slow things down measurably when it doesn't work out, and can speed + * things up by rather a lot when it does. In part, this is because the + * memcmp() compares data from cachelines that are needed in L1 cache even + * when the last comparison's result cannot be reused. + */ + arg1_match = true; + if (len1 != tss->last_len1 || memcmp(tss->buf1, a1p, len1) != 0) + { + arg1_match = false; + memcpy(tss->buf1, a1p, len1); + tss->buf1[len1] = '\0'; + tss->last_len1 = len1; + } + + /* + * If we're comparing the same two strings as last time, we can return the + * same answer without calling strcoll() again. This is more likely than + * it seems (at least with moderate to low cardinality sets), because + * quicksort compares the same pivot against many values. + */ + if (len2 != tss->last_len2 || memcmp(tss->buf2, a2p, len2) != 0) + { + memcpy(tss->buf2, a2p, len2); + tss->buf2[len2] = '\0'; + tss->last_len2 = len2; + } + else if (arg1_match && tss->last_returned != INT_MIN) + { + /* Use result cached following last actual strcoll() call */ + result = tss->last_returned; + goto done; + } #ifdef HAVE_LOCALE_T if (tss->locale) @@ -1952,6 +2005,8 @@ bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup) if (result == 0) result = strcmp(tss->buf1, tss->buf2); + /* Cache result, perhaps saving an expensive strcoll() call next time */ + tss->last_returned = result; done: /* We can't afford to leak memory here. */ if (PointerGetDatum(arg1) != x) @@ -2034,9 +2089,19 @@ bttext_abbrev_convert(Datum original, SortSupport ssup) tss->buf1 = palloc(tss->buflen1); } + /* Might be able to reuse strxfrm() blob from last call */ + if (tss->last_len1 == len && + memcmp(tss->buf1, authoritative_data, len) == 0) + { + memcpy(pres, tss->buf2, Min(sizeof(Datum), tss->last_len2)); + /* No change affecting cardinality, so no hashing required */ + goto done; + } + /* Just like strcoll(), strxfrm() expects a NUL-terminated string */ memcpy(tss->buf1, authoritative_data, len); tss->buf1[len] = '\0'; + tss->last_len1 = len; for (;;) { @@ -2048,6 +2113,7 @@ bttext_abbrev_convert(Datum original, SortSupport ssup) #endif bsize = strxfrm(tss->buf2, tss->buf1, tss->buflen2); + tss->last_len2 = bsize; if (bsize < tss->buflen2) break; @@ -2105,6 +2171,7 @@ bttext_abbrev_convert(Datum original, SortSupport ssup) addHyperLogLog(&tss->abbr_card, hash); +done: /* * Byteswap on little-endian machines. * -- 1.9.1
From 48493d8806f11e92b5d585b217e0e13b69ecdeb0 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan <peter.geoghega...@gmail.com> Date: Fri, 3 Jul 2015 13:48:08 -0700 Subject: [PATCH 1/3] Use unsigned integer abbreviated keys for text Add DatumToLittleEndian() macro to allow string-like types to represent abbreviated keys as unsigned integers. DatumToLittleEndian() will perform a byteswap on little-endian platforms only. Using the new macro, arrange for text abbreviated keys to be represented such that a simple 3-way unsigned integer comparison using the new representation is correct. Replace the comparator along those lines. This is faster on at least some platforms, since unsigned integer comparisons are now used rather than memcmp() during sorting proper. --- src/backend/utils/adt/varlena.c | 28 +++++++++++++++++++--------- src/include/port/pg_bswap.h | 22 ++++++++++++++++++++++ 2 files changed, 41 insertions(+), 9 deletions(-) diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 2fbbf54..fadd827 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -26,6 +26,7 @@ #include "libpq/pqformat.h" #include "miscadmin.h" #include "parser/scansup.h" +#include "port/pg_bswap.h" #include "regex/regex.h" #include "utils/builtins.h" #include "utils/bytea.h" @@ -1967,25 +1968,25 @@ done: static int bttextcmp_abbrev(Datum x, Datum y, SortSupport ssup) { - char *a = (char *) &x; - char *b = (char *) &y; - int result; - - result = memcmp(a, b, sizeof(Datum)); - /* - * When result = 0, the core system will call bttextfastcmp_c() or + * When 0 is returned, the core system will call bttextfastcmp_c() or * bttextfastcmp_locale(). Even a strcmp() on two non-truncated strxfrm() * blobs cannot indicate *equality* authoritatively, for the same reason * that there is a strcoll() tie-breaker call to strcmp() in varstr_cmp(). */ - return result; + if (x > y) + return 1; + else if (x == y) + return 0; + else + return -1; } /* * Conversion routine for sortsupport. Converts original text to abbreviated * key representation. Our encoding strategy is simple -- pack the first 8 - * bytes of a strxfrm() blob into a Datum. + * bytes of a strxfrm() blob into a Datum (on little-endian machines, the 8 + * bytes are stored in reverse order), and treat it as an unsigned integer. */ static Datum bttext_abbrev_convert(Datum original, SortSupport ssup) @@ -2104,6 +2105,15 @@ bttext_abbrev_convert(Datum original, SortSupport ssup) addHyperLogLog(&tss->abbr_card, hash); + /* + * Byteswap on little-endian machines. + * + * This is needed so that bttextcmp_abbrev() (an unsigned integer 3-way + * comparator) works correctly on all platforms. If this was skipped, then + * the comparator would have to call memcmp() with a pair of pointers to + * the first byte of each abbreviated key, which is slower. + */ + res = DatumToLittleEndian(res); /* Don't leak memory here */ if (PointerGetDatum(authoritative) != original) pfree(authoritative); diff --git a/src/include/port/pg_bswap.h b/src/include/port/pg_bswap.h index 6555942..ac060ea 100644 --- a/src/include/port/pg_bswap.h +++ b/src/include/port/pg_bswap.h @@ -43,4 +43,26 @@ ((x >> 56) & 0x00000000000000ffUL)) #endif /* HAVE__BUILTIN_BSWAP64 */ +/* + * Rearrange the bytes of a Datum into little-endian order from big-endian + * order. On big-endian machines, this does nothing at all. Note that the C + * type Datum is an unsigned integer type on all platforms. + * + * One possible application of the DatumToLittleEndian() macro is to make + * bitwise comparisons cheaper. A simple 3-way comparison of Datums + * transformed by the macro (based on native, unsigned comparisons) will return + * the same result as a memcmp() of the corresponding original Datums, but can + * be much cheaper. It's generally safe to do this on big-endian systems + * without any special transformation occurring first. + */ +#ifdef WORDS_BIGENDIAN +#define DatumToLittleEndian(x) (x) +#else /* !WORDS_BIGENDIAN */ +#if SIZEOF_DATUM == 8 +#define DatumToLittleEndian(x) BSWAP64(x) +#else /* SIZEOF_DATUM != 8 */ +#define DatumToLittleEndian(x) BSWAP32(x) +#endif /* SIZEOF_DATUM == 8 */ +#endif /* WORDS_BIGENDIAN */ + #endif /* PG_BSWAP_H */ -- 1.9.1
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers