Am 17.07.22 um 08:00 schrieb Thomas Munro:
Actually ... is there a reason to bother with an intarray version at all, rather than going straight for an in-core anyarray function? It's not obvious to me that an int4-only version would have major performance advantages.Yeah, that seems like a good direction. If there is a performance advantage to specialising, then perhaps we only have to specialise on size, not type. Perhaps there could be a general function that internally looks out for typbyval && typlen == 4, and dispatches to a specialised 4-byte, and likewise for 8, if it can, and that'd already be enough to cover int, bigint, float etc, without needing specialisations for each type.
I played around with the idea of an anyarray shuffle(). The hard part was to deal with arrays with variable length elements, as they can not be swapped easily in place. I solved it by creating an intermediate array of references to the elements. I'll attach a patch with the proof of concept. Unfortunatly it is already about 5 times slower than the specialised version and i am not sure if it is worth going down that road.
Martin
From 85cb90c79459132c4220b8d78586dffe6e590ba9 Mon Sep 17 00:00:00 2001 From: Martin Kalcher <martin.kalc...@aboutsource.net> Date: Sun, 17 Jul 2022 18:06:04 +0200 Subject: [PATCH] introduce array_shuffle() --- src/backend/utils/adt/arrayfuncs.c | 73 ++++++++++++++++++++++++++++++ src/include/catalog/pg_proc.dat | 3 ++ 2 files changed, 76 insertions(+) diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index fb167f226a..b6576ce178 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -6872,3 +6872,76 @@ trim_array(PG_FUNCTION_ARGS) PG_RETURN_DATUM(result); } + +Datum +array_shuffle(PG_FUNCTION_ARGS) +{ + ArrayType *array, + *result; + char *adat, + *rdat; + int nitems, + i; + Oid elemtype; + TypeCacheEntry typentry; + struct { char *dat; size_t size; } *refs; + + array = PG_GETARG_ARRAYTYPE_P(0); + + if (ARR_HASNULL(array)) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("shuffling arrays with null elements is not supported"))); + + if (ARR_NDIM(array) > 1) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("shuffling of multidimensional arrays is not supported"))); + + elemtype = ARR_ELEMTYPE(array); + typentry = *lookup_type_cache(elemtype, 0); + nitems = ArrayGetNItems(ARR_NDIM(array), ARR_DIMS(array)); + refs = palloc(nitems * (sizeof(char *) + sizeof(size_t))); + adat = ARR_DATA_PTR(array); + + /* + * To allow shuffling of arrays with variable length elements, we are going + * to shuffle references to the elements, because swapping variable length + * elements in an array is complicated. In a second step we will iterate the + * shuffled references and copy the elements to the destination array. + * We are using the "inside-out" variant of the Fisher-Yates algorithm to + * produce a shuffled array of references: Append each reference than swap it + * with a radomly-chosen refence. + */ + for (i = 0; i < nitems; i++) + { + int j = random() % (i + 1); + char *next = att_addlength_pointer(adat, typentry.typlen, adat); + next = (char *) att_align_nominal(next, typentry.typalign); + refs[i] = refs[j]; + refs[j].dat = adat; + refs[j].size = next - adat; + adat = next; + } + + result = create_array_envelope(ARR_NDIM(array), + ARR_DIMS(array), + ARR_LBOUND(array), + ARR_SIZE(array), + ARR_ELEMTYPE(array), + array->dataoffset); + + rdat = ARR_DATA_PTR(result); + + /* Collect all references and copy elements to the destination array */ + for (i = 0; i < nitems; i++) + { + memcpy(rdat, refs[i].dat, refs[i].size); + rdat += refs[i].size; + } + + pfree(refs); + PG_FREE_IF_COPY(array, 0); + + PG_RETURN_ARRAYTYPE_P(result); +} diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 2e41f4d9e8..56aff551d3 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -1681,6 +1681,9 @@ proname => 'arraycontjoinsel', provolatile => 's', prorettype => 'float8', proargtypes => 'internal oid internal int2 internal', prosrc => 'arraycontjoinsel' }, +{ oid => '7777', descr => 'shuffle array', + proname => 'array_shuffle', proisstrict => 'f', prorettype => 'anyarray', + proargtypes => 'anyarray', prosrc => 'array_shuffle' }, { oid => '764', descr => 'large object import', proname => 'lo_import', provolatile => 'v', proparallel => 'u', -- 2.37.1