Am 17.07.22 um 08:00 schrieb Thomas Munro:
Actually ... is there a reason to bother with an intarray version
at all, rather than going straight for an in-core anyarray function?
It's not obvious to me that an int4-only version would have
major performance advantages.

Yeah, that seems like a good direction.  If there is a performance
advantage to specialising, then perhaps we only have to specialise on
size, not type.  Perhaps there could be a general function that
internally looks out for typbyval && typlen == 4, and dispatches to a
specialised 4-byte, and likewise for 8, if it can, and that'd already
be enough to cover int, bigint, float etc, without needing
specialisations for each type.

I played around with the idea of an anyarray shuffle(). The hard part was to deal with arrays with variable length elements, as they can not be swapped easily in place. I solved it by creating an intermediate array of references to the elements. I'll attach a patch with the proof of concept. Unfortunatly it is already about 5 times slower than the specialised version and i am not sure if it is worth going down that road.

Martin
From 85cb90c79459132c4220b8d78586dffe6e590ba9 Mon Sep 17 00:00:00 2001
From: Martin Kalcher <martin.kalc...@aboutsource.net>
Date: Sun, 17 Jul 2022 18:06:04 +0200
Subject: [PATCH] introduce array_shuffle()

---
 src/backend/utils/adt/arrayfuncs.c | 73 ++++++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat    |  3 ++
 2 files changed, 76 insertions(+)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index fb167f226a..b6576ce178 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -6872,3 +6872,76 @@ trim_array(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(result);
 }
+
+Datum
+array_shuffle(PG_FUNCTION_ARGS)
+{
+	ArrayType  *array,
+			   *result;
+	char	   *adat,
+			   *rdat;
+	int		   nitems,
+			   i;
+	Oid 	   elemtype;
+	TypeCacheEntry typentry;
+	struct { char *dat; size_t size; } *refs;
+
+	array = PG_GETARG_ARRAYTYPE_P(0);
+
+	if (ARR_HASNULL(array))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("shuffling arrays with null elements is not supported")));
+
+	if (ARR_NDIM(array) > 1)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("shuffling of multidimensional arrays is not supported")));
+
+	elemtype = ARR_ELEMTYPE(array);
+	typentry = *lookup_type_cache(elemtype, 0);
+	nitems = ArrayGetNItems(ARR_NDIM(array), ARR_DIMS(array));
+	refs = palloc(nitems * (sizeof(char *) + sizeof(size_t)));
+	adat = ARR_DATA_PTR(array);
+
+	/*
+	 * To allow shuffling of arrays with variable length elements, we are going
+	 * to shuffle references to the elements, because swapping variable length
+	 * elements in an array is complicated. In a second step we will iterate the
+	 * shuffled references and copy the elements to the destination array.
+	 * We are using the "inside-out" variant of the Fisher-Yates algorithm to
+	 * produce a shuffled array of references: Append each reference than swap it
+	 * with a radomly-chosen refence.
+	 */
+	for (i = 0; i < nitems; i++)
+	{
+		int j = random() % (i + 1);
+		char *next = att_addlength_pointer(adat, typentry.typlen, adat);
+		next = (char *) att_align_nominal(next, typentry.typalign);
+		refs[i] = refs[j];
+		refs[j].dat = adat;
+		refs[j].size = next - adat;
+		adat = next;
+	}
+
+	result = create_array_envelope(ARR_NDIM(array),
+								   ARR_DIMS(array),
+								   ARR_LBOUND(array),
+								   ARR_SIZE(array),
+								   ARR_ELEMTYPE(array),
+								   array->dataoffset);
+
+	rdat = ARR_DATA_PTR(result);
+
+	/* Collect all references and copy elements to the destination array */
+	for (i = 0; i < nitems; i++)
+	{
+		memcpy(rdat, refs[i].dat, refs[i].size);
+		rdat += refs[i].size;
+	}
+
+	pfree(refs);
+	PG_FREE_IF_COPY(array, 0);
+
+	PG_RETURN_ARRAYTYPE_P(result);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 2e41f4d9e8..56aff551d3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1681,6 +1681,9 @@
   proname => 'arraycontjoinsel', provolatile => 's', prorettype => 'float8',
   proargtypes => 'internal oid internal int2 internal',
   prosrc => 'arraycontjoinsel' },
+{ oid => '7777', descr => 'shuffle array',
+  proname => 'array_shuffle', proisstrict => 'f', prorettype => 'anyarray',
+  proargtypes => 'anyarray', prosrc => 'array_shuffle' },
 
 { oid => '764', descr => 'large object import',
   proname => 'lo_import', provolatile => 'v', proparallel => 'u',
-- 
2.37.1

Reply via email to