Hello,

The attached patch improves the performance of array_length() by detoasting
only the overhead part of the datum.

Here is a test case:

postgres=# create table array_length_test as select array_agg(a) a from
generate_series(1, 10000) a, generate_series(1, 10000) b group by b;

Without the patch:

postgres=#  select sum(array_length(a, 1)) from array_length_test;
    sum
-----------
 100000000
(1 row)

Time: 199.002 ms

With the patch:

postgres=#  select sum(array_length(a, 1)) from array_length_test;
    sum
-----------
 100000000
(1 row)

Time: 34.599 ms

The motivation for patch is that some of our customers use arrays to store
a sequence of tens of thousands of events in each row. They often need to
get the last 10 event for each row, for which we do A[array_length(A, 1) -
9: 1000000] (assuming 1M is an upper-bound. we could use array_length()
instead of this constant too, but that is unnecessary if we know the
upper-bound and only slows down the query). Without this optimization,
array gets detoasted twice. With this patch, array_length() becomes much
faster, and the whole query saves few seconds.

Of course this technique is applicable to some other functions too, but
they have never become a bottleneck for me, so I decided to keep the
changes only to this function.

Another alternative I could think of was introducing python style slicing,
in which negative indexes start from end of array, so -10 means 10th
element from end. I thought this would be a bigger change and is probably
unnecessary, so I decided to improve array_length() instead.

Feedback is welcome.

Thanks,
   -- Hadi
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 91df184..5e1d9c2 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -1719,11 +1719,15 @@ array_upper(PG_FUNCTION_ARGS)
 Datum
 array_length(PG_FUNCTION_ARGS)
 {
-	ArrayType  *v = PG_GETARG_ARRAYTYPE_P(0);
+	Datum		arrdatum = PG_GETARG_DATUM(0);
 	int			reqdim = PG_GETARG_INT32(1);
+	ArrayType  *v;
 	int		   *dimv;
 	int			result;
 
+	v = (ArrayType *) PG_DETOAST_DATUM_SLICE(arrdatum, 0,
+											 ARR_OVERHEAD_NONULLS(MAXDIM));
+
 	/* Sanity check: does it look like an array at all? */
 	if (ARR_NDIM(v) <= 0 || ARR_NDIM(v) > MAXDIM)
 		PG_RETURN_NULL();
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to