Hello, The attached patch improves the performance of array_length() by detoasting only the overhead part of the datum.
Here is a test case: postgres=# create table array_length_test as select array_agg(a) a from generate_series(1, 10000) a, generate_series(1, 10000) b group by b; Without the patch: postgres=# select sum(array_length(a, 1)) from array_length_test; sum ----------- 100000000 (1 row) Time: 199.002 ms With the patch: postgres=# select sum(array_length(a, 1)) from array_length_test; sum ----------- 100000000 (1 row) Time: 34.599 ms The motivation for patch is that some of our customers use arrays to store a sequence of tens of thousands of events in each row. They often need to get the last 10 event for each row, for which we do A[array_length(A, 1) - 9: 1000000] (assuming 1M is an upper-bound. we could use array_length() instead of this constant too, but that is unnecessary if we know the upper-bound and only slows down the query). Without this optimization, array gets detoasted twice. With this patch, array_length() becomes much faster, and the whole query saves few seconds. Of course this technique is applicable to some other functions too, but they have never become a bottleneck for me, so I decided to keep the changes only to this function. Another alternative I could think of was introducing python style slicing, in which negative indexes start from end of array, so -10 means 10th element from end. I thought this would be a bigger change and is probably unnecessary, so I decided to improve array_length() instead. Feedback is welcome. Thanks, -- Hadi
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index 91df184..5e1d9c2 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -1719,11 +1719,15 @@ array_upper(PG_FUNCTION_ARGS) Datum array_length(PG_FUNCTION_ARGS) { - ArrayType *v = PG_GETARG_ARRAYTYPE_P(0); + Datum arrdatum = PG_GETARG_DATUM(0); int reqdim = PG_GETARG_INT32(1); + ArrayType *v; int *dimv; int result; + v = (ArrayType *) PG_DETOAST_DATUM_SLICE(arrdatum, 0, + ARR_OVERHEAD_NONULLS(MAXDIM)); + /* Sanity check: does it look like an array at all? */ if (ARR_NDIM(v) <= 0 || ARR_NDIM(v) > MAXDIM) PG_RETURN_NULL();
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers