Re: [GENERAL] How to speedup intarray aggregate function?

Dmitry Koterov Wed, 10 Oct 2007 02:28:31 -0700

Thanks for your comment.

I see two possible solution directions:



1. Is it possible to create C-function, which could accept something like
ROWSET(ARRAY[]) in its input parameters?
E.g. to call it as

SELECT array_rowset_glue((SELECT arrayfield FROM arraytable));

or something like this?


2. Is it possible to implement in C something like this?

array_buffer_init();
SELECT array_buffer_push(arrayfield) FROM arraytable;
ids := array_buffer_get();
array_buffer_free();

where array_buffer_push() is an aggregate function which returns void, but,
as its side-effect, appends arrayfield to the global array buffer for later
acces with array_buffer_get().


On 10/10/07, Filip Rembiałkowski <[EMAIL PROTECTED]> wrote:
>
> 2007/10/10, Dmitry Koterov <[EMAIL PROTECTED]>:
> > Hello.
> >
> > I created an aggregate:
> >
> > CREATE AGGREGATE intarray_aggregate_push (_int4)
> > (
> >   STYPE = _int4,
> >   SFUNC = intarray_push_array,
> >   INITCOND = '{}'
> > );
> >
> > (or - I may use _int_union instead of intarray_push_array, its speed is
> > practically the same in my case).
> > This aggregate merges together a list of integer[] arrays resulting one
> big
> > array with all elements.
> >
> > Then I want to use this aggregate:
> >
> > SELECT intarray_aggregate_push(arrayfield)
> > FROM arraytable
> >
> >  The table arraytable contains a lot of rows (about 5000), each row has
> > array with length of 5-10 elements, so - the resulting array should
> contain
> > about 50000 elements.
> >
> > The query is okay, but its speed is too bad: about 1 second.
> >
> > The main problem is the speed of intarray_aggregate_push function - it
> is
> > quite slow, because intarray_push_array reallocates the memory each time
> I
> > merge two arrays. I am pretty sure that the reallocaton and copying is
> the
> > bottleneck, because if I use another dummy aggreate:
> >
> > CREATE AGGREGATE intarray_aggregate_dummy (_int4)
> > (
> >   STYPE = _int4,
> >   SFUNC = dummy,
> >   INITCOND = '{}'
> > );
> >
> > CREATE OR REPLACE FUNCTION "public"."dummy" (a integer [], b integer [])
> > RETURNS integer [] AS
> > $body$ BEGIN RETURN a; END; $body$
> > LANGUAGE 'plpgsql' VOLATILE CALLED ON NULL INPUT SECURITY INVOKER;
> >
> > where dummy() is the function which returns its first argument without
> any
> > modification, the speed grows dramatically - about 25 ms (instead of
> 1000
> > ms!).
> >
> > The question is: how could I optimize this, and is it possible at all in
> > Postgres? I just want to get one large array glued from a lot of smaller
> > arrays...
>
>
> 1. no wonder copying is the bottleneck - this is what the aggregate
> does, mostly.
>
> 2. you can use plain array_cat for this, in my test it is few percent
> faster
>
> 3. in this case I guess intarrray contrib is not an option, AFAIK it
> was created only for speeding up searches, that is int4[] lookups
>
> 4. to have this kind of optimization you talk about, we would need an
> aggregate operating (in this case appending) directly on
> internalstate. i'm not sure if this is possible in postgres
>
> 5. my results:
> your method (using intarray_push_array): 940 ms
> using array_cat: 860 ms
> same in PL/PgSQL: (LOOP, append) 800 ms
> same thing in Perl, no database (push array of arrays into one and
> print ): 18 ms
>
>
> cheers, Filip
>
>
> --
> Filip Rembiałkowski
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/
>

Re: [GENERAL] How to speedup intarray aggregate function?

Reply via email to