Josh Berkus wrote:

intagg: what does this module do which is not already available
through the built-in array functions and operators?  Maybe I
don't understand what it does. Unnatributed in the README.  Move
to pgfoundry?

Short summary:

  Is there an equivalent of "int_array_enum()" built in?

  I use it for substantial (9X) performance improvements
  for doing joins similar to those described in its README.

  I think it can be used to do somewhat similar things with
  integer arrays that the SQL2003 UNNEST operator does
  on MULTISETs (but yeah, they're quite different too).


Long and boring, but with examples:

I find that it can speed up certain kinds of joins (like
those described in it's readme) drastically.  I have a
pretty big application that has a lot of joins that use
int_array_enum() to expand an array stored in one
column into something that looks like a table instead
of having a third join table connecting two tables.

Note that this is often much faster than the
array IN/ANY/SOME/NOT IN comparisons because when
planning the join it can all the various join plans
like hash joins; while the array operators seem to
just do linear searches of the arrays.

This trick is especially useful in conjunction with an aggregate
based on the "_int_union" function from the intarray/ contrib
module (similar to the FUSION operator for MULTISETS) when you
only want distinct values for that type of join. Sample queries
from an actual application showing a factor-of-9 performance
improvement(7 seconds to 800ms) are shown below.

-- similar to the standard FUSION operator for MULTISETS.
create aggregate intarray_union_agg (
    sfunc = _int_union,
    basetype = int[],
    stype = int[],
    initcond = '{-1}'
);
explain analyze select fac_nam  from userfeatures.point_features  join 
entity_facets using (entity_id)  where featureid=115 group by fac_nam;
-- Total runtime: 7125.322 ms
explain analyze select fac_nam from (select distinct int_array_enum(fac_ids) as 
fac_id from (select distinct fac_ids from entity_facids natural join 
point_features where featureid=115) as a) as a join facet_lookup using (fac_id);
-- Total runtime: 1297.558 ms
explain analyze select fac_nam from (select distinct int_array_enum(fac_ids) as 
fac_id from (select intarray_union_agg(fac_ids) as fac_ids from entity_facids 
natural join point_features where featureid=115) as a) as a join facet_lookup 
using (fac_id);
-- Total runtime: 803.187 ms

I don't have access to the system right now, so I don't have the
full table definitions - but the basic problem is that there are
many "facets" for each row in the "point_features" table and there
are many "features" with featureid=115.   The queries are trying
to find the names of each facet available from that set of point_features.



intarray: data_types/

Well, the array of int's data type is built in, so I think
this module is more about the functions, operators, and
indexes that it provides that operate on arrays of ints.
Would that make it fit better under functions/ in your new
directory tree?



If I had a vote, I'd think it nice if the intagg module got
merged with the intarray module (wherever it ends up) because
they really are quite complementary in providing useful
tools for manipulating arrays of ints.

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Reply via email to