Hi,

On 03/16/2016 12:03 PM, David Rowley wrote:
On 16 March 2016 at 10:34, David Rowley <david.row...@2ndquadrant.com> wrote:
If Haribabu's patch does all that's required for the numerical
aggregates for floating point types then the status of covered
aggregates is (in order of pg_aggregate.h):

* AVG() complete coverage
* SUM() complete coverage
* MAX() complete coverage
* MIN() complete coverage
* COUNT() complete coverage
* STDDEV + friends complete coverage
* regr_*,covar_pop,covar_samp,corr not touched these.
* bool*() complete coverage
* bitwise aggs. complete coverage
* Remaining are not touched. I see diminishing returns with making
these parallel for now. I think I might not be worth pushing myself
any harder to make these ones work.

Does what I have done + floating point aggs from Haribabu seem
reasonable for 9.6?

I've attached a series of patches.

Thanks. Haven't really done much testing of this, aside from reading through the patches and "it compiles".

Patch 1:
This is the parallel aggregate patch, not intended for review here.
However, all further patches are based on this, and this adds the
required planner changes to make it possible to test patches 2 and 3.

Patch 2:
This adds the serial/deserial aggregate infrastructure, pg_dump
support, CREATE AGGREGATE changes, and nodeAgg.c changes to have it
serialise and deserialise aggregate states when instructed to do so.

Patch 3:
This adds a boat load of serial/deserial functions, and combine
functions for most of the built-in numerical aggregate functions. It
also contains some regression tests which should really be in patch 2,
but I with patch 2 there's no suitable serialisation or
de-serialisation functions to test CREATE AGGREGATE with. I think
having them here is ok, as patch 2 is quite useless without patch 3
anyway.

I don't see how you could move the tests into #2 when the functions are defined in #3? IMHO this is the right place for the regression tests.


Another thing to note about this patch is that I've gone and created
serial/de-serial functions for when PolyNumAggState both require
sumX2, and don't require sumX2. I had thought about perhaps putting an
extra byte in the serial format to indicate if a sumX2 is included,
but I ended up not doing it this way. I don't really want these serial
formats getting too complex as we might like to do fun things like
pass them along to sharded servers one day, so it might be nice to
keep them simple.

Hmmm, I've noticed that while eyeballing the diffs, and I'm not sure if it's worth the additional complexity at this point. I mean, one byte is hardly going to make a measurable difference - we're probably wasting more than that due to alignment, for example.

As you've mentioned sharded servers - how stable should the serialized format be? I've assumed it to be transient, i.e. in the extreme case it might change after restarting a database - for the parallel aggregate that's clearly sufficient.

But if we expect this to eventually work in a sharded environment, that's going to be way more complicated. I guess in these cases we could rely on implicit format versioning, via the major the version (doubt we'd tweak the format in a minor version anyway).

I'm not sure the sharding is particularly useful argument at this point, considering we don't really know if the current format is actually a good match for that.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to