subject:"Re\: \[HACKERS\] Combining Aggregates"

On Tue, Apr 5, 2016 at 2:52 PM, Alvaro Herrera  wrote:
> Robert Haas wrote:
>> Now, let's suppose that the user sets up a sharded table and then
>> says: SELECT a, SUM(b), AVG(c) FROM sometab.  At this point, what we'd
>> like to have happen is that for each child foreign table, we go and
>> fetch partially aggregated results.  Those children might be running
>> any version of PostgreSQL - I was not assuming that we'd insist on
>> matching major versions, although of course that could be done - and
>> there would probably need to be a minimum version of PostgreSQL
>> anyway.  They could even be running some other database.  As long as
>> they can spit out partial aggregates in a format that we can
>> understand, we can deserialize those aggregates and run combine
>> functions on them.  But if the remote side is, say, MariaDB, it's
>> probably much easier to get it to spit out something that looks like a
>> PostgreSQL array than it is to make it spit out some bytea blob that's
>> in an entirely PostgreSQL-specific format.
>
> Basing parts of the Postgres sharding mechanism on FDWs sounds
> acceptable.  Trying to design things so that *any* FDW can be part of a
> shard, so that you have some shards in Postgres and other shards in
> MariaDB, seems ludicrous to me.  Down that path lies madness.

I'm doubtful that anyone wants to do the work to make that happen, but
I don't agree that we shouldn't care about whether it's possible.
Extensibility is a feature of the system that we've worked hard for,
and we shouldn't casually surrender it.  For example, postgres_fdw now
implements join pushdown, and I suspect few other FDW authors will
care to do the work to add similar support to their implementations.
But some may, and it's good that the code is structured in such a way
that they have the option.

Actually, though, MariaDB is a bad example.  What somebody's much more
likely to want to do is have PostgreSQL as a frontend accessing data
that's actually stored in Hadoop.  There are lots of SQL interfaces to
Hadoop already, so it's clearly a thing people want, and our SQL is
the best SQL (right?) so if you could put that on top of Hadoop
somebody'd probably like it.  I'm not planning to try it myself, but I
think it would be cool if somebody else did.  I have been very pleased
to see that many of the bits and pieces of infrastructure that I
created for parallel query (parallel background workers, DSM, shm_mq)
have attracted quite a bit of interest from other developers for
totally unrelated purposes, and I think anything we do around
horizontal scalability should be designed the same way: the goal
should be to work with PostgreSQL on the other side, but the bits that
can be made reusable for other purposes should be so constructed.

> In fact, trying to ensure cross-major-version compatibility already
> sounds like asking for too much.  Requiring matching major versions
> sounds a perfectly acceptable restricting to me.

I disagree.  One of the motivations (not the only one, by any means)
for wanting logical replication in PostgreSQL is precisely to get
around the fact that physical replication requires matching major
versions.  That restriction turns out to be fairly onerous, not least
because when you've got a cluster of several machines you'd rather
upgrade them one at a time rather than all at once.  That's going to
be even more true with a sharded cluster, which will probably tend to
involve more machines than a replication cluster, maybe a lot more.
If you say that the user has got to shut the entire thing down,
upgrade all the machines, and turn it all back on again, and just hope
it works, that's going to be really painful.  I think that we should
treat this more like we do with libpq, where each major release can
add new capabilities that new applications can use, but the old stuff
has got to keep working essentially forever.  Maybe the requirements
here are not quite so tight, because it's probably reasonable to say,
e.g. that you must upgrade every machine in the cluster to at least
release 11.1 before upgrading any machine to 11.3 or higher, but the
fewer such requirements we have, the better.  Getting people to
upgrade to new major releases is already fairly hard, and anything
that makes it harder is an unwelcome development from my point of
view.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-04-05 Thread Alvaro Herrera

Robert Haas wrote:

> Now, let's suppose that the user sets up a sharded table and then
> says: SELECT a, SUM(b), AVG(c) FROM sometab.  At this point, what we'd
> like to have happen is that for each child foreign table, we go and
> fetch partially aggregated results.  Those children might be running
> any version of PostgreSQL - I was not assuming that we'd insist on
> matching major versions, although of course that could be done - and
> there would probably need to be a minimum version of PostgreSQL
> anyway.  They could even be running some other database.  As long as
> they can spit out partial aggregates in a format that we can
> understand, we can deserialize those aggregates and run combine
> functions on them.  But if the remote side is, say, MariaDB, it's
> probably much easier to get it to spit out something that looks like a
> PostgreSQL array than it is to make it spit out some bytea blob that's
> in an entirely PostgreSQL-specific format.

Basing parts of the Postgres sharding mechanism on FDWs sounds
acceptable.  Trying to design things so that *any* FDW can be part of a
shard, so that you have some shards in Postgres and other shards in
MariaDB, seems ludicrous to me.  Down that path lies madness.

In fact, trying to ensure cross-major-version compatibility already
sounds like asking for too much.  Requiring matching major versions
sounds a perfectly acceptable restricting to me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

On Tue, Apr 5, 2016 at 10:19 AM, Robert Haas  wrote:
> I'm going to concede the point that this shouldn't really be a
> priority for 9.6, but I might want to come back to it later.

It seems to me that if two aggregates are using the same transition
function, they ought to also be using the same combine, serialization,
and deserialization functions.  I wrote a query to find cases where
that wasn't so and found a few, which I changed before committing.

DATA(insert ( 2100  n 0 int8_avg_accum  numeric_poly_avg
int8_avg_combineint8_avg_serialize  int8_avg_deserialize
int8_avg_accum  int8_avg_accum_inv  numeric_poly_avgf f 0   2281
 17  48  228148  _null_ _null_ ));
DATA(insert ( 2107  n 0 int8_avg_accum  numeric_poly_sum
numeric_poly_combineint8_avg_serialize  int8_avg_deserialize
 int8_avg_accum  int8_avg_accum_inv  numeric_poly_sumf f 0   2281
  17  48  228148  _null_ _null_ ));

I changed the second of these from numeric_poly_combine to int8_avg_combine.

DATA(insert ( 2103  n 0 numeric_avg_accum   numeric_avg
numeric_avg_combine numeric_avg_serialize   numeric_avg_deserialize
numeric_avg_accum numeric_accum_inv numeric_avg f f 0   2281
 17  128 2281128 _null_ _null_ ));
DATA(insert ( 2114  n 0 numeric_avg_accum   numeric_sum
numeric_combine numeric_avg_serialize
numeric_avg_deserialize numeric_avg_accum numeric_accum_inv
numeric_sum f f 0   228117  128 2281128 _null_ _null_
));

I changed the second of these from numeric_combine to
numeric_avg_combine.  I also added a regression test for this.  Please
let me know if I'm off-base here.

Committed 0002+0003 with those changes, some minor cosmetic stuff, and
of course the obligatory catversion bump.  Oh, and fixed an OID
conflict with the patch Magnus just committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

On Tue, Apr 5, 2016 at 9:30 AM, David Rowley
 wrote:
> On 6 April 2016 at 01:01, Robert Haas  wrote:
>> On Tue, Apr 5, 2016 at 3:55 AM, David Rowley
>>> To be really honest, I'm quite worried that if I go and make this
>>> change then my time might be wasted as I really think making that
>>> change this late in the day is just setting this up for failure.  I
>>> really don't think we can bat this patch over the Pacific Ocean too
>>> many times before we find ourselves inside the feature freeze. Of
>>> course, if you really think it's no good, that's different, it can't
>>> be committed, but "I think it might be better" does not seem like a
>>> solid enough argument for me to want to risk trying this and delaying
>>> further for that.
>>
>> I think I agree.  Certainly, with what we've got here today, these are
>> not user-exposed, so I think we could certainly change them next time
>> around if need be.  If they ever become user-exposed, then maybe we
>> should think a little harder.
>
> I understand your concern. Painting ourselves into a corner would be pretty 
> bad.
>
> Although, I'm not so sure we'd want to go down the route of trying to
> stabilise the format, even into something human readable. As far as I
> know we've never mentioned anything about what's stored in the
> internal aggregate states in the documentation, and we've made some
> fairly hefty changes to these over the last few releases. For example;
> 959277a4, where internal int128 support was added to improve
> performance of various aggregates, like sum(bigint). I think if we're
> going to head down the route of trying to keep this format stable,
> then we're going to struggle to make improvements to aggregates in the
> future.

That might be true, but it's much more likely to be true if the
internal format is an opaque blob.  If the serialized transition state
for average is {count,sum} then we can switch between having the
internal representation being numeric and having it be int128 and
nothing breaks.  With the opaque blob, maybe nothing breaks, or maybe
something does.

> It's also not all that clear that all of the internal fields really
> make sense to other databases, for example NumericAggState's maxScale
> and maxScaleCount. These really only are needed for moving aggregates.
> Would we want some other database to have to magic up some numbers for
> these fields because our format requires it?

Well, if we want to do cross-node aggregation with that database on
the other side, those numbers are going to have to get magicked up
someplace along the line.

I'm going to concede the point that this shouldn't really be a
priority for 9.6, but I might want to come back to it later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-04-05 Thread David Rowley

On 6 April 2016 at 01:01, Robert Haas  wrote:
> On Tue, Apr 5, 2016 at 3:55 AM, David Rowley
>> To be really honest, I'm quite worried that if I go and make this
>> change then my time might be wasted as I really think making that
>> change this late in the day is just setting this up for failure.  I
>> really don't think we can bat this patch over the Pacific Ocean too
>> many times before we find ourselves inside the feature freeze. Of
>> course, if you really think it's no good, that's different, it can't
>> be committed, but "I think it might be better" does not seem like a
>> solid enough argument for me to want to risk trying this and delaying
>> further for that.
>
> I think I agree.  Certainly, with what we've got here today, these are
> not user-exposed, so I think we could certainly change them next time
> around if need be.  If they ever become user-exposed, then maybe we
> should think a little harder.

I understand your concern. Painting ourselves into a corner would be pretty bad.

Although, I'm not so sure we'd want to go down the route of trying to
stabilise the format, even into something human readable. As far as I
know we've never mentioned anything about what's stored in the
internal aggregate states in the documentation, and we've made some
fairly hefty changes to these over the last few releases. For example;
959277a4, where internal int128 support was added to improve
performance of various aggregates, like sum(bigint). I think if we're
going to head down the route of trying to keep this format stable,
then we're going to struggle to make improvements to aggregates in the
future.

It's also not all that clear that all of the internal fields really
make sense to other databases, for example NumericAggState's maxScale
and maxScaleCount. These really only are needed for moving aggregates.
Would we want some other database to have to magic up some numbers for
these fields because our format requires it?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

On Tue, Apr 5, 2016 at 3:55 AM, David Rowley
 wrote:
>> I think it might be a good idea if these patches made less use of
>> bytea and exposed the numeric transition values as, say, a 2-element
>> array of numeric.
>
> Well, if you have a look at NumericAggState you can see it's not quite
> as simple as an array of numeric, unless of course you'd be willing to
> spend the extra cycles, use the extra memory, and bandwidth to convert
> those int64's to numeric too, then it could be made to work. To do are
> you describe properly, we'd need a composite type.

Uggh, yeah.  Yuck.

> hmm, isn't that why we have a deserialisation functions? Do you see a
> use case where these won't be available?
...
> I've not yet read the design spec for sharding in Postgres. If there's
> something I can read over to get an idea of why you think this won't
> work, then maybe we can come to a good conclusion that way.  But if I
> take a guess, then I'd have imagined that we'd not support sharding
> over different major versions, and if we really needed to change the
> serialisation format later, then we could do so. We could even put a
> note in the documents that the serialisation format may change between
> major versions.

Well, OK, so here was my thought.  For the sake of simplicity, let's
suppose that creating a sharded table works more or less like what you
can already do today: create a parent table with a non-inherited CHECK
(false) constraint and then create some inheritance children that are
foreign tables on various remote servers.  Give those children CHECK
constraints that explicate the partitioning scheme.  This is probably
not actually how we want this to work in detail (e.g. we probably want
declarative partitioning) but the details don't matter very much for
purposes of what I'm trying to explain here so let's just ignore them
for the moment.

Now, let's suppose that the user sets up a sharded table and then
says: SELECT a, SUM(b), AVG(c) FROM sometab.  At this point, what we'd
like to have happen is that for each child foreign table, we go and
fetch partially aggregated results.  Those children might be running
any version of PostgreSQL - I was not assuming that we'd insist on
matching major versions, although of course that could be done - and
there would probably need to be a minimum version of PostgreSQL
anyway.  They could even be running some other database.  As long as
they can spit out partial aggregates in a format that we can
understand, we can deserialize those aggregates and run combine
functions on them.  But if the remote side is, say, MariaDB, it's
probably much easier to get it to spit out something that looks like a
PostgreSQL array than it is to make it spit out some bytea blob that's
in an entirely PostgreSQL-specific format.

Now, maybe that doesn't matter.  Even getting something like this
working with PostgreSQL as the remote side is going to be hard.
Moreover, for this to have any chance of working with anything other
than a compatible PostgreSQL server on the remote side, the FDW is
going to have to write bespoke code for each aggregate anyway, and
that code can always construct the requisite bytea blobs internally.
So who cares?  I can't point to any really tangible advantage of
having serialized transition states be human-readable, so maybe there
isn't one.  But I was thinking about it, for fear that there might be
some advantage that I'm missing.

> To be really honest, I'm quite worried that if I go and make this
> change then my time might be wasted as I really think making that
> change this late in the day is just setting this up for failure.  I
> really don't think we can bat this patch over the Pacific Ocean too
> many times before we find ourselves inside the feature freeze. Of
> course, if you really think it's no good, that's different, it can't
> be committed, but "I think it might be better" does not seem like a
> solid enough argument for me to want to risk trying this and delaying
> further for that.

I think I agree.  Certainly, with what we've got here today, these are
not user-exposed, so I think we could certainly change them next time
around if need be.  If they ever become user-exposed, then maybe we
should think a little harder.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-04-05 Thread David Rowley

On 5 April 2016 at 16:54, Robert Haas  wrote:
> On Wed, Mar 30, 2016 at 3:34 PM, David Rowley
>  wrote:
>> I wrote 0002 - 0004, these were reviewed by Tomas.
>> 0005 is Haribabu's patch: Reviewed by Tomas and I.
>
> I think it might be a good idea if these patches made less use of
> bytea and exposed the numeric transition values as, say, a 2-element
> array of numeric.

Well, if you have a look at NumericAggState you can see it's not quite
as simple as an array of numeric, unless of course you'd be willing to
spend the extra cycles, use the extra memory, and bandwidth to convert
those int64's to numeric too, then it could be made to work. To do are
you describe properly, we'd need a composite type.

>  Maybe this doesn't really matter, but it's not
> impossible that these values could be exposed somewhere, and a numeric
> is an awful lot more user-friendly than an essentially opaque bytea.

hmm, isn't that why we have a deserialisation functions? Do you see a
use case where these won't be available?

> One of the things I eventually want to figure out how to do with this
> is distributed aggregation across multiple shards, and I think it
> might be better to have the value being sent over the wire be
> something like {26.6,435.12} rather than \x1234...
>
> Thoughts?

I've not yet read the design spec for sharding in Postgres. If there's
something I can read over to get an idea of why you think this won't
work, then maybe we can come to a good conclusion that way.  But if I
take a guess, then I'd have imagined that we'd not support sharding
over different major versions, and if we really needed to change the
serialisation format later, then we could do so. We could even put a
note in the documents that the serialisation format may change between
major versions.

To be really honest, I'm quite worried that if I go and make this
change then my time might be wasted as I really think making that
change this late in the day is just setting this up for failure.  I
really don't think we can bat this patch over the Pacific Ocean too
many times before we find ourselves inside the feature freeze. Of
course, if you really think it's no good, that's different, it can't
be committed, but "I think it might be better" does not seem like a
solid enough argument for me to want to risk trying this and delaying
further for that.

But either way, we should come try to come to a consensus quickly. I
don't want to alienate you because you think I don't want to listen to
you. We need to act quickly or we'll only have a handful of the useful
aggregates working in parallel for 9.6. I'd rather we had them all. I
hope you do too.

Thanks for looking over the patch.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-04-05 Thread Simon Riggs

On 5 April 2016 at 05:54, Robert Haas  wrote:

> On Wed, Mar 30, 2016 at 3:34 PM, David Rowley
>  wrote:
> > I wrote 0002 - 0004, these were reviewed by Tomas.
> > 0005 is Haribabu's patch: Reviewed by Tomas and I.
>
> I think it might be a good idea if these patches made less use of
> bytea and exposed the numeric transition values as, say, a 2-element
> array of numeric.  Maybe this doesn't really matter, but it's not
> impossible that these values could be exposed somewhere, and a numeric
> is an awful lot more user-friendly than an essentially opaque bytea.
> One of the things I eventually want to figure out how to do with this
> is distributed aggregation across multiple shards, and I think it
> might be better to have the value being sent over the wire be
> something like {26.6,435.12} rather than \x1234...
>
> Thoughts?


Rewriting something that works fine just before the deadline isn't a good
plan.

"Might be better" isn't enough.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Combining Aggregates

2016-04-04 Thread Robert Haas

On Wed, Mar 30, 2016 at 3:34 PM, David Rowley
 wrote:
> I wrote 0002 - 0004, these were reviewed by Tomas.
> 0005 is Haribabu's patch: Reviewed by Tomas and I.

I think it might be a good idea if these patches made less use of
bytea and exposed the numeric transition values as, say, a 2-element
array of numeric.  Maybe this doesn't really matter, but it's not
impossible that these values could be exposed somewhere, and a numeric
is an awful lot more user-friendly than an essentially opaque bytea.
One of the things I eventually want to figure out how to do with this
is distributed aggregation across multiple shards, and I think it
might be better to have the value being sent over the wire be
something like {26.6,435.12} rather than \x1234...

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-30 Thread David Rowley

On 31 March 2016 at 00:48, Robert Haas  wrote:
> On Tue, Mar 29, 2016 at 11:14 PM, David Rowley
>  wrote:
>>> 0005:
>>> Haribabu's patch; no change from last time.
>
> So what's the distinction between 0002 and 0005?  And what is the
> correct author/reviewer attribution for each?

I wrote 0002 - 0004, these were reviewed by Tomas.
0005 is Haribabu's patch: Reviewed by Tomas and I.

The reason 0002 and 0005 are separate patches is just down to them
having different authors.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-30 Thread Robert Haas

On Tue, Mar 29, 2016 at 11:14 PM, David Rowley
 wrote:
> Many thanks Robert for committing the serialize states portion.

yw, sorry I didn't get an email out about that.

>> 0005:
>> Haribabu's patch; no change from last time.

So what's the distinction between 0002 and 0005?  And what is the
correct author/reviewer attribution for each?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-29 Thread David Rowley

On 26 March 2016 at 15:07, David Rowley  wrote:

Many thanks Robert for committing the serialize states portion.

> 0005:
> Haribabu's patch; no change from last time.

Just in case you jump ahead. I just wanted to re-highlight something
Haribabu mentioned a while ago about combining floating point states
[1]. This discussion did not happen on this thread, so to make sure it
does not get lost...

As of today aggregate calculations for floating point types can vary
depending on the order in which values are aggregated.

For example:

create table f8 (num float8);
insert into f8 select x/100.0 from generate_series(1,1) x(x);
select stddev(num order by num) from f8;
  stddev
--
 28.8689567990717
(1 row)

select stddev(num order by num desc) from f8;
  stddev
--
 28.8689567990716
(1 row)

select stddev(num order by random()) from f8;
  stddev
--
 28.8689567990715
(1 row)

And of course the execution plan can determine the order in which rows
are aggregated, even if the underlying data does not change.

Parallelising these aggregates increases the chances of seeing these
variations as the number of rows aggregated in each worker is going to
vary on each run, so the numerical anomalies will also vary between
each run.

I wrote in [1]:
> We do also warn about this in the manual: "Inexact means that some
> values cannot be converted exactly to the internal format and are
> stored as approximations, so that storing and retrieving a value might
> show slight discrepancies. Managing these errors and how they
> propagate through calculations is the subject of an entire branch of
> mathematics and computer science and will not be discussed here,
> except for the following points:" [1]
> [1] http://www.postgresql.org/docs/devel/static/datatype-numeric.html

Does this text in the documents stretch as far as the variable results
from parallel aggregate for floating point types? or should we be more
careful and not parallelise these, similar to how we didn't end up
with inverse aggregate transition functions for floating point types?

I'm personally undecided, and would like to hear what others think.

[1] 
http://www.postgresql.org/message-id/cakjs1f_hplfhkd2ylfrsrmumbzwqkgvjcwx21b_xg1a-0pz...@mail.gmail.com
-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-25 Thread David Rowley

On 26 March 2016 at 15:07, David Rowley  wrote:
> Ok, so on further look at this I've decided to make changes and have
> it so the serialisation function can be dumb about memory contexts in
> the same way as finalize_aggregate() allows the final function to be
> dumb... notice at the end of the function it copies byref types into
> the correct memory context, if they're not already. So in the attached
> the serialisation function call code now does the same thing. In fact
> I decided to move all that code off into a function called
> finalize_partialaggregate().

Please disregard the previous 0001 patch in favour of the attached one.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Allow-INTERNAL-state-aggregates-to-participate-in-pa_2016-03-26a.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-24 Thread David Rowley

On 25 March 2016 at 06:17, Robert Haas  wrote:
> On Mon, Mar 21, 2016 at 2:18 PM, David Rowley
>  wrote:
>> I've attached 2 of the patches which are affected by the changes.
>
> I think the documentation for 0001 needs some work yet.  The
> additional paragraph that you've added...
>
> (1) doesn't seem to appear at a very logical place in the
> documentation - I think it should be much further down, as it's a
> minor detail.  Maybe document this the same way as the documentation
> patch you just sent for the combine-function stuff does it; and

Thanks. I also realised this when writing the combine documents fix.

> (2) isn't indented consistently with the surrounding paragraphs; and
>
> (3) is missing a closing  tag
>
> Also, I'd just cut this:
>
> +  This is required due to
> +  the process model being unable to pass references to INTERNAL
> +   types between different PostgreSQL
> +  processes.
>
> Instead, I'd change the earlier sentence in the paragraph, which
> currently reads:
>
> +  These
> +  functions are required in order to allow parallel aggregation for 
> aggregates
> +  with an stype of 
> +  INTERNAL.
>
> I'd replace the period at end with a comma and add "since
> INTERNAL values represent arbitrary in-memory data
> structures which can't be passed between processes".  I think that's a
> bit smoother.
>

In my rewrite I've incorporated these words.

Thanks for checking over this. A patch will follow in my response to
the next email.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-24 Thread Robert Haas

On Thu, Mar 24, 2016 at 1:17 PM, Robert Haas  wrote:
> I'm going to read through the code again now.

OK, I noticed another documentation problem: you need to update
catalogs.sgml for these new columns.

+* Validate the serial function, if present. We must ensure
that the return
+* Validate the de-serial function, if present. We must ensure that the

I think that you should refer to these consistently in the comments as
the "serialization function" and the "deserialization function", even
though the SQL syntax is different.  And unhyphenated.

+   /* check that we also got a serial type */
+   if (!OidIsValid(aggSerialType))
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
+errmsg("must specify serialization
type when specifying serialization function")));

I think that in parallel cases, this check is in DefineAggregate(),
not here.  See, e.g. "aggregate mfinalfunc must not be specified
without mstype".

Existing type parameters to CREATE AGGREGATE have IsPolymorphicType()
checks to enforce sanity in various ways, but you seem not to have
added that for the serial type.

+   /* don't call a strict serial function with
NULL input */
+   if (pertrans->serialfn.fn_strict &&
+   pergroupstate->transValueIsNull)
+   continue;

Shouldn't this instead set aggnulls[aggno] = true?  And doesn't the
hunk in combine_aggregates() have the same problem?

+   /*
+* serial and de-serial functions must match, if
present. Remember that
+* these will be InvalidOid if they're not required
for this agg node
+*/

Explain WHY they need to match.  And maybe update the overall comment
for the function.

+ "'-' AS
aggdeserialfn,aggmtransfn, aggminvtransfn, "

Whitespace.

In your pg_aggregate.h changes, avg(numeric) sets aggserialtype but no
aggserialfn or aggdeserialfn.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-24 Thread Robert Haas

On Mon, Mar 21, 2016 at 2:18 PM, David Rowley
 wrote:
> I've attached 2 of the patches which are affected by the changes.

I think the documentation for 0001 needs some work yet.  The
additional paragraph that you've added...

(1) doesn't seem to appear at a very logical place in the
documentation - I think it should be much further down, as it's a
minor detail.  Maybe document this the same way as the documentation
patch you just sent for the combine-function stuff does it; and

(2) isn't indented consistently with the surrounding paragraphs; and

(3) is missing a closing  tag

Also, I'd just cut this:

+  This is required due to
+  the process model being unable to pass references to INTERNAL
+   types between different PostgreSQL
+  processes.

Instead, I'd change the earlier sentence in the paragraph, which
currently reads:

+  These
+  functions are required in order to allow parallel aggregation for aggregates
+  with an stype of 
+  INTERNAL.

I'd replace the period at end with a comma and add "since
INTERNAL values represent arbitrary in-memory data
structures which can't be passed between processes".  I think that's a
bit smoother.

I'm going to read through the code again now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-24 Thread Robert Haas

On Thu, Mar 24, 2016 at 5:22 AM, David Rowley
 wrote:
> On 21 January 2016 at 08:06, Robert Haas  wrote:
>> I re-reviewed this and have committed most of it with only minor
>> kibitizing.  A few notes:
>
> I realised today that the combinefunc is rather undocumented. I've
> attached a patch which aims to fix this.
>
> Comments are welcome.

Looks good to me.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-24 Thread David Rowley

On 21 January 2016 at 08:06, Robert Haas  wrote:
> I re-reviewed this and have committed most of it with only minor
> kibitizing.  A few notes:

I realised today that the combinefunc is rather undocumented. I've
attached a patch which aims to fix this.

Comments are welcome.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


add_combinefunc_documents.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-21 Thread David Rowley

On 20 March 2016 at 16:48, David Rowley  wrote:
> I've attached another series of patches:
>
> 0001: This is the latest Parallel Aggregate Patch, not intended for
> review here, but is required for the remaining patches. This patch has
> changed quite a bit from the previous one that I posted here, and the
> remaining patches needed re-based due to those changes.
>
> 0002: Adds serial/de-serial function support to CREATE AGGREGATE,
> contains minor fix-ups from last version.
>
> 0003: Adds various combine/serial/de-serial functions for the standard
> set of aggregates, apart from most float8 aggregates.
>
> 0004: Adds regression tests for 0003 pg_aggregate.h changes.
>
> 0005: Documents to mention which aggregate functions support partial mode.
>
> 0006: Re-based version of Haribabu's floating point aggregate support,
> containing some fixes by me.

Now that parallel aggregate has been committed, 0002 no longer applies.
Please find attached a new 0001 patch, intended to replace the
previous 0001 and 0002.

The previous 0003, 0004, 0005 and 0006 should still apply onto the new 0001.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Allow-INTERNAL-state-aggregates-to-participate-in-pa_2016-03-22.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-21 Thread Robert Haas

On Sat, Mar 19, 2016 at 11:48 PM, David Rowley
 wrote:
> 0002: Adds serial/de-serial function support to CREATE AGGREGATE,
> contains minor fix-ups from last version.

This looks pretty good, but don't build_aggregate_serialfn_expr and
build_aggregate_deserialfn_expr compile down to identical machine
code?  Keeping two copies of the same code with different parameter
names is a degree of neatnik-ism I'm not sure I can swallow.

The only caller to make_partialgroup_input_target() passes true for
the additional argument.  That doesn't seem right.

Maybe error messages should refer to "aggregate serialization
function" and "aggregate deserialization function" instead of
"aggregate serial function" and "aggregate de-serial function".

- *   Other behavior is also supported and is controlled by the
'combineStates'
- *   and 'finalizeAggs'. 'combineStates' controls whether the trans func or
- *   the combine func is used during aggregation.  When 'combineStates' is
- *   true we expect other (previously) aggregated states as input
rather than
- *   input tuples. This mode facilitates multiple aggregate stages which
- *   allows us to support pushing aggregation down deeper into
the plan rather
- *   than leaving it for the final stage. For example with a query such as:
+ *   Other behavior is also supported and is controlled by the
'combineStates',
+ *   'finalizeAggs' and 'serialStates' parameters. 'combineStates' controls
+ *   whether the trans func or the combine func is used during aggregation.
+ *   When 'combineStates' is true we expect other (previously) aggregated
+ *   states as input rather than input tuples. This mode
facilitates multiple
+ *   aggregate stages which allows us to support pushing aggregation down
+ *   deeper into the plan rather than leaving it for the final stage. For
+ *   example with a query such as:

I'd omit this hunk.  The serialStates thing is really separate from
what's being talked about here, and you discuss it further down.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-20 Thread Haribabu Kommi

On Sun, Mar 20, 2016 at 2:23 PM, David Rowley
 wrote:
>
> I've had a look over this. I had to first base it on the 0005 patch,
> as it seemed like the pg_aggregate.h changes didn't include the
> serialfn and deserialfn changes, and an OID was being consumed by
> another function I added in patch 0003.
>
> On testing I also noticed some wrong results, which on investigation,
> are due to the wrong array elements being added together.
>
> For example:
>
> postgres=# select stddev(num) from f;
>   stddev
> --
>  28867.5149028984
> (1 row)
>
>
> postgres=# set max_parallel_degree=8;
> SET
> postgres=# select stddev(num) from f;
>  stddev
> 
>   0
> (1 row)
>
> + N += transvalues2[0];
> + sumX += transvalues2[1];
> + CHECKFLOATVAL(sumX, isinf(transvalues1[1]) || isinf(transvalues2[1]), true);
> + sumX2 += transvalues1[2];
>
> The last line should use transvalues2[2], not transvalues1[2].

Thanks.

> There's also quite a few mistakes in float8_regr_combine()
>
> + sumX2 += transvalues2[2];
> + CHECKFLOATVAL(sumX2, isinf(transvalues1[2]) || isinf(transvalues2[1]), 
> true);
>
> Wrong array element on isinf() check
>
> + sumY2 += transvalues2[4];
> + CHECKFLOATVAL(sumY2, isinf(transvalues1[4]) || isinf(transvalues2[3]), 
> true);
>
> Wrong array element on isinf() check
>
> + sumXY += transvalues2[5];
> + CHECKFLOATVAL(sumXY, isinf(transvalues1[5]) || isinf(transvalues2[1]) ||
> +  isinf(transvalues2[3]), true);
>
> Wrong array element on isinf() check, and the final
> isinf(transvalues2[3]) check does not need to be there.

Thanks for the changes, I just followed the float8_regr_accum function while
writing float8_regr_combine function. Now I understood isinf proper usage.


Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-20 Thread Tomas Vondra


Hi,

On 03/20/2016 04:48 AM, David Rowley wrote:

On 17 March 2016 at 14:25, Tomas Vondra  wrote:

On 03/16/2016 12:03 PM, David Rowley wrote:

Patch 2:
This adds the serial/deserial aggregate infrastructure, pg_dump
support, CREATE AGGREGATE changes, and nodeAgg.c changes to have it
serialise and deserialise aggregate states when instructed to do so.

Patch 3:
This adds a boat load of serial/deserial functions, and combine
functions for most of the built-in numerical aggregate functions. It
also contains some regression tests which should really be in patch 2,
but I with patch 2 there's no suitable serialisation or
de-serialisation functions to test CREATE AGGREGATE with. I think
having them here is ok, as patch 2 is quite useless without patch 3
anyway.


I don't see how you could move the tests into #2 when the functions are
defined in #3? IMHO this is the right place for the regression tests.


Yeah, but the CREATE AGGREGATE changes which are being tested in 0003
were actually added in 0002. I think 0002 is the right place to test
the changes to CREATE AGGREGATE, but since there's a complete lack of
functions to use, then I've just delayed until 0003.


Another thing to note about this patch is that I've gone and created
serial/de-serial functions for when PolyNumAggState both require
sumX2, and don't require sumX2. I had thought about perhaps putting an
extra byte in the serial format to indicate if a sumX2 is included,
but I ended up not doing it this way. I don't really want these serial
formats getting too complex as we might like to do fun things like
pass them along to sharded servers one day, so it might be nice to
keep them simple.



Hmmm, I've noticed that while eyeballing the diffs, and I'm not
sure if it's worth the additional complexity at this point. I mean,
one byte is hardly going to make a measurable difference - we're
probably wasting more than that due to alignment, for example.


I don't think any alignment gets done here. Look at pq_sendint(). Did
you mean the complexity of having extra functions, or having the
extra byte to check in the deserial func?


I haven't realized alignment does not apply here, but I still think the 
extra byte would be negligible in the bigger scheme of things. For 
example we're serializing the state into a bytea, which adds up to 4B 
header. So I don't think adding 1B would make any measurable difference.


But that assumes adding the 1B header would actually make the code 
simpler. Now that I think about it, that may not the case - in the end 
you'd probably get a function with two branches, with about the same 
size as the two functions combined. So not really an improvement.



As you've mentioned sharded servers - how stable should the
serialized format be? I've assumed it to be transient, i.e. in the
extreme case it might change after restarting a database - for the
parallel aggregate that's clearly sufficient.

But if we expect this to eventually work in a sharded environment,
that's going to be way more complicated. I guess in these cases we
could rely on implicit format versioning, via the major the
version (doubt we'd tweak the format in a minor version anyway).

I'm not sure the sharding is particularly useful argument at this
point, considering we don't really know if the current format is
actually a good match for that.


Perhaps you're right. At this stage I've no idea if we'd want to
support shards on varying major versions. I think probably not,
since the node write functions might not be compatible with the node
read functions on the other server.


OK, so let's ignore the sharded setup for now.


I've attached another series of patches:

0001: This is the latest Parallel Aggregate Patch, not intended for
review here, but is required for the remaining patches. This patch has
changed quite a bit from the previous one that I posted here, and the
remaining patches needed re-based due to those changes.

0002: Adds serial/de-serial function support to CREATE AGGREGATE,
contains minor fix-ups from last version.

0003: Adds various combine/serial/de-serial functions for the standard
set of aggregates, apart from most float8 aggregates.

0004: Adds regression tests for 0003 pg_aggregate.h changes.

0005: Documents to mention which aggregate functions support partial mode.

0006: Re-based version of Haribabu's floating point aggregate support,
containing some fixes by me.


I went through the changes and found nothing suspicious, except maybe 
for the wording in one of the doc patches:


All types apart from floating-point types

It may not be entirely clear for the readers, but this does not include 
"numeric" data type, only float4 and float8. But I don't think this 
really matters unless we fail to commit the last patch adding functions 
even for those data types.


Also, I think it's probably the right time to fix the failures in 
opr_sanity regression tests. After all, we'll only whitelist a bunch of

Re: [HACKERS] Combining Aggregates

2016-03-19 Thread David Rowley

On 18 March 2016 at 13:39, Haribabu Kommi  wrote:
> On Thu, Mar 17, 2016 at 10:59 PM, Tomas Vondra
>  wrote:
>> Hi,
>>
>> On 03/17/2016 12:53 PM, David Rowley wrote:
>>>
>> ...
>>>
>>>
>>> I just had a quick skim over the patch and noticed the naming
>>> convention you're using for the combine function is *_pl, and you have
>>> float8_pl. There's already a function named float8pl() which is quite
>>> close to what you have. I've been sticking to *_combine() for these,
>>> so maybe float8_combine() and float8_regr_combine() are better names.
>>
>>
>> +1 to the _combine naming convention.
>
> Thanks for the input. Makes sense, updated patch is attached with
> the changes.

I've had a look over this. I had to first base it on the 0005 patch,
as it seemed like the pg_aggregate.h changes didn't include the
serialfn and deserialfn changes, and an OID was being consumed by
another function I added in patch 0003.

On testing I also noticed some wrong results, which on investigation,
are due to the wrong array elements being added together.

For example:

postgres=# select stddev(num) from f;
  stddev
--
 28867.5149028984
(1 row)


postgres=# set max_parallel_degree=8;
SET
postgres=# select stddev(num) from f;
 stddev

  0
(1 row)

+ N += transvalues2[0];
+ sumX += transvalues2[1];
+ CHECKFLOATVAL(sumX, isinf(transvalues1[1]) || isinf(transvalues2[1]), true);
+ sumX2 += transvalues1[2];

The last line should use transvalues2[2], not transvalues1[2].

There's also quite a few mistakes in float8_regr_combine()

+ sumX2 += transvalues2[2];
+ CHECKFLOATVAL(sumX2, isinf(transvalues1[2]) || isinf(transvalues2[1]), true);

Wrong array element on isinf() check

+ sumY2 += transvalues2[4];
+ CHECKFLOATVAL(sumY2, isinf(transvalues1[4]) || isinf(transvalues2[3]), true);

Wrong array element on isinf() check

+ sumXY += transvalues2[5];
+ CHECKFLOATVAL(sumXY, isinf(transvalues1[5]) || isinf(transvalues2[1]) ||
+  isinf(transvalues2[3]), true);

Wrong array element on isinf() check, and the final
isinf(transvalues2[3]) check does not need to be there.

I've re-based the patch and fixed these already, so will send updated
patches shortly.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-19 Thread David Rowley

On 17 March 2016 at 16:30, Haribabu Kommi  wrote:
> On Wed, Mar 16, 2016 at 10:08 PM, David Rowley
>  wrote:
>> On 16 March 2016 at 23:54, Haribabu Kommi  wrote:
>>> On Wed, Mar 16, 2016 at 8:34 AM, David Rowley
>>>  wrote:
 Yes me too, so I spent several hours yesterday writing all of the
 combine functions and serialisation/deserialisation that are required
 for all of SUM(), AVG() STDDEV*(). I also noticed that I had missed
 using some existing functions for bool_and() and bool_or() so I added
 those to pg_aggregate.h. I'm just chasing down a crash bug on
 HAVE_INT128 enabled builds, so should be posting a patch quite soon. I
 didn't touch the FLOAT4 and FLOAT8 aggregates as I believe Haribabu
 has a patch for that over on the parallel aggregate thread. I've not
 looked at it in detail yet.
>>>
>>> The additional combine function patch that I posted handles all float4 and
>>> float8 aggregates. There is an OID conflict with the latest source code,
>>> I will update the patch and post it in that thread.
>>
>> Thanks! I just send a series of patches which add a whole host of
>> serial/deserial functions, and a patch which adds some documentation.
>> Maybe you could base your patch on the 0005 patch, and update the
>> documents to remove the "All types apart from floating-point types"
>> text and replace that with "Yes".
>
> Here I attached updated float aggregates patch based on 0005 patch.

Great! Thanks for sending that.

I just had a quick skim over the patch and noticed the naming
convention you're using for the combine function is *_pl, and you have
float8_pl. There's already a function named float8pl() which is quite
close to what you have. I've been sticking to *_combine() for these,
so maybe float8_combine() and float8_regr_combine() are better names.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-19 Thread Tomas Vondra

Hi,

On 03/17/2016 12:53 PM, David Rowley wrote:
>
...
>

I just had a quick skim over the patch and noticed the naming
convention you're using for the combine function is *_pl, and you have
float8_pl. There's already a function named float8pl() which is quite
close to what you have. I've been sticking to *_combine() for these,
so maybe float8_combine() and float8_regr_combine() are better names.

+1 to the _combine naming convention.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-19 Thread Haribabu Kommi

On Wed, Mar 16, 2016 at 10:08 PM, David Rowley
 wrote:
> On 16 March 2016 at 23:54, Haribabu Kommi  wrote:
>> On Wed, Mar 16, 2016 at 8:34 AM, David Rowley
>>  wrote:
>>> Yes me too, so I spent several hours yesterday writing all of the
>>> combine functions and serialisation/deserialisation that are required
>>> for all of SUM(), AVG() STDDEV*(). I also noticed that I had missed
>>> using some existing functions for bool_and() and bool_or() so I added
>>> those to pg_aggregate.h. I'm just chasing down a crash bug on
>>> HAVE_INT128 enabled builds, so should be posting a patch quite soon. I
>>> didn't touch the FLOAT4 and FLOAT8 aggregates as I believe Haribabu
>>> has a patch for that over on the parallel aggregate thread. I've not
>>> looked at it in detail yet.
>>
>> The additional combine function patch that I posted handles all float4 and
>> float8 aggregates. There is an OID conflict with the latest source code,
>> I will update the patch and post it in that thread.
>
> Thanks! I just send a series of patches which add a whole host of
> serial/deserial functions, and a patch which adds some documentation.
> Maybe you could base your patch on the 0005 patch, and update the
> documents to remove the "All types apart from floating-point types"
> text and replace that with "Yes".

Here I attached updated float aggregates patch based on 0005 patch.

Regards,
Hari Babu
Fujitsu Australia


0006-float-aggregates-17-03-2016.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-19 Thread Tomas Vondra


Hi,

On 03/16/2016 12:03 PM, David Rowley wrote:

On 16 March 2016 at 10:34, David Rowley  wrote:

If Haribabu's patch does all that's required for the numerical
aggregates for floating point types then the status of covered
aggregates is (in order of pg_aggregate.h):

* AVG() complete coverage
* SUM() complete coverage
* MAX() complete coverage
* MIN() complete coverage
* COUNT() complete coverage
* STDDEV + friends complete coverage
* regr_*,covar_pop,covar_samp,corr not touched these.
* bool*() complete coverage
* bitwise aggs. complete coverage
* Remaining are not touched. I see diminishing returns with making
these parallel for now. I think I might not be worth pushing myself
any harder to make these ones work.

Does what I have done + floating point aggs from Haribabu seem
reasonable for 9.6?


I've attached a series of patches.


Thanks. Haven't really done much testing of this, aside from reading 
through the patches and "it compiles".


Patch 1:
This is the parallel aggregate patch, not intended for review here.
However, all further patches are based on this, and this adds the
required planner changes to make it possible to test patches 2 and 3.

Patch 2:
This adds the serial/deserial aggregate infrastructure, pg_dump
support, CREATE AGGREGATE changes, and nodeAgg.c changes to have it
serialise and deserialise aggregate states when instructed to do so.

Patch 3:
This adds a boat load of serial/deserial functions, and combine
functions for most of the built-in numerical aggregate functions. It
also contains some regression tests which should really be in patch 2,
but I with patch 2 there's no suitable serialisation or
de-serialisation functions to test CREATE AGGREGATE with. I think
having them here is ok, as patch 2 is quite useless without patch 3
anyway.


I don't see how you could move the tests into #2 when the functions are 
defined in #3? IMHO this is the right place for the regression tests.




Another thing to note about this patch is that I've gone and created
serial/de-serial functions for when PolyNumAggState both require
sumX2, and don't require sumX2. I had thought about perhaps putting an
extra byte in the serial format to indicate if a sumX2 is included,
but I ended up not doing it this way. I don't really want these serial
formats getting too complex as we might like to do fun things like
pass them along to sharded servers one day, so it might be nice to
keep them simple.


Hmmm, I've noticed that while eyeballing the diffs, and I'm not sure if 
it's worth the additional complexity at this  point. I mean, one byte is 
hardly going to make a measurable difference - we're probably wasting 
more than that due to alignment, for example.


As you've mentioned sharded servers - how stable should the serialized 
format be? I've assumed it to be transient, i.e. in the extreme case it 
might change after restarting a database - for the parallel aggregate 
that's clearly sufficient.


But if we expect this to eventually work in a sharded environment, 
that's going to be way more complicated. I guess in these cases we could 
rely on implicit format versioning, via the major the version (doubt 
we'd tweak the format in a minor version anyway).


I'm not sure the sharding is particularly useful argument at this point, 
considering we don't really know if the current format is actually a 
good match for that.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-19 Thread Haribabu Kommi

On Thu, Mar 17, 2016 at 10:59 PM, Tomas Vondra
 wrote:
> Hi,
>
> On 03/17/2016 12:53 PM, David Rowley wrote:
>>
> ...
>>
>>
>> I just had a quick skim over the patch and noticed the naming
>> convention you're using for the combine function is *_pl, and you have
>> float8_pl. There's already a function named float8pl() which is quite
>> close to what you have. I've been sticking to *_combine() for these,
>> so maybe float8_combine() and float8_regr_combine() are better names.
>
>
> +1 to the _combine naming convention.

Thanks for the input. Makes sense, updated patch is attached with
the changes.

Regards,
Hari Babu
Fujitsu Australia


0006-float-aggregates-18-03-2016.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-16 Thread David Rowley

On 16 March 2016 at 23:54, Haribabu Kommi  wrote:
> On Wed, Mar 16, 2016 at 8:34 AM, David Rowley
>  wrote:
>> Yes me too, so I spent several hours yesterday writing all of the
>> combine functions and serialisation/deserialisation that are required
>> for all of SUM(), AVG() STDDEV*(). I also noticed that I had missed
>> using some existing functions for bool_and() and bool_or() so I added
>> those to pg_aggregate.h. I'm just chasing down a crash bug on
>> HAVE_INT128 enabled builds, so should be posting a patch quite soon. I
>> didn't touch the FLOAT4 and FLOAT8 aggregates as I believe Haribabu
>> has a patch for that over on the parallel aggregate thread. I've not
>> looked at it in detail yet.
>
> The additional combine function patch that I posted handles all float4 and
> float8 aggregates. There is an OID conflict with the latest source code,
> I will update the patch and post it in that thread.

Thanks! I just send a series of patches which add a whole host of
serial/deserial functions, and a patch which adds some documentation.
Maybe you could base your patch on the 0005 patch, and update the
documents to remove the "All types apart from floating-point types"
text and replace that with "Yes".

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-16 Thread Haribabu Kommi

On Wed, Mar 16, 2016 at 8:34 AM, David Rowley
 wrote:
> On 16 March 2016 at 06:39, Tomas Vondra  wrote:
>> After looking at the parallel aggregate patch, I also looked at this one, as
>> it's naturally related. Sadly I haven't found any issue that I could nag
>> about ;-) The patch seems well baked, as it was in the oven for quite a long
>> time already.
>
> Thanks for looking at this.
>
>> The one concern I do have is that it only adds (de)serialize functions for
>> SUM(numeric) and AVG(numeric). I think it's reasonable not to include that
>> into the patch, but it will look a bit silly if that's all that gets into
>> 9.6.
>
> Yes me too, so I spent several hours yesterday writing all of the
> combine functions and serialisation/deserialisation that are required
> for all of SUM(), AVG() STDDEV*(). I also noticed that I had missed
> using some existing functions for bool_and() and bool_or() so I added
> those to pg_aggregate.h. I'm just chasing down a crash bug on
> HAVE_INT128 enabled builds, so should be posting a patch quite soon. I
> didn't touch the FLOAT4 and FLOAT8 aggregates as I believe Haribabu
> has a patch for that over on the parallel aggregate thread. I've not
> looked at it in detail yet.

The additional combine function patch that I posted handles all float4 and
float8 aggregates. There is an OID conflict with the latest source code,
I will update the patch and post it in that thread.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-15 Thread David Rowley

On 16 March 2016 at 06:39, Tomas Vondra  wrote:
> After looking at the parallel aggregate patch, I also looked at this one, as
> it's naturally related. Sadly I haven't found any issue that I could nag
> about ;-) The patch seems well baked, as it was in the oven for quite a long
> time already.

Thanks for looking at this.

> The one concern I do have is that it only adds (de)serialize functions for
> SUM(numeric) and AVG(numeric). I think it's reasonable not to include that
> into the patch, but it will look a bit silly if that's all that gets into
> 9.6.

Yes me too, so I spent several hours yesterday writing all of the
combine functions and serialisation/deserialisation that are required
for all of SUM(), AVG() STDDEV*(). I also noticed that I had missed
using some existing functions for bool_and() and bool_or() so I added
those to pg_aggregate.h. I'm just chasing down a crash bug on
HAVE_INT128 enabled builds, so should be posting a patch quite soon. I
didn't touch the FLOAT4 and FLOAT8 aggregates as I believe Haribabu
has a patch for that over on the parallel aggregate thread. I've not
looked at it in detail yet.

If Haribabu's patch does all that's required for the numerical
aggregates for floating point types then the status of covered
aggregates is (in order of pg_aggregate.h):

* AVG() complete coverage
* SUM() complete coverage
* MAX() complete coverage
* MIN() complete coverage
* COUNT() complete coverage
* STDDEV + friends complete coverage
* regr_*,covar_pop,covar_samp,corr not touched these.
* bool*() complete coverage
* bitwise aggs. complete coverage
* Remaining are not touched. I see diminishing returns with making
these parallel for now. I think I might not be worth pushing myself
any harder to make these ones work.

Does what I have done + floating point aggs from Haribabu seem
reasonable for 9.6?

> I think it would be good to actually document which aggregates support
> parallel execution, and which don't. Currently the docs don't mention this
> at all, and it's tricky for regular users to deduce that as it's related to
> the type of the internal state and not to the input types. An example of
> that is the SUM(bigint) example mentioned above.

I agree. I will look for a suitable place.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-15 Thread Tomas Vondra

Hi,

On 03/14/2016 05:45 AM, David Rowley wrote:

On 14 March 2016 at 15:20, David Rowley wrote:

Current patch:
I've now updated the patch to base it on top of the parallel aggregate
patch in [2]. To apply the attached, you must apply [2] first!

...

[2]
http://www.postgresql.org/message-id/cakjs1f9tr-+9akwz1xuhunejz7gkkfo45b+4fcnj8dkrdzy...@mail.gmail.com

The attached fixes a small thinko in apply_partialaggref_nodes().

After looking at the parallel aggregate patch, I also looked at this
one, as it's naturally related. Sadly I haven't found any issue that I
could nag about ;-) The patch seems well baked, as it was in the oven
for quite a long time already.

The one concern I do have is that it only adds (de)serialize functions
for SUM(numeric) and AVG(numeric). I think it's reasonable not to
include that into the patch, but it will look a bit silly if that's all
that gets into 9.6.

It's true plenty of aggregates is supported out of the box, as they use
fixed-length data types for the aggregate state. But imagine users happy
that their aggregate queries get parallelized, only to find out that
sum(int8) disables that :-/

So I think we should try to support as many of the aggregates using
internal as possible - it seems quite straight-forward to do that at
least for the aggregates using the same internal representation (avg,
sum, var_samp, var_pop, variance, ...). That's 26 aggregates out of the
45 with aggtranstype=INTERNAL.

For the remaining aggregates (jsonb_*. json_*, string_agg, array_agg,
percentile_) it's probably more complicated to serialize the state, and
maybe even impossible to do that correctly (e.g. string_agg accumulates
the strings in the order as received, but partial paths would build
private lists - not sure if this can actually happen in practice).

I think it would be good to actually document which aggregates support
parallel execution, and which don't. Currently the docs don't mention
this at all, and it's tricky for regular users to deduce that as it's
related to the type of the internal state and not to the input types. An
example of that is the SUM(bigint) example mentioned above.

That's actually the one thing I think the patch is missing.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-03-11 Thread David Steele

On 1/20/16 11:42 PM, David Rowley wrote:
> On 21 January 2016 at 08:06, Robert Haas  > wrote:
> 
> On Wed, Jan 20, 2016 at 7:38 AM, David Rowley
> >
> wrote:
> > Agreed. So I've attached a version of the patch which does not have any 
> of
> > the serialise/deserialise stuff in it.
> 
> I re-reviewed this and have committed most of it with only minor
> kibitizing.  A few notes:
> 
> 
> I've attached the re-based remainder, which includes the serial/deserial
> again.
> 
> I'll submit this part to March 'fest, where hopefully we'll also have
> something to utilise it.

As far as I can tell this is the most recent version of the patch but it
doesn't apply on master.  I'm guessing that's mostly because of Tom's
UPP patch.

This is a long and confusing thread and I should know since I just read
through the entire thing.  I'm setting this back to "waiting for author"
and I'd like see a rebased patch and a summary of what's in the patch
before setting it back to "needs review".

-- 
-David
da...@pgmasters.net

signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Combining Aggregates

2016-02-07 Thread Haribabu Kommi

On Thu, Jan 21, 2016 at 3:42 PM, David Rowley
 wrote:
> On 21 January 2016 at 08:06, Robert Haas  wrote:
>>
>> On Wed, Jan 20, 2016 at 7:38 AM, David Rowley
>>  wrote:
>> > Agreed. So I've attached a version of the patch which does not have any
>> > of
>> > the serialise/deserialise stuff in it.
>>
>> I re-reviewed this and have committed most of it with only minor
>> kibitizing.  A few notes:
>
>
> I've attached the re-based remainder, which includes the serial/deserial
> again.
>
> I'll submit this part to March 'fest, where hopefully we'll also have
> something to utilise it.
>

While testing parallel aggregate with float4 and float8 types based on
the latest patch,
I found the following problems,

+ /*
+ * For partial aggregation we must adjust the return types of
+ * the Aggrefs
+ */
+ if (!aggplan->finalizeAggs)
+ set_partialagg_aggref_types(root, plan,
+ aggplan->serialStates);

[...]

+ aggref->aggtype = aggform->aggserialtype;
+ else
+ aggref->aggtype = aggform->aggtranstype;

Changing the aggref->aggtype with aggtranstype or aggserialtype will
only gets it changed in
partial aggregate plan, as set_upper_references starts from the top
plan and goes
further. Because of this reason, the targetlist contains for the node
below finalize
aggregate are still points to original type only.

To fix this problem, I tried updating the targetlist aggref->aggtype
with transtype during
aggregate plan itself, that leads to a problem in setting upper plan
references. This is
because, while fixing the aggregate reference of upper plans after
partial aggregate,
the aggref at upper plan nodes doesn't match with aggref that is
coming from partial
aggregate node because of aggtype difference in _equalAggref function.

COMPARE_SCALAR_FIELD(aggtype);

Temporarily i corrected it compare it against aggtranstype and
aggserialtype also then
it works fine. I don't see that change as correct approach. Do you
have any better idea
to solve this problem?

Regards,
Hari Babu
Fujitsu Australia

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-27 Thread Robert Haas

On Sat, Jan 23, 2016 at 6:26 PM, Jeff Janes  wrote:
> On Fri, Jan 22, 2016 at 4:53 PM, David Rowley
>  wrote:
>>
>> It seems that I must have mistakenly believed that non-existing
>> columns for previous versions were handled using the power of magic.
>> Turns out that I was wrong, and they need to be included as dummy
>> columns in the queries for previous versions.
>>
>> The attached fixes the problem.
>
> Yep, that fixes it.

Committed.  Apologies for the delay.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-23 Thread Jeff Janes

On Fri, Jan 22, 2016 at 4:53 PM, David Rowley
 wrote:
>
> It seems that I must have mistakenly believed that non-existing
> columns for previous versions were handled using the power of magic.
> Turns out that I was wrong, and they need to be included as dummy
> columns in the queries for previous versions.
>
> The attached fixes the problem.

Yep, that fixes it.

Thanks,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-22 Thread Jeff Janes

On Wed, Jan 20, 2016 at 11:06 AM, Robert Haas  wrote:
> On Wed, Jan 20, 2016 at 7:38 AM, David Rowley
>  wrote:
>> Agreed. So I've attached a version of the patch which does not have any of
>> the serialise/deserialise stuff in it.
>
> I re-reviewed this and have committed most of it with only minor
> kibitizing.  A few notes:


This commit has broken pg_dump.  At least, I think this is the thread
referencing this commit:


commit a7de3dc5c346e07e0439275982569996e645b3c2
Author: Robert Haas 
Date:   Wed Jan 20 13:46:50 2016 -0500
Support multi-stage aggregation.


If I add an aggregate to an otherwise empty 9.4.5 database:


CREATE OR REPLACE FUNCTION first_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS $$
SELECT $1;
$$;

-- And then wrap an aggregate around it
-- aggregates do not support create or replace, alas.
CREATE AGGREGATE first (
sfunc= first_agg,
basetype = anyelement,
stype= anyelement
);


And then run pg_dump from 9.6.this on it, I get:


pg_dump: column number -1 is out of range 0..17
Segmentation fault (core dumped)

Program terminated with signal 11, Segmentation fault.
#0  0x00416b0b in dumpAgg (fout=0x1e551e0, agginfo=0x1e65ec0)
at pg_dump.c:12670
12670if (strcmp(aggcombinefn, "-") != 0)
(gdb) bt
#0  0x00416b0b in dumpAgg (fout=0x1e551e0, agginfo=0x1e65ec0)
at pg_dump.c:12670
#1  0x0041df7a in main (argc=,
argv=) at pg_dump.c:810


Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-22 Thread David Rowley

On 23 January 2016 at 09:17, Jeff Janes  wrote:
> On Wed, Jan 20, 2016 at 11:06 AM, Robert Haas  wrote:
>> On Wed, Jan 20, 2016 at 7:38 AM, David Rowley
>>  wrote:
>>> Agreed. So I've attached a version of the patch which does not have any of
>>> the serialise/deserialise stuff in it.
>>
>> I re-reviewed this and have committed most of it with only minor
>> kibitizing.  A few notes:
>
>
> This commit has broken pg_dump.  At least, I think this is the thread
> referencing this commit:
>
...
> pg_dump: column number -1 is out of range 0..17
> Segmentation fault (core dumped)
>
> Program terminated with signal 11, Segmentation fault.
> #0  0x00416b0b in dumpAgg (fout=0x1e551e0, agginfo=0x1e65ec0)
> at pg_dump.c:12670
> 12670if (strcmp(aggcombinefn, "-") != 0)
> (gdb) bt
> #0  0x00416b0b in dumpAgg (fout=0x1e551e0, agginfo=0x1e65ec0)
> at pg_dump.c:12670
> #1  0x0041df7a in main (argc=,
> argv=) at pg_dump.c:810

Yikes :-( I will look at this with priority in the next few hours.

Thanks for the report.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-22 Thread Robert Haas

On Thu, Jan 21, 2016 at 4:08 PM, David Rowley
 wrote:
> It's quite simple to test how much of a win it'll be in the serial
> case today, and yes, it's not much, but it's a bit.
>
> create table t1 as select x from generate_series(1,100) x(x);
> vacuum analyze t1;
> select count(*) from (select * from t1 union all select * from t1) t;
>   count
> -
>  200
> (1 row)
>
> Time: 185.793 ms
>
> -- Mock up pushed down aggregation by using sum() as a combine
> function for count(*)
> select sum(c) from (select count(*) c from t1 union all select
> count(*) from t1) t;
>sum
> -
>  200
> (1 row)
>
> Time: 162.076 ms
>
> Not particularly incredible, but we don't normally turn our noses up
> at a 14% improvement, so let's just see how complex it will be to
> implement, once the upper planner changes are done.

Mumble mumble.  Why is that even any faster?  Just because we avoid
the projection overhead of the Append node, which is a no-op here
anyway?  If so, a planner change is one thought, but perhaps we also
ought to look at whether we can't reduce the execution-time overhead.

> But as you mention about lack of ability to make use of pre-sorted
> Path for each branch of the UNION ALL; I was really hoping Tom's patch
> will improve that part by allowing the planner to choose a pre-sorted
> Path and perform a MergeAppend instead of an Append, which would allow
> pre-sorted input into a GroupAggregate node.

I won't hazard a guess on that point...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-22 Thread David Rowley

On 23 January 2016 at 09:44, David Rowley  wrote:
> On 23 January 2016 at 09:17, Jeff Janes  wrote:
>> On Wed, Jan 20, 2016 at 11:06 AM, Robert Haas  wrote:
>>> On Wed, Jan 20, 2016 at 7:38 AM, David Rowley
>>>  wrote:
 Agreed. So I've attached a version of the patch which does not have any of
 the serialise/deserialise stuff in it.
>>>
>>> I re-reviewed this and have committed most of it with only minor
>>> kibitizing.  A few notes:
>>
>>
>> This commit has broken pg_dump.  At least, I think this is the thread
>> referencing this commit:
>>
> ...
>> pg_dump: column number -1 is out of range 0..17
>> Segmentation fault (core dumped)
>>
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x00416b0b in dumpAgg (fout=0x1e551e0, agginfo=0x1e65ec0)
>> at pg_dump.c:12670
>> 12670if (strcmp(aggcombinefn, "-") != 0)
>> (gdb) bt
>> #0  0x00416b0b in dumpAgg (fout=0x1e551e0, agginfo=0x1e65ec0)
>> at pg_dump.c:12670
>> #1  0x0041df7a in main (argc=,
>> argv=) at pg_dump.c:810
>
> Yikes :-( I will look at this with priority in the next few hours.
>
> Thanks for the report.

It seems that I must have mistakenly believed that non-existing
columns for previous versions were handled using the power of magic.
Turns out that I was wrong, and they need to be included as dummy
columns in the queries for previous versions.

The attached fixes the problem.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


combine_aggs_pg_dump_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-21 Thread Robert Haas

On Wed, Jan 20, 2016 at 9:33 PM, Haribabu Kommi
 wrote:
> Here I attached updated patch of parallel aggregate based on the latest
> changes in master. Still it lack of cost comparison of normal aggregate to
> parallel aggregate because of difficulty. This cost comparison is required
> in parallel aggregate as this is having some regression when the number
> of groups are less in the query plan.

Please keep this on its own thread rather than colonizing this one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-21 Thread Robert Haas

On Wed, Jan 20, 2016 at 8:32 PM, David Rowley
 wrote:
> At the moment I think everything which will use this is queued up behind the
> pathification of the grouping planner which Tom is working on. I think
> naturally Parallel Aggregate makes sense to work on first, given all the
> other parallel stuff in this release. I plan on working on that that by
> either assisting Haribabu, or... whatever else it takes.

Great!

> The other two usages which I have thought of are;
>
> 1) Aggregating before UNION ALL, which might be fairly simple after the
> grouping planner changes, as it may just be a matter of considering another
> "grouping path" which partially aggregates before the UNION ALL, and
> performs the final grouping stage after UNION ALL. At this stage it's hard
> to say how that will work as I'm not sure how far changes to the grouping
> planner will go. Perhaps Tom can comment?

I hope he will, but in the meantime let me ask how this does us any
good.  UNION ALL turns into an Append node.  Pushing aggregation
through an Append doesn't make anything faster AFAICS *unless* you can
further optimize beginning at that point.  For example, if one or both
sides of the Append node have a Gather under them, and you can push
the partial aggregation down into the Gather; or if you hit a Foreign
Scan and can have it do partial aggregation on the remote side, you
win.  But if you just have:

Finalize Aggregate
-> Append
  -> Partial Aggregate
-> Thing One
  -> Partial Aggregate
-> Thing Two

Instead of:

Aggregate
-> Append
  -> Thing One
  -> Thing Two

...then I don't see the point, at least not for a single-group
Aggregate or HashAggregate.  For a GroupAggregate, Thing One and Thing
Two might need to be sorted, and sorting two or more smaller data sets
might be faster than sorting one larger data set, but I can't see us
winning anything very big here.

To be clear, I'm not saying we shouldn't do this.  I just don't think
it does much *by itself*.  If you combine it with other optimizations
that let the aggregation be pushed further down the plan tree, it
could win big.

> 2) Group before join. e.g select p.description,sum(s.qty) from sale s inner
> join s.product_id = p.product_id group by s.product_id group by
> p.description;  I have a partial patch which implements this, although I was
> a bit stuck on if I should invent the concept of "GroupingPaths", or just
> inject alternative subquery relations which are already grouped by the
> correct clause.  The problem with "GroupingPaths" was down to the row
> estimates currently come from the RelOptInfo and is set in
> set_baserel_size_estimates() which always assumes the ungrouped number of
> rows, which is not what's needed if the grouping is already performed. I was
> holding off to see how Tom does this in the grouping planner changes.

Your sample query has two GROUP BY clauses; I think you meant to
include only the second one.  This seems like it can win big, because
it's possible that joining first means joining against a much larger
relation, whereas aggregating first might reduce "sale" to a volume of
data that fits in work_mem, after which you can hash join.  On the
other hand, if you add WHERE p.description LIKE '%snuffleupagus%' then
the aggregate-first strategy figures to lose, assuming things are
indexed properly.

On the substantive question you raise, I hope Tom will weigh in here.
However, I'll give you my take.  It seems to me that you are likely to
run into problems not only with the row counts but also with target
lists and width estimates.  All of those things change once you
aggregate.  It seems clear that you are going to need a GroupingPath
here, but it also seems really clear that you can't take a path where
some aggregation has been done and use add_path to toss it on the pile
for the same RelOptInfo as unaggregated paths to which it is not
comparable.  Right now, there's a RelOptInfo corresponding to each
subset of the joinrels for which we build paths; but in this new
world, I think you'll end up with multiple joinrels for each
RelOptInfo, one per

In your example above, you'd end up with RelOptInfos for (1) {s} with
no aggregation, (2) {p} with no aggregation, (3) {s p} with no
aggregation, (4) {s} aggregated by s.product_id and (5) {s p}
aggregated by p.description.  You can generate paths for (5) either by
joining (2) to (4) or by aggregating (3), and the cheapest path for
(5) is the overall winner.  It seems a bit tricky to determine which
aggregations are worth considering, and how to represent that in the
RelOptInfo, but overall that feels like the right idea.

I'm not sure how much this is going to overlap with the work Tom is
already doing.  I think his principal goal is to let the planning of a
subquery return multiple candidate paths.  That's almost certain to
involve creating something like GroupingPath, though I can't predict
the details, and I suspect it's also likely that he'll

Re: [HACKERS] Combining Aggregates

2016-01-21 Thread Haribabu Kommi

On Thu, Jan 21, 2016 at 3:52 PM, David Rowley
 wrote:
> On 21 January 2016 at 15:53, Haribabu Kommi 
> wrote:
>>
>> On Thu, Jan 21, 2016 at 1:33 PM, Haribabu Kommi
>>  wrote:
>> >
>> > Here I attached updated patch of parallel aggregate based on the latest
>> > changes in master. Still it lack of cost comparison of normal aggregate
>> > to
>> > parallel aggregate because of difficulty. This cost comparison is
>> > required
>> > in parallel aggregate as this is having some regression when the number
>> > of groups are less in the query plan.
>> >
>>
>> Updated patch is attached after removing a warning in building group
>> aggregate path.
>
>
> Hi,
>
> Thanks for updating the patch. I'd like to look at this with priority, but
> can you post it on the Parallel Agg thread? that way anyone following there
> can chime in over there rather than here.  I've still got a bit of work to
> do (in the not too distant future) on the serial/deserial part, so would be
> better to keep this thread for discussion on that.

Thanks for the details. Sorry for sending parallel aggregate patch in
this thread.
I will take care of it from next time onward.


Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-21 Thread David Rowley

On 22 January 2016 at 06:56, Robert Haas  wrote:
> On Wed, Jan 20, 2016 at 8:32 PM, David Rowley
>  wrote:
>> The other two usages which I have thought of are;
>>
>> 1) Aggregating before UNION ALL, which might be fairly simple after the
>> grouping planner changes, as it may just be a matter of considering another
>> "grouping path" which partially aggregates before the UNION ALL, and
>> performs the final grouping stage after UNION ALL. At this stage it's hard
>> to say how that will work as I'm not sure how far changes to the grouping
>> planner will go. Perhaps Tom can comment?
>
> I hope he will, but in the meantime let me ask how this does us any
> good.  UNION ALL turns into an Append node.  Pushing aggregation
> through an Append doesn't make anything faster AFAICS *unless* you can
> further optimize beginning at that point.  For example, if one or both
> sides of the Append node have a Gather under them, and you can push
> the partial aggregation down into the Gather; or if you hit a Foreign
> Scan and can have it do partial aggregation on the remote side, you
> win.  But if you just have:
>
> Finalize Aggregate
> -> Append
>   -> Partial Aggregate
> -> Thing One
>   -> Partial Aggregate
> -> Thing Two
>
> Instead of:
>
> Aggregate
> -> Append
>   -> Thing One
>   -> Thing Two
>
> ...then I don't see the point, at least not for a single-group
> Aggregate or HashAggregate.  For a GroupAggregate, Thing One and Thing
> Two might need to be sorted, and sorting two or more smaller data sets
> might be faster than sorting one larger data set, but I can't see us
> winning anything very big here.
>
> To be clear, I'm not saying we shouldn't do this.  I just don't think
> it does much *by itself*.  If you combine it with other optimizations
> that let the aggregation be pushed further down the plan tree, it
> could win big.

Yes, I agree, it's not a big win, at least not in the case of a serial
plan. If each branch of the UNION ALL could be processed in parallel,
then that's different.

It's quite simple to test how much of a win it'll be in the serial
case today, and yes, it's not much, but it's a bit.

create table t1 as select x from generate_series(1,100) x(x);
vacuum analyze t1;
select count(*) from (select * from t1 union all select * from t1) t;
  count
-
 200
(1 row)

Time: 185.793 ms

-- Mock up pushed down aggregation by using sum() as a combine
function for count(*)
select sum(c) from (select count(*) c from t1 union all select
count(*) from t1) t;
   sum
-
 200
(1 row)

Time: 162.076 ms

Not particularly incredible, but we don't normally turn our noses up
at a 14% improvement, so let's just see how complex it will be to
implement, once the upper planner changes are done.

But as you mention about lack of ability to make use of pre-sorted
Path for each branch of the UNION ALL; I was really hoping Tom's patch
will improve that part by allowing the planner to choose a pre-sorted
Path and perform a MergeAppend instead of an Append, which would allow
pre-sorted input into a GroupAggregate node.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

On 21 January 2016 at 08:06, Robert Haas  wrote:

> On Wed, Jan 20, 2016 at 7:38 AM, David Rowley
>  wrote:
> > Agreed. So I've attached a version of the patch which does not have any
> of
> > the serialise/deserialise stuff in it.
>
> I re-reviewed this and have committed most of it with only minor
> kibitizing.  A few notes:
>

Great! Thank you for committing.

- I changed the EXPLAIN code, since it failed to display the aggregate
> node's mode of operation in non-text format.
>

Oops, thanks for fixing.


> - I removed some of the planner support, since I'm not sure that it's
> completely correct and I'm very sure that it contains more code
> duplication than I'm comfortable with (set_combineagg_references in
> particular).  Please feel free to resubmit this part, perhaps in
> combination with code that actually uses it for something.  But please
> also think about whether there's a way to reduce the duplication, and
> maybe consider adding some comments, too.
>

That part is a gray area. Wasn't that sure it belonged to this patch
either. The problem is, that without it an Aggref's return type is wrong
when the node is in combine mode, which might not be a problem now as we
don't have a planner yet that generates these nodes.


> - I'm not sure that you made the right call regarding reusing
> transfn_oid and transfn for combine functions, vs. adding separate
> fields.  It is sort of logical and has the advantage of keeping the
> data structure smaller, but it might also be thought confusing or
> inappropriate on other grounds.  But I think we can change it later if
> that comes to seem like the right thing to do, so I've left it the way
> you had it for now.


Sharing the variable, as horrid as that might seem does simplify some code.
For example if you look at find_compatible_pertrans(), then you see:

if (aggtransfn != pertrans->transfn_oid ||
aggtranstype != pertrans->aggtranstype)
continue;

This would need to be changed to something like:

if (!aggstate->combineStates &&
(aggtransfn != pertrans->transfn_oid ||
aggtranstype != pertrans->aggtranstype))
continue;
else if (aggstate->combineStates &&
(aggcombinefn != pertrans->combinefn_oid ||
aggtranstype != pertrans->aggtranstype))
continue;

And we'd then have to pass aggcombinefn to that function too.

What might be better would be to rename the variable to something name that
screams something a bit more generic.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Combining Aggregates

On 21 January 2016 at 04:59, Robert Haas  wrote:

> On Wed, Jan 20, 2016 at 7:53 AM, David Rowley
>  wrote:
> > On 21 January 2016 at 01:44, Robert Haas  wrote:
> >>
> >> On Wed, Jan 20, 2016 at 7:38 AM, David Rowley
> >>  wrote:
> >> >> To my mind, priority #1 ought to be putting this fine new
> >> >> functionality to some use.  Expanding it to every aggregate we've got
> >> >> seems like a distinctly second priority.  That's not to say that it's
> >> >> absolutely gotta go down that way, but those would be my priorities.
> >> >
> >> > Agreed. So I've attached a version of the patch which does not have
> any
> >> > of
> >> > the serialise/deserialise stuff in it.
> >> >
> >> > I've also attached a test patch which modifies the grouping planner to
> >> > add a
> >> > Partial Aggregate node, and a final aggregate node when it's possible.
> >> > Running the regression tests with this patch only shows up variances
> in
> >> > the
> >> > EXPLAIN outputs, which is of course expected.
> >>
> >> That seems great as a test, but what's the first patch that can put
> >> this to real and permanent use?
> >
> > There's no reason why parallel aggregates can't use the
> > combine_aggregate_state_d6d480b_2016-01-21.patch patch.
>
> I agree.  Are you going to work on that?  Are you expecting me to work
> on that?  Do you think we can use Haribabu's patch?  What other
> applications are in play in the near term, if any?

At the moment I think everything which will use this is queued up behind
the pathification of the grouping planner which Tom is working on. I think
naturally Parallel Aggregate makes sense to work on first, given all the
other parallel stuff in this release. I plan on working on that that by
either assisting Haribabu, or... whatever else it takes.

The other two usages which I have thought of are;

1) Aggregating before UNION ALL, which might be fairly simple after the
grouping planner changes, as it may just be a matter of considering another
"grouping path" which partially aggregates before the UNION ALL, and
performs the final grouping stage after UNION ALL. At this stage it's hard
to say how that will work as I'm not sure how far changes to the grouping
planner will go. Perhaps Tom can comment?

2) Group before join. e.g select p.description,sum(s.qty) from sale s inner
join s.product_id = p.product_id group by s.product_id group by
p.description;  I have a partial patch which implements this, although I
was a bit stuck on if I should invent the concept of "GroupingPaths", or
just inject alternative subquery relations which are already grouped by the
correct clause.  The problem with "GroupingPaths" was down to the row
estimates currently come from the RelOptInfo and is set
in set_baserel_size_estimates() which always assumes the ungrouped number
of rows, which is not what's needed if the grouping is already performed. I
was holding off to see how Tom does this in the grouping planner changes.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Combining Aggregates

On 21 January 2016 at 15:53, Haribabu Kommi 
wrote:

> On Thu, Jan 21, 2016 at 1:33 PM, Haribabu Kommi
>  wrote:
> >
> > Here I attached updated patch of parallel aggregate based on the latest
> > changes in master. Still it lack of cost comparison of normal aggregate
> to
> > parallel aggregate because of difficulty. This cost comparison is
> required
> > in parallel aggregate as this is having some regression when the number
> > of groups are less in the query plan.
> >
>
> Updated patch is attached after removing a warning in building group
> aggregate path.

Hi,

Thanks for updating the patch. I'd like to look at this with priority, but
can you post it on the Parallel Agg thread? that way anyone following there
can chime in over there rather than here.  I've still got a bit of work to
do (in the not too distant future) on the serial/deserial part, so would be
better to keep this thread for discussion on that.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Combining Aggregates

2016-01-20 Thread Haribabu Kommi

On Thu, Jan 21, 2016 at 12:32 PM, David Rowley
 wrote:
> On 21 January 2016 at 04:59, Robert Haas  wrote:
>>
>> On Wed, Jan 20, 2016 at 7:53 AM, David Rowley
>>  wrote:
>> > On 21 January 2016 at 01:44, Robert Haas  wrote:
>> >>
>> >> On Wed, Jan 20, 2016 at 7:38 AM, David Rowley
>> >>  wrote:
>> >> >> To my mind, priority #1 ought to be putting this fine new
>> >> >> functionality to some use.  Expanding it to every aggregate we've
>> >> >> got
>> >> >> seems like a distinctly second priority.  That's not to say that
>> >> >> it's
>> >> >> absolutely gotta go down that way, but those would be my priorities.
>> >> >
>> >> > Agreed. So I've attached a version of the patch which does not have
>> >> > any
>> >> > of
>> >> > the serialise/deserialise stuff in it.
>> >> >
>> >> > I've also attached a test patch which modifies the grouping planner
>> >> > to
>> >> > add a
>> >> > Partial Aggregate node, and a final aggregate node when it's
>> >> > possible.
>> >> > Running the regression tests with this patch only shows up variances
>> >> > in
>> >> > the
>> >> > EXPLAIN outputs, which is of course expected.
>> >>
>> >> That seems great as a test, but what's the first patch that can put
>> >> this to real and permanent use?
>> >
>> > There's no reason why parallel aggregates can't use the
>> > combine_aggregate_state_d6d480b_2016-01-21.patch patch.
>>
>> I agree.  Are you going to work on that?  Are you expecting me to work
>> on that?  Do you think we can use Haribabu's patch?  What other
>> applications are in play in the near term, if any?
>
>
> At the moment I think everything which will use this is queued up behind the
> pathification of the grouping planner which Tom is working on. I think
> naturally Parallel Aggregate makes sense to work on first, given all the
> other parallel stuff in this release. I plan on working on that that by
> either assisting Haribabu, or... whatever else it takes.
>

Here I attached updated patch of parallel aggregate based on the latest
changes in master. Still it lack of cost comparison of normal aggregate to
parallel aggregate because of difficulty. This cost comparison is required
in parallel aggregate as this is having some regression when the number
of groups are less in the query plan.

Regards,
Hari Babu
Fujitsu Australia


parallelagg_poc_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-20 Thread Haribabu Kommi

On Thu, Jan 21, 2016 at 1:33 PM, Haribabu Kommi
 wrote:
>
> Here I attached updated patch of parallel aggregate based on the latest
> changes in master. Still it lack of cost comparison of normal aggregate to
> parallel aggregate because of difficulty. This cost comparison is required
> in parallel aggregate as this is having some regression when the number
> of groups are less in the query plan.
>

Updated patch is attached after removing a warning in building group
aggregate path.

Regards,
Hari Babu
Fujitsu Australia


parallelagg_poc_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-20 Thread Robert Haas

On Wed, Jan 20, 2016 at 7:53 AM, David Rowley
 wrote:
> On 21 January 2016 at 01:44, Robert Haas  wrote:
>>
>> On Wed, Jan 20, 2016 at 7:38 AM, David Rowley
>>  wrote:
>> >> To my mind, priority #1 ought to be putting this fine new
>> >> functionality to some use.  Expanding it to every aggregate we've got
>> >> seems like a distinctly second priority.  That's not to say that it's
>> >> absolutely gotta go down that way, but those would be my priorities.
>> >
>> > Agreed. So I've attached a version of the patch which does not have any
>> > of
>> > the serialise/deserialise stuff in it.
>> >
>> > I've also attached a test patch which modifies the grouping planner to
>> > add a
>> > Partial Aggregate node, and a final aggregate node when it's possible.
>> > Running the regression tests with this patch only shows up variances in
>> > the
>> > EXPLAIN outputs, which is of course expected.
>>
>> That seems great as a test, but what's the first patch that can put
>> this to real and permanent use?
>
> There's no reason why parallel aggregates can't use the
> combine_aggregate_state_d6d480b_2016-01-21.patch patch.

I agree.  Are you going to work on that?  Are you expecting me to work
on that?  Do you think we can use Haribabu's patch?  What other
applications are in play in the near term, if any?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-20 Thread Robert Haas

On Wed, Jan 20, 2016 at 7:38 AM, David Rowley
 wrote:
> Agreed. So I've attached a version of the patch which does not have any of
> the serialise/deserialise stuff in it.

I re-reviewed this and have committed most of it with only minor
kibitizing.  A few notes:

- I changed the EXPLAIN code, since it failed to display the aggregate
node's mode of operation in non-text format.

- I removed some of the planner support, since I'm not sure that it's
completely correct and I'm very sure that it contains more code
duplication than I'm comfortable with (set_combineagg_references in
particular).  Please feel free to resubmit this part, perhaps in
combination with code that actually uses it for something.  But please
also think about whether there's a way to reduce the duplication, and
maybe consider adding some comments, too.

- I'm not sure that you made the right call regarding reusing
transfn_oid and transfn for combine functions, vs. adding separate
fields.  It is sort of logical and has the advantage of keeping the
data structure smaller, but it might also be thought confusing or
inappropriate on other grounds.  But I think we can change it later if
that comes to seem like the right thing to do, so I've left it the way
you had it for now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-20 Thread Robert Haas

On Wed, Jan 20, 2016 at 7:38 AM, David Rowley
 wrote:
>> To my mind, priority #1 ought to be putting this fine new
>> functionality to some use.  Expanding it to every aggregate we've got
>> seems like a distinctly second priority.  That's not to say that it's
>> absolutely gotta go down that way, but those would be my priorities.
>
> Agreed. So I've attached a version of the patch which does not have any of
> the serialise/deserialise stuff in it.
>
> I've also attached a test patch which modifies the grouping planner to add a
> Partial Aggregate node, and a final aggregate node when it's possible.
> Running the regression tests with this patch only shows up variances in the
> EXPLAIN outputs, which is of course expected.

That seems great as a test, but what's the first patch that can put
this to real and permanent use?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

On 21 January 2016 at 01:44, Robert Haas  wrote:

> On Wed, Jan 20, 2016 at 7:38 AM, David Rowley
>  wrote:
> >> To my mind, priority #1 ought to be putting this fine new
> >> functionality to some use.  Expanding it to every aggregate we've got
> >> seems like a distinctly second priority.  That's not to say that it's
> >> absolutely gotta go down that way, but those would be my priorities.
> >
> > Agreed. So I've attached a version of the patch which does not have any
> of
> > the serialise/deserialise stuff in it.
> >
> > I've also attached a test patch which modifies the grouping planner to
> add a
> > Partial Aggregate node, and a final aggregate node when it's possible.
> > Running the regression tests with this patch only shows up variances in
> the
> > EXPLAIN outputs, which is of course expected.
>
> That seems great as a test, but what's the first patch that can put
> this to real and permanent use?


There's no reason why parallel aggregates can't use
the combine_aggregate_state_d6d480b_2016-01-21.patch patch.

In this patch I've changed aggregates_allow_partial() so that it properly
determines what is possible based on the aggregates which are in the query.
This now, of course restricts the aggregates to "internal only" when the
agg state type is INTERNAL, providing there's a combine function, of course.

Parallel aggregate should work with all the MAX() and MIN() functions and a
handful of other ones, I've managed to borrow various existing function as
the combine function for many aggregates:

# select count(*) from pg_aggregate where aggcombinefn <> 0;
 count
---
58
(1 row)

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Combining Aggregates

On 20 January 2016 at 10:54, Robert Haas  wrote:

> On Tue, Jan 19, 2016 at 4:50 PM, David Rowley
>  wrote:
> >> Oh, one more point: is there any reason why all of this needs to be a
> >> single (giant) patch?  I feel like the handling of internal states
> >> could be a separate patch from the basic patch to allow partial
> >> aggregation and aggregate-combining, and that the latter could be
> >> committed first if we had a reasonably final version of it.  That
> >> seems like it would be easier from a review standpoint, and might
> >> allow more development to proceed in parallel, too.
> >
> > I didn't ever really imagine that I'd include any actual new combinefns
> or
> > serialfn/deserialfn in this patch. One set has of course now ended up in
> > there, I can move these off to the test patch for now.
> >
> > Did you imagine that the first patch in the series would only add the
> > aggcombinefn column to pg_aggregate and leave the aggserialfn stuff until
> > later?
>
> I think that would be a sensible way to proceed.
>
> > I thought it seemed better to get the infrastructure committed in one
> > hit, then add a bunch of new combinefn, serialfn, deserialfns for
> existing
> > aggregates in follow-on commits.
>
> To my mind, priority #1 ought to be putting this fine new
> functionality to some use.  Expanding it to every aggregate we've got
> seems like a distinctly second priority.  That's not to say that it's
> absolutely gotta go down that way, but those would be my priorities.


Agreed. So I've attached a version of the patch which does not have any of
the serialise/deserialise stuff in it.

I've also attached a test patch which modifies the grouping planner to add
a Partial Aggregate node, and a final aggregate node when it's possible.
Running the regression tests with this patch only shows up variances in the
EXPLAIN outputs, which is of course expected.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


combine_aggregate_state_d6d480b_2016-01-21.patch
Description: Binary data


combine_aggs_test_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-19 Thread David Rowley

On 20 January 2016 at 05:58, Robert Haas  wrote:

> > On Mon, Dec 21, 2015 at 4:02 AM, David Rowley
> >  wrote:
> >> Now, there has been talk of this previously, on various threads, but I
> don't
> >> believe any final decisions were made on how exactly it should be done.
> At
> >> the moment I plan to make changes as follows:
>
> Oh, one more point: is there any reason why all of this needs to be a
> single (giant) patch?  I feel like the handling of internal states
> could be a separate patch from the basic patch to allow partial
> aggregation and aggregate-combining, and that the latter could be
> committed first if we had a reasonably final version of it.  That
> seems like it would be easier from a review standpoint, and might
> allow more development to proceed in parallel, too.

I didn't ever really imagine that I'd include any actual new combinefns or
serialfn/deserialfn in this patch. One set has of course now ended up in
there, I can move these off to the test patch for now.

Did you imagine that the first patch in the series would only add the
aggcombinefn column to pg_aggregate and leave the aggserialfn stuff until
later? I thought it seemed better to get the infrastructure committed in
one hit, then add a bunch of new combinefn, serialfn, deserialfns for
existing aggregates in follow-on commits.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Combining Aggregates

2016-01-19 Thread David Rowley

On 20 January 2016 at 05:56, Robert Haas  wrote:

>
> On Mon, Dec 21, 2015 at 4:02 AM, David Rowley
>  wrote:
> > Now, there has been talk of this previously, on various threads, but I
> don't
> > believe any final decisions were made on how exactly it should be done.
> At
> > the moment I plan to make changes as follows:
> >
> >  Add 3 new columns to pg_aggregate, aggserialfn, aggdeserialfn and
> > aggserialtype These will only be required when aggtranstype is INTERNAL.
>
> Check.
>
> > Perhaps we should disallow CREATE AGGREAGET from accepting them for any
> > other type...
>
> Well, we should definitely not accept them and have them be
> meaningless.  We should either reject them or accept them and then use
> them.  I can't immediately think of a reason to serialize one
> non-internal type as another, but maybe there is one.
>
>
In the latest patch I'm disallowing serial and deserial functions for non
INTERNAL state aggregates during CREATE AGGREGATE.
I think it's best to keep this disabled for now, and do so until we
discover some reason that we might want want to enable it. If we enabled
it, and later decided that was a dumb idea, then it'll be much harder to
add the restriction later, since it may cause errors for users who have
created their own aggregates.

> > Add a new bool field to nodeAgg's state named serialStates. If this is
> field
> > is set to true then when we're in finalizeAgg = false mode, we'll call
> the
> > serialfn on the agg state instead of the finalfn. This will allow the
> > serialized state to be stored in the tuple and sent off to the main
> backend.
> > The combine agg node should also be set to serialStates = true, so that
> it
> > knows to deserialize instead of just assuming that the agg state is of
> type
> > aggtranstype.
>
> I'm not quite sure, but it sounds like you might be overloading
> serialStates with two different meanings here.
>

Hmm, only in the sense that serialStates means "serialise" and
"deserialise". The only time that both could occur in the same node is if
combineStates=true and finalizeAggs=false (in other words 3 or more
aggregation stages).

Let's say we are performing aggregation in 3 stages, stage 1 is operating
on normal values and uses the transfn on these values, but does not
finalise the states. If serialStates = true here then, if one exists, we
call the serialfn, passing in the aggregated state, and pass the return
value of that function up to the next node. Now, this next (middle) node is
important, serialStates must also be true here so that the executor knows
to deserialise the previously serialised states. Now, this node uses the
combinefn to merge states which require, and then since serialStates is
true, it also (re)serialises those new states again. Now if there was some
reason that this middle node should deserialise, but not re-serialise those
states, then we may need an extra parameter to instruct it to do so. I
guess this may be required if we were to perform some partial aggregation
and combining again within a single process (in which case we'd not need to
serialise INTERNAL states, we can just pass the pointer to them in the
Tuple),  but then we might require the states to be serialised in order to
hand them over to the main process, from a worker process.

I can imagine cases where we might want to do this in the future, so
perhaps it is worth it:

Imagine:

SELECT COUNT(*) FROM (SELECT * FROM (SELECT * FROM a UNION ALL SELECT *
FROM b) AS ab UNION ALL (SELECT * FROM c UNION ALL SELECT * FROM d)) abcd;

Where the planner may decide to, on 1 worker perform count(*) on a then b,
and combine that into ab, while doing the same for c and d on some other
worker process.

> I believe this should allow us to not cause any performance regressions by
> > moving away from INTERNAL agg states. It should also be very efficient
> for
> > internal states such as Int8TransTypeData, as this struct merely has 2
> int64
> > fields which should be very simple to stuff into a bytea varlena type. We
> > don't need to mess around converting the ->count and ->sum into a
> strings or
> > anything.
>
> I think it would be more user-friendly to emit these as 2-element
> integer arrays rather than bytea.  Sure, bytea is fine when PostgreSQL
> is talking to itself, but if you are communicating with an external
> system, it's a different situation.  If the remote system is
> PostgreSQL then you are again OK, except for the data going over the
> wire being incomprehensible and maybe byte-order-dependent, but what
> if you want some other database system to do partial aggregation and
> then further aggregate the result?  You don't want the intermediate
> state to be some kooky thing that only another PostgreSQL database has
> a chance of generating correctly.
>

If we do that then we also need to invent composite database types for the
more complicated internal states such as NumericAggState.
We should

Re: [HACKERS] Combining Aggregates

On Tue, Jan 19, 2016 at 4:50 PM, David Rowley
 wrote:
>> Oh, one more point: is there any reason why all of this needs to be a
>> single (giant) patch?  I feel like the handling of internal states
>> could be a separate patch from the basic patch to allow partial
>> aggregation and aggregate-combining, and that the latter could be
>> committed first if we had a reasonably final version of it.  That
>> seems like it would be easier from a review standpoint, and might
>> allow more development to proceed in parallel, too.
>
> I didn't ever really imagine that I'd include any actual new combinefns or
> serialfn/deserialfn in this patch. One set has of course now ended up in
> there, I can move these off to the test patch for now.
>
> Did you imagine that the first patch in the series would only add the
> aggcombinefn column to pg_aggregate and leave the aggserialfn stuff until
> later?

I think that would be a sensible way to proceed.

> I thought it seemed better to get the infrastructure committed in one
> hit, then add a bunch of new combinefn, serialfn, deserialfns for existing
> aggregates in follow-on commits.

To my mind, priority #1 ought to be putting this fine new
functionality to some use.  Expanding it to every aggregate we've got
seems like a distinctly second priority.  That's not to say that it's
absolutely gotta go down that way, but those would be my priorities.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

On Mon, Jan 18, 2016 at 11:24 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Here is a patch that helps a good deal.  I changed things so that when
>> we create a context, we always allocate at least 1kB.
>
> That's going to kill performance in some other cases; subtransactions
> in particular rely on the subtransaction's TransactionContext not causing
> any actual malloc unless something gets put into the TransactionContext.

Sorry, I guess I wasn't clear: it increases the allocation for the
context node itself to 1kB and uses the extra space to store a few
allocations.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

On Tue, Jan 19, 2016 at 12:22 AM, Tomas Vondra
 wrote:
> I dare to claim that if expanded objects require a separate memory context
> per aggregate state (I don't see why would be the case), it's a dead end.
> Not so long ago we've fixed exactly this issue in array_agg(), which
> addressed a bunch of reported OOM issues and improved array_agg()
> performance quite a bit. It'd be rather crazy introducing the same problem
> to all aggregate functions.

Expanded objects require a separate memory context per Datum.  I think
I know how to rejigger things so that the overhead of that is
minimized as much as possible, but it's still 200 bytes for the
AllocSetContext + ~16 bytes for the name + 32 bytes for an AllocBlock
+ 48 bytes for an ExpandedObjectHeader.  That's about 300 bytes of
overhead per expanded datum, which means on this test case the
theoretical minimum memory burn for this approach is about 300 MB (and
in practice somewhat more).  Ouch.

The upshot seems to be that expanded objects aren't going to actually
be useful in any situation where you might have very many of them,
because the memory overhead is just awful.  That's rather
disappointing.  Even if you could get rid of the requirement for a
memory context per Datum - and it seems like you could if you added a
method defined to reparent the object to a designated context, which
looks like a totally reasonable innovation - somebody's probably going
to say, not without some justification, that 48 bytes of
ExpandedObjectHeader per object is an unaffordable luxury here.  The
structure has 8 bytes of unnecessary padding in it that we could elide
easily enough, but after that there's not really much way to squeeze
it down without hurting performance in some case cases where this
mechanism is fast today.

So I guess I'm going to agree that this approach is doomed.  Rats.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

On Tue, Jan 19, 2016 at 11:56 AM, Robert Haas  wrote:
> [ rewinding to here from the detour I led us on ]
>
> On Mon, Dec 21, 2015 at 4:02 AM, David Rowley
>  wrote:
>> Now, there has been talk of this previously, on various threads, but I don't
>> believe any final decisions were made on how exactly it should be done. At
>> the moment I plan to make changes as follows:

Oh, one more point: is there any reason why all of this needs to be a
single (giant) patch?  I feel like the handling of internal states
could be a separate patch from the basic patch to allow partial
aggregation and aggregate-combining, and that the latter could be
committed first if we had a reasonably final version of it.  That
seems like it would be easier from a review standpoint, and might
allow more development to proceed in parallel, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

[ rewinding to here from the detour I led us on ]

On Mon, Dec 21, 2015 at 4:02 AM, David Rowley
 wrote:
> Now, there has been talk of this previously, on various threads, but I don't
> believe any final decisions were made on how exactly it should be done. At
> the moment I plan to make changes as follows:
>
>  Add 3 new columns to pg_aggregate, aggserialfn, aggdeserialfn and
> aggserialtype These will only be required when aggtranstype is INTERNAL.

Check.

> Perhaps we should disallow CREATE AGGREAGET from accepting them for any
> other type...

Well, we should definitely not accept them and have them be
meaningless.  We should either reject them or accept them and then use
them.  I can't immediately think of a reason to serialize one
non-internal type as another, but maybe there is one.

> The return type of aggserialfn should be aggserialtype, and it
> should take a single parameter of aggtranstype. aggdeserialfn will be the
> reverse of that.

Check.

> Add a new bool field to nodeAgg's state named serialStates. If this is field
> is set to true then when we're in finalizeAgg = false mode, we'll call the
> serialfn on the agg state instead of the finalfn. This will allow the
> serialized state to be stored in the tuple and sent off to the main backend.
> The combine agg node should also be set to serialStates = true, so that it
> knows to deserialize instead of just assuming that the agg state is of type
> aggtranstype.

I'm not quite sure, but it sounds like you might be overloading
serialStates with two different meanings here.

> I believe this should allow us to not cause any performance regressions by
> moving away from INTERNAL agg states. It should also be very efficient for
> internal states such as Int8TransTypeData, as this struct merely has 2 int64
> fields which should be very simple to stuff into a bytea varlena type. We
> don't need to mess around converting the ->count and ->sum into a strings or
> anything.

I think it would be more user-friendly to emit these as 2-element
integer arrays rather than bytea.  Sure, bytea is fine when PostgreSQL
is talking to itself, but if you are communicating with an external
system, it's a different situation.  If the remote system is
PostgreSQL then you are again OK, except for the data going over the
wire being incomprehensible and maybe byte-order-dependent, but what
if you want some other database system to do partial aggregation and
then further aggregate the result?  You don't want the intermediate
state to be some kooky thing that only another PostgreSQL database has
a chance of generating correctly.

> Then in order for the planner to allow parallel aggregation all aggregates
> must:
>
> Not have a DISTINCT or ORDER BY clause
> Have a combinefn
> If aggtranstype = INTERNAL, must have a aggserialfn and aggdeserialfn.
>
> We can relax the requirement on 3 if we're using 2-stage aggregation, but
> not parallel aggregation.

When would we do that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Haribabu Kommi

On Mon, Jan 18, 2016 at 10:32 PM, David Rowley
 wrote:
> On 18 January 2016 at 16:44, Robert Haas  wrote:
>>
>> On Sun, Jan 17, 2016 at 9:26 PM, David Rowley
>>  wrote:
>> > hmm, so wouldn't that mean that the transition function would need to
>> > (for
>> > each input tuple):
>> >
>> > 1. Parse that StringInfo into tokens.
>> > 2. Create a new aggregate state object.
>> > 3. Populate the new aggregate state based on the tokenised StringInfo,
>> > this
>> > would perhaps require that various *_in() functions are called on each
>> > token.
>> > 4. Add the new tuple to the aggregate state.
>> > 5. Build a new StringInfo based on the aggregate state modified in 4.
>> >
>> > ?
>>
>> I don't really know what you mean by parse the StringInfo into tokens.
>> The whole point of the expanded-object interface is to be able to keep
>> things in an expanded internal form so that you *don't* have to
>> repeatedly construct and deconstruct internal data structures.
>
>
> That was a response to Haribabu proposal, although perhaps I misunderstood
> that. However I'm not sure how a PolyNumAggState could be converted to a
> string and back again without doing any string parsing.

I just thought like direct mapping of the structure with text pointer.
something like
the below.

result = PG_ARGISNULL(0) ? NULL : (text *) PG_GETARG_POINTER(0);
state = (PolyNumAggState *)VARDATA(result);

To handle the big-endian or little-endian, we may need some extra changes.

Instead of adding 3 new columns to the pg_aggregate catalog table to handle
the internal types, either something like the above to handle the internal types
or some other way is better IMO.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Tom Lane

Robert Haas  writes:
> Yeah.  The API contract for an expanded object took me quite a while
> to puzzle out, but it seems to be this: if somebody hands you an R/W
> pointer to an expanded object, you're entitled to assume that you can
> "take over" that object and mutate it however you like.  But the
> object might be in some other memory context, so you have to move it
> into your own memory context.

Only if you intend to keep it --- for example, a function that is mutating
and returning an object isn't required to move it somewhere else, if the
input is R/W, and I think it generally shouldn't.

In the context here, I'd think it is the responsibility of nodeAgg.c
not individual datatype functions to make sure that expanded objects
live where it wants them to.

> Well, that's pretty odd.  I guess the plan change must be a result of
> switching the transition type from internal to text, although I'm not
> immediately certain why that would make a difference.

I'm pretty sure there's something in the planner that special-cases
its estimate of space consumed by hashtable entries when the data type
is "internal".  You'd be wanting to fool with that estimate anyway
for something like this ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Robert Haas

On Mon, Jan 18, 2016 at 12:09 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Yeah.  The API contract for an expanded object took me quite a while
>> to puzzle out, but it seems to be this: if somebody hands you an R/W
>> pointer to an expanded object, you're entitled to assume that you can
>> "take over" that object and mutate it however you like.  But the
>> object might be in some other memory context, so you have to move it
>> into your own memory context.
>
> Only if you intend to keep it --- for example, a function that is mutating
> and returning an object isn't required to move it somewhere else, if the
> input is R/W, and I think it generally shouldn't.
>
> In the context here, I'd think it is the responsibility of nodeAgg.c
> not individual datatype functions to make sure that expanded objects
> live where it wants them to.

That's how I did it in my prototype, but the problem with that is that
spinning up a memory context for every group sucks when there are many
groups with only a small number of elements each - hence the 50%
regression that David Rowley observed.  If we're going to use expanded
objects here, which seems like a good idea in principle, that's going
to have to be fixed somehow.  We're flogging the heck out of malloc by
repeatedly creating a context, doing one or two allocations in it, and
then destroying the context.

I think that, in general, the memory context machinery wasn't really
designed to manage lots of small contexts.  The overhead of spinning
up a new context for just a few allocations is substantial.  That
matters in some other situations too, I think - I've commonly seen
AllocSetContextCreate taking several percent  of runtime in profiles.
But here it's much exacerbated.  I'm not sure whether it's better to
attack that problem at the root and try to make AllocSetContextCreate
cheaper, or whether we should try to figure out some change to the
expanded-object machinery to address the issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

On 18 January 2016 at 16:44, Robert Haas  wrote:

> On Sun, Jan 17, 2016 at 9:26 PM, David Rowley
>  wrote:
> > hmm, so wouldn't that mean that the transition function would need to
> (for
> > each input tuple):
> >
> > 1. Parse that StringInfo into tokens.
> > 2. Create a new aggregate state object.
> > 3. Populate the new aggregate state based on the tokenised StringInfo,
> this
> > would perhaps require that various *_in() functions are called on each
> > token.
> > 4. Add the new tuple to the aggregate state.
> > 5. Build a new StringInfo based on the aggregate state modified in 4.
> >
> > ?
>
> I don't really know what you mean by parse the StringInfo into tokens.
> The whole point of the expanded-object interface is to be able to keep
> things in an expanded internal form so that you *don't* have to
> repeatedly construct and deconstruct internal data structures.

That was a response to Haribabu proposal, although perhaps I misunderstood
that. However I'm not sure how a PolyNumAggState could be converted to a
string and back again without doing any string parsing.

I worked up an example of this approach using string_agg(), which I
> attach here.  This changes the transition type of string_agg() from
> internal to text.  The same code would work for bytea_string_agg(),
> which would allow removal of some other code, but this patch doesn't
>
do that, because the point of this is to elucidate the approach.
>
>
Many thanks for working up that patch. I was clearly missing what you meant
previously. I understand it much better now. Thank you for taking the time
on that.

> In my tests, this seems to be slightly slower than what we're doing
> today; worst of all, it adds a handful of cycles to
> advance_transition_function() even when the aggregate is not an
> expanded object or, indeed, not even pass-by-reference.  Some of this
> might be able to be fixed by a little massaging - in particular,
> DatumIsReadWriteExpandedObject() seems like it could be partly or
> entirely inlined, and maybe there's some other way to improve the
> coding here.
>

It also seems that an expanded object requires a new memory context which
must be malloc()d and free()d. This has added quite an overhead in my
testing. I assume that we must be doing that so that we can ensure that all
memory is properly free()d once we're done with the expanded-object.

create table ab(a int, b text);
insert into ab select x,'aaa' from
generate_Series(1,100) x(x);
set work_mem='1GB';
vacuum analyze ab;

explain analyze select a%100,length(string_agg(b,',')) from ab group by
1;

Patched
1521.045 ms
1515.905 ms
1530.920 ms

Master
932.457 ms
959.285 ms
991.021 ms

Although performance of the patched version is closer to master when we
have less groups, I felt it necessary to show the extreme case. I feel this
puts a bit of a dampener on the future of this idea, as I've previously had
patches rejected for adding 2-5% on planning time alone, adding over 50%
execution time seems like a dead-end.

I've run profiles on this and malloc() and free() are both top of the
profile with the patched version with about 30% CPU time each.

> Generally, I think finding a way to pass expanded objects through
> nodeAgg.c would be a good thing to pursue, if we can make it work.
> The immediate impetus for changing things this way would be that we
> wouldn't need to add a mechanism for serializing and deserializing
> internal functions just to pass around partial aggregates.  But
> there's another advantage, too: right now,
> advance_transition_function() does a lot of data copying to move data
> from per-call context to the per-aggregate context.  When an expanded
> object is in use, this can be skipped.

The part I quite liked about the serialize/deserialize is that there's no
need to add any overhead at all for serializing and deserializing the
states when the aggregation is done in a single backend process. We'd
simply just have the planner pass the make_agg()'s serializeStates as false
when we're working within a single backend. This does not appear possible
with your proposed implementation, since it makes changes to each
transition function. It is my understanding that we normally bend over
backwards with new code to try and stop any performance regression. I'm not
quite sure the expanded-object idea works well in this regard, but I do
agree your approach seems neater. I just don't want to waste my time
writing all the code required to replace all INTERNAL aggregate states when
I know the performance regression is going to be unacceptable.

I also witnessed another regression with your patch when testing on another
machine. It caused the plan to change to a HashAgg instead of GroupAgg
causing a significant slowdown.

Unpatched

# explain analyze select a%100,length(string_agg(b,',')) from ab group
by 1;
QUERY PLAN

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Robert Haas

On Mon, Jan 18, 2016 at 6:32 AM, David Rowley
 wrote:
>> In my tests, this seems to be slightly slower than what we're doing
>> today; worst of all, it adds a handful of cycles to
>> advance_transition_function() even when the aggregate is not an
>> expanded object or, indeed, not even pass-by-reference.  Some of this
>> might be able to be fixed by a little massaging - in particular,
>> DatumIsReadWriteExpandedObject() seems like it could be partly or
>> entirely inlined, and maybe there's some other way to improve the
>> coding here.
>
> It also seems that an expanded object requires a new memory context which
> must be malloc()d and free()d. This has added quite an overhead in my
> testing. I assume that we must be doing that so that we can ensure that all
> memory is properly free()d once we're done with the expanded-object.

Yeah.  The API contract for an expanded object took me quite a while
to puzzle out, but it seems to be this: if somebody hands you an R/W
pointer to an expanded object, you're entitled to assume that you can
"take over" that object and mutate it however you like.  But the
object might be in some other memory context, so you have to move it
into your own memory context.  That's implementing by reparenting the
object's context under your context.  This is nice because it's O(1) -
whereas copying would be O(n) - but it's sort of aggravating, too.  In
this case, the transition function already knows that it wants
everything in the per-agg state, but it can't just create everything
there and be done with it, because nodeAgg.c has to content with the
possibility that some other piece of code will return an expanded
object that *isn't* parented to the aggcontext.  And in fact one of
the regression tests does exactly that, which caused me lots of
head-banging yesterday.

Making it worse, the transition function isn't required to have the
same behavior every time: the one in the regression tests sometimes
returns an expanded-object pointer, and sometimes doesn't, and if
nodeAgg.c reparents that pointer to the aggcontext, the next call to
the transition function reparents the pointer BACK to the per-tuple
context, so you can't even optimize away the repeated set-parent
calls.  Uggh.

> create table ab(a int, b text);
> insert into ab select x,'aaa' from
> generate_Series(1,100) x(x);
> set work_mem='1GB';
> vacuum analyze ab;
>
> explain analyze select a%100,length(string_agg(b,',')) from ab group by
> 1;
>
> Patched
> 1521.045 ms
> 1515.905 ms
> 1530.920 ms
>
> Master
> 932.457 ms
> 959.285 ms
> 991.021 ms
>
> Although performance of the patched version is closer to master when we have
> less groups, I felt it necessary to show the extreme case. I feel this puts
> a bit of a dampener on the future of this idea, as I've previously had
> patches rejected for adding 2-5% on planning time alone, adding over 50%
> execution time seems like a dead-end.

Thanks for working up this test case.  I certainly agree that adding
50% execution time is a dead-end, but I wonder if that problem can be
fixed somehow.  It would be a shame to find out that expanded-objects
can't be effectively used for anything other than the purposes for
which they were designed, namely speeding up PL/pgsql array
operations.

> seems neater. I just don't want to waste my time writing all the code
> required to replace all INTERNAL aggregate states when I know the
> performance regression is going to be unacceptable.

No argument there.  If the performance regression isn't fixable, this
approach is doomed.

> I also witnessed another regression with your patch when testing on another
> machine. It caused the plan to change to a HashAgg instead of GroupAgg
> causing a significant slowdown.
>
> Unpatched
>
> # explain analyze select a%100,length(string_agg(b,',')) from ab group
> by 1;
> QUERY PLAN
> ---
>  GroupAggregate  (cost=119510.84..144510.84 rows=100 width=32) (actual
> time=538.938..1015.278 rows=100 loops=1)
>Group Key: ((a % 100))
>->  Sort  (cost=119510.84..122010.84 rows=100 width=32) (actual
> time=538.917..594.194 rows=100 loops=1)
>  Sort Key: ((a % 100))
>  Sort Method: quicksort  Memory: 102702kB
>  ->  Seq Scan on ab  (cost=0.00..19853.00 rows=100 width=32)
> (actual time=0.016..138.964 rows=100 loops=1)
>  Planning time: 0.146 ms
>  Execution time: 1047.511 ms
>
>
> Patched
> # explain analyze select a%100,length(string_agg(b,',')) from ab group
> by 1;
>QUERY PLAN
> 
>  HashAggregate  (cost=24853.00..39853.00 rows=100

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Pavel Stehule

>
> > # explain analyze select a%100,length(string_agg(b,',')) from ab
> group
> > by 1;
> > QUERY PLAN
> >
> ---
> >  GroupAggregate  (cost=119510.84..144510.84 rows=100 width=32)
> (actual
> > time=538.938..1015.278 rows=100 loops=1)
> >Group Key: ((a % 100))
> >->  Sort  (cost=119510.84..122010.84 rows=100 width=32) (actual
> > time=538.917..594.194 rows=100 loops=1)
> >  Sort Key: ((a % 100))
> >  Sort Method: quicksort  Memory: 102702kB
> >  ->  Seq Scan on ab  (cost=0.00..19853.00 rows=100 width=32)
> > (actual time=0.016..138.964 rows=100 loops=1)
> >  Planning time: 0.146 ms
> >  Execution time: 1047.511 ms
> >
> >
> > Patched
> > # explain analyze select a%100,length(string_agg(b,',')) from ab
> group
> > by 1;
> >QUERY PLAN
> >
> 
> >  HashAggregate  (cost=24853.00..39853.00 rows=100 width=32) (actual
> > time=8072.346..144424.872 rows=100 loops=1)
> >Group Key: (a % 100)
> >->  Seq Scan on ab  (cost=0.00..19853.00 rows=100 width=32)
> (actual
> > time=0.025..481.332 rows=100 loops=1)
> >  Planning time: 0.164 ms
> >  Execution time: 263288.332 ms
>
> Well, that's pretty odd.  I guess the plan change must be a result of
> switching the transition type from internal to text, although I'm not
> immediately certain why that would make a difference.
>

It is strange, why hashaggregate is too slow?

Pavel



>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Combining Aggregates

On 19 January 2016 at 06:03, Pavel Stehule  wrote:

>
>
> >
>> > # explain analyze select a%100,length(string_agg(b,',')) from ab
>> group
>> > by 1;
>> > QUERY PLAN
>> >
>> ---
>> >  GroupAggregate  (cost=119510.84..144510.84 rows=100 width=32)
>> (actual
>> > time=538.938..1015.278 rows=100 loops=1)
>> >Group Key: ((a % 100))
>> >->  Sort  (cost=119510.84..122010.84 rows=100 width=32) (actual
>> > time=538.917..594.194 rows=100 loops=1)
>> >  Sort Key: ((a % 100))
>> >  Sort Method: quicksort  Memory: 102702kB
>> >  ->  Seq Scan on ab  (cost=0.00..19853.00 rows=100 width=32)
>> > (actual time=0.016..138.964 rows=100 loops=1)
>> >  Planning time: 0.146 ms
>> >  Execution time: 1047.511 ms
>> >
>> >
>> > Patched
>> > # explain analyze select a%100,length(string_agg(b,',')) from ab
>> group
>> > by 1;
>> >QUERY PLAN
>> >
>> 
>> >  HashAggregate  (cost=24853.00..39853.00 rows=100 width=32) (actual
>> > time=8072.346..144424.872 rows=100 loops=1)
>> >Group Key: (a % 100)
>> >->  Seq Scan on ab  (cost=0.00..19853.00 rows=100 width=32)
>> (actual
>> > time=0.025..481.332 rows=100 loops=1)
>> >  Planning time: 0.164 ms
>> >  Execution time: 263288.332 ms
>>
>> Well, that's pretty odd.  I guess the plan change must be a result of
>> switching the transition type from internal to text, although I'm not
>> immediately certain why that would make a difference.
>>
>
> It is strange, why hashaggregate is too slow?
>

Good question. I looked at this and found my VM was swapping like crazy.
Upon investigation it appears that's because, since the patch creates a
memory context per aggregated group, and in this case I've got 1 million of
them, it means we create 1 million context, which are
ALLOCSET_SMALL_INITSIZE (1KB) in size, which means about 1GB of memory,
which is more than my VM likes.

set work_mem = '130MB' does coax the planner into a GroupAggregate plan,
which is faster, but due to the the hash agg executor code not giving any
regard to work_mem. If I set work_mem to 140MB (which is more realistic for
this VM), it does cause the same swapping problems to occur.  Probably
setting aggtransspace for this aggregate to 1024 would help the costing
problem, but it would also cause hashagg to be a less chosen option during
planning.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Robert Haas

On Mon, Jan 18, 2016 at 12:34 PM, Robert Haas  wrote:
> On Mon, Jan 18, 2016 at 12:09 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> Yeah.  The API contract for an expanded object took me quite a while
>>> to puzzle out, but it seems to be this: if somebody hands you an R/W
>>> pointer to an expanded object, you're entitled to assume that you can
>>> "take over" that object and mutate it however you like.  But the
>>> object might be in some other memory context, so you have to move it
>>> into your own memory context.
>>
>> Only if you intend to keep it --- for example, a function that is mutating
>> and returning an object isn't required to move it somewhere else, if the
>> input is R/W, and I think it generally shouldn't.
>>
>> In the context here, I'd think it is the responsibility of nodeAgg.c
>> not individual datatype functions to make sure that expanded objects
>> live where it wants them to.
>
> That's how I did it in my prototype, but the problem with that is that
> spinning up a memory context for every group sucks when there are many
> groups with only a small number of elements each - hence the 50%
> regression that David Rowley observed.  If we're going to use expanded
> objects here, which seems like a good idea in principle, that's going
> to have to be fixed somehow.  We're flogging the heck out of malloc by
> repeatedly creating a context, doing one or two allocations in it, and
> then destroying the context.
>
> I think that, in general, the memory context machinery wasn't really
> designed to manage lots of small contexts.  The overhead of spinning
> up a new context for just a few allocations is substantial.  That
> matters in some other situations too, I think - I've commonly seen
> AllocSetContextCreate taking several percent  of runtime in profiles.
> But here it's much exacerbated.  I'm not sure whether it's better to
> attack that problem at the root and try to make AllocSetContextCreate
> cheaper, or whether we should try to figure out some change to the
> expanded-object machinery to address the issue.

Here is a patch that helps a good deal.  I changed things so that when
we create a context, we always allocate at least 1kB.  If that's more
than we need for the node itself and the name, then we use the rest of
the space as a sort of keeper block.  I think there's more that can be
done to improve this, but I'm wimping out for now because it's late
here.

I suspect something like this is a good idea even if we don't end up
using it for aggregate transition functions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index d26991e..3730a21 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -164,6 +164,14 @@ typedef void *AllocPointer;
 /*
  * AllocSetContext is our standard implementation of MemoryContext.
  *
+ * Note: An AllocSetContext can include one or more "keeper" blocks which
+ * are not freed when the context is reset.  "keeper" should point to the
+ * first of these blocks.  Sometimes, there may be a block which shouldn't
+ * be freed even when the context is deleted (because that space isn't a
+ * separate allocation); if so, "stopper" can point to this block.  "keeper"
+ * must be set whenever "stopper" is set, and must point to the same block
+ * as "stopper" or an earlier one.
+ *
  * Note: header.isReset means there is nothing for AllocSetReset to do.
  * This is different from the aset being physically empty (empty blocks list)
  * because we may still have a keeper block.  It's also different from the set
@@ -181,7 +189,8 @@ typedef struct AllocSetContext
 	Size		maxBlockSize;	/* maximum block size */
 	Size		nextBlockSize;	/* next block size to allocate */
 	Size		allocChunkLimit;	/* effective chunk size limit */
-	AllocBlock	keeper;			/* if not NULL, keep this block over resets */
+	AllocBlock	keeper;		/* on reset, stop freeing when we reach this */
+	AllocBlock	stopper;	/* on delete, stop freeing when we reach this */
 } AllocSetContext;
 
 typedef AllocSetContext *AllocSet;
@@ -248,7 +257,6 @@ typedef struct AllocChunkData
 static void *AllocSetAlloc(MemoryContext context, Size size);
 static void AllocSetFree(MemoryContext context, void *pointer);
 static void *AllocSetRealloc(MemoryContext context, void *pointer, Size size);
-static void AllocSetInit(MemoryContext context);
 static void AllocSetReset(MemoryContext context);
 static void AllocSetDelete(MemoryContext context);
 static Size AllocSetGetChunkSpace(MemoryContext context, void *pointer);
@@ -267,7 +275,6 @@ static MemoryContextMethods AllocSetMethods = {
 	AllocSetAlloc,
 	AllocSetFree,
 	AllocSetRealloc,
-	AllocSetInit,
 	AllocSetReset,
 	AllocSetDelete,
 	AllocSetGetChunkSpace,
@@ -440,13 +447,45 @@ AllocSetContextCreate(MemoryContext parent,
 	  Size maxBlockSize)
 {
 	AllocSet

Re: [HACKERS] Combining Aggregates

On 19 January 2016 at 18:04, Tomas Vondra 
wrote:

> Hi,
>
> On 01/19/2016 05:00 AM, David Rowley wrote:
>
>> On 19 January 2016 at 06:03, Pavel Stehule > > wrote:
>>
>> ...
>
>>
>> It is strange, why hashaggregate is too slow?
>>
>>
>> Good question. I looked at this and found my VM was swapping like crazy.
>> Upon investigation it appears that's because, since the patch creates a
>> memory context per aggregated group, and in this case I've got 1 million
>> of them, it means we create 1 million context, which are
>> ALLOCSET_SMALL_INITSIZE (1KB) in size, which means about 1GB of memory,
>> which is more than my VM likes.
>>
>
> Really? Where do we create the memory context? IIRC string_agg uses the
> aggcontext directly, and indeed that's what I see in string_agg_transfn and
> makeStringAggState.
>
>
Yeah, all this is talk is relating to Robert's expandedstring-v1.patch
which changes string_agg to use text and expanded-objects. This also means
that a memory context is created per group, which is rather a big overhead.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Tomas Vondra



On 01/19/2016 05:14 AM, Robert Haas wrote:

On Mon, Jan 18, 2016 at 12:34 PM, Robert Haas  wrote:

On Mon, Jan 18, 2016 at 12:09 PM, Tom Lane  wrote:

Robert Haas  writes:

Yeah.  The API contract for an expanded object took me quite a while
to puzzle out, but it seems to be this: if somebody hands you an R/W
pointer to an expanded object, you're entitled to assume that you can
"take over" that object and mutate it however you like.  But the
object might be in some other memory context, so you have to move it
into your own memory context.


Only if you intend to keep it --- for example, a function that is
mutating and returning an object isn't required to move it
somewhere else, if the input is R/W, and I think it generally
shouldn't.

In the context here, I'd think it is the responsibility of
nodeAgg.c not individual datatype functions to make sure that
expanded objects live where it wants them to.


That's how I did it in my prototype, but the problem with that is that
spinning up a memory context for every group sucks when there are many
groups with only a small number of elements each - hence the 50%
regression that David Rowley observed.  If we're going to use expanded
objects here, which seems like a good idea in principle, that's going
to have to be fixed somehow.  We're flogging the heck out of malloc by
repeatedly creating a context, doing one or two allocations in it, and
then destroying the context.

I think that, in general, the memory context machinery wasn't really
designed to manage lots of small contexts.  The overhead of spinning
up a new context for just a few allocations is substantial.  That
matters in some other situations too, I think - I've commonly seen
AllocSetContextCreate taking several percent  of runtime in profiles.
But here it's much exacerbated.  I'm not sure whether it's better to
attack that problem at the root and try to make AllocSetContextCreate
cheaper, or whether we should try to figure out some change to the
expanded-object machinery to address the issue.


Here is a patch that helps a good deal.  I changed things so that when
we create a context, we always allocate at least 1kB.  If that's more
than we need for the node itself and the name, then we use the rest of
the space as a sort of keeper block.  I think there's more that can be
done to improve this, but I'm wimping out for now because it's late
here.

I suspect something like this is a good idea even if we don't end up
using it for aggregate transition functions.


I dare to claim that if expanded objects require a separate memory 
context per aggregate state (I don't see why would be the case), it's a 
dead end. Not so long ago we've fixed exactly this issue in array_agg(), 
which addressed a bunch of reported OOM issues and improved array_agg() 
performance quite a bit. It'd be rather crazy introducing the same 
problem to all aggregate functions.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Tom Lane

Robert Haas  writes:
> Here is a patch that helps a good deal.  I changed things so that when
> we create a context, we always allocate at least 1kB.

That's going to kill performance in some other cases; subtransactions
in particular rely on the subtransaction's TransactionContext not causing
any actual malloc unless something gets put into the TransactionContext.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Tomas Vondra


Hi,

On 01/19/2016 05:00 AM, David Rowley wrote:

On 19 January 2016 at 06:03, Pavel Stehule > wrote:


...


It is strange, why hashaggregate is too slow?


Good question. I looked at this and found my VM was swapping like crazy.
Upon investigation it appears that's because, since the patch creates a
memory context per aggregated group, and in this case I've got 1 million
of them, it means we create 1 million context, which are
ALLOCSET_SMALL_INITSIZE (1KB) in size, which means about 1GB of memory,
which is more than my VM likes.


Really? Where do we create the memory context? IIRC string_agg uses the 
aggcontext directly, and indeed that's what I see in string_agg_transfn 
and makeStringAggState.


Perhaps you mean that initStringInfo() allocates 1kB buffers by default?



set work_mem = '130MB' does coax the planner into a GroupAggregate plan,
which is faster, but due to the the hash agg executor code not giving
any regard to work_mem. If I set work_mem to 140MB (which is more
realistic for this VM), it does cause the same swapping problems to
occur.  Probably setting aggtransspace for this aggregate to 1024 would
help the costing problem, but it would also cause hashagg to be a less
chosen option during planning.


I'm not quite sure I understand - the current code ends up using 8192 
for the transition space (per count_agg_clauses_walker). Are you 
suggesting lowering the value, despite the danger of OOM issues?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Tomas Vondra




On 01/19/2016 06:04 AM, Tomas Vondra wrote:

Hi,

On 01/19/2016 05:00 AM, David Rowley wrote:

Good question. I looked at this and found my VM was swapping like crazy.
Upon investigation it appears that's because, since the patch creates a
memory context per aggregated group, and in this case I've got 1 million
of them, it means we create 1 million context, which are
ALLOCSET_SMALL_INITSIZE (1KB) in size, which means about 1GB of memory,
which is more than my VM likes.


Really? Where do we create the memory context? IIRC string_agg uses the
aggcontext directly, and indeed that's what I see in string_agg_transfn
and makeStringAggState.

Perhaps you mean that initStringInfo() allocates 1kB buffers by default?


...


I'm not quite sure I understand - the current code ends up using 8192
for the transition space (per count_agg_clauses_walker). Are you
suggesting lowering the value, despite the danger of OOM issues?


Meh, right after sending the message I realized that perhaps this 
relates to Robert's patch and that maybe it changes this. And indeed, it 
changes the type from internal to text, and thus the special case in 
count_agg_clauses_walker no longer applies.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

On 19 January 2016 at 02:44, Haribabu Kommi 
wrote:

> On Mon, Jan 18, 2016 at 10:32 PM, David Rowley
>  wrote:
>
> I just thought like direct mapping of the structure with text pointer.
> something like
> the below.
>
> result = PG_ARGISNULL(0) ? NULL : (text *) PG_GETARG_POINTER(0);
> state = (PolyNumAggState *)VARDATA(result);
>
> To handle the big-endian or little-endian, we may need some extra changes.
>
> Instead of adding 3 new columns to the pg_aggregate catalog table to handle
> the internal types, either something like the above to handle the internal
> types
> or some other way is better IMO.

The problem with that is that most of these internal structs for the
aggregate states have pointers to other memory, so even if we laid those
bytes down into a bytea or something, then doing so is not going to
dereference the pointers to the other memory, and when we dereference those
pointers in the other process, we'll have problems as these addresses
belong to the other process.

For example PolyNumAggState is defined as:

typedef NumericAggState PolyNumAggState;

and NumericAggState has:

NumericVar sumX; /* sum of processed numbers */
NumericVar sumX2; /* sum of squares of processed numbers */

And NumericVar has:

NumericDigit *buf; /* start of palloc'd space for digits[] */
NumericDigit *digits; /* base-NBASE digits */

Both of these point to other memory which won't be in the varlena type.

Serialization is the process of collecting all of these pointers up in to
some consecutive bytes.

Of course, that's not to say that there's never Aggregate State structs
which don't have any pointers, I've not checked, but in these cases we
could (perhaps) just make the serialize and deserialize function a simple
memcpy() into a bytea array, although in reality, as you mentioned, we'd
likely want to agree on some format that's cross platform for different
byte orders, as we'll probably, one day, want to forward these values over
to some other server to finish off the aggregation.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Combining Aggregates