Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Sun, Apr 13, 2014 at 2:23 PM, Tom Lane wrote: > * I also left out the table documenting which aggregates have this > optimization. That's not the kind of thing we ordinarily document, > and it seems inevitable to me that such a table would be noteworthy > mostly for wrong/incomplete/obsolete information in the future. I tend to think that not documenting such things is an error. Sure, the documentation could become obsolete, but I don't see why it's particularly more likely with this table than anywhere else, and if it does happen, and people care, they'll submit patches to fix it. More to the point, when we don't document things like this, it doesn't cause the knowledge not to be important to end-users; it just means that the knowledge lives on wiki pages, support fora, and the minds of people who are "in the know" rather than being easily and generally accessible. I'd rather have documentation on this topic that was, say, 80% correct than have none at all. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
> -Original Message- > From: Tom Lane [mailto:t...@sss.pgh.pa.us] > Sent: 14 April 2014 06:23 > To: Dean Rasheed > Cc: Florian Pflug; David Rowley; Kevin Grittner; Josh Berkus; Greg Stark; > PostgreSQL-development > Subject: Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions > (WIP) > > Dean Rasheed writes: > > OK, I'm marking this ready for committer attention, on the > > understanding that that doesn't include the invtrans_minmax patch. > > I've finished reviewing/revising/committing this submission. That's great news! Tom, Thanks for taking the time to make the all modifications and commit this. Dean, Thank you for all your hard work reviewing the patch. The detail of the review was quite impressive. I'm really happy to see this make it in for 9.4. Regards David Rowley -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr13, 2014, at 20:23 , Tom Lane wrote: > Dean Rasheed writes: >> OK, I'm marking this ready for committer attention, on the >> understanding that that doesn't include the invtrans_minmax patch. > > I've finished reviewing/revising/committing this submission. Cool! Thanks! > Some notes: > > * I left out the EXPLAIN ANALYZE statistics, as I still feel that they're > of little if any use to regular users, and neither very well defined nor > well documented. We can revisit that later of course. > > * I also left out the table documenting which aggregates have this > optimization. That's not the kind of thing we ordinarily document, > and it seems inevitable to me that such a table would be noteworthy > mostly for wrong/incomplete/obsolete information in the future. I can live with each leaving each of these out individually, but leaving them both out means there's now no way of determining how a particular sliding window aggregation will behave. That leaves our users a bit out in the cold IMHO. For example, with this patch you'd usually want to write SUM(numeric_value::numeric(n,m)) instead of SUM(numeric_value)::numeric(n,m) in a sliding window aggregation, and there should be *some* way for users to figure this out. We have exhaustive tables of all the functions, operators and aggregates we support, and maintenance of those seems to work well for the most part. Why would a table of invertible aggregates be any different? If it's the non-locality that bothers you - I guess the same information could be added to the descriptions of the individual aggregates, instead of having one big table. > * I rejected the invtrans_collecting sub-patch altogether. I didn't > like anything about the idea of adding a poorly-documented field to > ArrayBuildState and then teaching some random subset of the functions > using that struct what to do with it. I think it would've been simpler, > more reliable, and not that much less efficient to just do memmove's in > shiftArrayResult. The string-aggregate changes were at least more > localized, but they still seemed awfully messy. And there's a bigger > issue: these aggregates have to do O(N) work per row for a frame of length > N anyway, so it's not clear to me that there's any big-O gain to be had > from making them into moving aggregates. Yeah, those probably aren't super-usefull. I initially added array_agg because with the nonstrict-forward/strict-reverse rule still in place, there wouldn't otherwise have been an in-core aggregate which exercises the non-strict case, which seemed like a bad idea. For the string case - I didn't expect that to turn out to be *quite* this messy when I started implementing it. > Anyway, this is nice forward progress for 9.4, even if we get no further. Yup! Thanks to everybody who made this happens! best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Dean Rasheed writes: > OK, I'm marking this ready for committer attention, on the > understanding that that doesn't include the invtrans_minmax patch. I've finished reviewing/revising/committing this submission. Some notes: * I left out the EXPLAIN ANALYZE statistics, as I still feel that they're of little if any use to regular users, and neither very well defined nor well documented. We can revisit that later of course. * I also left out the table documenting which aggregates have this optimization. That's not the kind of thing we ordinarily document, and it seems inevitable to me that such a table would be noteworthy mostly for wrong/incomplete/obsolete information in the future. * I rejected the invtrans_collecting sub-patch altogether. I didn't like anything about the idea of adding a poorly-documented field to ArrayBuildState and then teaching some random subset of the functions using that struct what to do with it. I think it would've been simpler, more reliable, and not that much less efficient to just do memmove's in shiftArrayResult. The string-aggregate changes were at least more localized, but they still seemed awfully messy. And there's a bigger issue: these aggregates have to do O(N) work per row for a frame of length N anyway, so it's not clear to me that there's any big-O gain to be had from making them into moving aggregates. I doubt people are going to be using these aggregates with very wide frames, just because they're going to be horribly expensive no matter what we do. Anyway, this is nice forward progress for 9.4, even if we get no further. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr11, 2014, at 19:42 , Tom Lane wrote: > Florian Pflug writes: >> Yes, the idea had come up at some point during the review discussion. I >> agree that it's only worthwhile if it works for state type internal - though >> I think there ought to be a way to allow that. We could for example simply >> decree that the initfunc's first parameter must be of type internal if it's >> return type is, and then just pass NULL for that parameter. > > I had thought about that, but it doesn't really work since it'd be > violating the strictness spec of the function. Only if we insist on passing SQL NULL, not if we passed an non-NULL value that happens to be (char*)0. Or we could simply require the initfunc to be non-strict in the case of state type internal. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Florian Pflug writes: > Yes, the idea had come up at some point during the review discussion. I > agree that it's only worthwhile if it works for state type internal - though > I think there ought to be a way to allow that. We could for example simply > decree that the initfunc's first parameter must be of type internal if it's > return type is, and then just pass NULL for that parameter. I had thought about that, but it doesn't really work since it'd be violating the strictness spec of the function. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Dean Rasheed writes: > On 10 April 2014 19:54, Tom Lane wrote: >> So if we go with that terminology, perhaps these names for the >> new CREATE AGGREGATE parameters: >> >> initfuncapplies to plain aggregation, mutually exclusive with >> initcond >> msfunc (or just mfunc?) forward transition for moving-agg mode >> mifunc inverse transition for moving-agg mode >> mstype state datatype for moving-agg mode >> msspace space estimate for mstype >> mfinalfunc final function for moving-agg mode >> minitfunc "firsttrans" for moving-agg mode >> minitcond mutually exclusive with minitfunc > Yeah, those work for me. > I think I prefer "mfunc" to "msfunc", but perhaps that's just my > natural aversion to the "ms" prefix :-) Meh. We've got mstype, and I don't think leaving out the "s" there feels right. > Also, perhaps "minvfunc" rather than "mifunc" because "i" by itself > could mean "initial". Good point. So with initfuncs out of the picture, we have new CREATE AGGREGATE parameter names msfunc forward transition for moving-agg mode minvfuncinverse transition for moving-agg mode mfinalfunc final function for moving-agg mode mstype state datatype for moving-agg mode msspace space estimate for mstype minitcond initial state value for moving-agg mode and new pg_aggregate columns aggmtransfn | regproc | not null aggminvtransfn| regproc | not null aggmfinalfn | regproc | not null aggmtranstype | oid | not null aggmtransspace| integer | not null aggminitval | text | It's a bit unfortunate that the catalog column names aren't quite on the same page as CREATE AGGREGATE, but it doesn't seem like a good idea to try to fix that now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr11, 2014, at 19:04 , Tom Lane wrote: > It strikes me that your second point > >> 2) It allows strict aggregates whose state type and input type aren't >> binary coercible to return NULL if all inputs were NULL without any >> special coding. As it stands today, this doesn't work without some >> kind of counter in the state, because the final function otherwise >> won't know if there were non-NULL inputs or not. Aggregates whose state >> and input types are binary coercible get around that by setting the >> initial value to NULL while *still* being strict, and relying on the >> state replacement behaviour of the aggregate machinery. > > could be addressed by the initfunc idea, but I'm still not sufficiently > excited by that one. Given the problem with internal-type transition > values, I think this could only win if there are cases where it would let > us use a regular SQL type instead of internal for the transition value; > and I'm not sure that there are many/any such cases. Yes, the idea had come up at some point during the review discussion. I agree that it's only worthwhile if it works for state type internal - though I think there ought to be a way to allow that. We could for example simply decree that the initfunc's first parameter must be of type internal if it's return type is, and then just pass NULL for that parameter. What I like about the initfunc idea is that it also naturally extends to ordered-set aggregates, I think it'd be very useful for some possible ordered-set aggregates to received their direct arguments beforehand and not afterwards. But that all seems largely orthogonal to the invtrans patch. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Florian Pflug writes: > On Apr11, 2014, at 17:09 , Tom Lane wrote: >> Basically, this comes down to a design judgment call, and my judgment >> is differing from yours. In the absence of opinions from others, >> I'm going to do it my way. > Ok. Are you going to do the necessary changes, or shall I? I'm happy to > leave it to you, but if you lack the time I can try to find some. Nah, I'm happy to do the work, since it's me insisting on changing it. >> Tell me about the special case here again? Should we revisit the issue? > ... > I don't feel too strongly either way on this one - I initially implemented the > exception because I noticed that the NULL handling of some aggregates was > broken otherwise, and it seemed simpler to fix this in one place than going > over all the aggregates separately. OTOH, when I wrote the docs, I noticed > how hard it was to describe the behaviour accurately, which made me like it > less and less. And Dean wasn't happy with it at all, so that finally settled > it. Yeah, if you can't document the design nicely, it's probably not a good idea :-(. Thanks for the summary. It strikes me that your second point > 2) It allows strict aggregates whose state type and input type aren't > binary coercible to return NULL if all inputs were NULL without any > special coding. As it stands today, this doesn't work without some > kind of counter in the state, because the final function otherwise > won't know if there were non-NULL inputs or not. Aggregates whose state > and input types are binary coercible get around that by setting the > initial value to NULL while *still* being strict, and relying on the > state replacement behaviour of the aggregate machinery. could be addressed by the initfunc idea, but I'm still not sufficiently excited by that one. Given the problem with internal-type transition values, I think this could only win if there are cases where it would let us use a regular SQL type instead of internal for the transition value; and I'm not sure that there are many/any such cases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr11, 2014, at 17:09 , Tom Lane wrote: > Basically, this comes down to a design judgment call, and my judgment > is differing from yours. In the absence of opinions from others, > I'm going to do it my way. Ok. Are you going to do the necessary changes, or shall I? I'm happy to leave it to you, but if you lack the time I can try to find some. >> ... SUM(int4) wouldn't need >> *any* extra state at all to be invertible, if it weren't for those pesky >> issues surrounding NULL handling. In fact, an earlier version of the >> invtrans patch *did* use int4_sum as SUM(int4)'s transfer functions! The >> only reason this doesn't work nowadays is that Dean didn't like the >> forward-nonstrict-but-inverse-strict special case that made this work. > > Tell me about the special case here again? Should we revisit the issue? My original coding allows the combination of non-strict forward with strict reverse transition functions. The combination behaved like a strict aggregate regarding NULL handling - i.e., neither the forward nor the reverse transition function received NULL inputs. But if the initial state was NULL, the forward transition function *would* be called with that NULL state value upon seeing the first non-NULL input. In the window case, the aggregation machinery would also take care to reset the state type to it's initial value when it removed the last non-NULL input from the aggregation state (just like it does for strict aggregates today). This had two advantages 1) First, it allows strict aggregates to use state type internal. As it stands now, aggregates with state type internal must implement their own NULL handling, even if they behave exactly like most standard aggregates do, namely ignore NULLS and return NULL only if there were no non-NULL inputs. 2) It allows strict aggregates whose state type and input type aren't binary coercible to return NULL if all inputs were NULL without any special coding. As it stands today, this doesn't work without some kind of counter in the state, because the final function otherwise won't know if there were non-NULL inputs or not. Aggregates whose state and input types are binary coercible get around that by setting the initial value to NULL while *still* being strict, and relying on the state replacement behaviour of the aggregate machinery. It, however, also has a few disadvantages 3) It means that one needs to look at the inverse transition function's strictness setting even if that function is never used. 4) It feels a bit hacky. (2) is what affects SUM(int4). The only reason it track the number of inputs is to be able to return NULL instead of 0 if no inputs remain. One idea to fix (3) and (4) was *explicitly* flagging aggregates as NULL-handling or NULL-ignoring, and also as what one might call "weakly strict", i.e. as returning NULL exactly if there were no non-NULL inputs. There are various variations of that theme possible - one flag for each behaviour, or simply a single "common behaviour" flag. In the end, I decided not to pursue that, mostly because the aggregates affected by that issued turned out to be relatively easy to fix. For the ones with state type internal, I simply added a counter, and I made SUM(int4) use AVG's transition function. I don't feel too strongly either way on this one - I initially implemented the exception because I noticed that the NULL handling of some aggregates was broken otherwise, and it seemed simpler to fix this in one place than going over all the aggregates separately. OTOH, when I wrote the docs, I noticed how hard it was to describe the behaviour accurately, which made me like it less and less. And Dean wasn't happy with it at all, so that finally settled it. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Florian Pflug writes: > Which is why I feel that having two separate sets of transition functions > and state types solves the wrong problem. If we find a way to prevent > transition functions from being called directly, we'll shave a few cycles > of quite a few existing aggregates, invertible or not. If we find a way > around the initfunc-for-internal-statetype problem you discovered, the > implementation would get much simpler, since we could then make nearly > all of them strict. And this would again shave off a few cycles - for lots > of NULL inputs, the effect could even be large. Since neither of those latter things seems likely to happen, I don't find this argument convincing at all. Even if they did happen, they would represent an incremental improvement that would be equally useful with either one or two sfuncs. Basically, this comes down to a design judgment call, and my judgment is differing from yours. In the absence of opinions from others, I'm going to do it my way. However, quite independently of how many sfuncs there are, I'm interested about this: > ... SUM(int4) wouldn't need > *any* extra state at all to be invertible, if it weren't for those pesky > issues surrounding NULL handling. In fact, an earlier version of the > invtrans patch *did* use int4_sum as SUM(int4)'s transfer functions! The > only reason this doesn't work nowadays is that Dean didn't like the > forward-nonstrict-but-inverse-strict special case that made this work. Tell me about the special case here again? Should we revisit the issue? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 10 April 2014 22:52, Tom Lane wrote: > Dean Rasheed writes: >> I was imagining that firsttrans would only be passed the first value >> to be aggregated, not any previous state, and that it would be illegal >> to specify both an initcond and a firsttrans function. > >> The forward transition function would only be called for values after >> the first, by which point the state would be non-null, and so it could >> be made strict in most cases. The same would apply to the invertible >> transition functions, so they wouldn't have to do null counting, which >> in turn would make their state types simpler. > > I put together a very fast proof-of-concept patch for this (attached). > It has a valid execution path for an aggregate with initfunc, but I didn't > bother writing the CREATE AGGREGATE support yet. I made sum(int4) work > as you suggest, marking the transfn strict and ripping out int4_sum's > internal support for null inputs. The result seems to be about a 4% or so > improvement in the overall aggregation speed, for a simple "SELECT > sum(int4col) FROM table" query. So from a performance standpoint this > seems only marginally worth doing. The real problem is not that 4% isn't > worth the trouble, it's that AFAICS the only built-in aggregates that > can benefit are sum(int2) and sum(int4). So that looks like a rather > narrow use-case. > > You had suggested upthread that we could use this idea to make the > transition functions strict for aggregates using "internal" transition > datatypes, but that does not work because the initfunc would violate > the safety rule that a function returning internal must take at least > one internal-type argument. That puts a pretty strong damper on the > usefulness of the approach, given how many internal-transtype aggregates > we have (and the moving-aggregate case is not going to be better is it?) > Ah, that's disappointing. If it can't be made to work for aggregates with internal state types, then I think it loses most of it's value, and I don't think it will be of much more use in the moving aggregate case either. > So at this point I'm feeling unexcited about the initfunc idea. > Unless it does something really good for the moving-aggregate case, > I think we should drop it. > Agreed. Thanks for prototyping it though. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr11, 2014, at 01:30 , Tom Lane wrote: > Florian Pflug writes: >> As for evidence - have you looked at the patch I posted? I'd be very >> interested to know if it removes the performance differences you saw. > > (1) You can't really prove the absence of a performance issue by showing > that one specific aggregate doesn't have an issue. I'm claiming that SUM(int4) is about as simple as it gets, so if the effect can be mitigated there, it can be mitigated everywhere. The more complex a forward-only transition function is, the less will and added if or two hurt. > (2) These results > (mine as well as yours) seem mighty platform-dependent, so the fact you > don't see an issue doesn't mean someone else won't. Yeah, though *any* change - mine, the one your propose, and any other - has the potential to hurt some platform due to weird interactions (say, cache trashing). > (3) A new > FunctionCallInfoData field just for this? Surely not. There's got to be > a distributed cost to that (although I notice you didn't bother > initializing the field most places, which is cheating). I think the field doesn't actually increase the size of the structure at all - at least not if the bool before it has size 1 and the short following it is 2-byte aligned. Or at least that was why I made it a char, and added it at the place I did. But I might be missing something... Also, InitFunctionCallInfoData *does* initialize the flags to zero. Though maybe not everybody uses that - I didn't check, this was just a quick hack. > Now point 3 could be addressed by doing the signaling in some other way > with the existing context field. But it's still the case that you're > trying to band-aid a bad design. There's no good reason to make the sfunc > do extra work to be invertible in contexts where we know, with certainty, > that that work is useless. This is the principal point where we disagree, I think. From my POV, the problem isn't invertibility here at all. Heck, SUM(int4) wouldn't need *any* extra state at all to be invertible, if it weren't for those pesky issues surrounding NULL handling. In fact, an earlier version of the invtrans patch *did* use int4_sum as SUM(int4)'s transfer functions! The only reason this doesn't work nowadays is that Dean didn't like the forward-nonstrict-but-inverse-strict special case that made this work. The way I see it, the main problem is the drop in performance that comes from using a pass-by-ref state type. Which IMHO happens because this *already* is a heavily band-aided design - all the state validation and "if (AggCheckCallContext(,NULL))" stuff really works around the fact that transition functions *aren't* supposed to be user-called, yet they are, and must deal. Which is why I feel that having two separate sets of transition functions and state types solves the wrong problem. If we find a way to prevent transition functions from being called directly, we'll shave a few cycles of quite a few existing aggregates, invertible or not. If we find a way around the initfunc-for-internal-statetype problem you discovered, the implementation would get much simpler, since we could then make nearly all of them strict. And this would again shave off a few cycles - for lots of NULL inputs, the effect could even be large. Compared to that, the benefit of having a completely separate set of transition functions and state types for the invertible case is much less tangible, IMHO. > Especially not when we know that even a few instructions of extra work > can be performance-significant. But where do we draw that line? nodeWindowAgg.c quite certainly wastes about as many cycles as int4_avg_accum does on various checks that are unnecessary unless in the non-sliding window case. Do we duplicate those functions too? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Florian Pflug writes: > My argument is that is costs us more complexity to duplicate everything > for the invertible case, *and* the result seems less flexible - not > from the POV of aggregate implementations, but from the POV of future > extensions. [ shrug... ] You can argue against any feature whatsoever by claiming that it might possibly conflict with something we would wish to do sometime in the future. I think you need to have a much more concrete argument about specific issues this will cause in order to be convincing. For all we know about ROLLUP/CUBE implementation issues right now, doing this feature with separate implementations might make that easier not harder. (I note that the crux of my complaint right now is that we're asking sfuncs to serve two masters --- how's it going to be better when they have to serve three or four?) > As for evidence - have you looked at the patch I posted? I'd be very > interested to know if it removes the performance differences you saw. (1) You can't really prove the absence of a performance issue by showing that one specific aggregate doesn't have an issue. (2) These results (mine as well as yours) seem mighty platform-dependent, so the fact you don't see an issue doesn't mean someone else won't. (3) A new FunctionCallInfoData field just for this? Surely not. There's got to be a distributed cost to that (although I notice you didn't bother initializing the field most places, which is cheating). Now point 3 could be addressed by doing the signaling in some other way with the existing context field. But it's still the case that you're trying to band-aid a bad design. There's no good reason to make the sfunc do extra work to be invertible in contexts where we know, with certainty, that that work is useless. Especially not when we know that even a few instructions of extra work can be performance-significant. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr11, 2014, at 00:07 , Tom Lane wrote: > Florian Pflug writes: >> I still think you're getting ahead of yourselves here. The number of >> aggregates which benefit from this is tiny SUM(int2,int4) and maybe >> BOOL_{AND,OR}. And in the SUM(int2,int4) case *only* on 64-bit archs - >> for the others, the state type is already pass-by-ref. > > That argument is reasonable for the initfunc idea, but it doesn't apply > otherwise. Why not? AFAICS, the increase in cost comes from going from an by-value to a by-reference state type. Once you're using a by-refence type, you already pay the overhead of the additional dereferences, and for calling AggCheckCallContext() or some equivalent. >> Another reason I'm so opposed to this is that inverse transition >> functions might not be the last kind of transition functions we ever >> add. For example, if we ever get ROLLUP/CUBE, we might want to have >> a mergefunc which takes two aggregation states and combines them >> into one. What do we do if we add those? > > Make more pg_aggregate columns. What exactly do you think is either > easier or harder about such cases? Also, maybe I'm misremembering > the spec, but ROLLUP/CUBE wouldn't apply to window functions would > they? So if your argument is based on the assumption that these > features need to combine, I'm not sure it's true. Well, it was just an example - there might be other future extensions which *do* need to combine. And we've been known to go beyond the spec sometimes... > Furthermore, I do not buy the argument that if we hack hard enough, > we can make the performance cost of forcing the sfunc to do double duty > negligible. In the first place, that argument is unsupported by much > evidence, and in the second place, it will certainly cost us complexity > to make the performance issue go away. Instead we can just design the > problem out, for nothing that I see as a serious drawback. My argument is that is costs us more complexity to duplicate everything for the invertible case, *and* the result seems less flexible - not from the POV of aggregate implementations, but from the POV of future extensions. As for evidence - have you looked at the patch I posted? I'd be very interested to know if it removes the performance differences you saw. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Florian Pflug writes: > I still think you're getting ahead of yourselves here. The number of > aggregates which benefit from this is tiny SUM(int2,int4) and maybe > BOOL_{AND,OR}. And in the SUM(int2,int4) case *only* on 64-bit archs - > for the others, the state type is already pass-by-ref. That argument is reasonable for the initfunc idea, but it doesn't apply otherwise. > Another reason I'm so opposed to this is that inverse transition > functions might not be the last kind of transition functions we ever > add. For example, if we ever get ROLLUP/CUBE, we might want to have > a mergefunc which takes two aggregation states and combines them > into one. What do we do if we add those? Make more pg_aggregate columns. What exactly do you think is either easier or harder about such cases? Also, maybe I'm misremembering the spec, but ROLLUP/CUBE wouldn't apply to window functions would they? So if your argument is based on the assumption that these features need to combine, I'm not sure it's true. The bottom line for me is that it seems conceptually far cleaner to make the moving-aggregate support be independent than to insist that it share an stype and sfunc with the plain case. Furthermore, I do not buy the argument that if we hack hard enough, we can make the performance cost of forcing the sfunc to do double duty negligible. In the first place, that argument is unsupported by much evidence, and in the second place, it will certainly cost us complexity to make the performance issue go away. Instead we can just design the problem out, for nothing that I see as a serious drawback. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr10, 2014, at 21:34 , Dean Rasheed wrote: > On 10 April 2014 19:54, Tom Lane wrote: >> So if we go with that terminology, perhaps these names for the >> new CREATE AGGREGATE parameters: >> >> initfuncapplies to plain aggregation, mutually exclusive with >> initcond >> msfunc (or just mfunc?) forward transition for moving-agg mode >> mifunc inverse transition for moving-agg mode >> mstype state datatype for moving-agg mode >> msspace space estimate for mstype >> mfinalfunc final function for moving-agg mode >> minitfunc "firsttrans" for moving-agg mode >> minitcond mutually exclusive with minitfunc > > I think I prefer "mfunc" to "msfunc", but perhaps that's just my > natural aversion to the "ms" prefix :-) > > Also, perhaps "minvfunc" rather than "mifunc" because "i" by itself > could mean "initial". I still think you're getting ahead of yourselves here. The number of aggregates which benefit from this is tiny SUM(int2,int4) and maybe BOOL_{AND,OR}. And in the SUM(int2,int4) case *only* on 64-bit archs - for the others, the state type is already pass-by-ref. I don't think we should be additional that much additional machinery until it has been conclusively demonstrated that the performance regression cannot be fixed any other way. Which, quite frankly, I don't believe. Nothing in int4_avg_accum looks particularly expensive, and the things that *do* seem to cost something certainly can be made cheaper. c.f. the patch I just posted. Another reason I'm so opposed to this is that inverse transition functions might not be the last kind of transition functions we ever add. For example, if we ever get ROLLUP/CUBE, we might want to have a mergefunc which takes two aggregation states and combines them into one. What do we do if we add those? Add yet a another set of "mergable" transition functions? What about the various combinations of invertible/non-invertible mergable/non-mergable that could result? The opportunity cost seems pretty high here... best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Dean Rasheed writes: > I was imagining that firsttrans would only be passed the first value > to be aggregated, not any previous state, and that it would be illegal > to specify both an initcond and a firsttrans function. > The forward transition function would only be called for values after > the first, by which point the state would be non-null, and so it could > be made strict in most cases. The same would apply to the invertible > transition functions, so they wouldn't have to do null counting, which > in turn would make their state types simpler. I put together a very fast proof-of-concept patch for this (attached). It has a valid execution path for an aggregate with initfunc, but I didn't bother writing the CREATE AGGREGATE support yet. I made sum(int4) work as you suggest, marking the transfn strict and ripping out int4_sum's internal support for null inputs. The result seems to be about a 4% or so improvement in the overall aggregation speed, for a simple "SELECT sum(int4col) FROM table" query. So from a performance standpoint this seems only marginally worth doing. The real problem is not that 4% isn't worth the trouble, it's that AFAICS the only built-in aggregates that can benefit are sum(int2) and sum(int4). So that looks like a rather narrow use-case. You had suggested upthread that we could use this idea to make the transition functions strict for aggregates using "internal" transition datatypes, but that does not work because the initfunc would violate the safety rule that a function returning internal must take at least one internal-type argument. That puts a pretty strong damper on the usefulness of the approach, given how many internal-transtype aggregates we have (and the moving-aggregate case is not going to be better is it?) So at this point I'm feeling unexcited about the initfunc idea. Unless it does something really good for the moving-aggregate case, I think we should drop it. regards, tom lane diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c index fe6dc8a..1ca5c8f 100644 *** a/src/backend/catalog/pg_aggregate.c --- b/src/backend/catalog/pg_aggregate.c *** AggregateCreate(const char *aggName, *** 390,395 --- 390,396 values[Anum_pg_aggregate_aggfnoid - 1] = ObjectIdGetDatum(procOid); values[Anum_pg_aggregate_aggkind - 1] = CharGetDatum(aggKind); values[Anum_pg_aggregate_aggnumdirectargs - 1] = Int16GetDatum(numDirectArgs); + values[Anum_pg_aggregate_agginitfn - 1] = ObjectIdGetDatum(InvalidOid); values[Anum_pg_aggregate_aggtransfn - 1] = ObjectIdGetDatum(transfn); values[Anum_pg_aggregate_aggfinalfn - 1] = ObjectIdGetDatum(finalfn); values[Anum_pg_aggregate_aggsortop - 1] = ObjectIdGetDatum(sortop); diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index 7e4bca5..2671c49 100644 *** a/src/backend/executor/nodeAgg.c --- b/src/backend/executor/nodeAgg.c *** typedef struct AggStatePerAggData *** 152,157 --- 152,158 int numTransInputs; /* Oids of transfer functions */ + Oid initfn_oid; /* may be InvalidOid */ Oid transfn_oid; Oid finalfn_oid; /* may be InvalidOid */ *** typedef struct AggStatePerAggData *** 160,165 --- 161,167 * corresponding oid is not InvalidOid. Note in particular that fn_strict * flags are kept here. */ + FmgrInfo initfn; FmgrInfo transfn; FmgrInfo finalfn; *** typedef struct AggHashEntryData *** 296,301 --- 298,306 static void initialize_aggregates(AggState *aggstate, AggStatePerAgg peragg, AggStatePerGroup pergroup); + static void initialize_transition_value(AggState *aggstate, + AggStatePerAgg peraggstate, + AggStatePerGroup pergroupstate); static void advance_transition_function(AggState *aggstate, AggStatePerAgg peraggstate, AggStatePerGroup pergroupstate); *** initialize_aggregates(AggState *aggstate *** 403,408 --- 408,498 } /* + * Initialize the transition value upon reaching the first non-NULL input(s). + * + * We use this code when the initcond is NULL and the transfn is strict, + * which otherwise would mean the transition value can never become non-null. + * If an initfn has been provided, call it on the current input value(s); + * otherwise, take the current input value as the new transition value. + * (In the latter case, we already checked that this is okay datatype-wise.) + */ + static void + initialize_transition_value(AggState *aggstate, + AggStatePerAgg peraggstate, + AggStatePerGroup pergroupstate) + { + FunctionCallInfo tfcinfo = &peraggstate->transfn_fcinfo; + MemoryContext oldContext; + + if (OidIsValid(peraggstate->initfn_oid)) + { + FunctionCallInfoData fcinfo; + int numInitArgs; + Datum newVal; + + /* We run the transition functions in per-input-tuple memory context *
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr10, 2014, at 02:13 , Florian Pflug wrote: > On Apr9, 2014, at 23:17 , Florian Pflug wrote: >> On Apr9, 2014, at 21:35 , Tom Lane wrote: >>> A quick test says that avg(int4) >>> is about five percent slower than sum(int4), so that's the kind of hit >>> we'd be taking on non-windowed aggregations if we do it like this. >> >> That's rather surprising though. AFAICS, there's isn't much difference >> between the two transfer functions int4_sum and int4_avg_accum at all. >> >> The differences come down to (+ denoting things which ought to make >> int4_avg_accum slower compared to int4_sum, - denoting the opposite) >> >> 1. +) int4_avg_accum calls AggCheckCallContext >> 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum) >> 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS >> to verify that the state is a 2-element array without NULL entries >> 4. -) int4_sum checks if the input is NULL > > I've done a bit of profiling on this (using Instruments.app on OSX). One > thing I missed is that inv4_avg_accum also calls pg_detoast_datum - that > calls comes from the PG_GETARG_ARRAYTYPE_P macro. Doing that is a bit silly, > since we know that the datum cannot possibly be toasted I think (or if it > was, nodeAgg.c should do the de-toasting). > > The profile also attributes a rather large percent of the total runtime of > int4_avg_accum (around 30%) to the call of AggCheckCallContext(). Getting > rid of that would help quite a few transition functions, invertible or not. > That certainly seems doable - we'd need a way to mark functions as internal > support functions, and prevent user-initiated calls of such functions. > Transition functions marked that way could then safely scribble over their > state arguments without having to consult AggCheckCallContext() first. > > ... > > However, I still believe the best approach at this point is to just work > on making int4_avg_accum faster. I still see no principal reason what it > has to be noticeably slower - the only additional work it absolutely *has* > to perform is *one* 64-bit increment. I played with this a bit. Currently, int4_avg_accum invokes AggCheckCallContext every time, and also repeatedly checks whether the array has the required dimension - which, incidentally, is the only big difference between int4_avg_accum and int4_sum. To avoid that, I added a flags field to fcinfo which nodeAgg uses to tell transition functions whether they're called for the first time, or are being called with whatever state they themselves returned last time, i.e the n-th time. If the n-th time flag is set, int4_avg_accum simply retrieves the state with PG_GETARG_DATUM() instead of PG_GETARG_ARRAYTYPE_P(), relying on the fact that it never returns toasted datums itself, and also doesn't bother to validate the array size, for the same reason. If the flag is not set, it uses PG_GETARG_ARRAYTYPE_COPY_P and does validate the array size. In theory, it could further distinguish between a 1st call in an aggregation context (where the copy is unnecessary), and a user-initiated call (i.e. outside an aggregation). But that seems unnecessary - one additional copy per aggregation group seems unlikely to be a problem. With this in place, instruction-level profiling using Apple's Instruments.app shows that int4_avg_accum takes about 1.5% of the total runtime of a simple aggregation, while int4_sum takes about 1.2%. A (very rough) patch is attached. I haven't been able to repeat Tom's measurement which shows a 5% difference in performance between the invertible and the non-invertible versions of SUM(int4), so I cannot say if this removes that. But the profiling I've done would certainly indicate it should. best regards, Florian Pflug invtrans_sumint4_opt2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 10 April 2014 19:54, Tom Lane wrote: > Dean Rasheed writes: >> On 10 April 2014 19:04, Tom Lane wrote: >>> What about names for the invertible-aggregate infrastructure? >>> I'm tempted to prefix "inv" to all the existing names, but then >>> "invsfunc" means the alternate forward function ... can we use >>> "invifunc" for the inverse transition function? Or maybe the >>> prefix should be just "i". > >> Hmm, I'm not a fan of any of those names. Perhaps "win" as a prefix to >> denote a sliding window? Or just "m" for "moving aggregate". > > Hmm ... "moving aggregate" is actually a pretty good name for this > whole feature -- better than "invertible aggregate" anyway. I can > feel a global-search-and-replace coming on. > > So if we go with that terminology, perhaps these names for the > new CREATE AGGREGATE parameters: > > initfuncapplies to plain aggregation, mutually exclusive with initcond > msfunc (or just mfunc?) forward transition for moving-agg mode > mifunc inverse transition for moving-agg mode > mstype state datatype for moving-agg mode > msspace space estimate for mstype > mfinalfunc final function for moving-agg mode > minitfunc "firsttrans" for moving-agg mode > minitcond mutually exclusive with minitfunc > Yeah, those work for me. I think I prefer "mfunc" to "msfunc", but perhaps that's just my natural aversion to the "ms" prefix :-) Also, perhaps "minvfunc" rather than "mifunc" because "i" by itself could mean "initial". Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Dean Rasheed writes: > On 10 April 2014 19:04, Tom Lane wrote: >> What about names for the invertible-aggregate infrastructure? >> I'm tempted to prefix "inv" to all the existing names, but then >> "invsfunc" means the alternate forward function ... can we use >> "invifunc" for the inverse transition function? Or maybe the >> prefix should be just "i". > Hmm, I'm not a fan of any of those names. Perhaps "win" as a prefix to > denote a sliding window? Or just "m" for "moving aggregate". Hmm ... "moving aggregate" is actually a pretty good name for this whole feature -- better than "invertible aggregate" anyway. I can feel a global-search-and-replace coming on. So if we go with that terminology, perhaps these names for the new CREATE AGGREGATE parameters: initfuncapplies to plain aggregation, mutually exclusive with initcond msfunc (or just mfunc?) forward transition for moving-agg mode mifunc inverse transition for moving-agg mode mstype state datatype for moving-agg mode msspace space estimate for mstype mfinalfunc final function for moving-agg mode minitfunc "firsttrans" for moving-agg mode minitcond mutually exclusive with minitfunc That takes us up to 16 columns in pg_aggregate, but it's still not going to be a very voluminous catalog --- there's only 171 rows there today. So I'm not particularly concerned about space, and if there's a chance of squeezing out cycles, I think we should seize it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 10 April 2014 19:04, Tom Lane wrote: > Dean Rasheed writes: >> On 10 April 2014 15:18, Tom Lane wrote: >>> This idea of a separate firsttrans function is interesting but perhaps >>> orthogonal to the current patch. Also, I don't quite understand how >>> it would work for aggregates with null initvalues; don't you end up >>> with exactly the same conflict about how you can't mark the transfn >>> strict? Or is the idea that firsttrans would *only* apply to aggregates >>> with null initvalue, and so you wouldn't even pass the previous state >>> value to it? > >> I was imagining that firsttrans would only be passed the first value >> to be aggregated, not any previous state, and that it would be illegal >> to specify both an initcond and a firsttrans function. > > Got it. So the existing behavior where we insert the first non-null > value could be seen as a degenerate case in which the firsttrans function > is an identity. > Right. >> The forward transition function would only be called for values after >> the first, by which point the state would be non-null, and so it could >> be made strict in most cases. The same would apply to the invertible >> transition functions, so they wouldn't have to do null counting, which >> in turn would make their state types simpler. > > Makes sense to me. We need names for these things though. I don't > think abbreviating to "ffunc" is a good idea because of the likelihood > of confusion with the finalfunc (indeed I see the CREATE AGGREGATE > ref page is already using "ffunc" as a short form for finalfunc). > Maybe "initfunc", which would parallel "initcond"? > Yes, I was already thinking that "initfunc" is actually a much better name for that. > What about names for the invertible-aggregate infrastructure? > I'm tempted to prefix "inv" to all the existing names, but then > "invsfunc" means the alternate forward function ... can we use > "invifunc" for the inverse transition function? Or maybe the > prefix should be just "i". > Hmm, I'm not a fan of any of those names. Perhaps "win" as a prefix to denote a sliding window? Or just "m" for "moving aggregate". Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Dean Rasheed writes: > On 10 April 2014 15:18, Tom Lane wrote: >> This idea of a separate firsttrans function is interesting but perhaps >> orthogonal to the current patch. Also, I don't quite understand how >> it would work for aggregates with null initvalues; don't you end up >> with exactly the same conflict about how you can't mark the transfn >> strict? Or is the idea that firsttrans would *only* apply to aggregates >> with null initvalue, and so you wouldn't even pass the previous state >> value to it? > I was imagining that firsttrans would only be passed the first value > to be aggregated, not any previous state, and that it would be illegal > to specify both an initcond and a firsttrans function. Got it. So the existing behavior where we insert the first non-null value could be seen as a degenerate case in which the firsttrans function is an identity. > The forward transition function would only be called for values after > the first, by which point the state would be non-null, and so it could > be made strict in most cases. The same would apply to the invertible > transition functions, so they wouldn't have to do null counting, which > in turn would make their state types simpler. Makes sense to me. We need names for these things though. I don't think abbreviating to "ffunc" is a good idea because of the likelihood of confusion with the finalfunc (indeed I see the CREATE AGGREGATE ref page is already using "ffunc" as a short form for finalfunc). Maybe "initfunc", which would parallel "initcond"? What about names for the invertible-aggregate infrastructure? I'm tempted to prefix "inv" to all the existing names, but then "invsfunc" means the alternate forward function ... can we use "invifunc" for the inverse transition function? Or maybe the prefix should be just "i". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 10 April 2014 15:18, Tom Lane wrote: > Dean Rasheed writes: >> On 10 April 2014 01:13, Florian Pflug wrote: >>> However, I still believe the best approach at this point is to just work >>> on making int4_avg_accum faster. I still see no principal reason what it >>> has to be noticeably slower - the only additional work it absolutely *has* >>> to perform is *one* 64-bit increment. > >> In the best case that would make sum() not noticeably slower than >> avg(), whereas using a firsttrans/initialfunction would potentially >> make both of them faster than they currently are, and not just in >> window queries. > > I'm still of the opinion that we should separate the transfn for > invertible cases from the normal one, and allow for two separate > state types. One of the things that helps with is the strictness > consideration: you no longer have to have the same strictness > setting for the plain and invertible forward transfns. > Yes, I can imagine that there would be some aggregates for which the plain forwards transition function would be simpler than the invertible one, with a simpler state type. You'd still be left with quite a large number of existing aggregates having non-strict plain transition functions, in addition to a bunch of new non-strict invertible transition functions that had to do null counting. > This idea of a separate firsttrans function is interesting but perhaps > orthogonal to the current patch. Also, I don't quite understand how > it would work for aggregates with null initvalues; don't you end up > with exactly the same conflict about how you can't mark the transfn > strict? Or is the idea that firsttrans would *only* apply to aggregates > with null initvalue, and so you wouldn't even pass the previous state > value to it? > I was imagining that firsttrans would only be passed the first value to be aggregated, not any previous state, and that it would be illegal to specify both an initcond and a firsttrans function. The forward transition function would only be called for values after the first, by which point the state would be non-null, and so it could be made strict in most cases. The same would apply to the invertible transition functions, so they wouldn't have to do null counting, which in turn would make their state types simpler. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Dean Rasheed writes: > On 10 April 2014 01:13, Florian Pflug wrote: >> However, I still believe the best approach at this point is to just work >> on making int4_avg_accum faster. I still see no principal reason what it >> has to be noticeably slower - the only additional work it absolutely *has* >> to perform is *one* 64-bit increment. > In the best case that would make sum() not noticeably slower than > avg(), whereas using a firsttrans/initialfunction would potentially > make both of them faster than they currently are, and not just in > window queries. I'm still of the opinion that we should separate the transfn for invertible cases from the normal one, and allow for two separate state types. One of the things that helps with is the strictness consideration: you no longer have to have the same strictness setting for the plain and invertible forward transfns. This idea of a separate firsttrans function is interesting but perhaps orthogonal to the current patch. Also, I don't quite understand how it would work for aggregates with null initvalues; don't you end up with exactly the same conflict about how you can't mark the transfn strict? Or is the idea that firsttrans would *only* apply to aggregates with null initvalue, and so you wouldn't even pass the previous state value to it? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Thu, Apr 10, 2014 at 9:55 AM, Tom Lane wrote: > Florian Pflug writes: > > I was (and still am) not in favour of duplicating the whole quadruple of > > (state, initialvalue, transferfunction, finalfunction) because it seems > > excessive. In fact, I believed that doing this would probably be grounds > for > > outright rejection of the patch, on the base of catalog bloat. And your > > initial response to this suggestion seemed to confirm this. > > Well, I think it's much more likely that causing a performance penalty for > cases unrelated to window aggregates would lead to outright rejection :-(. > The majority of our users probably don't ever use window functions, but > for sure they've heard of SUM(). We can't penalize the non-window case. > > Expanding pg_aggregate from 10 columns (as per patch) to 14 (as per this > suggestion) is a little annoying but it doesn't sound like a show stopper. > It seems reasonable to assume that the extra initval would be NULL in most > cases, so it's probably a net addition of 12 bytes per row. > > I also wouldn't imagine that the overhead of storing that would be too great... And are there really any databases out there that have 1000's of custom aggregate functions? I'm actually quite glad to see someone agrees with me on this. I think it opens up quite a bit of extra optimisation opportunities with things like MAX and MIN... In these cases we could be tracking the number of values of max found and reset it when we get a bigger value. That way we could report the inverse transition as successful if maxcount is still above 0 after the removal of a max value... Similar to how I implemented the inverse transition for sum(numeric). In fact doing it this way would mean that inverse transitions for sum(numeric) would never fail and retry. I just thought we had gotten to a stage of not requiring this due to the overheads being so low... I was quite surprised to see the count tracking account for 5% for sum int. What I don't quite understand yet is why we can't just create a new function for int inverse transitions instead of borrowing the inverse transition functions for avg...? Regards David Rowley
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Thu, Apr 10, 2014 at 9:55 AM, Tom Lane wrote: > > > I'm pretty sure David Rowley did some benchmarking. The results should be > > in this thread somewhere I think, but they currently evade me... Maybe > David > > can re-post, if he's following this... > > I saw benchmarks addressing window aggregation, but none looking for > side-effects on plain aggregation. > > These ones maybe? http://www.postgresql.org/message-id/CAApHDvr_oSpvM-XXz43eCMX8n0EfshJ=9j+rxvgqcy91yr-...@mail.gmail.com I think it was only around SUM(numeric), and you'll need to scroll down a bit to find it as it's mixed in with a load of window agg tests. At the time it was the only forward transition function that I had added any overhead to, so I think it was the only one I bothered to test at the time. However, I do remember benchmarking the bool_and and bool_or changes after I rewrote the way they worked and also found the same as you... no difference, I just can't find the post with my results... Though it sounds like Tom found the same as me so I don't think it's worth me looking any more for them. Regards David Rowley
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 10 April 2014 01:13, Florian Pflug wrote: > Also, the *only* reason that SUM(int2|int4) cannot use int8 as it's > transition type is that it needs to return NULL, not 0, if zero rows > were aggregates. It might seems that it could just use int8 as state > with initial value NULL, but that only works if the transition functions > are non-strict (since the special case of state type == input type doesn't > apply here). And for non-strict transition functions need to deal with > NULL inputs themselves, which means counting the number of non-NULL inputs.. > > That problem would go away though if we had support for an initalfunction, > which would receive the first input value and initialize the state. In > the case of SUM(), the initial function would be strict, and thus would be > called on the first non-NULL input value, which it'd convert to int8 and > return as the new state. > Ah snap! > However, I still believe the best approach at this point is to just work > on making int4_avg_accum faster. I still see no principal reason what it > has to be noticeably slower - the only additional work it absolutely *has* > to perform is *one* 64-bit increment. > In the best case that would make sum() not noticeably slower than avg(), whereas using a firsttrans/initialfunction would potentially make both of them faster than they currently are, and not just in window queries. Also, IMO a first/initial function leads to a cleaner separation of logic, allowing some of the transition functions to be simplified rather than becoming more complex. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 9 April 2014 22:55, Tom Lane wrote: > Florian Pflug writes: >> I was (and still am) not in favour of duplicating the whole quadruple of >> (state, initialvalue, transferfunction, finalfunction) because it seems >> excessive. In fact, I believed that doing this would probably be grounds for >> outright rejection of the patch, on the base of catalog bloat. And your >> initial response to this suggestion seemed to confirm this. > > Well, I think it's much more likely that causing a performance penalty for > cases unrelated to window aggregates would lead to outright rejection :-(. > The majority of our users probably don't ever use window functions, but > for sure they've heard of SUM(). We can't penalize the non-window case. > > Expanding pg_aggregate from 10 columns (as per patch) to 14 (as per this > suggestion) is a little annoying but it doesn't sound like a show stopper. > It seems reasonable to assume that the extra initval would be NULL in most > cases, so it's probably a net addition of 12 bytes per row. > >> On Apr9, 2014, at 20:20 , Tom Lane wrote: >>> The patch has in fact already done that to a couple of basic aggregates like >>> sum(int4). Has anyone bothered to test what side-effects that has on >>> non-windowed aggregation performance? > >> I'm pretty sure David Rowley did some benchmarking. The results should be >> in this thread somewhere I think, but they currently evade me... Maybe David >> can re-post, if he's following this... > > I saw benchmarks addressing window aggregation, but none looking for > side-effects on plain aggregation. > >> If we really go down that road (and I'm far from convinced), then maybe >> instead of having a bunch of additional fields, we could have separate >> entries in pg_aggregate for the two cases, with links between them. > > That seems like a complete mess; in particular it would break the primary > key for pg_aggregate (aggfnoid), and probably break every existing query > that looks at pg_aggregate. Some extra fields would not break such > expectations (in fact we've added fields to pg_aggregate in the past). > This may initially sound unrelated, but I think it might address some of these issues. Suppose we added a 'firsttrans' function, that took a single argument (the first value to be aggregated) and was responsible for creating the initial state from that first value. This would apply to aggregates that ignore null values, but whose transition function cannot currently be declared strict (either because the state type is internal, or because it is not the same as the aggregate's argument type). I think quite a lot of the existing aggregates fall into this category, and this would allow their transition functions to be made strict and simplified --- no more testing if the state is null, and then building it, and no more testing if the argument is null and ignoring it. That might give a noticeable performance boost in the regular aggregate case, especially over data containing nulls. But in addition, it would help with writing inverse transition functions because if the transition functions could be made strict, they wouldn't need to do null-counting, which would mean that their state types would not need to be expanded. So for example sum(int4) could continue to have int8 as its state type, it could use int8(int4) as its firsttrans function, and int4_sum() could become strict and lose all its null-handling logic. Then int4_sum_inv() would be the trivial to write - just doing the same in reverse. I'm not sure it helps for all aggregates, but there are certainly some where it would seem to simplify things. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr9, 2014, at 23:17 , Florian Pflug wrote: > On Apr9, 2014, at 21:35 , Tom Lane wrote: >> A quick test says that avg(int4) >> is about five percent slower than sum(int4), so that's the kind of hit >> we'd be taking on non-windowed aggregations if we do it like this. > > That's rather surprising though. AFAICS, there's isn't much difference > between the two transfer functions int4_sum and int4_avg_accum at all. > > The differences come down to (+ denoting things which ought to make > int4_avg_accum slower compared to int4_sum, - denoting the opposite) > > 1. +) int4_avg_accum calls AggCheckCallContext > 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum) > 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS > to verify that the state is a 2-element array without NULL entries > 4. -) int4_sum checks if the input is NULL I've done a bit of profiling on this (using Instruments.app on OSX). One thing I missed is that inv4_avg_accum also calls pg_detoast_datum - that calls comes from the PG_GETARG_ARRAYTYPE_P macro. Doing that is a bit silly, since we know that the datum cannot possibly be toasted I think (or if it was, nodeAgg.c should do the de-toasting). The profile also attributes a rather large percent of the total runtime of int4_avg_accum (around 30%) to the call of AggCheckCallContext(). Getting rid of that would help quite a few transition functions, invertible or not. That certainly seems doable - we'd need a way to mark functions as internal support functions, and prevent user-initiated calls of such functions. Transition functions marked that way could then safely scribble over their state arguments without having to consult AggCheckCallContext() first. I've also compared the disassemblies of int4_sum and int4_avg_accum, and apart from these differences, they are pretty similar. Also, the *only* reason that SUM(int2|int4) cannot use int8 as it's transition type is that it needs to return NULL, not 0, if zero rows were aggregates. It might seems that it could just use int8 as state with initial value NULL, but that only works if the transition functions are non-strict (since the special case of state type == input type doesn't apply here). And for non-strict transition functions need to deal with NULL inputs themselves, which means counting the number of non-NULL inputs.. That problem would go away though if we had support for an initalfunction, which would receive the first input value and initialize the state. In the case of SUM(), the initial function would be strict, and thus would be called on the first non-NULL input value, which it'd convert to int8 and return as the new state. However, I still believe the best approach at this point is to just work on making int4_avg_accum faster. I still see no principal reason what it has to be noticeably slower - the only additional work it absolutely *has* to perform is *one* 64-bit increment. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Florian Pflug writes: > I was (and still am) not in favour of duplicating the whole quadruple of > (state, initialvalue, transferfunction, finalfunction) because it seems > excessive. In fact, I believed that doing this would probably be grounds for > outright rejection of the patch, on the base of catalog bloat. And your > initial response to this suggestion seemed to confirm this. Well, I think it's much more likely that causing a performance penalty for cases unrelated to window aggregates would lead to outright rejection :-(. The majority of our users probably don't ever use window functions, but for sure they've heard of SUM(). We can't penalize the non-window case. Expanding pg_aggregate from 10 columns (as per patch) to 14 (as per this suggestion) is a little annoying but it doesn't sound like a show stopper. It seems reasonable to assume that the extra initval would be NULL in most cases, so it's probably a net addition of 12 bytes per row. > On Apr9, 2014, at 20:20 , Tom Lane wrote: >> The patch has in fact already done that to a couple of basic aggregates like >> sum(int4). Has anyone bothered to test what side-effects that has on >> non-windowed aggregation performance? > I'm pretty sure David Rowley did some benchmarking. The results should be > in this thread somewhere I think, but they currently evade me... Maybe David > can re-post, if he's following this... I saw benchmarks addressing window aggregation, but none looking for side-effects on plain aggregation. > If we really go down that road (and I'm far from convinced), then maybe > instead of having a bunch of additional fields, we could have separate > entries in pg_aggregate for the two cases, with links between them. That seems like a complete mess; in particular it would break the primary key for pg_aggregate (aggfnoid), and probably break every existing query that looks at pg_aggregate. Some extra fields would not break such expectations (in fact we've added fields to pg_aggregate in the past). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr9, 2014, at 20:20 , Tom Lane wrote: > There was discussion upthread of providing > two separate forward transition functions, but Florian argued that that > would do nothing that you couldn't accomplish with a runtime check in > the forward function. I think that's nonsense though, because one of > the key points here is that an invertible aggregate may require a more > complex transition state data structure --- in particular, if you're > forced to go from a pass-by-value to a pass-by-reference data type, right > there you are going to take a big hit in aggregate performance, and there > is no way for the forward transition function to avoid it. To be precise, my point was that *only* having a separate non-invertible forward transition function is pointless, exactly because of the reason you gave - that won't allow a cheaper state representation for the non-invertible case. So all such a non-invertible forward transition function can do is to skip some bookkeeping that's required by the inverse transition function - and *that* can just as easily be accomplished by a runtime check. I was (and still am) not in favour of duplicating the whole quadruple of (state, initialvalue, transferfunction, finalfunction) because it seems excessive. In fact, I believed that doing this would probably be grounds for outright rejection of the patch, on the base of catalog bloat. And your initial response to this suggestion seemed to confirm this. > The patch has in fact already done that to a couple of basic aggregates like > sum(int4). Has anyone bothered to test what side-effects that has on > non-windowed aggregation performance? I'm pretty sure David Rowley did some benchmarking. The results should be in this thread somewhere I think, but they currently evade me... Maybe David can re-post, if he's following this... > I think what'd make sense is to have a separate forward function *and* > separate state datatype to be used when we want invertible aggregation > (hm, and I guess a separate final function too). So an aggregate > definition would include two entirely independent implementations. > If they chance to share sfunc and stype, fine, but they don't have to. > > This would mean we'd need some new names for the doppelgangers of the > CREATE AGGREGATE parameters sfunc, stype, sspace, finalfunc, and initcond > (but not sortop). I guess that'd open up a chance to use a more uniform > naming scheme, but I'm not too sure what would be good. If we really go down that road (and I'm far from convinced), then maybe instead of having a bunch of additional fields, we could have separate entries in pg_aggregate for the two cases, with links between them. The non-invertible aggregates would have something like "_noninv" or so appended to their name, and we'd automatically use them if we know we won't need to remove entries from the aggregation state. That would also allow the users to *force* the non-invertible aggregate to be used by simply saying "SUM_NONINV" instead of "SUM". Then all we'd need would be an additional OID field that links the invertible to the non-invertible definition. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr9, 2014, at 21:35 , Tom Lane wrote: > As a quick check, I compared aggregation performance in HEAD, non-assert > builds, with and without --disable-float8-byval on a 64-bit machine. > So this tests replacing a pass-by-val transition datatype with a > pass-by-ref one without any other changes. There's essentially no > difference in performance of sum(int4), AFAICT, but that's because > int4_sum goes out of its way to cheat and avoid palloc overhead. > I looked to the bit_and() aggregates to see what would happen to > an aggregate not thus optimized. As expected, int4 and int8 bit_and > are just about the same speed if int8 is pass by value ... but if it's > pass by ref, the int8 case is a good 60% slower. True, but that just means that aggregate transition functions really ought to update the state in-place, no? > So added palloc overhead, at least, is a no-go. I see that the patched > version of sum(int4) avoids that trap, but nonetheless it's replaced a > pretty cheap transition function with a less cheap function, namely the > function previously used for avg(int4). A quick test says that avg(int4) > is about five percent slower than sum(int4), so that's the kind of hit > we'd be taking on non-windowed aggregations if we do it like this. That's rather surprising though. AFAICS, there's isn't much difference between the two transfer functions int4_sum and int4_avg_accum at all. The differences come down to (+ denoting things which ought to make int4_avg_accum slower compared to int4_sum, - denoting the opposite) 1. +) int4_avg_accum calls AggCheckCallContext 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum) 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS to verify that the state is a 2-element array without NULL entries 4. -) int4_sum checks if the input is NULL The number of conditional branches should be about the same (and all are seldomly taken). The validity checks on the state array, i.e. (3), should be rather cheap I think - not quite as cheap as PG_ARGISNULL maybe, but not so much more expensive either. That leaves the AggCheckCallContext call. If that call costs us 5%, maybe we can find a way to make it faster, or get rid of it entirely? Still seems a lot of a call of a not-very-complex function, though... I'll go and check the disassembly - maybe something in int4_avg_accum turns out to be more complex than is immediately obvious. I'll also try to create a call profile, unless you already have one from your test runs. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
I wrote: > ... an invertible aggregate may require a more > complex transition state data structure --- in particular, if you're > forced to go from a pass-by-value to a pass-by-reference data type, right > there you are going to take a big hit in aggregate performance, and there > is no way for the forward transition function to avoid it. The patch > has in fact already done that to a couple of basic aggregates like > sum(int4). Has anyone bothered to test what side-effects that has on > non-windowed aggregation performance? As a quick check, I compared aggregation performance in HEAD, non-assert builds, with and without --disable-float8-byval on a 64-bit machine. So this tests replacing a pass-by-val transition datatype with a pass-by-ref one without any other changes. There's essentially no difference in performance of sum(int4), AFAICT, but that's because int4_sum goes out of its way to cheat and avoid palloc overhead. I looked to the bit_and() aggregates to see what would happen to an aggregate not thus optimized. As expected, int4 and int8 bit_and are just about the same speed if int8 is pass by value ... but if it's pass by ref, the int8 case is a good 60% slower. So added palloc overhead, at least, is a no-go. I see that the patched version of sum(int4) avoids that trap, but nonetheless it's replaced a pretty cheap transition function with a less cheap function, namely the function previously used for avg(int4). A quick test says that avg(int4) is about five percent slower than sum(int4), so that's the kind of hit we'd be taking on non-windowed aggregations if we do it like this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Dean Rasheed writes: > OK, I'm marking this ready for committer attention, on the > understanding that that doesn't include the invtrans_minmax patch. I've started to look at this patch set. After rereading the thread, I'm thinking that it's a mistake to just add the inverse transition function and require it to work with the standard forward transition function for the aggregate. There was discussion upthread of providing two separate forward transition functions, but Florian argued that that would do nothing that you couldn't accomplish with a runtime check in the forward function. I think that's nonsense though, because one of the key points here is that an invertible aggregate may require a more complex transition state data structure --- in particular, if you're forced to go from a pass-by-value to a pass-by-reference data type, right there you are going to take a big hit in aggregate performance, and there is no way for the forward transition function to avoid it. The patch has in fact already done that to a couple of basic aggregates like sum(int4). Has anyone bothered to test what side-effects that has on non-windowed aggregation performance? I think what'd make sense is to have a separate forward function *and* separate state datatype to be used when we want invertible aggregation (hm, and I guess a separate final function too). So an aggregate definition would include two entirely independent implementations. If they chance to share sfunc and stype, fine, but they don't have to. This would mean we'd need some new names for the doppelgangers of the CREATE AGGREGATE parameters sfunc, stype, sspace, finalfunc, and initcond (but not sortop). I guess that'd open up a chance to use a more uniform naming scheme, but I'm not too sure what would be good. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 8 April 2014 21:48, Florian Pflug wrote: > On Apr7, 2014, at 17:41 , Dean Rasheed wrote: >> I've just finished reading through all the other patches, and they all >> look OK to me. It's mostly straightforward stuff, so despite the size >> it's hopefully all committable once the base patch goes in. > > Hm, I'm starting to have second thoughts about the minmax patch. The > inverse transition functions for MIN and MAX have a non-trivial probability > of failure - they trigger a rescan whenever the value that is removed isn't > strictly smaller (respectively strictly larger) then the current maximum > (respectively minimum). Thus, whenever that happens, we both call the > inverse transition function *and* (since it fails) restart the aggregation. > > For windows based on ROWS, this isn't too bad - even if we fail every second > time, we still avoid half the rescans, which should be a net win if the > average window size is > 2. > > But for RANGE-based windows, more than one call of the inverse transition > function must succeed for us to save anything, since we must successfully > remove *all* peers to avoid one restart. This greatly increases the chance > that using the inverse transition function hurts rather then helps - the > situation is worse the larger the average number of peers is. > Argh, I hadn't really considered that case. I suppose any imperfect inverse transition function has the potential to make performance worse rather than better. But working out the likelihood of that isn't necessarily straightforward. It might be possible to include some sort of heuristic based on the known information --- the number of rows P in the peer group about to be removed vs the total number N of rows aggregated so far. If the data were fairly random, then a quick back-of-the-envelope calculation suggests that trying the inverse min/max functions would be worthwhile on average if P were less than around 0.4N, but of course different data distributions could make that much worse. Even a perfect inverse transition function isn't going to be much use if P > N/2 (e.g., imagine a case where the peer groups decrease in size exponentially), so perhaps we should be including such a check anyway. That's also assuming that the cost of the inverse transition function is about the same as the cost of the forward function, which is not necessarily the case. Perhaps imperfect inverse transition functions should be assigned a higher cost, and that should be factored into the decision as to whether they are likely to be worth trying. All that feels very speculative though, and I think it's too late to be considering that for 9.4, so yes, let's leave out the min/max aggregates for now. > I've factored the BOOL_AND,BOOL_OR stuff out into a separate patch > invtrans_bool - it previously was part of invtrans_minmax. Given the > performance risk involved, I think that we probably shouldn't commit > invtrans_minmax at this time. I should have brought this up earlier, but > the issue had slipped my mind :-( Sorry for that. > >> I think that you're probably right that optimising >> window_gettupleslot() to eliminate the O(n^2) behaviour can be left to >> a later patch --- the existing performance benefits of this patch are >> enough to justify its inclusion IMO. It would be nice to include the >> trivial optimisation to window_gettupleslot() that I posted upthread, >> since it gives such a big improvement for only a few lines of code >> changed. > > Agreed, but since it's independent from the rest of the base patch, > I kept it as a separate patch (invtrans_optimize) instead of merging it > into the base patch. It should probably be committed separately too. > It would be good to commit at least the "base", "arith" and "optimize" patches for 9.4. I think the "collecting" and "bool" patches are also committable, but I also suspect that those aggregates are less well used, so they could be considered lower priority. >> If you post a new patch set, I'll mark it as ready for committer attention. > > New patch set is attached. The only difference to the previous one is that > "Forward Transitions" and "Inverse Transitions" are now scaled with nloops, > and that it includes your window_gettupleslot patch under the name > invtrans_optimize. > OK, I'm marking this ready for committer attention, on the understanding that that doesn't include the invtrans_minmax patch. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr9, 2014, at 02:55 , David Rowley wrote: > On Wed, Apr 9, 2014 at 8:48 AM, Florian Pflug wrote: > > As explain above, invtrans_bool is a bit problematic, since it carries > a real risk of performance regressions. It's included for completeness' > sake, and should probably not be committed at this time. > > Did you mean to write invtrans_minmax? Otherwise you didn't explain about > you concerns with bool. Grmpf. Should have re-read that once more before sending :-( Yes, I meant invtrans_minmax is problematic! invtrans_bool is fine, the inverse transition function never fails for BOOL_AND and BOOL_OR. This is why I factored it out into a separate patch, to make it easy to not apply the MIN/MAX stuff, while still applying the BOOL stuff. Sorry for the confision. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Wed, Apr 9, 2014 at 8:48 AM, Florian Pflug wrote: > > As explain above, invtrans_bool is a bit problematic, since it carries > a real risk of performance regressions. It's included for completeness' > sake, and should probably not be committed at this time. > > Did you mean to write invtrans_minmax? Otherwise you didn't explain about you concerns with bool. Regards David Rowley
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 7 April 2014 14:09, Florian Pflug wrote: > On Apr5, 2014, at 09:55 , Dean Rasheed wrote: >> On 5 April 2014 08:38, Dean Rasheed wrote: >>> [snip] releasing it in this state feels a little half-baked >>> to me. >>> >> >> I regret writing that almost as soon as I sent it. The last of those >> queries is now over 10 times faster than HEAD, which is certainly >> worthwhile. What bugs me is that there is so much more potential in >> this patch. > > Well, the perfect is the enemy of the good as they say. By all means, > let's get rid of the O(n^2) behaviour in 9.5, but let's get a basic > version into 9.4. > >> I think it's pretty close to being committable, but I fear that time >> is running out for 9.4. > > I'm not aware of any open issues in the basic patch, other then > scaling the reported calling statistics by nloops, which should be > trivial. I'll post an updated patch later today. > > I don't really expect all the add-on patches to make it into 9.4 - > they don't seem to have gotten much attention yet - but at least > the inverse transition functions for the basic arithmetic aggregates > should be doable I hope. > I've just finished reading through all the other patches, and they all look OK to me. It's mostly straightforward stuff, so despite the size it's hopefully all committable once the base patch goes in. I think that you're probably right that optimising window_gettupleslot() to eliminate the O(n^2) behaviour can be left to a later patch --- the existing performance benefits of this patch are enough to justify its inclusion IMO. It would be nice to include the trivial optimisation to window_gettupleslot() that I posted upthread, since it gives such a big improvement for only a few lines of code changed. If you post a new patch set, I'll mark it as ready for committer attention. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr5, 2014, at 09:55 , Dean Rasheed wrote: > On 5 April 2014 08:38, Dean Rasheed wrote: >> [snip] releasing it in this state feels a little half-baked >> to me. >> > > I regret writing that almost as soon as I sent it. The last of those > queries is now over 10 times faster than HEAD, which is certainly > worthwhile. What bugs me is that there is so much more potential in > this patch. Well, the perfect is the enemy of the good as they say. By all means, let's get rid of the O(n^2) behaviour in 9.5, but let's get a basic version into 9.4. > I think it's pretty close to being committable, but I fear that time > is running out for 9.4. I'm not aware of any open issues in the basic patch, other then scaling the reported calling statistics by nloops, which should be trivial. I'll post an updated patch later today. I don't really expect all the add-on patches to make it into 9.4 - they don't seem to have gotten much attention yet - but at least the inverse transition functions for the basic arithmetic aggregates should be doable I hope. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 5 April 2014 08:38, Dean Rasheed wrote: > [snip] releasing it in this state feels a little half-baked > to me. > I regret writing that almost as soon as I sent it. The last of those queries is now over 10 times faster than HEAD, which is certainly worthwhile. What bugs me is that there is so much more potential in this patch. I think it's pretty close to being committable, but I fear that time is running out for 9.4. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 4 April 2014 11:56, Florian Pflug wrote: > >> On 04.04.2014, at 09:40, Dean Rasheed wrote: >> >> I'm not sure how much additional work is required to sort this out, >> but to me it looks more realistic to target 9.5 than 9.4, so at this >> point I tend to think that the patch ought to be marked as returned >> with feedback. > Just doing the first optimisation I recommended (which I pessimistically referred to as a "micro-optimisation") actually gives up to a 4x performance boost relative to the current patch for the queries above: SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 10 FOLLOWING) FROM generate_series(1,2) g(n); -- 20ms (was 33ms) SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 100 FOLLOWING) FROM generate_series(1,2) g(n); -- 54ms (was 173ms) SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 1000 FOLLOWING) FROM generate_series(1,2) g(n); -- 368ms (was 1467ms) SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) FROM generate_series(1,2) g(n); -- 1866ms (was 7709ms) See the attached patch, which applies on top of yours but is actually independent of it. IMO, this doesn't go far enough though. We should be aiming to eliminate the O(n^2) behaviour completely and have all the above queries take roughly the same amount of time. > I think the patch is worthwhile, even without this additional optimization. > In fact, If the optimization was part of the patch, there would probably be > calls to factor it out, on the ground that the patch is already rather large. > > I don't see what bumping the whole thing to 9.5 buys, compared do applying > what we have now, and optimizing in 9.5 further. > The problem with the current state of the patch is that we're going to a lot of effort to add this new inverse aggregate function, documenting it's benefits and revealing via EXPLAIN how it can reduce by several orders of magnitude the number of transition function calls, but then not giving a commensurate performance boost unless the window is UNBOUNDED FOLLOWING. That's going to be awkward to explain to users, and so releasing it in this state feels a little half-baked to me. Regards, Dean *** src/backend/executor/nodeWindowAgg.c.orig 2014-04-05 07:56:15.0 +0100 --- src/backend/executor/nodeWindowAgg.c 2014-04-05 08:03:32.0 +0100 *** window_gettupleslot(WindowObject winobj, *** 2412,2425 winobj->seekpos++; } ! while (winobj->seekpos > pos) { ! if (!tuplestore_gettupleslot(winstate->buffer, false, true, slot)) elog(ERROR, "unexpected end of tuplestore"); winobj->seekpos--; } ! while (winobj->seekpos < pos) { if (!tuplestore_gettupleslot(winstate->buffer, true, true, slot)) elog(ERROR, "unexpected end of tuplestore"); --- 2412,2443 winobj->seekpos++; } ! /* Advance or rewind until we are within one tuple of the one we want */ ! while (winobj->seekpos < pos-1) { ! if (!tuplestore_advance(winstate->buffer, true)) ! elog(ERROR, "unexpected end of tuplestore"); ! winobj->seekpos++; ! } ! ! while (winobj->seekpos > pos+1) ! { ! if (!tuplestore_advance(winstate->buffer, false)) elog(ERROR, "unexpected end of tuplestore"); winobj->seekpos--; } ! /* ! * Now we should be on the tuple immediately before or after the one we ! * want, so just fetch forwards or backwards as appropriate. ! */ ! if (winobj->seekpos > pos) ! { ! if (!tuplestore_gettupleslot(winstate->buffer, false, true, slot)) ! elog(ERROR, "unexpected end of tuplestore"); ! winobj->seekpos--; ! } ! else { if (!tuplestore_gettupleslot(winstate->buffer, true, true, slot)) elog(ERROR, "unexpected end of tuplestore"); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 2014-04-04 12:56:55 +0200, Florian Pflug wrote: > > > On 04.04.2014, at 09:40, Dean Rasheed wrote: > > > > I'm not sure how much additional work is required to sort this out, > > but to me it looks more realistic to target 9.5 than 9.4, so at this > > point I tend to think that the patch ought to be marked as returned > > with feedback. > > I think the patch is worthwhile, even without this additional > optimization. In fact, If the optimization was part of the patch, > there would probably be calls to factor it out, on the ground that the > patch is already rather large. > > I don't see what bumping the whole thing to 9.5 buys, compared do > applying what we have now, and optimizing in 9.5 further. >From my POV applying this patch can't be considered a very high priority for 9.4x. It came *really* late to the game for a relatively complex patch. A significant portion of the development only happened *after* the start of the last commitfest. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
> On 04.04.2014, at 09:40, Dean Rasheed wrote: > > I'm not sure how much additional work is required to sort this out, > but to me it looks more realistic to target 9.5 than 9.4, so at this > point I tend to think that the patch ought to be marked as returned > with feedback. I think the patch is worthwhile, even without this additional optimization. In fact, If the optimization was part of the patch, there would probably be calls to factor it out, on the ground that the patch is already rather large. I don't see what bumping the whole thing to 9.5 buys, compared do applying what we have now, and optimizing in 9.5 further. best regards, Florian Pflug PS: Sorry for the broken mail I sent earlier - miss-touched on my Phone ;-( -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
>> ), which seem reasonable. But > then I started testing performance, and I found cases where the > improvement is not nearly what I expected. > > The example cited at the start of this thread is indeed orders of > magnitude faster than HEAD: > > SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) > F > > > > > > > I'm not sure how much additional work is required to sort this out, > but to me it looks more realistic to target 9.5 than 9.4, so at this > point I tend to think that the patch ought to be marked as returned > with feedback. > > Thoughts? > > Regards, > Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 1 April 2014 20:58, Florian Pflug wrote: > On Apr1, 2014, at 10:08 , Dean Rasheed wrote: >> On 31 March 2014 01:58, Florian Pflug wrote: >>> Attached are updated patches that include the EXPLAIN changes mentioned >>> above and updated docs. >>> >> >> These patches need re-basing --- they no longer apply to HEAD. > > Rebased to head (554bb3beba27bf4a49edecc40f6c0f249974bc7c). I had to > re-assign OIDs in the dependent paches again (those which actually > add the various inverse transition functions). > Looking at the new explain output, that is not exactly what I was suggesting upthread. In particular, in cases where the WindowAgg node is executed more than once, I think that the reported transition counts should be the averages per-execution of the node. That way the number of transition function calls reported for the node is directly comparable with its "rows" value. Thus I think the output should be divided by nloops, which would be more consistent with the way other similar values are reported in explain output (c.f. show_instrumentation_count). I started looking at the "arith" patch, and I got as far as looking at the changes to count(*) and count(val), which seem reasonable. But then I started testing performance, and I found cases where the improvement is not nearly what I expected. The example cited at the start of this thread is indeed orders of magnitude faster than HEAD: SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM generate_series(1,2) g(n); This executes in 20ms on my box, vs 30sec on HEAD, which reflects the change from an O(n^2) to an O(n) algorithm. However, if both ends of the frame move, the benefits are far less impressive: SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 10 FOLLOWING) FROM generate_series(1,2) g(n); -- 33ms SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 100 FOLLOWING) FROM generate_series(1,2) g(n); -- 173ms SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 1000 FOLLOWING) FROM generate_series(1,2) g(n); -- 1467ms SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) FROM generate_series(1,2) g(n); -- 7709ms This is still exhibiting the behaviour of an O(n^2) algorithm. The problem lies in window_gettupleslot() which steps one row at a time from the last position to the new required position. So if both ends of the frame are moving, it needs to step forwards and backwards through the entire window, for each row processed - hence the O(n^2) behaviour. Looking at window_gettupleslot, there is an obvious potential mirco-optimisation that can be made if there are multiple rows to be skipped --- instead of calling tuplestore_gettupleslot() multiple times, call tuplestore_advance() multiple times, then call tuplestore_gettupleslot() once to fetch the final required tuple, thus avoiding a lot of unnecessary tuple copying. That of course won't eliminate the O(n^2) behaviour, but it will reduce the overall factor, and so is probably worth doing. However, to fix the fundamental O(n^2) problem, I think there needs to be separate tuplestore read pointers for the head and tail of the frame. There may also be a case for another read pointer for the current row too, and possibly one for general purpose user-triggered fetches. One approach might be to have up to a small number N (say 3 or 4) of read pointers, with window_gettupleslot() automatically choosing the one nearest to the requested row. Possibly there are better approaches. I think a little more investigation is needed. I'm not sure how much additional work is required to sort this out, but to me it looks more realistic to target 9.5 than 9.4, so at this point I tend to think that the patch ought to be marked as returned with feedback. Thoughts? Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 31 March 2014 01:58, Florian Pflug wrote: > Attached are updated patches that include the EXPLAIN changes mentioned > above and updated docs. > These patches need re-basing --- they no longer apply to HEAD. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 27 March 2014 21:01, Florian Pflug wrote: > First, sorry guys for letting this slide - I was overwhelmed by other works, > and this kind of slipped my mind :-( > > On Mar27, 2014, at 09:04 , Dean Rasheed wrote: >> On 26 March 2014 19:43, David Rowley wrote: >>> On Thu, Mar 27, 2014 at 7:33 AM, Tom Lane wrote: David Rowley writes: > I've attached an updated invtrans_strictstrict_base patch which has the > feature removed. What is the state of play on this patch? Is the latest version what's in http://www.postgresql.org/message-id/64f96fd9-64d1-40b9-8861-e61820292...@phlo.org plus this sub-patch? Is everybody reasonably happy with it? I don't see it marked "ready for committer" in the CF app, but time is running out. >>> >>> As far as I know the only concern left was around the extra stats in the >>> explain output, which I removed in the patch I attached in the previous >>> email. >>> >> >> Agreed. That was my last concern regarding the base patch, and I agree >> that removing the new explain output is probably the best course of >> action, given that we haven't reached consensus as to what the most >> useful output would be. > > After re-reading the thread, I'd prefer to go with Dean's suggestion, i.e. > simply reporting the total number of invocations of the forward transition > functions, and the total number of invocations of the reverse transition > function, over reporting nothing. The labels of the two counts would simply > be "Forward Transitions" and "Reverse Transitions". > That should be "Inverse" not "Reverse" according to the terminology agreed upthread. Personally, I'm not a big fan of that terminology because "forward" and "inverse" aren't natural antonyms. But actually I think that it's "forward" that is the wrong word to use, because they actually both move (different ends of) the frame forwards. The only alternatives I can think of are "direct" and "inverse", which are natural antonyms, but I don't want to hold up this patch bikeshedding over this. OTOH this is not the first time on this thread that someone has slipped into calling them "forward" and "reverse" transitions. > But I don't want this issue to prevent us from getting this patch into 9.4, > so if there are objections to this, I'll rip out the EXPLAIN stuff all > together. > >>> The invtrans_strictstrict_base.patch in my previous email replaces the >>> invtrans_strictstrict_base_038070.patch in that Florian sent here >>> http://www.postgresql.org/message-id/64f96fd9-64d1-40b9-8861-e61820292...@phlo.org >>> all of the other patches are unchanged so it's save to use Florian's latest >>> ones >>> >>> Perhaps Dean can confirm that there's nothing else outstanding? >>> >> >> Florian mentioned upthread that the docs hadn't been updated to >> reflect the latest changes, so I think they need a little attention. > > I'll see to updating the docs, and will post a final patch within the next > few days. > > Dean, have you by chance looked at the other patches yet? > No, sorry. I too have been swamped by other work. I will try to look at them over the next few days. I don't anticipate that they will be as complex as the base patch, so I hope that this can be finished in time for 9.4. If there are any other reviewers with spare cycles, feel free to jump in too. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
First, sorry guys for letting this slide - I was overwhelmed by other works, and this kind of slipped my mind :-( On Mar27, 2014, at 09:04 , Dean Rasheed wrote: > On 26 March 2014 19:43, David Rowley wrote: >> On Thu, Mar 27, 2014 at 7:33 AM, Tom Lane wrote: >>> >>> David Rowley writes: I've attached an updated invtrans_strictstrict_base patch which has the feature removed. >>> >>> What is the state of play on this patch? Is the latest version what's in >>> >>> http://www.postgresql.org/message-id/64f96fd9-64d1-40b9-8861-e61820292...@phlo.org >>> plus this sub-patch? Is everybody reasonably happy with it? I don't >>> see it marked "ready for committer" in the CF app, but time is running >>> out. >>> >> >> As far as I know the only concern left was around the extra stats in the >> explain output, which I removed in the patch I attached in the previous >> email. >> > > Agreed. That was my last concern regarding the base patch, and I agree > that removing the new explain output is probably the best course of > action, given that we haven't reached consensus as to what the most > useful output would be. After re-reading the thread, I'd prefer to go with Dean's suggestion, i.e. simply reporting the total number of invocations of the forward transition functions, and the total number of invocations of the reverse transition function, over reporting nothing. The labels of the two counts would simply be "Forward Transitions" and "Reverse Transitions". But I don't want this issue to prevent us from getting this patch into 9.4, so if there are objections to this, I'll rip out the EXPLAIN stuff all together. >> The invtrans_strictstrict_base.patch in my previous email replaces the >> invtrans_strictstrict_base_038070.patch in that Florian sent here >> http://www.postgresql.org/message-id/64f96fd9-64d1-40b9-8861-e61820292...@phlo.org >> all of the other patches are unchanged so it's save to use Florian's latest >> ones >> >> Perhaps Dean can confirm that there's nothing else outstanding? >> > > Florian mentioned upthread that the docs hadn't been updated to > reflect the latest changes, so I think they need a little attention. I'll see to updating the docs, and will post a final patch within the next few days. Dean, have you by chance looked at the other patches yet? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 26 March 2014 19:43, David Rowley wrote: > On Thu, Mar 27, 2014 at 7:33 AM, Tom Lane wrote: >> >> David Rowley writes: >> > I've attached an updated invtrans_strictstrict_base patch which has the >> > feature removed. >> >> What is the state of play on this patch? Is the latest version what's in >> >> http://www.postgresql.org/message-id/64f96fd9-64d1-40b9-8861-e61820292...@phlo.org >> plus this sub-patch? Is everybody reasonably happy with it? I don't >> see it marked "ready for committer" in the CF app, but time is running >> out. >> > > As far as I know the only concern left was around the extra stats in the > explain output, which I removed in the patch I attached in the previous > email. > Agreed. That was my last concern regarding the base patch, and I agree that removing the new explain output is probably the best course of action, given that we haven't reached consensus as to what the most useful output would be. > The invtrans_strictstrict_base.patch in my previous email replaces the > invtrans_strictstrict_base_038070.patch in that Florian sent here > http://www.postgresql.org/message-id/64f96fd9-64d1-40b9-8861-e61820292...@phlo.org > all of the other patches are unchanged so it's save to use Florian's latest > ones > > Perhaps Dean can confirm that there's nothing else outstanding? > Florian mentioned upthread that the docs hadn't been updated to reflect the latest changes, so I think they need a little attention. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Thu, Mar 27, 2014 at 7:33 AM, Tom Lane wrote: > David Rowley writes: > > I've attached an updated invtrans_strictstrict_base patch which has the > > feature removed. > > What is the state of play on this patch? Is the latest version what's in > > http://www.postgresql.org/message-id/64f96fd9-64d1-40b9-8861-e61820292...@phlo.org > plus this sub-patch? Is everybody reasonably happy with it? I don't > see it marked "ready for committer" in the CF app, but time is running > out. > > As far as I know the only concern left was around the extra stats in the explain output, which I removed in the patch I attached in the previous email. The invtrans_strictstrict_base.patch in my previous email replaces the invtrans_strictstrict_base_038070.patch in that Florian sent here http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.orgall of the other patches are unchanged so it's save to use Florian's latest ones Perhaps Dean can confirm that there's nothing else outstanding? > regards, tom lane >
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
David Rowley writes: > I've attached an updated invtrans_strictstrict_base patch which has the > feature removed. What is the state of play on this patch? Is the latest version what's in http://www.postgresql.org/message-id/64f96fd9-64d1-40b9-8861-e61820292...@phlo.org plus this sub-patch? Is everybody reasonably happy with it? I don't see it marked "ready for committer" in the CF app, but time is running out. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Mar5, 2014, at 23:49 , Tom Lane wrote: > Florian Pflug writes: >> On Mar5, 2014, at 18:37 , Tom Lane wrote: >>> My advice is to lose the EXPLAIN output entirely. If the authors of >>> the patch can't agree on what it means, what hope have everyday users >>> got of making sense of it? > >> The question isn't what the current output means, but whether it's a >> good metric to report or not. > > If you can't agree, then it isn't. Probably, yes, so let's find something that *is* a good metric. (BTW, it's not the authors who disagree here. It was me who put the EXPLAIN feature in, and Dean reviewed it and found it confusing. The original author David seems to run out of time to work on this, and AFAIK hasn't weighted in on that particular feature at all) >> If we don't report anything, then how would a user check whether a query >> is slow because of O(n^2) behaviour of a windowed aggregate, or because >> of some other reasons? > > [ shrug... ] They can see whether the Window plan node is where the time > is going. It's not apparent to me that the extra numbers you propose to > report will edify anybody. By the same line of reasoning, every metric except execution time is superfluous. Comparing execution times really is a horrible way to measure this - not only does it include all kinds of thing that have nothing to do with the restart behaviour of aggregates, you'd also have to construct a base-line query first which is guaranteed to not restart. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Mar5, 2014, at 23:49 , Tom Lane wrote: > Florian Pflug writes: >> On Mar5, 2014, at 18:37 , Tom Lane wrote: >>> My advice is to lose the EXPLAIN output entirely. If the authors of >>> the patch can't agree on what it means, what hope have everyday users >>> got of making sense of it? > >> The question isn't what the current output means, but whether it's a >> good metric to report or not. > > If you can't agree, then it isn't. Probably, yes, so let's find something that *is* a good metric. (BTW, it's not the authors who disagree here. It was me who put the EXPLAIN feature in, and Dean reviewed it and found it confusing. The original author David seems to run out of time to work on this, and AFAIK hasn't weighted in on that particular feature at all) >> If we don't report anything, then how would a user check whether a query >> is slow because of O(n^2) behaviour of a windowed aggregate, or because >> of some other reasons? > > [ shrug... ] They can see whether the Window plan node is where the time > is going. It's not apparent to me that the extra numbers you propose to > report will edify anybody. By the same line of reasoning, every metric except execution time is superfluous. Comparing execution times really is a horrible way to measure this - not only does it include all kinds of thing that have nothing to do with the restart behaviour of aggregates, you'd also have to construct a base-line query first which is guaranteed to not restart. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Florian Pflug writes: > On Mar5, 2014, at 18:37 , Tom Lane wrote: >> My advice is to lose the EXPLAIN output entirely. If the authors of >> the patch can't agree on what it means, what hope have everyday users >> got of making sense of it? > The question isn't what the current output means, but whether it's a > good metric to report or not. If you can't agree, then it isn't. > If we don't report anything, then how would a user check whether a query > is slow because of O(n^2) behaviour of a windowed aggregate, or because > of some other reasons? [ shrug... ] They can see whether the Window plan node is where the time is going. It's not apparent to me that the extra numbers you propose to report will edify anybody. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Wed, Mar 5, 2014 at 10:49 PM, Tom Lane wrote: > > [ shrug... ] They can see whether the Window plan node is where the time > is going. It's not apparent to me that the extra numbers you propose to > report will edify anybody. Perhaps just saying "Incremental Window Function" versus "Iterated Window Function" or something like that be sufficient? At least that way query tuning quidelines have a keyword they can say to watch out for. And someone trying to figure out *why* the time is being spent in this node has something they might notice a correlation with. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Mar5, 2014, at 18:27 , Dean Rasheed wrote: > On 5 March 2014 14:35, Florian Pflug wrote: >> When I added the EXPLAIN stuff, I initially simply reported the number >> of times nodeWindowAgg has to restart the aggregation. The problem with >> that approach is that not all restarts carry the same cost. If the frames >> either don't overlap at all or are identical, restarts don't cause any >> additional work. And for fixed-length frames (BETWEEN n PRECEEDING AND >> m FOLLOWING), the performance effects of restarts depends on m-n. >> >> Which is why I made it count the number of aggregated input rows instead. >> >> Having said that, I' not really 100% happy with the name >> "Transitions Per Row" for this - it was simply the best I could come up with >> that was reasonably short. And I'm certainly open to reporting the absolute >> count instead of a factor relative to ntuples. >> >> If we made it an absolute count, would calling this "Aggregated Rows" work >> for you? > > I'm not sure about naming, but I think my preference would be to > output the correct absolute counts for both the forward and inverse > transitions (i.e. multiply by the right combinations of numaggs and > numaggs_restart). The EXPLAIN output already tells you how many > aggregates there are, and how many rows there are, so you'd be able to > work out from that how much extra work it's doing. Hm, if we do that we might as well go all the way and simply report these numbers per aggregate, instead of once per window aggregation node. That'd provide the maximum amount of information, and be quite unambiguous. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Mar5, 2014, at 18:37 , Tom Lane wrote: > Dean Rasheed writes: >> I think we really need a larger consensus on this though, so I'd be >> interested to hear what others think. > > My advice is to lose the EXPLAIN output entirely. If the authors of > the patch can't agree on what it means, what hope have everyday users > got of making sense of it? The question isn't what the current output means, but whether it's a good metric to report or not. If we don't report anything, then how would a user check whether a query is slow because of O(n^2) behaviour of a windowed aggregate, or because of some other reasons? If inevitability where a purely static property, then maybe we could get away with that, and say "check whether your aggregates are invertible or not". But since we have partially invertible aggregates, the performance characteristics depends on the input data, so we IMHO need some way for users to check what's actually happening. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Tom, I did not follow the thread very close, so I need to look what the ambiguity is there, however I would love to see window rescans in explain analyze. I have great experience in tuning Oracle queries. There are features in PostgreSQL's explain analyze that I miss badly in Oracle: 'rows removed by filter' is my favourite one. Improving explain analyze is great for performance analysis. I would say target audience for 'explain analyze' is not all the users, but someone closer to 'performance engineers'. Those beings are used to triple-check results and build/validate hypothesis since all the counters tend to lie, so 'a bit misleading counter' is not a showstopper. I did not think of 'non-being-able-to-use-negative-transition-since-floats-do-not-commute' before. Thanks to this discussion I see what kind of dragons live here. I would vote (if I had any vote at all) for the inclusion of performance statistics to explain analyze (e.g. number of rescans and number of rows fed to aggregate) provided performance impact is tolerable in both regular and explain analyze mode. I wonder how Oracle handles negative transition (does it?), however that is a different discussion. Regards, Vladimir Sitnikov
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Dean Rasheed writes: > I think we really need a larger consensus on this though, so I'd be > interested to hear what others think. My advice is to lose the EXPLAIN output entirely. If the authors of the patch can't agree on what it means, what hope have everyday users got of making sense of it? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 5 March 2014 14:35, Florian Pflug wrote: > On Mar4, 2014, at 21:09 , Dean Rasheed wrote: >> On 3 March 2014 23:00, Florian Pflug wrote: * In show_windowagg_info(), this calculation looks suspicious to me: double tperrow = winaggstate->aggfwdtrans / (inst->nloops * inst->ntuples); If the node is executed multiple times, aggfwdtrans will be reset in each loop, so the transitions per row figure will be under-estimated. ISTM that if you want to report on this, you'd need aggfwdtrans to be reset once per query, but I'm not sure exactly how to do that. ... Actually, I think it's misleading to only count forward transition function calls, because a call to the inverse transition function still represents a state transition, and is likely to be around the same cost. For a window of size 2, there would not be much advantage to using inverse transition functions, because it would be around 2 transitions per row either way. >>> >>> True. In fact, I pondered whether to avoid using the inverse transition >>> function for windows of 2 rows. In the end, I didn't because I felt that >>> it makes custom aggregates harder to test. >>> >>> On the question of whether to count inverse transition function calls - >>> the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the >>> number of state transitions, but rather to show whether the aggregation >>> has O(n) or O(n^2) behaviour. The idea being that a value close to "1" >>> means "inverse transition function works as expected", and larger values >>> mean "not working so well". >>> >>> Regarding multiple evaluations - I think I based the behaviour on how >>> ntuples works, which also only reports the value of the last evaluation >>> I think. But maybe I'm confused about this. >>> >> >> No, it doesn't look like that's correct for multiple loops. Consider >> this example: >> >> ... >> >> It turns out that show_windowagg_info() is only called once at the >> end, with ntuples=100, nloops=4 and aggfwdtrans=100, so it's computing >> tperrow=100/(4*100)=0.25, which then gets truncated to 0.2. So to get >> 1, you'd have to use this formula: >> >>double tperrow = winaggstate->aggfwdtrans / inst->ntuples; > > Hm, so I *was* confused - seems I mixed up ntuples (which counts the > total number of tuples over all loops) with what we report as "rows" > (i.e. the average number of rows per loop). Thanks for clearing that up! > >> I'm still not convinced that's the most useful thing to report though. >> Personally, I'd prefer to just see the separate counts, e.g.: >> >> -> WindowAgg (cost=0.00..22.50 rows=1000 width=4) (actual >> time=0.019..0.094 rows=25 loops=4) >> Output: sum(i.i) OVER (?) >> Forward transitions: 25 Inverse transitions: 25 >> >> IMO that gives a clearer picture of what's going on. > > My problem with this is that if there are multiple aggregates, the > numbers don't really count the number of forward and reverse transfer > function invocations. What nodeWindowAgg.c really counts is the number > of *rows* it has to fetch from the tuple store to perform forward > aggregations - the relevant code snippet is > > if (numaggs_restart > 0) > winstate->aggfwdtrans += (winstate->aggregatedupto > - winstate->frameheadpos); > else > winstate->aggfwdtrans += (winstate->aggregatedupto > - aggregatedupto_nonrestarted); > > Now we could of course report these figures per aggregate instead of > only once per aggregation node. But as I said earlier, my guess is that > people usually won't be interested in the precise counts - most likely, > what you want to know is "how much do the restarts cost me" > The problem I have with the single "Transitions Per Row" figure, and the idea that a value close to 1.0 is supposed to be good, is that it's not really true. For example, with a window size of 2 and a "perfect" invertible aggregate, you'd get a value of 1.0, but with a non-invertible aggregate you'd get a value of 2.0, when actually it wouldn't do any better if it had an inverse. I think trying to represent this as a single number is too simplistic. > When I added the EXPLAIN stuff, I initially simply reported the number > of times nodeWindowAgg has to restart the aggregation. The problem with > that approach is that not all restarts carry the same cost. If the frames > either don't overlap at all or are identical, restarts don't cause any > additional work. And for fixed-length frames (BETWEEN n PRECEEDING AND > m FOLLOWING), the performance effects of restarts depends on m-n. > > Which is why I made it count the number of aggregated input rows instead. > > Having said that, I' not really 100% happy with the name > "Transitions Per Row" for this - it was simply the best I could come up with > that was reasonably short. And I'm certainly open to reporting the absolute >
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Mar4, 2014, at 21:09 , Dean Rasheed wrote: > On 3 March 2014 23:00, Florian Pflug wrote: >>> * In show_windowagg_info(), this calculation looks suspicious to me: >>> >>> double tperrow = winaggstate->aggfwdtrans / >>> (inst->nloops * inst->ntuples); >>> >>> If the node is executed multiple times, aggfwdtrans will be reset in >>> each loop, so the transitions per row figure will be under-estimated. >>> ISTM that if you want to report on this, you'd need aggfwdtrans to be >>> reset once per query, but I'm not sure exactly how to do that. >>> >>> ... >>> >>> Actually, I think it's misleading to only count forward transition >>> function calls, because a call to the inverse transition function >>> still represents a state transition, and is likely to be around the >>> same cost. For a window of size 2, there would not be much advantage >>> to using inverse transition functions, because it would be around 2 >>> transitions per row either way. >> >> True. In fact, I pondered whether to avoid using the inverse transition >> function for windows of 2 rows. In the end, I didn't because I felt that >> it makes custom aggregates harder to test. >> >> On the question of whether to count inverse transition function calls - >> the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the >> number of state transitions, but rather to show whether the aggregation >> has O(n) or O(n^2) behaviour. The idea being that a value close to "1" >> means "inverse transition function works as expected", and larger values >> mean "not working so well". >> >> Regarding multiple evaluations - I think I based the behaviour on how >> ntuples works, which also only reports the value of the last evaluation >> I think. But maybe I'm confused about this. >> > > No, it doesn't look like that's correct for multiple loops. Consider > this example: > > ... > > It turns out that show_windowagg_info() is only called once at the > end, with ntuples=100, nloops=4 and aggfwdtrans=100, so it's computing > tperrow=100/(4*100)=0.25, which then gets truncated to 0.2. So to get > 1, you'd have to use this formula: > >double tperrow = winaggstate->aggfwdtrans / inst->ntuples; Hm, so I *was* confused - seems I mixed up ntuples (which counts the total number of tuples over all loops) with what we report as "rows" (i.e. the average number of rows per loop). Thanks for clearing that up! > I'm still not convinced that's the most useful thing to report though. > Personally, I'd prefer to just see the separate counts, e.g.: > > -> WindowAgg (cost=0.00..22.50 rows=1000 width=4) (actual > time=0.019..0.094 rows=25 loops=4) > Output: sum(i.i) OVER (?) > Forward transitions: 25 Inverse transitions: 25 > > IMO that gives a clearer picture of what's going on. My problem with this is that if there are multiple aggregates, the numbers don't really count the number of forward and reverse transfer function invocations. What nodeWindowAgg.c really counts is the number of *rows* it has to fetch from the tuple store to perform forward aggregations - the relevant code snippet is if (numaggs_restart > 0) winstate->aggfwdtrans += (winstate->aggregatedupto - winstate->frameheadpos); else winstate->aggfwdtrans += (winstate->aggregatedupto - aggregatedupto_nonrestarted); Now we could of course report these figures per aggregate instead of only once per aggregation node. But as I said earlier, my guess is that people usually won't be interested in the precise counts - most likely, what you want to know is "how much do the restarts cost me" When I added the EXPLAIN stuff, I initially simply reported the number of times nodeWindowAgg has to restart the aggregation. The problem with that approach is that not all restarts carry the same cost. If the frames either don't overlap at all or are identical, restarts don't cause any additional work. And for fixed-length frames (BETWEEN n PRECEEDING AND m FOLLOWING), the performance effects of restarts depends on m-n. Which is why I made it count the number of aggregated input rows instead. Having said that, I' not really 100% happy with the name "Transitions Per Row" for this - it was simply the best I could come up with that was reasonably short. And I'm certainly open to reporting the absolute count instead of a factor relative to ntuples. If we made it an absolute count, would calling this "Aggregated Rows" work for you? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 3 March 2014 23:00, Florian Pflug wrote: >> * In show_windowagg_info(), this calculation looks suspicious to me: >> >>double tperrow = winaggstate->aggfwdtrans / >>(inst->nloops * inst->ntuples); >> >> If the node is executed multiple times, aggfwdtrans will be reset in >> each loop, so the transitions per row figure will be under-estimated. >> ISTM that if you want to report on this, you'd need aggfwdtrans to be >> reset once per query, but I'm not sure exactly how to do that. >> >> ... >> >> Actually, I think it's misleading to only count forward transition >> function calls, because a call to the inverse transition function >> still represents a state transition, and is likely to be around the >> same cost. For a window of size 2, there would not be much advantage >> to using inverse transition functions, because it would be around 2 >> transitions per row either way. > > True. In fact, I pondered whether to avoid using the inverse transition > function for windows of 2 rows. In the end, I didn't because I felt that > it makes custom aggregates harder to test. > > On the question of whether to count inverse transition function calls - > the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the > number of state transitions, but rather to show whether the aggregation > has O(n) or O(n^2) behaviour. The idea being that a value close to "1" > means "inverse transition function works as expected", and larger values > mean "not working so well". > > Regarding multiple evaluations - I think I based the behaviour on how > ntuples works, which also only reports the value of the last evaluation > I think. But maybe I'm confused about this. > No, it doesn't look like that's correct for multiple loops. Consider this example: explain (verbose, analyse) select * from (values (10), (20), (30), (40)) v(x), lateral (select sum(i) over (rows between 4 preceding and current row) from generate_series(1, x) i) t; QUERY PLAN Nested Loop (cost=0.00..170.06 rows=4000 width=12) (actual time=0.027..0.414 rows=100 loops=1) Output: "*VALUES*".column1, (sum(i.i) OVER (?)) -> Values Scan on "*VALUES*" (cost=0.00..0.05 rows=4 width=4) (actual time=0.002..0.006 rows=4 loops=1) Output: "*VALUES*".column1 -> WindowAgg (cost=0.00..22.50 rows=1000 width=4) (actual time=0.019..0.094 rows=25 loops=4) Output: sum(i.i) OVER (?) Transitions Per Row: 0.2 -> Function Scan on pg_catalog.generate_series i (cost=0.00..10.00 rows=1000 width=4) (actual time=0.010..0.015 rows=25 loops=4) Output: i.i Function Call: generate_series(1, "*VALUES*".column1) It turns out that show_windowagg_info() is only called once at the end, with ntuples=100, nloops=4 and aggfwdtrans=100, so it's computing tperrow=100/(4*100)=0.25, which then gets truncated to 0.2. So to get 1, you'd have to use this formula: double tperrow = winaggstate->aggfwdtrans / inst->ntuples; I'm still not convinced that's the most useful thing to report though. Personally, I'd prefer to just see the separate counts, e.g.: -> WindowAgg (cost=0.00..22.50 rows=1000 width=4) (actual time=0.019..0.094 rows=25 loops=4) Output: sum(i.i) OVER (?) Forward transitions: 25 Inverse transitions: 25 IMO that gives a clearer picture of what's going on. Thoughts? Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 25 February 2014 12:33, Florian Pflug wrote: > On Feb24, 2014, at 17:50 , Dean Rasheed wrote: >> On 20 February 2014 01:48, Florian Pflug wrote: >>> On Jan29, 2014, at 13:45 , Florian Pflug wrote: In fact, I'm currently leaning towards just forbidding non-strict forward transition function with strict inverses, and adding non-NULL counters to the aggregates that then require them. It's really only the SUM() aggregates that are affected by this, I think. >>> >>> I finally got around to doing that, and the results aren't too bad. The >>> attached patches required that the strictness settings of the forward and >>> reverse transition functions agree, and employ exactly the same >>> NULL-skipping >>> logic we always had. >>> >>> The only aggregates seriously affected by that change were SUM(int2) and >>> SUM(int4). >> >> I haven't looked at this in any detail yet, but that seems much neater >> to me. It seems perfectly sensible that the forward and inverse >> transition functions should have the same strictness settings, and >> enforcing that keeps the logic simple, as well as hopefully making it >> easier to document. > > Good to hear that you agree! I'll try to find some time to update the docs. > I finally got round to looking at this in more detail. Sorry for the delay. Here is my more detailed review of the base patch. Overall, I think that it is in reasonable shape, and as I said I think the approach of enforcing matching strictness settings on the forward and inverse transition functions is much simpler and neater. I have a few comments, some cosmetic, and a couple more substantive: * In a couple of places: errmsg("stricness of forward and reverse transition functions must match") - misspelling: "stricness". - "reverse" should be "inverse" to match the terminology used elsewhere. * Grammatical error in the comment for lookup_agg_function() - you should drop the word "both". * In show_windowagg_info(), this calculation looks suspicious to me: double tperrow = winaggstate->aggfwdtrans / (inst->nloops * inst->ntuples); If the node is executed multiple times, aggfwdtrans will be reset in each loop, so the transitions per row figure will be under-estimated. ISTM that if you want to report on this, you'd need aggfwdtrans to be reset once per query, but I'm not sure exactly how to do that. Here's a test case: explain (verbose, analyse) select sum(i) over (rows between 4 preceding and current row) from generate_series(1, 10) i; which outputs 10 rows with an average of 1 transition per row, but doing the same window aggregate twice in a nested loop: explain (verbose, analyse) select * from (values (10), (10)) v(x), lateral (select sum(i) over (rows between 4 preceding and current row) from generate_series(1, x) i) t; outputs 20 rows, but only reports 0.5 transitions per row. Actually, I think it's misleading to only count forward transition function calls, because a call to the inverse transition function still represents a state transition, and is likely to be around the same cost. For a window of size 2, there would not be much advantage to using inverse transition functions, because it would be around 2 transitions per row either way. * The function comment for build_aggregate_fnexprs() needs to be updated to reference the inverse transition function. I'd also be tempted to have it allow invtransfnexpr be a NULL pointer, if the inverse transition function expression tree is not required. Then ExecInitAgg() could simply pass NULL, instead of having the local variable invtransfnexpr with the slightly cryptic comment "needed but never used". * In struct WindowStatePerAggData, I think you should change the field order to transfn_oid, invtransfn_oid and then finalfn_oid. It's only a small thing, but that's the order those 3 functions are referred to everywhere else. * In struct WindowStatePerAggData, the comment for transValueCount should read "number of aggregated values". * If AggCheckCallContext() is called from a window function, and it asks for an aggcontext, it will fail because calledaggno will be -1. That can't currently happen for any of our built-in window functions, and I'm not sure if it's likely to happen in the future, but I think it would be better to defend against that possibility just in case. So I think it ought to return the shared context in that case, as the original code would have done. * In advance_windowaggregate(), this code if (peraggstate->transfn.fn_strict) { is against the project style, which is to have curly braces on new lines. But also, that test condition is the same as the preceding block, so the 2 blocks could just be merged. * I was wondering about the case of a forward transition function returning NULL in the presence of an inverse transition function. In this patch there are 3 pieces of code that test for that: 1). advance_windowaggregate() errors out if the forward t
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Feb24, 2014, at 17:50 , Dean Rasheed wrote: > On 20 February 2014 01:48, Florian Pflug wrote: >> On Jan29, 2014, at 13:45 , Florian Pflug wrote: >>> In fact, I'm >>> currently leaning towards just forbidding non-strict forward transition >>> function with strict inverses, and adding non-NULL counters to the >>> aggregates that then require them. It's really only the SUM() aggregates >>> that are affected by this, I think. >> >> I finally got around to doing that, and the results aren't too bad. The >> attached patches required that the strictness settings of the forward and >> reverse transition functions agree, and employ exactly the same NULL-skipping >> logic we always had. >> >> The only aggregates seriously affected by that change were SUM(int2) and >> SUM(int4). > > I haven't looked at this in any detail yet, but that seems much neater > to me. It seems perfectly sensible that the forward and inverse > transition functions should have the same strictness settings, and > enforcing that keeps the logic simple, as well as hopefully making it > easier to document. Good to hear that you agree! I'll try to find some time to update the docs. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 20 February 2014 01:48, Florian Pflug wrote: > On Jan29, 2014, at 13:45 , Florian Pflug wrote: >> In fact, I'm >> currently leaning towards just forbidding non-strict forward transition >> function with strict inverses, and adding non-NULL counters to the >> aggregates that then require them. It's really only the SUM() aggregates >> that are affected by this, I think. > > I finally got around to doing that, and the results aren't too bad. The > attached patches required that the strictness settings of the forward and > reverse transition functions agree, and employ exactly the same NULL-skipping > logic we always had. > > The only aggregates seriously affected by that change were SUM(int2) and > SUM(int4). > > The SUM, AVG and STDDEV aggregates which use NumericAggState where > already mostly prepared for this - all they required were a few adjustments > to correctly handle the last non-NULL, non-NaN input being removed, and a few > additional PG_ARGISNULL calls for the inverse transition functions since > they're > now non-strict. I've also modified them to unconditionally allocate the state > at the first call, instead upon seeing the first non-NULL input, but that > isn't > strictly required. But without that, the state can have three classes of > values - > SQL-NULL, NULL pointer and valid pointer, and that's just confusing... > > SUM(int2) and SUM(int4) now simply use the same transition functions as > AVG(int2) and AVG(int4), which use an int8 array to track the sum of the > inputs > and the number of inputs, plus a new final function int2int4_sum(). > Previously, > they used a single int8 as their state type. > > Since I was touching the code anyway, I removed some unnecessary inverse > transition functions - namely, int8_avg_accum_inv and numeric_avg_accum_inv. > These > are completely identical to their non-avg cousins - the only difference > between > the corresponding forward transition functions is whether they request > computation > of sumX2 (i.e. the sum of squares of the inputs) or not. > > I haven't yet updated the docs - it'll do that if and when there's consensus > about whether this is the way to go or not. > I haven't looked at this in any detail yet, but that seems much neater to me. It seems perfectly sensible that the forward and inverse transition functions should have the same strictness settings, and enforcing that keeps the logic simple, as well as hopefully making it easier to document. It's a shame that more transition functions cannot be made strict, when they actually ignore null values, but I think trying to solve that can be regarded as outside the scope of this patch. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan29, 2014, at 09:59 , Dean Rasheed wrote: > On 28 January 2014 20:16, Florian Pflug wrote: >> On Jan27, 2014, at 23:28 , Dean Rasheed wrote: >>> This case is explicitly forbidden, both in CREATE AGGREGATE and in the >>> executor. To me, that seems overly restrictive --- if transfn is >>> strict, then you know for sure that no NULL values are added to the >>> aggregate state, so surely it doesn't matter whether or not >>> inv_transfn is strict. It will never be asked to remove NULL values. >>> >>> I think there are definite use-cases where a user might want to use a >>> pre-existing function as the inverse transition function, so it seems >>> harsh to force them to wrap it in a strict function in this case. >> >> I'm not sure that the likelihood of someone wanting to combine a strict >> forward with a non-strict inverse function is very hight, but neither >> > > Me neither, but the checks to forbid it aren't adding anything, and I > think it's best to make it as flexible as possible. Ok. >> Another idea would be to have an explicit nulls=ignore|process option >> for CREATE AGGREGATE. If nulls=process and either of the transition >> functions are strict, we could either error out, or simply do what >> normal functions calls do and pretend they return NULL for NULL inputs. >> Not sure how the rule that forward transition functions may not return >> NULL if there's an inverse transition function would fit in if we do >> the latter, though. >> >> The question is - is it worth it the effort to add that flag? > > One use-case I had in mind upthread was suppose you wanted to write a > custom version of array_agg that only collected non-NULL values. With > such a flag, that would be trivial, but with the current patch you'd > have to (count-intuitively) wrap the inverse transition function in a > strict function. I'd be more convinced by that if doing so was actually possible for non-superusers. But it isn't, because the aggregate uses "internal" as it's state type and it's thus entirely up to the user to not crash the database by mixing transfer and final functions with incompatible state data. Plus, instead of creating a custom aggregate, one can just use a FILTER clause to get rid of the NULL values. > Another use-case I can imagine is suppose you wanted a custom version > of sum() that returned NULL if any of the input values were NULL. If > you knew that most of your data was non-NULL, you might still wish to > benefit from an inverse transition function. Right now the patch won't > allow that because of the error in advance_windowaggregate(), but > possibly that could be relaxed by forcing a restart in that case. That's not really true - that patch only forbids that if you insist on representing the state "i have seen a NULL input" with a NULL state value. But if you instead just count the number of NULLS in your transition functions, all you need to do is to have your final function return NULL if that count is not zero. > If I've understood correctly that should give similar to current > performance if NULLs are present, and enhanced performance as the > window moved over non-NULL data. Exactly - and this makes defining a NULL-sensitive SUM() this way rather silly - a simple counter has very nearly zero overhead, and avoids all rescans. > In that second case, it would also be nice if you could simply re-use > the existing sum forward and inverse transition functions, with a > different null-handling flag. Even if we had a nulls=process|ignore flag, SUM's transition functions would still need to take that use-case into account explicitly to make this work - at the very least, the forward transition function would need to return NULL if the input is NULL instead of just skipping it as it does now. But that would leads to completely unnecessary rescans, so what we actually ought to do then is to make it track whether there have been NULL inputs and make the finalfunc return NULL in this case. Which would border on ironic, since the whole raison d'etre for this is to *avoid* spreading NULL-counting logic around... Plus, to make this actually useable, we'd have to document this, and tell people how to define such a SUM aggregate. But I don't want to go there - we really mustn't encourage people to mix-and-match built-in aggregates with state type "internal", since whether they work or crash depends on factors outside the control of the user. And finally, you can get that behaviour quite easily by doing CASE WHEN bool_and(input IS NOT NULL) whatever_agg(input) ELSE NULL END > So I think an ignore-nulls flag would have real benefits, as well as > being a cleaner design than relying on a strict inverse transition > function. The more I think about this, the less convinced I am. In fact, I'm currently leaning towards just forbidding non-strict forward transition function with strict inverses, and adding non-NULL counters to the aggregates that then require them. It's really only the SUM
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 28 January 2014 20:16, Florian Pflug wrote: > On Jan27, 2014, at 23:28 , Dean Rasheed wrote: >> strict transfn vs non-strict inv_transfn >> >> >> This case is explicitly forbidden, both in CREATE AGGREGATE and in the >> executor. To me, that seems overly restrictive --- if transfn is >> strict, then you know for sure that no NULL values are added to the >> aggregate state, so surely it doesn't matter whether or not >> inv_transfn is strict. It will never be asked to remove NULL values. >> >> I think there are definite use-cases where a user might want to use a >> pre-existing function as the inverse transition function, so it seems >> harsh to force them to wrap it in a strict function in this case. >> >> AFAICS, the code in advance/retreat_windowaggregate should just work >> if those checks are removed. > > True. It didn't use to in earlier version of the patch because the advance > and retreat functions looked at the strict settings directly, but now that > there's an ignore_nulls flag in the executor node, the only requirement > should be that ignore_nulls is set if either of the transition functions > is strict. > > I'm not sure that the likelihood of someone wanting to combine a strict > forward with a non-strict inverse function is very hight, but neither > Me neither, but the checks to forbid it aren't adding anything, and I think it's best to make it as flexible as possible. >> non-strict transfn vs strict inv_transfn >> >> >> At first this seems as though it must be an error --- the forwards >> transition function allows NULL values into the aggregate state, and >> the inverse transition function is strict, so it cannot remove them. >> >> But actually what the patch is doing in this case is treating the >> forwards transition function as partially strict --- it won't be >> passed NULL values (at least in a window context), but it may be >> passed a NULL state in order to build the initial state when the first >> non-NULL value arrives. > > Exactly. > >> It looks like this behaviour is intended to support aggregates that >> ignore NULL values, but cannot actually be made to have a strict >> transition function. I think typically this is because the aggregate's >> initial state is NULL and it's state type differs from the type of the >> values being aggregated, and so can't be automatically created from >> the first non-NULL value. > > Yes. I added this because the alternative would haven been to count > non-NULL values in most of the existing SUM() aggregates. > >> That all seems quite ugly though, because now you have a transition >> function that is not strict, which is passed NULL values when used in >> a normal aggregate context, and not passed NULL values when used in a >> window context (whether sliding or not), except for the NULL state for >> the first non-NULL value. > > Ugh, true. Clearly, nodeAgg.c needs to follow what nodeWindowAgg.c does > and skip NULL inputs if the aggregate has a strict inverse transition > function. That fact that we never actually *call* the inverse doesn't > matter. Will fix that once we decided on the various strictness issues. > Yuk! >> I'm not sure if there is a better way to do it though. If we disallow >> this case, these aggregates would have to use non-strict functions for >> both the forward and inverse transition functions, which means they >> would have to implement their own non-NULL value counting. >> Alternatively, allowing strict transition functions for these >> aggregates would require that we provide some other way to initialise >> the state from the first non-NULL input, such as a new initfn. > > I'm not sure an initfn would really help. It would allow us to return > to the initial requirement that the strict settings match - but you > already deem the check that the forward transition function can only > be strict if the inverse is also overly harsh, so would that really be > an improvement? It's also cost us an additional pg_proc entry per aggregate. > > Another idea would be to have an explicit nulls=ignore|process option > for CREATE AGGREGATE. If nulls=process and either of the transition > functions are strict, we could either error out, or simply do what > normal functions calls do and pretend they return NULL for NULL inputs. > Not sure how the rule that forward transition functions may not return > NULL if there's an inverse transition function would fit in if we do > the latter, though. > > The question is - is it worth it the effort to add that flag? > Yeah, I thought about a flag too. I think that could work quite nicely. Basically where I'm coming from is trying to make this as flexible as possible, without pre-judging the kinds of aggregates that users may want to add. One use-case I had in mind upthread was suppose you wanted to write a custom version of array_agg that only collected non-NULL values. With such a flag, that would be triv
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan27, 2014, at 23:28 , Dean Rasheed wrote: > strict transfn vs non-strict inv_transfn > > > This case is explicitly forbidden, both in CREATE AGGREGATE and in the > executor. To me, that seems overly restrictive --- if transfn is > strict, then you know for sure that no NULL values are added to the > aggregate state, so surely it doesn't matter whether or not > inv_transfn is strict. It will never be asked to remove NULL values. > > I think there are definite use-cases where a user might want to use a > pre-existing function as the inverse transition function, so it seems > harsh to force them to wrap it in a strict function in this case. > > AFAICS, the code in advance/retreat_windowaggregate should just work > if those checks are removed. True. It didn't use to in earlier version of the patch because the advance and retreat functions looked at the strict settings directly, but now that there's an ignore_nulls flag in the executor node, the only requirement should be that ignore_nulls is set if either of the transition functions is strict. I'm not sure that the likelihood of someone wanting to combine a strict forward with a non-strict inverse function is very hight, but neither > non-strict transfn vs strict inv_transfn > > > At first this seems as though it must be an error --- the forwards > transition function allows NULL values into the aggregate state, and > the inverse transition function is strict, so it cannot remove them. > > But actually what the patch is doing in this case is treating the > forwards transition function as partially strict --- it won't be > passed NULL values (at least in a window context), but it may be > passed a NULL state in order to build the initial state when the first > non-NULL value arrives. Exactly. > It looks like this behaviour is intended to support aggregates that > ignore NULL values, but cannot actually be made to have a strict > transition function. I think typically this is because the aggregate's > initial state is NULL and it's state type differs from the type of the > values being aggregated, and so can't be automatically created from > the first non-NULL value. Yes. I added this because the alternative would haven been to count non-NULL values in most of the existing SUM() aggregates. > That all seems quite ugly though, because now you have a transition > function that is not strict, which is passed NULL values when used in > a normal aggregate context, and not passed NULL values when used in a > window context (whether sliding or not), except for the NULL state for > the first non-NULL value. Ugh, true. Clearly, nodeAgg.c needs to follow what nodeWindowAgg.c does and skip NULL inputs if the aggregate has a strict inverse transition function. That fact that we never actually *call* the inverse doesn't matter. Will fix that once we decided on the various strictness issues. > I'm not sure if there is a better way to do it though. If we disallow > this case, these aggregates would have to use non-strict functions for > both the forward and inverse transition functions, which means they > would have to implement their own non-NULL value counting. > Alternatively, allowing strict transition functions for these > aggregates would require that we provide some other way to initialise > the state from the first non-NULL input, such as a new initfn. I'm not sure an initfn would really help. It would allow us to return to the initial requirement that the strict settings match - but you already deem the check that the forward transition function can only be strict if the inverse is also overly harsh, so would that really be an improvement? It's also cost us an additional pg_proc entry per aggregate. Another idea would be to have an explicit nulls=ignore|process option for CREATE AGGREGATE. If nulls=process and either of the transition functions are strict, we could either error out, or simply do what normal functions calls do and pretend they return NULL for NULL inputs. Not sure how the rule that forward transition functions may not return NULL if there's an inverse transition function would fit in if we do the latter, though. The question is - is it worth it the effort to add that flag? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 25 January 2014 02:21, Florian Pflug wrote: > I've now split this up into > > invtrans_base: Basic machinery plus tests with SQL-language aggregates > invtrans_arith: COUNT(),SUM(),AVG(),STDDEV() and the like > invtrans_minmax: MIN(),MAX(),BOOL_AND(),BOOL_OR() > invtrans_collecting: ARRAY_AGG(), STRING_AGG() > invtrans_docs: Documentation > Thanks. That makes it a bit easier to review, and hopefully easier to commit. The thing that I'm currently trying to get my head round is the null/not null, strict/not strict handling in this patch, which I find a bit odd, particularly when transfn and inv_transfn have different strictness settings: strict transfn vs non-strict inv_transfn This case is explicitly forbidden, both in CREATE AGGREGATE and in the executor. To me, that seems overly restrictive --- if transfn is strict, then you know for sure that no NULL values are added to the aggregate state, so surely it doesn't matter whether or not inv_transfn is strict. It will never be asked to remove NULL values. I think there are definite use-cases where a user might want to use a pre-existing function as the inverse transition function, so it seems harsh to force them to wrap it in a strict function in this case. AFAICS, the code in advance/retreat_windowaggregate should just work if those checks are removed. non-strict transfn vs strict inv_transfn At first this seems as though it must be an error --- the forwards transition function allows NULL values into the aggregate state, and the inverse transition function is strict, so it cannot remove them. But actually what the patch is doing in this case is treating the forwards transition function as partially strict --- it won't be passed NULL values (at least in a window context), but it may be passed a NULL state in order to build the initial state when the first non-NULL value arrives. It looks like this behaviour is intended to support aggregates that ignore NULL values, but cannot actually be made to have a strict transition function. I think typically this is because the aggregate's initial state is NULL and it's state type differs from the type of the values being aggregated, and so can't be automatically created from the first non-NULL value. That all seems quite ugly though, because now you have a transition function that is not strict, which is passed NULL values when used in a normal aggregate context, and not passed NULL values when used in a window context (whether sliding or not), except for the NULL state for the first non-NULL value. I'm not sure if there is a better way to do it though. If we disallow this case, these aggregates would have to use non-strict functions for both the forward and inverse transition functions, which means they would have to implement their own non-NULL value counting. Alternatively, allowing strict transition functions for these aggregates would require that we provide some other way to initialise the state from the first non-NULL input, such as a new initfn. Thoughts? Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan26, 2014, at 00:24 , David Rowley wrote: >> On Sat, Jan 25, 2014 at 3:21 PM, Florian Pflug wrote: >> On Jan24, 2014, at 08:47 , Dean Rasheed wrote: >> The invtrans_minmax patch doesn't contain any patches yet - David, could >> you provide some for these functions, and also for bool_and and bool_or? >> We don't need to test every datatype, but a few would be nice. > > I've added a few regression tests for min, min and bool_or and bool_and. > I've pushed these up to here: > > https://github.com/david-rowley/postgres/commits/invtrans_minmax OK, I've pushed this to github. I haven't produced new patches yet - I think it's probably better to wait for a few things to pile up, to make this less of a moving target. But ultimately, this is up to Dean - Dean, I'll do whatever makes your life as a reviewer easier. > I did wonder if you'd want to see uses of FILTER in there as I'm thinking > that should really be covered by the core patch and the tests here really > should just be checking the inverse transition functions for min and max. I don't mind the FILTER - when this gets committed, the distinction between what was in which patch goes away anyway. What I did realize when merging this is that we currently don't tests the case of a window which lies fully after the current row (i.e. BETWEEN N FOLLOWING AND m FOLLOWING). We do test BETWEEN n PRECEDING AND m PRECEDING, because the array_agg and string_agg tests do that. So maybe some of the MIN/MAX or arithmetic tests could exercise that case? > As for the data types tested, I ended just adding tests for int and text > for min and max. Let me know if you think that more should be tested. That's sufficient for the regression tests, I think. > As for bool_or and bool_and. I didn't think there was much extra that would > need tested after I added 1 test. It's pretty simple code and adding anything > extra seems like it would be testing something else. Sounds fine to me. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Sat, Jan 25, 2014 at 3:21 PM, Florian Pflug wrote: > On Jan24, 2014, at 08:47 , Dean Rasheed wrote: > The invtrans_minmax patch doesn't contain any patches yet - David, could > you provide some for these functions, and also for bool_and and bool_or? > We don't need to test every datatype, but a few would be nice. > > I've added a few regression tests for min, min and bool_or and bool_and. I've pushed these up to here: https://github.com/david-rowley/postgres/commits/invtrans_minmax I did wonder if you'd want to see uses of FILTER in there as I'm thinking that should really be covered by the core patch and the tests here really should just be checking the inverse transition functions for min and max. As for the data types tested, I ended just adding tests for int and text for min and max. Let me know if you think that more should be tested. As for bool_or and bool_and. I didn't think there was much extra that would need tested after I added 1 test. It's pretty simple code and adding anything extra seems like it would be testing something else. Regards David Rowley
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan25, 2014, at 09:50 , David Rowley wrote: > On Thu, Jan 23, 2014 at 1:57 PM, Florian Pflug wrote: >> On Jan23, 2014, at 01:17 , David Rowley wrote: >> > On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug wrote: >> >> If you want to play with >> >> this, I think the first step has to be to find a set of guarantees that >> >> SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if >> >> the >> >> summands all have the same sign, the error is bound by C * N, where C is >> >> some >> >> (machine-specific?) constant (about 1e-15 or so), and N is the number of >> >> input >> >> rows. Or at least so I think from looking at SUMs over floats in >> >> descending >> >> order, which I guess is the worst case. You could, for example, try to >> >> see if >> >> you can find a invertibility conditions which guarantees the same, but >> >> allows >> >> C to be larger. That would put a bound on the number of digits lost by >> >> the new >> >> SUM(float) compared to the old one. >> > >> > I guess if the case is that normal (non-window) sum(float) results are >> > undefined >> > unless you add an order by to the aggregate then I guess there is no >> > possible >> > logic to put in for inverse transitions that will make them behave the >> > same as >> > an undefined behaviour. >> >> Actually, if sum(float) was really undefined, it'd be *very* easy to provide >> an >> inverse transition function - just make it a no-op. Heck, you could then even >> make the forward transition function a no-op, since the very definition of >> "undefined behaviour" is "result can be anything, including setting your >> house >> on fire". The point is, it's *not* undefined - it's just imprecise. And the >> imprecision can even be quantified, it just depends on the number of >> input rows (the equal-sign requirement is mostly there to make "imprecision" >> equivalent to "relative error"). > > My apologies, I meant to use the term nondeterministic rather than undefined. > There's really not any need that I can see to turn things silly here. I wasn't trying to be absurd, I was trying to get a point across. > My point was more that since sum(float) can give different results if it used > an index scan rather than a seq scan, trying to get the inverse transition to > match something that gives varying results sounds like a tricky task to take > on. You don't have to match it digit-by-digit! In that I fully agree with Kevin - floats are *always* an approximation, and so is thus SUM(float). Summarization order is BTW not the only source of non-determinism for SUM(float) - the exact result can very between architectures, and for some architectures between compilers. (Intel is one of these, due to their 80-bit extended precision format that gets truncated to 64-bit when stored to main memory). But in a large number of cases, they won't vary by *much*, which is *the* reason why SUM(float) is *not* totally useless. And reasoning about SUM(float) which ignores that by simply calling it "non-deterministic", "undefined" or whatever, without any quantification of the possible error, has thus zero chance of leading to interesting conclusions. >> Secondly, you'd really need a second aggregate definition - usually, the >> non-invertible aggregate will get away with a much smaller state >> representation. >> Aggregates with state type "internal" could hack their way around that by >> simply not using the additional parts of their state structure, but e.g. >> plpgsql aggregates would still incur overhead due to repeated copying of >> a unnecessary large state value. If it's possible to represent the >> forward-only but not the invertible state state with a copy-by-value type, >> that overhead could easily be a large part of the total overhead you paid >> for having an inverse transition function. > > I had imagined they keep the same state type and don't use any extra variables > that are defined for the forward transition function that is invertible. Yeah, and the above explains that at least for non-C-language aggregates, passing around that unnecessarily large state may very well prove to be the source of a large part, if not almost all, of the overhead. So having just a separate forward transition function will buy you almost nothing or some cases. I just tried this. I defined two aggregates mymax(int4) and myfastmax(int4), both with just a forward transition function, both SQL-language functions. mymax uses a composite type for the state containing an int4 field holding the current maximum, and a dummy int8 field. myfastmax uses a plain int4 for the state, holding the current maximum. Both forward transition functions essentially do case when current_max > v then current_max else v end On my laptop, computing the maximum of 1e6 rows takes about 4.5 seconds with myfastmax and 7.8 seconds with mymax. If make mymax's transition function increment the dummy field on every transition, the time increases from 7.8 to 8.
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
David Rowley writes: > My point was more that since sum(float) can give different results if it > used an index scan rather than a seq scan, trying to get the inverse > transition to match something that gives varying results sounds like a > tricky task to take on. This is just a variant of the same excuse we heard before. The question is not whether sum(float8) can give bad results; the question is whether we are going to break applications that are using it carefully (ie with an appropriate ORDER BY) and expecting to get good results. >> Secondly, you'd really need a second aggregate definition - usually, the >> non-invertible aggregate will get away with a much smaller state >> representation. Yeah. I think the idea of multiple transition functions in a single aggregate definition is pretty broken to start with, but the likelihood that they couldn't share aggregate state types puts it completely beyond sanity. We're not going to start inventing "stype2" etc. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Thu, Jan 23, 2014 at 1:57 PM, Florian Pflug wrote: > On Jan23, 2014, at 01:17 , David Rowley wrote: > > On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug wrote: > >> If you want to play with > >> this, I think the first step has to be to find a set of guarantees that > >> SUM(float) is supposed to meet. Currently, SUM(float) guarantees that > if the > >> summands all have the same sign, the error is bound by C * N, where C > is some > >> (machine-specific?) constant (about 1e-15 or so), and N is the number > of input > >> rows. Or at least so I think from looking at SUMs over floats in > descending > >> order, which I guess is the worst case. You could, for example, try to > see if > >> you can find a invertibility conditions which guarantees the same, but > allows > >> C to be larger. That would put a bound on the number of digits lost by > the new > >> SUM(float) compared to the old one. > > > > I guess if the case is that normal (non-window) sum(float) results are > undefined > > unless you add an order by to the aggregate then I guess there is no > possible > > logic to put in for inverse transitions that will make them behave the > same as > > an undefined behaviour. > > Actually, if sum(float) was really undefined, it'd be *very* easy to > provide an > inverse transition function - just make it a no-op. Heck, you could then > even > make the forward transition function a no-op, since the very definition of > "undefined behaviour" is "result can be anything, including setting your > house > on fire". The point is, it's *not* undefined - it's just imprecise. And the > imprecision can even be quantified, it just depends on the number of > input rows (the equal-sign requirement is mostly there to make > "imprecision" > equivalent to "relative error"). > > My apologies, I meant to use the term nondeterministic rather than undefined. There's really not any need that I can see to turn things silly here. My point was more that since sum(float) can give different results if it used an index scan rather than a seq scan, trying to get the inverse transition to match something that gives varying results sounds like a tricky task to take on. > >> > If it seems sound enough, then I may implement it in C to see how much > >> > overhead it adds to forward aggregation for floating point types, but > even > >> > if it did add too much overhead to forward aggregation it might be > worth > >> > allowing aggregates to have 2 forward transition functions and if the > 2nd > >> > one exists then it could be used in windowing functions where the > frame > >> > does not have "unbounded following". > >> > >> I don't think adding yet another type of aggregation function is the > >> solution here. > > > > Why not? > > > > I thought, if in the cases where we need to burden the forward transition > > functions with extra code to make inverse transitions possible, then why > > not have an extra transition function so that it does not slow down > normal > > aggregation? > > > > My testing of sum(numeric) last night does not show any slow down by that > > extra dscale tracking code that was added, but if it did then I'd be > thinking > > seriously about 2 forward transition functions to get around the problem. > > I don't understand what would be wrong with that. The core code could > just > > make use of the 2nd function if it existed and inverse transitions were > > thought to be required. i.e in a window context and does not > > have UNBOUNDED PRECEDING. > > First, every additional function increases the maintenance burden, and at > some point the expected benefit simply isn't worth it. IMHO at least. > > There's little point in arguing for this as we've managed to narrow any forward transition over head to background noise so far, my point more was if there was no other way to maintain performance and have an inverse transition function that this may be a solution to that problem. > Secondly, you'd really need a second aggregate definition - usually, the > non-invertible aggregate will get away with a much smaller state > representation. > Aggregates with state type "internal" could hack their way around that by > simply not using the additional parts of their state structure, but e.g. > plpgsql aggregates would still incur overhead due to repeated copying of > a unnecessary large state value. If it's possible to represent the > forward-only but not the invertible state state with a copy-by-value type, > that overhead could easily be a large part of the total overhead you paid > for having an inverse transition function. > > I had imagined they keep the same state type and don't use any extra variables that are defined for the forward transition function that is invertible. > And lastly, we could achieve much of the benefit of a second transition > function by simply telling the forward transition function whether to > expect inverse transition function calls. We already expose things like > the aggregation memory con
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan24, 2014, at 08:47 , Dean Rasheed wrote: > I think it should probably be broken up. It might be overly ambitious > to try to get all of this committed during this commitfest, and in any > case, I suspect that the committer would probably choose to commit it > in stages. Perhaps something like: > > Patch 1 > - Basic support for inverse transition functions, CREATE AGGREGATE > support and doc updates. This should include test cases to validate > that the underlying executor changes are correct, by defining custom > aggregates such as sum(int) and array_agg() using inverse transition > functions. > > Patch 2 > - Add built-in inverse transition functions for count, sum(int), and friends. > > Patch 3, 4... > - Other related groups of built-in aggregates. By this point, it > should be a fairly mechanical process. > > Splitting it up this way now should help to focus on getting patch 1 > correct, without being distracted by all the other aggregates that may > or may not usefully be made to have inverse transition functions. I > think the value of the feature has been proved, and it is good to see > that it can be applied to so many aggregates, but let's not try to do > it all at once. Working on that now, will post individual patches later today. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 22 January 2014 09:54, David Rowley wrote: > I've performed some more benchmarks on this patch tonight. The results and > full recreation scripts are attached along with the patch it was tested > against. > I noticed that the rate of changes to this patch has dropped off, which I took as sign that it is ready for review, so I started doing that. My first thought was wow, this is a big patch! I think it should probably be broken up. It might be overly ambitious to try to get all of this committed during this commitfest, and in any case, I suspect that the committer would probably choose to commit it in stages. Perhaps something like: Patch 1 - Basic support for inverse transition functions, CREATE AGGREGATE support and doc updates. This should include test cases to validate that the underlying executor changes are correct, by defining custom aggregates such as sum(int) and array_agg() using inverse transition functions. Patch 2 - Add built-in inverse transition functions for count, sum(int), and friends. Patch 3, 4... - Other related groups of built-in aggregates. By this point, it should be a fairly mechanical process. Splitting it up this way now should help to focus on getting patch 1 correct, without being distracted by all the other aggregates that may or may not usefully be made to have inverse transition functions. I think the value of the feature has been proved, and it is good to see that it can be applied to so many aggregates, but let's not try to do it all at once. Regarding what would be in patch 1, I've only given it a cursory look, but I did immediately notice a problem with the nodeWindowAgg.c changes --- for aggregates with non-strict transition functions, something equivalent to the not null counting code is needed to make it work correctly with FILTER. For example, the following query gives the wrong results with this patch: SELECT array_agg(i) FILTER (WHERE i%3=0) OVER (ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) FROM generate_series(1,10) g(i); Perhaps it should just always count the non-skipped values added after removal of filtered and null values. Then you might be able to simplify things by getting rid of ignore_nulls from WindowStatePerAggData and replacing notnullcount with a more general valueCount to track the number of values actually added to transValue. I haven't read through nodeWindowAgg.c in any detail yet (I think it will take me a while to fully get my head round that code) but I think it needs some more test cases to verify that the various corner cases are covered. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan23, 2014, at 01:07 , David Rowley wrote: > On Tue, Jan 21, 2014 at 3:20 AM, Florian Pflug wrote: >> On Jan20, 2014, at 08:42 , David Rowley wrote: >> >> On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug wrote: >> >> * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, >> >> and >> >> it's the last commit, so if you object to that, then you can merge up to >> >> eafa72330f23f7c970019156fcc26b18dd55be27 instead of >> >> de3d9148be9732c4870b76af96c309eaf1d613d7. >> > >> > >> > Seems like sfunc really should be tfunc then we could have invtfunc. I'd >> > probably >> > understand this better if I knew what the 's' was for in sfunc. I've not >> > applied >> > this just yet. Do you have a reason why you think it's better? >> >> My issue with just "invfunc" is mainly that it's too generic - it doesn't >> tell >> you what it's supposed to be the inverse of. >> >> I've always assumed that 's' in 'sfunc' and 'stype' stands for 'state', and >> that >> the naming is inspired by control theory, where the function which acts on >> the >> state space is often called S. > > Ok, that makes more sense now and it seems like a reasonable idea. I'm not > not quite > sure yet as when someone said upthread that these "negative transition > functions" as > I was calling them at the time should really be called "inverse transition > functions", > I then posted that I was going to call the create aggregate option "invfunc" > which > nobody seemed to object to. I just don't want to go and change that now. It > is very > possible this will come up again when the committer is looking at the patch. > It would > be a waste if it ended up back at invfunc after we changed it to invsfunc. Since we already settled on "inverse transition function", I kinda doubt that calling the parameter invsfunc is going to meet a lot of resistance. But we can put that off a little longer still... I've pushed a few additional things to https://github.com/fgp/postgres/tree/invtrans. * I update the CREATE AGGREGATE documentation, trying to include the description of the various modes of inverse transition functions into the paragraphs which already explained about STRICT for transition functions and such. * I've also updated the list of window functions to include a list of those aggregates which potentially need to restart the computation, i.e. MIN/MAX and the like. * I've changed nodeWindowAgg.c to use per-aggregate aggregation contexts for the invertible aggregates. Without that, the aggregate context is potentially never reset, because that previously required *all* the aggregates to restart at the same time. That would be OK if we were sure not to leak unbounded amounts of stuff stores in that context, but unfortunately we sometimes do. For example, whenever a strict, invertible aggregate ends up with only NULL inputs, we re-initialize the aggregation, which leaks the old state value. We could pfree() that of course, but that state value might reference other stuff that we don't know about and thus cannot free. Separating the aggregation contexts is the only solution I came up with, so I did that. * I've also tweaked an if to flag aggregates as invertible only if the frame head can actually move, i.e. if the frame start clause is something other than UNBOUNDED PRECEEDING. Since the choice of whether to use a private aggregation context is driven by that flag, that also makes the above apply only to aggregates were the inverse transition function is actually used. I hope to find some time tomorrow or so to complete my pass through the documentation - what's still missing as an explanation of the EXPLAIN VERBOSE ANALYZE field and maybe some cleanup of xaggr.sgml. Do you have any additional things pending? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan23, 2014, at 01:17 , David Rowley wrote: > On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug wrote: >> If you want to play with >> this, I think the first step has to be to find a set of guarantees that >> SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the >> summands all have the same sign, the error is bound by C * N, where C is some >> (machine-specific?) constant (about 1e-15 or so), and N is the number of >> input >> rows. Or at least so I think from looking at SUMs over floats in descending >> order, which I guess is the worst case. You could, for example, try to see if >> you can find a invertibility conditions which guarantees the same, but allows >> C to be larger. That would put a bound on the number of digits lost by the >> new >> SUM(float) compared to the old one. > > I guess if the case is that normal (non-window) sum(float) results are > undefined > unless you add an order by to the aggregate then I guess there is no possible > logic to put in for inverse transitions that will make them behave the same as > an undefined behaviour. Actually, if sum(float) was really undefined, it'd be *very* easy to provide an inverse transition function - just make it a no-op. Heck, you could then even make the forward transition function a no-op, since the very definition of "undefined behaviour" is "result can be anything, including setting your house on fire". The point is, it's *not* undefined - it's just imprecise. And the imprecision can even be quantified, it just depends on the number of input rows (the equal-sign requirement is mostly there to make "imprecision" equivalent to "relative error"). >> > If it seems sound enough, then I may implement it in C to see how much >> > overhead it adds to forward aggregation for floating point types, but even >> > if it did add too much overhead to forward aggregation it might be worth >> > allowing aggregates to have 2 forward transition functions and if the 2nd >> > one exists then it could be used in windowing functions where the frame >> > does not have "unbounded following". >> >> I don't think adding yet another type of aggregation function is the >> solution here. > > Why not? > > I thought, if in the cases where we need to burden the forward transition > functions with extra code to make inverse transitions possible, then why > not have an extra transition function so that it does not slow down normal > aggregation? > > My testing of sum(numeric) last night does not show any slow down by that > extra dscale tracking code that was added, but if it did then I'd be thinking > seriously about 2 forward transition functions to get around the problem. > I don't understand what would be wrong with that. The core code could just > make use of the 2nd function if it existed and inverse transitions were > thought to be required. i.e in a window context and does not > have UNBOUNDED PRECEDING. First, every additional function increases the maintenance burden, and at some point the expected benefit simply isn't worth it. IMHO at least. Secondly, you'd really need a second aggregate definition - usually, the non-invertible aggregate will get away with a much smaller state representation. Aggregates with state type "internal" could hack their way around that by simply not using the additional parts of their state structure, but e.g. plpgsql aggregates would still incur overhead due to repeated copying of a unnecessary large state value. If it's possible to represent the forward-only but not the invertible state state with a copy-by-value type, that overhead could easily be a large part of the total overhead you paid for having an inverse transition function. And lastly, we could achieve much of the benefit of a second transition function by simply telling the forward transition function whether to expect inverse transition function calls. We already expose things like the aggregation memory context to C-language transition functions, so the basic mechanism is already there. In fact, I pondered whether to do this - but then figured I'd leave it to however needs that facility first. Though it wouldn't be much code - basically, setting a flag in the WindowAggState, and exporting a function to be used by aggregates to read it, similar to what AggCheckCallContext does. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug wrote: > On Jan21, 2014, at 10:53 , David Rowley wrote: > > It came to me that it might be possible to implement inverse transitions > > for floating point aggregates by just detecting if precision has been > > lost during forward transitions. > > > > I've written the test to do this as: > > > > IF state.value + value = state.value AND value <> 0 THEN > > newstate.precision_lost := true; > > newstate.value := state.value; > > ELSE > > newstate.precision_lost := false; > > newstate.value := state.value + value; > > END IF; > > > > > > The inverse transition function checks the precision_lost and if it's > true it > > returns NULL. The core code is now implemented (thanks to Florian) to > > re-aggregate when NULL is returned from the inverse transition function. > > That's not sufficient, I fear. You can lose all significant digits of the > value > and still have precision_lost = false afterwards. Try summing over 1e16, > 1.01. > "SELECT 1e16::float8 + 1.01::float8 = 1e16::float8" returns FALSE, yet > "SELECT 1e16::float8 + 1.01::float8 - 1e16::float8" returns "2" where > "1.01" > would have been correct. That's still too much precision loss. > > I'm quite certain that the general idea has merit, but the actual > invertibility condition is going to be more involved. If you want to play > with > this, I think the first step has to be to find a set of guarantees that > SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if > the > summands all have the same sign, the error is bound by C * N, where C is > some > (machine-specific?) constant (about 1e-15 or so), and N is the number of > input > rows. Or at least so I think from looking at SUMs over floats in descending > order, which I guess is the worst case. You could, for example, try to see > if > you can find a invertibility conditions which guarantees the same, but > allows > C to be larger. That would put a bound on the number of digits lost by the > new > SUM(float) compared to the old one. > > I guess if the case is that normal (non-window) sum(float) results are undefined unless you add an order by to the aggregate then I guess there is no possible logic to put in for inverse transitions that will make them behave the same as an undefined behaviour. > I don't have high hopes for this getting int 9.4, though. > > Yeah I'm going to drop it. I need to focus on documents and performance testing. > > If it seems sound enough, then I may implement it in C to see how much > > overhead it adds to forward aggregation for floating point types, but > even > > if it did add too much overhead to forward aggregation it might be worth > > allowing aggregates to have 2 forward transition functions and if the 2nd > > one exists then it could be used in windowing functions where the frame > > does not have "unbounded following". > > I don't think adding yet another type of aggregation function is the > solution here. > > Why not? I thought, if in the cases where we need to burden the forward transition functions with extra code to make inverse transitions possible, then why not have an extra transition function so that it does not slow down normal aggregation? My testing of sum(numeric) last night does not show any slow down by that extra dscale tracking code that was added, but if it did then I'd be thinking seriously about 2 forward transition functions to get around the problem. I don't understand what would be wrong with that. The core code could just make use of the 2nd function if it existed and inverse transitions were thought to be required. i.e in a window context and does not have UNBOUNDED PRECEDING. Regards David Rowley
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Tue, Jan 21, 2014 at 3:20 AM, Florian Pflug wrote: > On Jan20, 2014, at 08:42 , David Rowley wrote: > >> On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug wrote: > >> * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive > change, and > >> it's the last commit, so if you object to that, then you can merge up > to > >> eafa72330f23f7c970019156fcc26b18dd55be27 instead of > >> de3d9148be9732c4870b76af96c309eaf1d613d7. > > > > > > Seems like sfunc really should be tfunc then we could have invtfunc. I'd > probably > > understand this better if I knew what the 's' was for in sfunc. I've not > applied > > this just yet. Do you have a reason why you think it's better? > > My issue with just "invfunc" is mainly that it's too generic - it doesn't > tell > you what it's supposed to be the inverse of. > > I've always assumed that 's' in 'sfunc' and 'stype' stands for 'state', > and that > the naming is inspired by control theory, where the function which acts on > the > state space is often called S. > > Ok, that makes more sense now and it seems like a reasonable idea. I'm not not quite sure yet as when someone said upthread that these "negative transition functions" as I was calling them at the time should really be called "inverse transition functions", I then posted that I was going to call the create aggregate option "invfunc" which nobody seemed to object to. I just don't want to go and change that now. It is very possible this will come up again when the committer is looking at the patch. It would be a waste if it ended up back at invfunc after we changed it to invsfunc. Regards David Rowley
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan20, 2014, at 15:20 , Florian Pflug wrote: > * In CREATE AGGREGATE, we should state the precise axioms we assume about > forward > and inverse transition functions. The last time I read the text there, it was > a bit ambiguous about whether inverse transition functions assume > commutativity, > i.e. whether we assume that we can remove inputs other than the first one from > the aggregation. Currently, all the inverse transition functions we have are, > in fact, commutative, because all the forward transition function are also. > But > we could e.g. add an inverse to array_agg and string_agg, and those would > obviously need this ordering guarantee. I'd also like us to explain that the > "inverse" in "inverse transition function" shouldn't be taken too literally. > For > forward transition function F, inverse transition function G, and inputs > a,b,... > we *don't require that G(F(s,a),a) == s. We we do require is that if I is the > initial state, then > > G(F(...F(F(I,a),b)...,z),a) == F(...F(I,b)...,z). > > (Well, actually we don't strictly require even that, the two states don't > need to be identical, they just need to produce identical outputs when passed > to the final function) > > * In CREATE AGGREGATE, we should also explain the NULL semantics you get > with various combinations of strict and non-strict forward and inverse > transition functions. I think a table listing all the combinations is in order > there, with a column explaining the semantics you get. We should also clearly > state that once you provide an inverse transition function, NULL isn't a valid > state value anymore, except as an initial state, i.e. that the forward > transition > function must never return NULL in this case. I gave that a shot, the results are at https://github.com/fgp/postgres/tree/invtrans > * The window function page should explain the performance hazards when > a moving frame head is involved. Ideally, we'd list which aggregates never > have to restart, and which sometimes might, and what you can do to avoid that. > We can also tell people to use EXPLAIN VERBOSE ANALYZE to check whether > restarts are occurring. This is were we'd tell people to cast their numeric > operands to a type with defined scale to avoid restarts, and were we'd state > the SUM(float) *does* restart always due to precision loss issues. > BTW, something that we haven't addressed at all is how inverse transition > functions interact with what is called "ordered-set aggregates". I haven't > wrapped my head around these fully, I think, so I'm still not sure if there's > anything to do there or not. Can those even be used as window functions? > Should we forbid ordered-set aggregates from specifying an inverse transition > function? It seems that these aren't valid window function anyway, so there's nothing to do for now, I think. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan21, 2014, at 10:53 , David Rowley wrote: > It came to me that it might be possible to implement inverse transitions > for floating point aggregates by just detecting if precision has been > lost during forward transitions. > > I've written the test to do this as: > > IF state.value + value = state.value AND value <> 0 THEN > newstate.precision_lost := true; > newstate.value := state.value; > ELSE > newstate.precision_lost := false; > newstate.value := state.value + value; > END IF; > > > The inverse transition function checks the precision_lost and if it's true it > returns NULL. The core code is now implemented (thanks to Florian) to > re-aggregate when NULL is returned from the inverse transition function. That's not sufficient, I fear. You can lose all significant digits of the value and still have precision_lost = false afterwards. Try summing over 1e16, 1.01. "SELECT 1e16::float8 + 1.01::float8 = 1e16::float8" returns FALSE, yet "SELECT 1e16::float8 + 1.01::float8 - 1e16::float8" returns "2" where "1.01" would have been correct. That's still too much precision loss. I'm quite certain that the general idea has merit, but the actual invertibility condition is going to be more involved. If you want to play with this, I think the first step has to be to find a set of guarantees that SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the summands all have the same sign, the error is bound by C * N, where C is some (machine-specific?) constant (about 1e-15 or so), and N is the number of input rows. Or at least so I think from looking at SUMs over floats in descending order, which I guess is the worst case. You could, for example, try to see if you can find a invertibility conditions which guarantees the same, but allows C to be larger. That would put a bound on the number of digits lost by the new SUM(float) compared to the old one. I don't have high hopes for this getting int 9.4, though. > If it seems sound enough, then I may implement it in C to see how much > overhead it adds to forward aggregation for floating point types, but even > if it did add too much overhead to forward aggregation it might be worth > allowing aggregates to have 2 forward transition functions and if the 2nd > one exists then it could be used in windowing functions where the frame > does not have "unbounded following". I don't think adding yet another type of aggregation function is the solution here. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Sun, Dec 15, 2013 at 2:00 PM, Tom Lane wrote: > Greg Stark writes: > > On 14 Dec 2013 15:40, "Tom Lane" wrote: > >> I think you *can't* cover them for the float types; roundoff error > >> would mean you don't get the same answers as before. > > > I was going to say the same thing. But then I started to wonder > What's > > so special about the answers we used to give? They are also subject to > > round off and the results are already quite questionable in those cases. > > Well, we can't easily do better than the old answers, and the new ones > might be arbitrarily worse. Example: sum or average across single-row > windows ought to be exact in any case, but it might be arbitrarily wrong > with the negative-transition technique. > > More generally, this is supposed to be a performance enhancement only; > it's not supposed to change the results. > > It came to me that it might be possible to implement inverse transitions for floating point aggregates by just detecting if precision has been lost during forward transitions. I've written the test to do this as: IF state.value + value = state.value AND value <> 0 THEN newstate.precision_lost := true; newstate.value := state.value; ELSE newstate.precision_lost := false; newstate.value := state.value + value; END IF; The inverse transition function checks the precision_lost and if it's true it returns NULL. The core code is now implemented (thanks to Florian) to re-aggregate when NULL is returned from the inverse transition function. I've attached an implementation of this with the transition functions written in plpgsql. I don't really know for sure yet if it can handle all cases and give the exact same results as it would without inverse transitions, but it certainly fixes the error case which was presented Using the attached on HEAD of https://github.com/david-rowley/postgres/commits/invtrans explain (analyze, verbose) select mysum(v) over (order by i rows between current row and unbounded following) from (values(1,1e20),(2,1)) b(i,v); Gives me the expected results of 1e20 and 1, instead of my original attempt which gave 1e20 and 0. I guess the extra tracking on forward transition might mean this would not be practical to implement in C for sum(float), but I just wanted to run the idea through a few heads to see if anyone can present a case where it can still produce wrong results. If it seems sound enough, then I may implement it in C to see how much overhead it adds to forward aggregation for floating point types, but even if it did add too much overhead to forward aggregation it might be worth allowing aggregates to have 2 forward transition functions and if the 2nd one exists then it could be used in windowing functions where the frame does not have "unbounded following". Any thoughts? Regards David Rowley BEGIN WORK; CREATE TYPE float_state AS (precision_lost bool, value float); CREATE OR REPLACE FUNCTION float_sum(state float_state, value float) RETURNS float_state AS $$ DECLARE newstate float_state; BEGIN IF state IS NULL THEN IF value IS NULL THEN RETURN NULL; ELSE newstate.value := value; newstate.precision_lost := false; return newstate; END IF; END IF; IF state.value + value = state.value AND value <> 0 THEN newstate.precision_lost := true; newstate.value := state.value; ELSE newstate.precision_lost := false; newstate.value := state.value + value; END IF; RETURN newstate; END; $$ LANGUAGE plpgsql IMMUTABLE; CREATE OR REPLACE FUNCTION float_sum_inv(state float_state, value float) RETURNS float_state AS $$ DECLARE newstate float_state; BEGIN IF state.precision_lost = true THEN RETURN NULL; ELSE newstate.value := state.value - value; RETURN newstate; END IF; END; $$ LANGUAGE plpgsql STRICT IMMUTABLE; CREATE OR REPLACE FUNCTION float_sum_final(state float_state) RETURNS float AS $$ BEGIN IF NOT(state IS NULL) THEN RETURN state.value; ELSE RETURN NULL; END IF; END; $$ LANGUAGE plpgsql IMMUTABLE; CREATE AGGREGATE mysum (float) ( stype = float_state, sfunc = float_sum, invfunc = float_sum_inv, finalfunc = float_sum_final ); select mysum(v) from (values(1,1e20),(2,1)) b(i,v); -- forces re-aggregate due to precision loss --explain (analyze, verbose) select mysum(v) over (order by i rows between current row and unbounded following) from (values(1,1e20),(2,1)) b(i,v); -- does not force reaggregate. --explain (analyze, verbose) select mysum(v) over (order by i rows between current row and unbounded following) from (values(1,1),(2,2),(3,3)) b(i,v); rollback; -- Sent via pgsql-hackers mailing list (pgsql-hackers@post
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan20, 2014, at 08:42 , David Rowley wrote: >> On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug wrote: >> * An assert that the frame end doesn't move backwards - I realized that >> it is after all easy to do that, if it's done after the loop which adds >> the new values, not before. > > I've applied this too, but I'm wondering why an elog for if the head moves > back, but an assert if the tail moves back? When I put the frame head check in, I was concerned that the code might crash or loop endlessly if aggregatedbase was ever larger than frameheadpos, so I made it elog(), just for safety. The frame end check, OTOH, is done at the very end, so it doesn't protect against much, it just documents that it's not supposed to happen. But yeah, it's bit weird. Feel free to turn the frame end check into an elog() too if you want to. >> * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and >> it's the last commit, so if you object to that, then you can merge up to >> eafa72330f23f7c970019156fcc26b18dd55be27 instead of >> de3d9148be9732c4870b76af96c309eaf1d613d7. > > > Seems like sfunc really should be tfunc then we could have invtfunc. I'd > probably > understand this better if I knew what the 's' was for in sfunc. I've not > applied > this just yet. Do you have a reason why you think it's better? My issue with just "invfunc" is mainly that it's too generic - it doesn't tell you what it's supposed to be the inverse of. I've always assumed that 's' in 'sfunc' and 'stype' stands for 'state', and that the naming is inspired by control theory, where the function which acts on the state space is often called S. > Thanks, yeah those really do need a review. I've lost a bit of direction with > them and I'm not quite sure just how much detail to go in to with it. I'd like > to explain a bit that users who need to use their numeric columns in window > aggregates might want to think about having a defined scale to the numeric > rather > than an undefined scale and explain that this is because the inverse > transition > function for numeric bails out if it loses the maximum seen dscale. Though it > does seem generally a good idea to have a defined scale, but then I guess > you've > got to have a bit of knowledge about the numbers you're storing in that case. > I'm not quite sure how to put that into words friendly enough for the > documents > just yet and or exactly where to put the words. So any ideas or patches you > have > around that would be great. Here's what I image the docs should look like, very roughly * In CREATE AGGREGATE, we should state the precise axioms we assume about forward and inverse transition functions. The last time I read the text there, it was a bit ambiguous about whether inverse transition functions assume commutativity, i.e. whether we assume that we can remove inputs other than the first one from the aggregation. Currently, all the inverse transition functions we have are, in fact, commutative, because all the forward transition function are also. But we could e.g. add an inverse to array_agg and string_agg, and those would obviously need this ordering guarantee. I'd also like us to explain that the "inverse" in "inverse transition function" shouldn't be taken too literally. For forward transition function F, inverse transition function G, and inputs a,b,... we *don't require that G(F(s,a),a) == s. We we do require is that if I is the initial state, then G(F(...F(F(I,a),b)...,z),a) == F(...F(I,b)...,z). (Well, actually we don't strictly require even that, the two states don't need to be identical, they just need to produce identical outputs when passed to the final function) * In CREATE AGGREGATE, we should also explain the NULL semantics you get with various combinations of strict and non-strict forward and inverse transition functions. I think a table listing all the combinations is in order there, with a column explaining the semantics you get. We should also clearly state that once you provide an inverse transition function, NULL isn't a valid state value anymore, except as an initial state, i.e. that the forward transition function must never return NULL in this case. * The window function page should explain the performance hazards when a moving frame head is involved. Ideally, we'd list which aggregates never have to restart, and which sometimes might, and what you can do to avoid that. We can also tell people to use EXPLAIN VERBOSE ANALYZE to check whether restarts are occurring. This is were we'd tell people to cast their numeric operands to a type with defined scale to avoid restarts, and were we'd state the SUM(float) *does* restart always due to precision loss issues. BTW, something that we haven't addressed at all is how inverse transition functions interact with what is called "ordered-set aggregates". I haven't wrapped my head around these fully, I think, so I'm still not sure if there's anything to do there or not. C
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug wrote: > On Jan19, 2014, at 20:00 , David Rowley wrote: > > I've applied that patch again and put in the sort operators. > > I've push a new version to https://github.com/fgp/postgres/tree/invtrans > which includes > > * A bunch of missing declaration for *_inv functions > > Thanks, I've applied that. > * An assert that the frame end doesn't move backwards - I realized that > it is after all easy to do that, if it's done after the loop which adds > the new values, not before. > > I've applied this too, but I'm wondering why an elog for if the head moves back, but an assert if the tail moves back? > * EXPLAIN VERBOSE ANALYZE now shows the max. number of forward aggregate > transitions per row and aggregate. It's a bit imprecise, because it > doesn't > track the count per aggregate, but it's still a good metric for how well > the inverse transition functions work. If the number is close to one, you > know that very few rescans are happening. > > I've not looked at this yet and I don't think I'll have time tonight, but it sounds interesting. I guess it might be quite nice to have a way to see this especially with the way the numeric stuff works, it might be actually pretty hard to otherwise know how many inverse transition "failures" there had been. Do you think it's also worth tracking the inverse transition failures too? * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, > and > it's the last commit, so if you object to that, then you can merge up to > eafa72330f23f7c970019156fcc26b18dd55be27 instead of > de3d9148be9732c4870b76af96c309eaf1d613d7. > > Seems like sfunc really should be tfunc then we could have invtfunc. I'd probably understand this better if I knew what the 's' was for in sfunc. I've not applied this just yet. Do you have a reason why you think it's better? > A few more things I noticed, all minor stuff > > * do_numeric_discard()'s inverseTransValid flag is unnecessary. First, if > the > inverse transition function returns NULL once, we never call it again, > so the > flag won't have any practical effect. And second, assume we ever called > the > forward transition function after the inverse fail, and then retried the > inverse. > In the case of do_numeric_discard(), that actually *could* allow the > inverse > to suddenly succeed - if the call to the forward function increased the > dscale > beyond that of the element we tried to remove, removal would suddenly be > possible again. We never do that, of course, and it seems unlikely we > ever > will. But it's still weird to have code which serves no other purpose > than to > pessimize a case which would otherwise just work fine. > > hmm, yeah of course, you are right. I've removed this now. > * The state == NULL checks in all the strict inverse transition functions > are > redundant. > > ok, I've removed these and added comments to note that these functions should be declared strict. > I haven't taken a close look at the documentation yet, I hope to be able to > do that tomorrow. Otherwise, things look good as far as I'm concerned. > > Thanks, yeah those really do need a review. I've lost a bit of direction with them and I'm not quite sure just how much detail to go in to with it. I'd like to explain a bit that users who need to use their numeric columns in window aggregates might want to think about having a defined scale to the numeric rather than an undefined scale and explain that this is because the inverse transition function for numeric bails out if it loses the maximum seen dscale. Though it does seem generally a good idea to have a defined scale, but then I guess you've got to have a bit of knowledge about the numbers you're storing in that case. I'm not quite sure how to put that into words friendly enough for the documents just yet and or exactly where to put the words. So any ideas or patches you have around that would be great. Once again thanks for all your work on this. Regards David Rowley > best regards, > Florian Pflug > >
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan19, 2014, at 20:00 , David Rowley wrote: > I've applied that patch again and put in the sort operators. I've push a new version to https://github.com/fgp/postgres/tree/invtrans which includes * A bunch of missing declaration for *_inv functions * An assert that the frame end doesn't move backwards - I realized that it is after all easy to do that, if it's done after the loop which adds the new values, not before. * EXPLAIN VERBOSE ANALYZE now shows the max. number of forward aggregate transitions per row and aggregate. It's a bit imprecise, because it doesn't track the count per aggregate, but it's still a good metric for how well the inverse transition functions work. If the number is close to one, you know that very few rescans are happening. * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and it's the last commit, so if you object to that, then you can merge up to eafa72330f23f7c970019156fcc26b18dd55be27 instead of de3d9148be9732c4870b76af96c309eaf1d613d7. A few more things I noticed, all minor stuff * do_numeric_discard()'s inverseTransValid flag is unnecessary. First, if the inverse transition function returns NULL once, we never call it again, so the flag won't have any practical effect. And second, assume we ever called the forward transition function after the inverse fail, and then retried the inverse. In the case of do_numeric_discard(), that actually *could* allow the inverse to suddenly succeed - if the call to the forward function increased the dscale beyond that of the element we tried to remove, removal would suddenly be possible again. We never do that, of course, and it seems unlikely we ever will. But it's still weird to have code which serves no other purpose than to pessimize a case which would otherwise just work fine. * The state == NULL checks in all the strict inverse transition functions are redundant. I haven't taken a close look at the documentation yet, I hope to be able to do that tomorrow. Otherwise, things look good as far as I'm concerned. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Mon, Jan 20, 2014 at 8:42 PM, David Rowley wrote: > On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug wrote: > >> * EXPLAIN VERBOSE ANALYZE now shows the max. number of forward aggregate >> transitions per row and aggregate. It's a bit imprecise, because it >> doesn't >> track the count per aggregate, but it's still a good metric for how well >> the inverse transition functions work. If the number is close to one, >> you >> know that very few rescans are happening. >> >> > I've not looked at this yet and I don't think I'll have time tonight, but > it sounds interesting. I guess it might be quite nice to have a way to see > this especially with the way the numeric stuff works, it might be actually > pretty hard to otherwise know how many inverse transition "failures" there > had been. Do you think it's also worth tracking the inverse transition > failures too? > > I've merged this patch but I attempted to get it into a bit more of a ready state by moving the code out into a helper function the same way as the other explain stuff is done. I've not touched explain before so do let me know if I've made it worse. https://github.com/david-rowley/postgres/commits/invtrans Regards David Rowley
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan19, 2014, at 20:00 , David Rowley wrote: > I've applied that patch again and put in the sort operators. I've push a new version to https://github.com/fgp/postgres/tree/invtrans This branch includes the following changes * A bunch of missing declaration for *_inv functions * An assert that the frame end doesn't move backwards - I realized that it is after all easy to do that, if it's done after the loop which adds the new values, not before. * EXPLAIN VERBOSE ANALYZE now shows the max. number of forward aggregate transitions per row and aggregate. It's a bit imprecise, because it doesn't track the count per aggregate, but it's still a good metric for how well the inverse transition functions work. If the number is close to one, you know that very few rescans are happening. * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and it's the last commit, so if you object to that, then you can merge up to eafa72330f23f7c970019156fcc26b18dd55be27 instead of de3d9148be9732c4870b76af96c309eaf1d613d7. A few more things I noticed, all minor stuff * do_numeric_discard()'s inverseTransValid flag is unnecessary. First, if the inverse transition function returns NULL once, we never call it again, so the flag won't have any practical effect. And second, assume we ever called the forward transition function after the inverse fail, and then retried the inverse. In the case of do_numeric_discard(), that actually *could* allow the inverse to suddenly succeed - if the call to the forward function increased the dscale beyond that of the element we tried to remove, removal would suddenly be possible again. We never do that, of course, and it seems unlikely we ever will. But it's still weird to have code which serves no other purpose than to pessimize a case which would otherwise just work fine. * The state == NULL checks in all the strict inverse transition functions are redundant. I haven't taken a close look at the documentation yet, I hope to be able to do that tomorrow. Otherwise, things look good as far as I'm concerned. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Mon, Jan 20, 2014 at 5:53 AM, Florian Pflug wrote: > On Jan19, 2014, at 05:27 , David Rowley wrote: > >> I just finished implementing the inverse transition functions for > bool_and > >> and bool_or, these aggregates had a sort operator which I assume would > have > >> allowed an index scan to be performed, but since I had to change the > first > >> argument of these aggregates to internal and that meant I had to get > rid of > >> the sort operator... > > Why does having transition type "internal" prevent you from specifying a > sort operator? The sort operator's argument types must match the *input* > type of the aggregate, not the transition type. > > Here's a pure SQL implementation of an optimized bool_and called myand_agg > that uses state type bigint[] and specifies a sort operator. > > create or replace function myboolagg_fwd(counts bigint[], value bool) > returns bigint[] as $$ > select array[ > counts[1] + case value when true then 0 else 1 end, > counts[2] + case value when true then 1 else 0 end > ] > $$ language sql strict immutable; > > create or replace function myboolagg_inv(counts bigint[], value bool) > returns bigint[] as $$ > select array[ > counts[1] - case value when true then 0 else 1 end, > counts[2] - case value when true then 1 else 0 end > ] > $$ language sql strict immutable; > > create or replace function myboolagg_and(counts bigint[]) > returns bool as $$ > select case counts[1] when 0 then true else false end > $$ language sql strict immutable; > > create aggregate myand_agg (bool) ( > stype = bigint[], > sfunc = myboolagg_fwd, > invfunc = myboolagg_inv, > finalfunc = myboolagg_and, > sortop = <, > initcond = '{0,0}' > ); > > With this, doing > > create table boolvals as > select i, random() < 0.5 as v from generate_series(1,1) i; > create index on boolvals(v); > > explain analyze select myand_agg(v) from boolvals; > > yields > > Result (cost=0.33..0.34 rows=1 width=0) (actual time=0.067..0.067 rows=1 > loops=1) >InitPlan 1 (returns $0) > -> Limit (cost=0.29..0.33 rows=1 width=1) (actual time=0.061..0.061 > rows=1 loops=1) >-> Index Only Scan using boolvals_v_idx on boolvals > (cost=0.29..474.41 rows=9950 width=1) (actual time=0.061..0.061 rows=1 > loops=1) > Index Cond: (v IS NOT NULL) > Heap Fetches: 1 > Total runtime: 0.100 ms > > which looks fine, no? > > hmm, yeah you're right. I guess I didn't quite think through what the sort comparison was comparing with, for some reason I had it in my head that it was the aggregate state and not another value in a btree index. I've applied that patch again and put in the sort operators. Thanks for looking at that. Regards David Rowley > best regards, > Florian Pflug > >
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan19, 2014, at 05:27 , David Rowley wrote: >> I just finished implementing the inverse transition functions for bool_and >> and bool_or, these aggregates had a sort operator which I assume would have >> allowed an index scan to be performed, but since I had to change the first >> argument of these aggregates to internal and that meant I had to get rid of >> the sort operator... Why does having transition type "internal" prevent you from specifying a sort operator? The sort operator's argument types must match the *input* type of the aggregate, not the transition type. Here's a pure SQL implementation of an optimized bool_and called myand_agg that uses state type bigint[] and specifies a sort operator. create or replace function myboolagg_fwd(counts bigint[], value bool) returns bigint[] as $$ select array[ counts[1] + case value when true then 0 else 1 end, counts[2] + case value when true then 1 else 0 end ] $$ language sql strict immutable; create or replace function myboolagg_inv(counts bigint[], value bool) returns bigint[] as $$ select array[ counts[1] - case value when true then 0 else 1 end, counts[2] - case value when true then 1 else 0 end ] $$ language sql strict immutable; create or replace function myboolagg_and(counts bigint[]) returns bool as $$ select case counts[1] when 0 then true else false end $$ language sql strict immutable; create aggregate myand_agg (bool) ( stype = bigint[], sfunc = myboolagg_fwd, invfunc = myboolagg_inv, finalfunc = myboolagg_and, sortop = <, initcond = '{0,0}' ); With this, doing create table boolvals as select i, random() < 0.5 as v from generate_series(1,1) i; create index on boolvals(v); explain analyze select myand_agg(v) from boolvals; yields Result (cost=0.33..0.34 rows=1 width=0) (actual time=0.067..0.067 rows=1 loops=1) InitPlan 1 (returns $0) -> Limit (cost=0.29..0.33 rows=1 width=1) (actual time=0.061..0.061 rows=1 loops=1) -> Index Only Scan using boolvals_v_idx on boolvals (cost=0.29..474.41 rows=9950 width=1) (actual time=0.061..0.061 rows=1 loops=1) Index Cond: (v IS NOT NULL) Heap Fetches: 1 Total runtime: 0.100 ms which looks fine, no? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Sat, Jan 18, 2014 at 11:34 AM, David Rowley wrote: > > The latest version of the patch is attached. > > I've attached an updated version of the patch. I'm now using github to track the changes on the patch, so I've included the commit sha in the file name of the latest commit that this patch includes, but I've also included the date. Please see https://github.com/david-rowley/postgres/commits/invtrans for what's been changed. Right now I don't think there is very much left to do. Perhaps the documents need some examples of creating inverse transition functions, I was not sure, so I left them out for now. Regards David Rowley > Regards > > David Rowley > > inverse_transition_functions_d00df99_2014-01-20.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Sun, Jan 19, 2014 at 5:27 PM, David Rowley wrote: > >> >> It's probably far more worth it for the bool and/or aggregates. We >> could just >> >> keep track of the values aggregated and the count of values as "true" >> and return >> >> true if those are the same in the case of "AND", then check the true >> count >> >> is > 0 in the case of "OR". I'd feel more strongly to go and do that >> if I'd >> >> actually ever used those aggregates for anything. >> >> That, OTOH, would be worthwhile I think. I'll go do that, though probably >> not today. I hope to get to it sometime tomorrow. >> > > I've commited a patch to the github repo to do this. > > https://github.com/david-rowley/postgres/commit/121b0823753cedf33bb94f646df3176b77f28500 > but I'm not sure if we can keep it as I had to remove the sort op as I > explained above. > >> >> I think I'm going to have to revert the patch which implements the inverse transition function for bool_and and bool_or. I tested on an instance of 9.3.2 and the following queries use index scans. create table booltest (b boolean not null); insert into booltest (b) select false from generate_series(1,2) g(n); insert into booltest (b) values(true); create index booltest_b_idx ON booltest(b); vacuum analyze booltest; explain select bool_or(b) from booltest; explain select bool_and(b) from booltest; I'm guessing there is no way to have an internal state type on the aggregate and a sort operator on the aggregate. I wonder if it is worth creating naive inverse transition functions similar to max()'s and min()'s inverse transition functions. I guess on average they've got about a 50% chance of being used and likely for some work loads it would be a win. What's your thoughts? Regards David Rowley
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Sun, Jan 19, 2014 at 3:22 AM, Florian Pflug wrote: > > BTW, this made me realize that MIN and MAX currently have the same issue - > they'll rescan if the inputs are all equal. We could avoid that by doing > what > you did with dscale - track the number of times we've seen the maximum. I > wonder if that would be worth it - it would, unfortunately, require use to > use state type "internal" there too, and hence to add final functions for > all > the MIN/MAX aggregates. But that seems excessive. So for now, let's just > live with that. > > Yeah, it's an idea... I had actually talked a bit about it before when first I posted about inverse transition functions. http://www.postgresql.org/message-id/caaphdvqu+ygw0vbpbb+yxhrpg5vcy_kifyi8xmxfo8kyocz...@mail.gmail.com But I've only realised today that it might be a no go. Let me explain... I just finished implementing the inverse transition functions for bool_and and bool_or, these aggregates had a sort operator which I assume would have allowed an index scan to be performed, but since I had to change the first argument of these aggregates to internal and that meant I had to get rid of the sort operator... So I'm not actually sure that we really should implement inverse transition functions for bool_and and bool_or because of this. Never-the-less I commited a patch to the github repo which implements them. I guess this sort operator problem completely writes off doing something similar for MAX and MIN as that would mean no index scan would be possible for these aggregates! > If we really *do* want to optimize this case, we could > come to it from a completely different angle. Aggregates already > special-case > MIN and MAX to be able to use an index to evalutate SELECT MAX(c) FROM t. > If we provided a way for the transition function to call the sort operator > specified by SORTOP in CREATE AGGREGATE, one generic triple of forward and > inverse transition function and final function would work for all the > MIN and MAX aggregates. But that's material for a separate patch for 9.5 > > > Note that after this fix the results for my quick benchmark now look > like: > > > > create table num (num numeric(10,2) not null); > > insert into num (num) select * from generate_series(1,2); > > select sum(num) over(order by num rows between current row and unbounded > following) from num; -- 113 ms > > select sum(num / 10) over(order by num rows between current row and > unbounded following) from num; -- 156ms > > select sum(num / 1) over(order by num rows between current row and > unbounded following) from num; -- 144 ms > > > > So it seems to be much less prone to falling back to brute force forward > > transitions. > > It also seems the / 10 version must have had to previously do 1 brute > > force rescan but now it looks like it can do it in 1 scan. > > > >>> I'm not set on it, and I'm willing to try the lazy zeroing of the scale > >>> tracker array, but I'm just not quite sure what extra real use cases it > >>> covers that the one above does not. Perhaps my way won't perform > inverse > >>> transitions if the query did sum(numericCol / 10) or so. > >> > >> Dunno how many people SUM over numerics with different dscales. Easily > >> possible that it's few enough to not be worth fussing over. > > > > Going by Tom's comments in the post above this is possible just by > having an > > unconstrained numeric column, but I guess there's still a good chance > that > > even those unconstrained numbers have the same scale or at least the > scale > > will likely not vary wildly enough to make us have to perform brute force > > forward transitions for each row. > > Yeah, I'm convinced by now that your approach is the right trade-off there. > Those who do have values with wildly different dscales in their columns can > always add a cast to normalize them, if they experience a lot of restarts. > > So let's just add a sentence or two to the SUM(numeric) documentation about > this, and be done. > > I had a quick look and I couldn't decide the best place to write about specific details on inverse transition functions. The best place I could see was to add a note under the aggregates table here: http://www.postgresql.org/docs/devel/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-TABLE > > > > I did think of this when I wrote them. I thought that the removing 0 > case might > > be quite common and worth it, but I thought the ~0 case would be less > common, > > but I just thought it was weird to do one without the other. > > To do more tracking on these it looks like we'd need to change those > aggregates > > to use an state type that is internal and I think the extra tracking > would mean > > looping over a 8, 32 or 64 element array of int64's for each value, I > just don't > > think that would be a winner performance wise since the code that's > there is > > pretty much a handful of CPU cycles. > > Yeah, this is similar to the SUM(numeric) problem in that we *could* avoid >
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Jan18, 2014, at 06:15 , David Rowley wrote: >> On Sat, Jan 18, 2014 at 2:20 PM, Florian Pflug wrote: >> On Jan17, 2014, at 23:34 , David Rowley wrote: >>> The test turned out to become: >>> if (state->expectedScale == -1) >>> state->expectedScale = X.dscale; >>> else if (state->expectedScale != X.dscale) >>> state->expectedScale = -2; >>> >>> In do_numeric_accum, then when we do the inverse I just check if >>> expectedScale < 0 then return NULL. >> >> Ok, so this will rescan if and only if the dscales of all inputs match. >> I still that's overly pessimistic - we've only got a problem when we >> removed the input with the largest dscale, no? So my suggestion would be >> >> >> >> I'd think that this avoids more restarts without about the same effort, >> but I haven't tried this though, so maybe I'm missing something. > > This is not quite right as it means if all the values are the same then > we reject inverse transitions since state->maxScale will always be equal > to X.dscale. > But you are right about the overly strict code I've put in, we should allow > values with a less than the maximum dscale to be unaggregated without > complaint. To implement this I needed a maxScaleCounter variable too so we > only reject when the maxScaleCounter gets back to 0 again. Ups, sorry, yeah. Sounds sensible. BTW, this made me realize that MIN and MAX currently have the same issue - they'll rescan if the inputs are all equal. We could avoid that by doing what you did with dscale - track the number of times we've seen the maximum. I wonder if that would be worth it - it would, unfortunately, require use to use state type "internal" there too, and hence to add final functions for all the MIN/MAX aggregates. But that seems excessive. So for now, let's just live with that. If we really *do* want to optimize this case, we could come to it from a completely different angle. Aggregates already special-case MIN and MAX to be able to use an index to evalutate SELECT MAX(c) FROM t. If we provided a way for the transition function to call the sort operator specified by SORTOP in CREATE AGGREGATE, one generic triple of forward and inverse transition function and final function would work for all the MIN and MAX aggregates. But that's material for a separate patch for 9.5 > Note that after this fix the results for my quick benchmark now look like: > > create table num (num numeric(10,2) not null); > insert into num (num) select * from generate_series(1,2); > select sum(num) over(order by num rows between current row and unbounded > following) from num; -- 113 ms > select sum(num / 10) over(order by num rows between current row and unbounded > following) from num; -- 156ms > select sum(num / 1) over(order by num rows between current row and unbounded > following) from num; -- 144 ms > > So it seems to be much less prone to falling back to brute force forward > transitions. > It also seems the / 10 version must have had to previously do 1 brute > force rescan but now it looks like it can do it in 1 scan. > >>> I'm not set on it, and I'm willing to try the lazy zeroing of the scale >>> tracker array, but I'm just not quite sure what extra real use cases it >>> covers that the one above does not. Perhaps my way won't perform inverse >>> transitions if the query did sum(numericCol / 10) or so. >> >> Dunno how many people SUM over numerics with different dscales. Easily >> possible that it's few enough to not be worth fussing over. > > Going by Tom's comments in the post above this is possible just by having an > unconstrained numeric column, but I guess there's still a good chance that > even those unconstrained numbers have the same scale or at least the scale > will likely not vary wildly enough to make us have to perform brute force > forward transitions for each row. Yeah, I'm convinced by now that your approach is the right trade-off there. Those who do have values with wildly different dscales in their columns can always add a cast to normalize them, if they experience a lot of restarts. So let's just add a sentence or two to the SUM(numeric) documentation about this, and be done. >> * build_aggregate_fnexprs() should allow NULL to be passed for >> invtransfn_oid, >> I think. I don't quite like that we construct that even for plain >> aggregates, >> and avoiding requires just an additional if. >> >> I'm not quite sure what you mean on this. It passes InvalidOid in normal >> aggregate calls (search for: "InvalidOid, /* invtrans is not needed here */") >> and only looks up the function in build_aggregate_fnexprs if >> (OidIsValid(invtransfn_oid)) is true. I'm not sure how this can be improved >> since that function is used for window aggregates and normal aggregates. I was thinking about checking for **invtransfnexpr = NULL, and not assigning if it is. But on second thought, you're right - the additional variable doesn't really hurt. So let's leave it as it is
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Sat, Jan 18, 2014 at 6:15 PM, David Rowley wrote: > On Sat, Jan 18, 2014 at 2:20 PM, Florian Pflug wrote: > >> * Don't we need to check for volatile function in the filter expression >> too? >> > >> > I did manual testing on this before and the volatility test for the > aggregate arguments seems to cover this. I didn't look into why but it just > did. I've not test this again since your refactoring. I could test this > easily before when my numeric case was changing the results because of the > dscale problem, I noticed that if I did FILTER(WHERE random() > 0) that the > extra trailing zeros would disappear. > The problem now is that it's pretty hard to determine if an inverse > transition took place, the only way we can really tell is performance. I'll > see if I can invent a new test case for this by creating a user defined > aggregate as you described. I'm thinking just append '+' to a string for > transitions and '-' to a string for inverse transitions, then just make > sure we only have a string of '+'s when doing something like filter(where > random() >= 0). > > > I've added a test case for this and it seem work as expected: https://github.com/david-rowley/postgres/commit/43a5021e8f8ae1af272e7e21a842d1b0d5cbe577 Regards David Rowley
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Sat, Jan 18, 2014 at 2:20 PM, Florian Pflug wrote: > First, I've go the feeling that I should somehow update the commitfest app, > but I don't really know in which way. Should I put myself in as a reviewer, > or as a second author? Or neither? Suggestions welcome... > > We I guess you're both now, but it's a bit weird to be author and reviewer so I've put your name against author too, hopefully Dean can review our combined results and we can review each other's work at the same time. > On Jan17, 2014, at 23:34 , David Rowley wrote: > > The test turned out to become: > > if (state->expectedScale == -1) > > state->expectedScale = X.dscale; > > else if (state->expectedScale != X.dscale) > > state->expectedScale = -2; > > > > In do_numeric_accum, then when we do the inverse I just check if > > expectedScale < 0 then return NULL. > > Ok, so this will rescan if and only if the dscales of all inputs match. > I still that's overly pessimistic - we've only got a problem when we > removed the input with the largest dscale, no? So my suggestion would be > > state->maxDScale = MAX(X.dscale, state->maxDScale); > > in do_numeric_accum, and in the inverse > > if (state->maxDScane == X.dscale) > return PG_RETURN_NULL; > > I'd think that this avoids more restarts without about the same effort, > but I haven't tried this though, so maybe I'm missing something. > > This is not quite right as it means if all the values are the same then we reject inverse transitions since state->maxScale will always be equal to X.dscale. But you are right about the overly strict code I've put in, we should allow values with a less than the maximum dscale to be unaggregated without complaint. To implement this I needed a maxScaleCounter variable too so we only reject when the maxScaleCounter gets back to 0 again. Note that after this fix the results for my quick benchmark now look like: create table num (num numeric(10,2) not null); insert into num (num) select * from generate_series(1,2); select sum(num) over(order by num rows between current row and unbounded following) from num; -- 113 ms select sum(num / 10) over(order by num rows between current row and unbounded following) from num; -- 156ms select sum(num / 1) over(order by num rows between current row and unbounded following) from num; -- 144 ms So it seems to be much less prone to falling back to brute force forward transitions. It also seems the / 10 version must have had to previously do 1 brute force rescan but now it looks like it can do it in 1 scan. > I'm not set on it, and I'm willing to try the lazy zeroing of the scale > > tracker array, but I'm just not quite sure what extra real use cases it > > covers that the one above does not. Perhaps my way won't perform inverse > > transitions if the query did sum(numericCol / 10) or so. > > Dunno how many people SUM over numerics with different dscales. Easily > possible that it's few enough to not be worth fussing over. > > Going by Tom's comments in the post above this is possible just by having an unconstrained numeric column, but I guess there's still a good chance that even those unconstrained numbers have the same scale or at least the scale will likely not vary wildly enough to make us have to perform brute force forward transitions for each row. > > create table num (num numeric(10,2) not null); > > insert into num (num) select * from generate_series(1,2); > > select sum(num) over(order by num rows between current row and unbounded > following) from num; -- 124ms > > select sum(num / 10) over(order by num rows between current row and > unbounded following) from num; -- 254ms > > select sum(num / 1) over(order by num rows between current row and > unbounded following) from num; -- 108156.917 ms > > > > The divide by 1 case is slow because of that weird 20 trailing zero > > instead of 16 when dividing a numeric by 1 and that causes the inverse > > transition function to return NULL because the scale changes. > > > > I've not tested an unpatched version yet to see how that divide by 1 > query > > performs on that but I'll get to that soon. > > So without the patch, all three queries should perform simiarly, i.e. take > about 10 seconds, right? If so, the improvement is fantastic! > > Well, it's actually 100 seconds, not 10. I tested the worse case performance against an unpatched head and got 107 seconds instead of the 108. So I'm guessing that's pretty good as worse case is not really any worse and the worse case is pretty hard to get to. I guess the results would have to all have a different scale with the biggest scale on the first aggregated values... Reaching that worse case just seems impossible in a real world workload. > > I'm thinking that the idea about detecting the numeric range with > floating > > point types and performing an inverse transition providing the range has > > not gone beyond some defined danger zone is not material for this > patch
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
First, I've go the feeling that I should somehow update the commitfest app, but I don't really know in which way. Should I put myself in as a reviewer, or as a second author? Or neither? Suggestions welcome... On Jan17, 2014, at 23:34 , David Rowley wrote: > The test turned out to become: > if (state->expectedScale == -1) > state->expectedScale = X.dscale; > else if (state->expectedScale != X.dscale) > state->expectedScale = -2; > > In do_numeric_accum, then when we do the inverse I just check if > expectedScale < 0 then return NULL. Ok, so this will rescan if and only if the dscales of all inputs match. I still that's overly pessimistic - we've only got a problem when we removed the input with the largest dscale, no? So my suggestion would be state->maxDScale = MAX(X.dscale, state->maxDScale); in do_numeric_accum, and in the inverse if (state->maxDScane == X.dscale) return PG_RETURN_NULL; I'd think that this avoids more restarts without about the same effort, but I haven't tried this though, so maybe I'm missing something. > I'm not set on it, and I'm willing to try the lazy zeroing of the scale > tracker array, but I'm just not quite sure what extra real use cases it > covers that the one above does not. Perhaps my way won't perform inverse > transitions if the query did sum(numericCol / 10) or so. Dunno how many people SUM over numerics with different dscales. Easily possible that it's few enough to not be worth fussing over. > create table num (num numeric(10,2) not null); > insert into num (num) select * from generate_series(1,2); > select sum(num) over(order by num rows between current row and unbounded > following) from num; -- 124ms > select sum(num / 10) over(order by num rows between current row and unbounded > following) from num; -- 254ms > select sum(num / 1) over(order by num rows between current row and unbounded > following) from num; -- 108156.917 ms > > The divide by 1 case is slow because of that weird 20 trailing zero > instead of 16 when dividing a numeric by 1 and that causes the inverse > transition function to return NULL because the scale changes. > > I've not tested an unpatched version yet to see how that divide by 1 query > performs on that but I'll get to that soon. So without the patch, all three queries should perform simiarly, i.e. take about 10 seconds, right? If so, the improvement is fantastic! > I'm thinking that the idea about detecting the numeric range with floating > point types and performing an inverse transition providing the range has > not gone beyond some defined danger zone is not material for this patch... > I think it would be not a great deal of work to code, but the testing involved > would probably make this patch not possible for 9.4 Yeah, I never imagined that this would happen for 9.4. > The latest version of the patch is attached. OK, there are a few more comments * build_aggregate_fnexprs() should allow NULL to be passed for invtransfn_oid, I think. I don't quite like that we construct that even for plain aggregates, and avoiding requires just an additional if. * Don't we need to check for volatile function in the filter expression too? * As it stands, I don't think intXand_inv and intXor_inv are worth it, since the case where they return non-NULL is awefully slim (only for inputs containing only 1 respectively only zeros). We should either track the number of zeros and ones per bit, which would allow us to always perform inverse transitions, or just drop them. * Quite a few of the inverse transition functions are marked strict, yet contain code to handle NULL inputs. You can just remove that code - the system makes sure that strict functions never receive NULL arguments. Affected are, AFAICS numeric_accum_inv, numeric_avg_accum_inv, int2_accum_inv, int4_accum_inv, int8_accum_inv, int8_avg_accum_inv, int2_sum_inv, int4_sum_inv, int8_sum_inv. Not sure that list is exhaustive... * For any of the new inverse transition functions, I'd be inclined to just elog() if they're called directly and not as an aggregate. In particular those which check for that anyway, plus the *smaller_inv and *larger_inv ones. I don't see why anyone would ever want to call these directly - and if we elog() now, we can easy change these functions later, because no external code can depend on them. E.g., maybe someone wants to keep the top N elements in the MIN/MAX aggregates one day... * The number of new regression tests seems a bit excessive. I don't think there really a policy what to test and what not, but in this case I think it suffices if we test the basic machinery, plus the more complex functions. But e.g. most of the SUM and AVG aggregates use numeric_accum or numeric_avg_accum internally, and the wrapper code basically just does datatype conversion, so testing a few cases seems enough there. What I think we *should* test, but don't do curre
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
David Rowley writes: > I just thought that my idea was good enough and very cheap too, won't all > numerics that are actually stored in a column have the same scale anyway? No; unconstrained numeric columns (ie, if you just say "numeric") don't force their contents to any particular scale. It might be that we don't have to optimize for that case, since it's not in the SQL spec, but it is definitely supported by PG. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Sat, Jan 18, 2014 at 10:42 AM, David Rowley wrote: > >> You could do better than that - the numeric problem amounts to tracking >> the maximum >> scale AFAICS, so it'd be sufficient to restart if we remove a value whose >> scale equals >> the current maximum. But if we do SUM(numeric) at all, I think we should >> do so >> without requiring restarts - I still think the overhead of tracking the >> maximum scale >> within the aggregated values isn't that bad. If we zero the dscale >> counters lazily, >> the numbers of entries we have to zero is bound by the maximum dscale we >> encounter. >> Since actually summing N digits is probably more expensive than zeroing >> them, and since >> we pay the zeroing overhead only once per aggregation but save >> potentially many >> summations, I'm pretty sure we come out ahead by quite some margin. >> >> > We'll work that out, I don't think it will take very long to code up your > idea either. > I just thought that my idea was good enough and very cheap too, won't all > numerics that are actually stored in a column have the same scale anyway? > Is it not only been a problem because we've been testing with random > numeric literals the whole time? > > The test turned out to become: > if (state->expectedScale == -1) > state->expectedScale = X.dscale; > else if (state->expectedScale != X.dscale) > state->expectedScale = -2; > > In do_numeric_accum, then when we do the inverse I just check if > expectedScale < 0 then return NULL. > > I'm not set on it, and I'm willing to try the lazy zeroing of the scale > tracker array, but I'm just not quite sure what extra real use cases it > covers that the one above does not. Perhaps my way won't perform inverse > transitions if the query did sum(numericCol / 10) or so. > > I'll be committing this to the github repo very soon, so we can hack away > at the scale tracking code once it's back in. > > Ok, we're back up to 86 aggregate function / type combinations with inverse transition functions. I've commited my latest work up to github and here's a fresh patch which I will need to do more tests on. The test (below) that used to fail a few versions ago is back in there and it's now passing. SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n); In this case it won't use inverse transitions because the forward transition function detects that the scale is not fixed. I tested throwing some numerics into a table and I'm pretty happy with the results. create table num (num numeric(10,2) not null); insert into num (num) select * from generate_series(1,2); select sum(num) over(order by num rows between current row and unbounded following) from num; -- 124ms select sum(num / 10) over(order by num rows between current row and unbounded following) from num; -- 254ms select sum(num / 1) over(order by num rows between current row and unbounded following) from num; -- 108156.917 ms The divide by 1 case is slow because of that weird 20 trailing zero instead of 16 when dividing a numeric by 1 and that causes the inverse transition function to return NULL because the scale changes. I've not tested an unpatched version yet to see how that divide by 1 query performs on that but I'll get to that soon. I'm thinking that the idea about detecting the numeric range with floating point types and performing an inverse transition providing the range has not gone beyond some defined danger zone is not material for this patch... I think it would be not a great deal of work to code, but the testing involved would probably make this patch not possible for 9.4 The latest version of the patch is attached. Regards David Rowley inverse_transition_functions_v2.7.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers