Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On Sat, Jul 06, 2013 at 11:49:21AM +0100, Dean Rasheed wrote: On 5 July 2013 18:23, David Fetter da...@fetter.org wrote: Please find attached changes based on the above. This looks good. The grammar changes are smaller and neater now on top of the makeFuncCall() patch. Overall I think this patch offers useful additional functionality, in compliance with the SQL spec, which should be handy to simplify complex grouping queries. There's a minor typo in syntax.sgml: for each input row, each row matching same. --- fix attached. That is actually correct grammar, but may not be the easiest to translate. I think this is ready for committer. Thanks :) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On 5 July 2013 18:23, David Fetter da...@fetter.org wrote: Please find attached changes based on the above. This looks good. The grammar changes are smaller and neater now on top of the makeFuncCall() patch. Overall I think this patch offers useful additional functionality, in compliance with the SQL spec, which should be handy to simplify complex grouping queries. There's a minor typo in syntax.sgml: for each input row, each row matching same. --- fix attached. I think this is ready for committer. Regards, Dean filter_008b.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On Mon, Jul 01, 2013 at 05:30:38PM +0100, Dean Rasheed wrote: On 1 July 2013 01:44, David Fetter da...@fetter.org wrote: On Fri, Jun 28, 2013 at 09:22:52PM +0100, Dean Rasheed wrote: On 21 June 2013 06:16, David Fetter da...@fetter.org wrote: Please find attached a patch which allows subqueries in the FILTER clause and adds regression testing for same. This needs re-basing/merging following Robert's recent commit to make OVER unreserved. Please find attached. Thanks, Andrew Gierth! In this one, FILTER is no longer a reserved word. Looking at this patch again, it appears to be in pretty good shape. - Applies cleanly to head. - Compiles with no warnings. - Includes regression test cases and doc updates. - Compatible with the relevant part of T612, Advanced OLAP operations. - Includes pg_dump support. - Code changes all look reasonable, and I can't find any corner cases that have been missed. - Appears to work as expected. I tested everything I could think of and couldn't break it. AFAICT all the bases have been covered. As mentioned upthread, I would have preferred a few more regression test cases, and a couple of the tests don't appear to return anything interesting, but I'll leave that for the committer to decide whether they're sufficient for regression tests. I have a few suggestions to improve the docs: 1). In syntax.sgml: The aggregate_name can also be suffixed with FILTER as described below. It's not really a suffix to the aggregate name, since it follows the function arguments and optional order by clause. Perhaps it would be more consistent with the surrounding text to say something like replaceableexpression/replaceable is any value expression that does not itself contain an aggregate expression or a window function call, and !replaceableorder_by_clause/replaceable and !replaceablefilter_clause/replaceable are optional !literalORDER BY/ and literalFILTER/ clauses as described below. 2). In syntax.sgml: ... or when a FILTER clause is present, each row matching same. In the context of that paragraph this suggests that the filter clause only applies to the first form, since that paragraph is a description of the 4 forms of the aggregate function. I don't think it's worth mentioning FILTER in this paragraph at all --- it's adequately described below that. 3). In syntax.sgml: Adding a FILTER clause to an aggregate specifies which values of the expression being aggregated to evaluate. How about something a little more specific, along the lines of If literalFILTER/ is specified, then only input rows for which the replaceablefilter_clause/replaceable evaluates to true are fed to the aggregate function; input rows for which the replaceablefilter_clause/replaceable evaluates to false or the null value are discarded. For example... 4). In select.sgml: In the absence of a FILTER clause, aggregate functions It doesn't seem right to refer to the FILTER clause at the top level here because it's not part of the SELECT syntax being described on this page. Also I think this should include a cross-reference to the aggregate function syntax section, perhaps something like: Aggregate functions, if any are used, are computed across all rows making up each group, producing a separate value for each group (whereas without literalGROUP BY/literal, an aggregate produces a single value computed across all the selected rows). +The set of rows fed to the aggregate function can be further filtered +by attaching a literalFILTER/literal clause to the aggregate +function call, see xref ... for more information. When literalGROUP BY/literal is present, it is not valid for the commandSELECT/command list expressions to refer to ungrouped columns except within aggregate functions or if the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column. Please find attached changes based on the above. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate diff --git a/doc/src/sgml/keywords.sgml b/doc/src/sgml/keywords.sgml index 5e3b33a..ecfde99 100644 --- a/doc/src/sgml/keywords.sgml +++ b/doc/src/sgml/keywords.sgml @@ -1786,7 +1786,7 @@ /row row entrytokenFILTER/token/entry -entry/entry +entrynon-reserved/entry entryreserved/entry entryreserved/entry entry/entry @@
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On 1 July 2013 01:44, David Fetter da...@fetter.org wrote: On Fri, Jun 28, 2013 at 09:22:52PM +0100, Dean Rasheed wrote: On 21 June 2013 06:16, David Fetter da...@fetter.org wrote: Please find attached a patch which allows subqueries in the FILTER clause and adds regression testing for same. This needs re-basing/merging following Robert's recent commit to make OVER unreserved. Please find attached. Thanks, Andrew Gierth! In this one, FILTER is no longer a reserved word. Looking at this patch again, it appears to be in pretty good shape. - Applies cleanly to head. - Compiles with no warnings. - Includes regression test cases and doc updates. - Compatible with the relevant part of T612, Advanced OLAP operations. - Includes pg_dump support. - Code changes all look reasonable, and I can't find any corner cases that have been missed. - Appears to work as expected. I tested everything I could think of and couldn't break it. AFAICT all the bases have been covered. As mentioned upthread, I would have preferred a few more regression test cases, and a couple of the tests don't appear to return anything interesting, but I'll leave that for the committer to decide whether they're sufficient for regression tests. I have a few suggestions to improve the docs: 1). In syntax.sgml: The aggregate_name can also be suffixed with FILTER as described below. It's not really a suffix to the aggregate name, since it follows the function arguments and optional order by clause. Perhaps it would be more consistent with the surrounding text to say something like replaceableexpression/replaceable is any value expression that does not itself contain an aggregate expression or a window function call, and !replaceableorder_by_clause/replaceable and !replaceablefilter_clause/replaceable are optional !literalORDER BY/ and literalFILTER/ clauses as described below. 2). In syntax.sgml: ... or when a FILTER clause is present, each row matching same. In the context of that paragraph this suggests that the filter clause only applies to the first form, since that paragraph is a description of the 4 forms of the aggregate function. I don't think it's worth mentioning FILTER in this paragraph at all --- it's adequately described below that. 3). In syntax.sgml: Adding a FILTER clause to an aggregate specifies which values of the expression being aggregated to evaluate. How about something a little more specific, along the lines of If literalFILTER/ is specified, then only input rows for which the replaceablefilter_clause/replaceable evaluates to true are fed to the aggregate function; input rows for which the replaceablefilter_clause/replaceable evaluates to false or the null value are discarded. For example... 4). In select.sgml: In the absence of a FILTER clause, aggregate functions It doesn't seem right to refer to the FILTER clause at the top level here because it's not part of the SELECT syntax being described on this page. Also I think this should include a cross-reference to the aggregate function syntax section, perhaps something like: Aggregate functions, if any are used, are computed across all rows making up each group, producing a separate value for each group (whereas without literalGROUP BY/literal, an aggregate produces a single value computed across all the selected rows). +The set of rows fed to the aggregate function can be further filtered +by attaching a literalFILTER/literal clause to the aggregate +function call, see xref ... for more information. When literalGROUP BY/literal is present, it is not valid for the commandSELECT/command list expressions to refer to ungrouped columns except within aggregate functions or if the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On Fri, Jun 28, 2013 at 09:22:52PM +0100, Dean Rasheed wrote: On 21 June 2013 06:16, David Fetter da...@fetter.org wrote: Please find attached a patch which allows subqueries in the FILTER clause and adds regression testing for same. This needs re-basing/merging following Robert's recent commit to make OVER unreserved. Please find attached. Thanks, Andrew Gierth! In this one, FILTER is no longer a reserved word. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate diff --git a/doc/src/sgml/keywords.sgml b/doc/src/sgml/keywords.sgml index 5e3b33a..ecfde99 100644 --- a/doc/src/sgml/keywords.sgml +++ b/doc/src/sgml/keywords.sgml @@ -1786,7 +1786,7 @@ /row row entrytokenFILTER/token/entry -entry/entry +entrynon-reserved/entry entryreserved/entry entryreserved/entry entry/entry @@ -3200,7 +3200,7 @@ /row row entrytokenOVER/token/entry -entryreserved (can be function or type)/entry +entrynon-reserved/entry entryreserved/entry entryreserved/entry entry/entry diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 68309ba..b289a3a 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -594,10 +594,13 @@ GROUP BY replaceable class=parameterexpression/replaceable [, ...] /para para -Aggregate functions, if any are used, are computed across all rows +In the absence of a literalFILTER/literal clause, +aggregate functions, if any are used, are computed across all rows making up each group, producing a separate value for each group (whereas without literalGROUP BY/literal, an aggregate produces a single value computed across all the selected rows). +When a literalFILTER/literal clause is present, only those +rows matching the FILTER clause are included. When literalGROUP BY/literal is present, it is not valid for the commandSELECT/command list expressions to refer to ungrouped columns except within aggregate functions or if the diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index b139212..c4d5f33 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -1562,24 +1562,26 @@ sqrt(2) syntax of an aggregate expression is one of the following: synopsis -replaceableaggregate_name/replaceable (replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) -replaceableaggregate_name/replaceable (ALL replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) -replaceableaggregate_name/replaceable (DISTINCT replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) -replaceableaggregate_name/replaceable ( * ) +replaceableaggregate_name/replaceable (replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) [ FILTER ( WHERE replaceablefilter_clause/replaceable ) ] +replaceableaggregate_name/replaceable (ALL replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) [ FILTER ( WHERE replaceablefilter_clause/replaceable ) ] +replaceableaggregate_name/replaceable (DISTINCT replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) [ FILTER ( WHERE replaceablefilter_clause/replaceable ) ] +replaceableaggregate_name/replaceable ( * ) [ FILTER ( WHERE replaceablefilter_clause/replaceable ) ] /synopsis where replaceableaggregate_name/replaceable is a previously defined aggregate (possibly qualified with a schema name), -replaceableexpression/replaceable is -any value expression that does not itself contain an aggregate -expression or a window function call, and -replaceableorder_by_clause/replaceable is a optional -literalORDER BY/ clause as described below. +replaceableexpression/replaceable is any value expression that +does not itself contain an aggregate expression or a window +function call, replaceableorder_by_clause/replaceable is a +optional literalORDER BY/ clause as described below. The +replaceableaggregate_name/replaceable can also be suffixed +with literalFILTER/literal as described below. /para para -The first form of aggregate expression invokes the aggregate -once for each input row. +The first form of aggregate expression invokes the aggregate once +for each input row, or when a FILTER clause is present, each row +matching same. The second form is the same as the first, since literalALL/literal is the default. The third form invokes the aggregate once for each distinct value @@ -1607,6 +1609,21 @@ sqrt(2) /para
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On 21 June 2013 06:16, David Fetter da...@fetter.org wrote: Please find attached a patch which allows subqueries in the FILTER clause and adds regression testing for same. This needs re-basing/merging following Robert's recent commit to make OVER unreserved. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On 6/23/13 10:50 PM, Tom Lane wrote: It'd sure be interesting to know what the SQL committee's target parsing algorithm is. It's whatever Oracle and IBM implement. Or maybe they really don't give a damn about breaking applications every time they invent a new reserved word? Well, yes, I think that policy was built into the language at the very beginning. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On 26 June 2013 01:01, Josh Berkus j...@agliodbs.com wrote: I know it's heresy in these parts, but maybe we should consider adopting a non-spec syntax for this feature? In particular, it's really un-obvious why the FILTER clause shouldn't be inside rather than outside the aggregate's parens, like ORDER BY. Well, what other DBMSes support this feature? Will being non-spec introduce migration pain? I can't find any, but that doesn't mean they don't exist, or aren't being worked on. To recap, the options currently on offer are: 1). Make FILTER a new partially reserved keyword, accepting that that might break some users' application code. 2). Make FILTER unreserved, accepting that that will lead to syntax errors rather than more specific error messages if the user tries to use an aggregate/window function with FILTER or OVER in the FROM clause of a query, or as an index expression. 3). Adopt a non-standard syntax for this feature, accepting that that might conflict with other databases, and that we can never then claim to have implemented T612, Advanced OLAP operations. 4). Some other parser hack that will offer a better compromise? My preference is for (2) as the lesser of several evils --- it's a fairly narrow case where the quality of the error message is reduced. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
2013/6/26 Dean Rasheed dean.a.rash...@gmail.com: On 26 June 2013 01:01, Josh Berkus j...@agliodbs.com wrote: I know it's heresy in these parts, but maybe we should consider adopting a non-spec syntax for this feature? In particular, it's really un-obvious why the FILTER clause shouldn't be inside rather than outside the aggregate's parens, like ORDER BY. Well, what other DBMSes support this feature? Will being non-spec introduce migration pain? I can't find any, but that doesn't mean they don't exist, or aren't being worked on. To recap, the options currently on offer are: 1). Make FILTER a new partially reserved keyword, accepting that that might break some users' application code. 2). Make FILTER unreserved, accepting that that will lead to syntax errors rather than more specific error messages if the user tries to use an aggregate/window function with FILTER or OVER in the FROM clause of a query, or as an index expression. 3). Adopt a non-standard syntax for this feature, accepting that that might conflict with other databases, and that we can never then claim to have implemented T612, Advanced OLAP operations. 4). Some other parser hack that will offer a better compromise? My preference is for (2) as the lesser of several evils --- it's a fairly narrow case where the quality of the error message is reduced. @2 looks well Pavel Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On 24 June 2013 03:50, Tom Lane t...@sss.pgh.pa.us wrote: David Fetter da...@fetter.org writes: On Sun, Jun 23, 2013 at 07:44:26AM -0700, Kevin Grittner wrote: I think it is OK if that gets a syntax error. If that's the worst case I like this approach. I think reducing the usefulness of error messages is something we need to think extremely hard about before we do. Is there maybe a way to keep the error messages even if by some magic we manage to unreserve the words? The flip side is that the error message you get if you don't realise a word is now reserved is not exactly useful: Syntax error at or near xxx. I don't know of any way to improve that though. Of the alternatives discussed so far, I don't really like anything better than adding another special case to base_yylex(). Robert opined in the other thread about RESPECT NULLS that the potential failure cases with that approach are harder to reason about, which is true, but that doesn't mean that we should accept failures we know of because there might be failures we don't know of. One thing that struck me while thinking about this is that it seems like we've been going about it wrong in base_yylex() in any case. For example, because we fold WITH followed by TIME into a special token WITH_TIME, we will fail on a statement beginning WITH time AS ... even though time is a legal ColId. But suppose that instead of merging the two tokens into one, we just changed the first token into something special; that is, base_yylex would return a special token WITH_FOLLOWED_BY_TIME and then TIME. We could then fix the above problem by allowing either WITH or WITH_FOLLOWED_BY_TIME as the leading keyword of a statement; and similarly for the few other places where WITH can be followed by an arbitrary identifier. Going on the same principle, we could probably let FILTER be an unreserved keyword while FILTER_FOLLOWED_BY_PAREN could be a type_func_name_keyword. (I've not tried this though.) I've not tried either, but wouldn't that mean that SELECT * FROM list_filters() filter would be legal, whereas SELECT * FROM list_filters() filter(id, val) would be a syntax error? If so, I don't think that would be an improvement. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
Dean Rasheed dean.a.rash...@gmail.com writes: On 24 June 2013 03:50, Tom Lane t...@sss.pgh.pa.us wrote: Going on the same principle, we could probably let FILTER be an unreserved keyword while FILTER_FOLLOWED_BY_PAREN could be a type_func_name_keyword. (I've not tried this though.) I've not tried either, but wouldn't that mean that SELECT * FROM list_filters() filter would be legal, whereas SELECT * FROM list_filters() filter(id, val) would be a syntax error? If so, I don't think that would be an improvement. Hm, good point. The SQL committee really managed to choose some unfortunate syntax here, didn't they. I know it's heresy in these parts, but maybe we should consider adopting a non-spec syntax for this feature? In particular, it's really un-obvious why the FILTER clause shouldn't be inside rather than outside the aggregate's parens, like ORDER BY. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
2013/6/25 Tom Lane t...@sss.pgh.pa.us: Dean Rasheed dean.a.rash...@gmail.com writes: On 24 June 2013 03:50, Tom Lane t...@sss.pgh.pa.us wrote: Going on the same principle, we could probably let FILTER be an unreserved keyword while FILTER_FOLLOWED_BY_PAREN could be a type_func_name_keyword. (I've not tried this though.) I've not tried either, but wouldn't that mean that SELECT * FROM list_filters() filter would be legal, whereas SELECT * FROM list_filters() filter(id, val) would be a syntax error? If so, I don't think that would be an improvement. Hm, good point. The SQL committee really managed to choose some unfortunate syntax here, didn't they. I know it's heresy in these parts, but maybe we should consider adopting a non-spec syntax for this feature? In particular, it's really un-obvious why the FILTER clause shouldn't be inside rather than outside the aggregate's parens, like ORDER BY. the gram can be more free and final decision should be done in later stages ??? This technique was enough when I wrote prototype for CUBE ROLLUP without CUBE ROLLUP reserwed keywords. Regards Pavel regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On Sun, Jun 23, 2013 at 10:50 PM, Tom Lane t...@sss.pgh.pa.us wrote: David Fetter da...@fetter.org writes: On Sun, Jun 23, 2013 at 07:44:26AM -0700, Kevin Grittner wrote: I think it is OK if that gets a syntax error. If that's the worst case I like this approach. I think reducing the usefulness of error messages is something we need to think extremely hard about before we do. Is there maybe a way to keep the error messages even if by some magic we manage to unreserve the words? Of the alternatives discussed so far, I don't really like anything better than adding another special case to base_yylex(). Robert opined in the other thread about RESPECT NULLS that the potential failure cases with that approach are harder to reason about, which is true, but that doesn't mean that we should accept failures we know of because there might be failures we don't know of. Sure, that's true; but the proposal on the other thread is just to disallow invalid syntax early enough that it benefits the parser. The error message is different, but I don't think it's a BAD error message. One thing that struck me while thinking about this is that it seems like we've been going about it wrong in base_yylex() in any case. For example, because we fold WITH followed by TIME into a special token WITH_TIME, we will fail on a statement beginning WITH time AS ... even though time is a legal ColId. But suppose that instead of merging the two tokens into one, we just changed the first token into something special; that is, base_yylex would return a special token WITH_FOLLOWED_BY_TIME and then TIME. We could then fix the above problem by allowing either WITH or WITH_FOLLOWED_BY_TIME as the leading keyword of a statement; and similarly for the few other places where WITH can be followed by an arbitrary identifier. Going on the same principle, we could probably let FILTER be an unreserved keyword while FILTER_FOLLOWED_BY_PAREN could be a type_func_name_keyword. (I've not tried this though.) I think this whole direction is going to collapse under its own weight VERY quickly. The problems you're describing are essentially shift/reduce conflicts that are invisible because they're hidden behind lexer magic. Part of the value of using a parser generator is that it TELLS you when you've added ambiguous syntax. But it doesn't know about lexer hacks, so stuff will just silently break. I think this type of lexer hacks works reasonably well keyword-like things that are used in just one place in the grammar. As soon as you get up to two, the wheels come off - as with RESPECT NULLS vs. NULLS FIRST. This idea doesn't help much for OVER because one of the alternatives for over_clause is OVER ColId, and I doubt we want to have base_yylex know all the alternatives for ColId. I also had no great success with the NULLS FIRST/LAST case: AFAICT the substitute token for NULLS still has to be fully reserved, meaning that something like select nulls last still doesn't work without quoting. We could maybe fix that with enough denormalization of the index_elem productions, but it'd be ugly. I don't think that particular example is very compelling - there's a general rule that column aliases can't be keywords of any type. That's not wonderful, and EnterpriseDB has had bug reports filed about it, but the real-world impact is pretty minimal, certainly compared to what we used to do which is not allow column aliases AT ALL. It'd sure be interesting to know what the SQL committee's target parsing algorithm is. I find it hard to believe they're uninformed enough to not know that these random syntaxes they keep inventing are hard to deal with in LALR(1). Or maybe they really don't give a damn about breaking applications every time they invent a new reserved word? Does the SQL committee contemplate that SELECT * FROM somefunc() filter (id, val) should act as a table alias and that SELECT * FROM somefunc() filter (where x 1) is an aggregate filter? This all gets much easier to understand if one of those constructs isn't allowed in that particular context. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
I know it's heresy in these parts, but maybe we should consider adopting a non-spec syntax for this feature? In particular, it's really un-obvious why the FILTER clause shouldn't be inside rather than outside the aggregate's parens, like ORDER BY. Well, what other DBMSes support this feature? Will being non-spec introduce migration pain? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On 21 June 2013 10:02, Dean Rasheed dean.a.rash...@gmail.com wrote: On 21 June 2013 06:16, David Fetter da...@fetter.org wrote: On Fri, Jun 21, 2013 at 12:10:25AM -0400, Alvaro Herrera wrote: David Fetter escribió: On Thu, Jun 20, 2013 at 08:59:27PM +0100, Dean Rasheed wrote: In my testing of sub-queries in the FILTER clause (an extension to the spec), I was able to produce the following error: Per the spec, B) A filter clause shall not contain a query expression, a window function, or an outer reference. If this is not allowed, I think there should be a clearer error message. What Dean showed is an internal (not user-facing) error message. Please find attached a patch which allows subqueries in the FILTER clause and adds regression testing for same. Nice! I should have pointed out that it was already working for some sub-queries, just not all. It's good to see that it was basically just a one-line fix, because I think it would have been a shame to not support sub-queries. I hope to take another look at it over the weekend. I'm still not happy that this patch is making FILTER a new reserved keyword, because I think it is a common enough English word (and an obscure enough SQL keyword) that people may well have used it for table names or aliases, and so their code will break. There is another thread discussing a similar issue with lead/lag ignore nulls: http://www.postgresql.org/message-id/CAOdE5WQnfR737OkxNXuLWnwpL7=OUssC-63ijP2=2rnrtw8...@mail.gmail.com Playing around with the idea proposed there, it seems that it is possible to make FILTER (and also OVER) unreserved keywords, by splitting out the production of aggregate/window functions from normal functions in gram.y. Aggregate/window functions are not allowed in index_elem or func_table constructs, but they may appear in c_expr's. That resolves the shift/reduce conflicts that would otherwise occur from subsequent alias clauses, allowing FILTER and OVER to be unreserved. There is a drawback --- certain error messages become less specific, for example: SELECT * FROM rank() OVER(ORDER BY random()); is now a syntax error, rather than the more useful error saying that window functions aren't allowed in the FROM clause. That doesn't seem like such a common case though. What do you think? Regards, Dean make-filter-unreserved.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
Dean Rasheed dean.a.rash...@gmail.com wrote: I'm still not happy that this patch is making FILTER a new reserved keyword, because I think it is a common enough English word (and an obscure enough SQL keyword) that people may well have used it for table names or aliases, and so their code will break. Well, if it is a useful capability with a standard syntax, I think we should go with the standard syntax even if it might break application code that uses, as unquoted identifiers, words reserved by the SQL standard. Of course, non-reserved is better than reserved if we can manage it. Playing around with the idea proposed there, it seems that it is possible to make FILTER (and also OVER) unreserved keywords, by splitting out the production of aggregate/window functions from normal functions in gram.y. Aggregate/window functions are not allowed in index_elem or func_table constructs, but they may appear in c_expr's. That resolves the shift/reduce conflicts that would otherwise occur from subsequent alias clauses, allowing FILTER and OVER to be unreserved. There is a drawback --- certain error messages become less specific, for example: SELECT * FROM rank() OVER(ORDER BY random()); is now a syntax error, rather than the more useful error saying that window functions aren't allowed in the FROM clause. That doesn't seem like such a common case though. What do you think? I think it is OK if that gets a syntax error. If that's the worst case I like this approach. This also helps keep down the size of the generated parse tables, doesn't it? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On Sun, Jun 23, 2013 at 07:44:26AM -0700, Kevin Grittner wrote: Dean Rasheed dean.a.rash...@gmail.com wrote: I'm still not happy that this patch is making FILTER a new reserved keyword, because I think it is a common enough English word (and an obscure enough SQL keyword) that people may well have used it for table names or aliases, and so their code will break. Well, if it is a useful capability with a standard syntax, I think we should go with the standard syntax even if it might break application code that uses, as unquoted identifiers, words reserved by the SQL standard. Of course, non-reserved is better than reserved if we can manage it. I'm not entirely sure that I agree with this. The SQL standard doesn't go adding keywords willy-nilly, or at least hasn't over a fairly long stretch of time. I can get precise numbers on this if needed. So far, only Tom and Greg have weighed in, Greg's response being here: http://www.postgresql.org/message-id/CAM-w4HOd3N_ozMygs==lm5+hu8yqkkayutgjinp6z2hazdr...@mail.gmail.com Playing around with the idea proposed there, it seems that it is possible to make FILTER (and also OVER) unreserved keywords, by splitting out the production of aggregate/window functions from normal functions in gram.y. Aggregate/window functions are not allowed in index_elem or func_table constructs, but they may appear in c_expr's. That resolves the shift/reduce conflicts that would otherwise occur from subsequent alias clauses, allowing FILTER and OVER to be unreserved. There is a drawback --- certain error messages become less specific, for example: SELECT * FROM rank() OVER(ORDER BY random()); is now a syntax error, rather than the more useful error saying that window functions aren't allowed in the FROM clause. That doesn't seem like such a common case though. What do you think? I think it is OK if that gets a syntax error. If that's the worst case I like this approach. I think reducing the usefulness of error messages is something we need to think extremely hard about before we do. Is there maybe a way to keep the error messages even if by some magic we manage to unreserve the words? This also helps keep down the size of the generated parse tables, doesn't it? Could well. I suspect we may need to rethink the whole way we do grammar at some point, but that's for a later discussion when I (or someone else) has something choate to say about it. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
David Fetter da...@fetter.org writes: On Sun, Jun 23, 2013 at 07:44:26AM -0700, Kevin Grittner wrote: I think it is OK if that gets a syntax error. If that's the worst case I like this approach. I think reducing the usefulness of error messages is something we need to think extremely hard about before we do. Is there maybe a way to keep the error messages even if by some magic we manage to unreserve the words? Of the alternatives discussed so far, I don't really like anything better than adding another special case to base_yylex(). Robert opined in the other thread about RESPECT NULLS that the potential failure cases with that approach are harder to reason about, which is true, but that doesn't mean that we should accept failures we know of because there might be failures we don't know of. One thing that struck me while thinking about this is that it seems like we've been going about it wrong in base_yylex() in any case. For example, because we fold WITH followed by TIME into a special token WITH_TIME, we will fail on a statement beginning WITH time AS ... even though time is a legal ColId. But suppose that instead of merging the two tokens into one, we just changed the first token into something special; that is, base_yylex would return a special token WITH_FOLLOWED_BY_TIME and then TIME. We could then fix the above problem by allowing either WITH or WITH_FOLLOWED_BY_TIME as the leading keyword of a statement; and similarly for the few other places where WITH can be followed by an arbitrary identifier. Going on the same principle, we could probably let FILTER be an unreserved keyword while FILTER_FOLLOWED_BY_PAREN could be a type_func_name_keyword. (I've not tried this though.) This idea doesn't help much for OVER because one of the alternatives for over_clause is OVER ColId, and I doubt we want to have base_yylex know all the alternatives for ColId. I also had no great success with the NULLS FIRST/LAST case: AFAICT the substitute token for NULLS still has to be fully reserved, meaning that something like select nulls last still doesn't work without quoting. We could maybe fix that with enough denormalization of the index_elem productions, but it'd be ugly. Could well. I suspect we may need to rethink the whole way we do grammar at some point, but that's for a later discussion when I (or someone else) has something choate to say about it. It'd sure be interesting to know what the SQL committee's target parsing algorithm is. I find it hard to believe they're uninformed enough to not know that these random syntaxes they keep inventing are hard to deal with in LALR(1). Or maybe they really don't give a damn about breaking applications every time they invent a new reserved word? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On 21 June 2013 05:01, David Fetter da...@fetter.org wrote: What tests do you think should be there that aren't? I think I expected to see more tests related to some of the specific code changes, such as CREATE TABLE t AS SELECT * FROM generate_series(1,10) t(x); -- Should fail (filter can't be used for non-aggregates) SELECT abs(x) FILTER (WHERE x 5) FROM t; -- Should fail (filter clause can't contain aggregates) SELECT array_agg(x) FILTER (WHERE x AVG(x)) t; -- OK (aggregate in filter sub-query) SELECT array_agg(x) FILTER (WHERE x (SELECT AVG(x) FROM t)) FROM t; -- OK (trivial sub-query) SELECT array_agg(x) FILTER (WHERE (SELECT x 5)) FROM t; -- OK SELECT array_agg(x) FILTER (WHERE (SELECT x (SELECT AVG(x) FROM t))) FROM t; -- OK SELECT array_agg(x) FILTER (WHERE (SELECT (SELECT t.x AVG(t2.x) FROM t as t2))) FROM t; -- Should fail SELECT array_agg(x) FILTER (WHERE (SELECT (SELECT 5 AVG(t.x) FROM t as t2))) FROM t; -- OK (filtered aggregate in window context) SELECT x, AVG(x) OVER(ORDER BY x), AVG(x) FILTER (WHERE x 5) OVER(ORDER BY x) FROM t; -- Should fail (non-aggregate window function with filter) SELECT x, rank() OVER(ORDER BY x), rank() FILTER (WHERE x 5) OVER(ORDER BY x) FROM t; -- View definition test CREATE VIEW v AS SELECT array_agg(x) FILTER (WHERE (SELECT (SELECT t.x AVG(t2.x) FROM t as t2))) FROM t; \d+ v etc... You could probably dream up better examples --- I just felt that the current tests don't give much coverage of the code changes. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On 21 June 2013 06:16, David Fetter da...@fetter.org wrote: On Fri, Jun 21, 2013 at 12:10:25AM -0400, Alvaro Herrera wrote: David Fetter escribió: On Thu, Jun 20, 2013 at 08:59:27PM +0100, Dean Rasheed wrote: In my testing of sub-queries in the FILTER clause (an extension to the spec), I was able to produce the following error: Per the spec, B) A filter clause shall not contain a query expression, a window function, or an outer reference. If this is not allowed, I think there should be a clearer error message. What Dean showed is an internal (not user-facing) error message. Please find attached a patch which allows subqueries in the FILTER clause and adds regression testing for same. Nice! I should have pointed out that it was already working for some sub-queries, just not all. It's good to see that it was basically just a one-line fix, because I think it would have been a shame to not support sub-queries. I hope to take another look at it over the weekend. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On 17 June 2013 06:36, David Fetter da...@fetter.org wrote: Please find attached two versions of a patch which provides optional FILTER clause for aggregates (T612, Advanced OLAP operations). The first is intended to be applied on top of the previous patch, the second without it. The first is, I believe, clearer in what it's doing. Rather than simply mechanically visiting every place a function call might be constructed, it visits a central one to change the default, then goes only to the places where it's relevant. The patches are both early WIP as they contain no docs or regression tests yet. Docs and regression tests added, makeFuncArgs approached dropped for now, will re-visit later. Regression tests added to reflect bug fixes in COLLATE. Rebased vs. master. Hi, I've been looking at this patch, which adds support for the SQL standard feature of applying a filter to rows used in an aggregate. The syntax matches the spec: agg_fn ( args [ ORDER BY sort_clause ] ) [ FILTER ( WHERE qual ) ] and this patch makes FILTER a new reserved keyword, usable as a function or type, but not in other contexts, e.g., as a table name or alias. I'm not sure if that might be a problem for some people, but I can't see any way round it, since otherwise there would be no way for the parser to distinguish a filter clause from an alias expression. The feature appears to work per-spec, and the code and doc changes look reasonable. However, it looks a little light on regression tests, and so I think some necessary changes have been overlooked. In my testing of sub-queries in the FILTER clause (an extension to the spec), I was able to produce the following error: CREATE TABLE t1(a1 int); CREATE TABLE t2(a2 int); INSERT INTO t1 SELECT * FROM generate_series(1,10); INSERT INTO t2 SELECT * FROM generate_series(3,6); SELECT array_agg(a1) FILTER (WHERE a1 IN (SELECT a2 FROM t2)) FROM t1; ERROR: plan should not reference subplan's variable which looks to be related to subselect.c's handling of sub-queries in aggregates. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On Thu, Jun 20, 2013 at 08:59:27PM +0100, Dean Rasheed wrote: On 17 June 2013 06:36, David Fetter da...@fetter.org wrote: Please find attached two versions of a patch which provides optional FILTER clause for aggregates (T612, Advanced OLAP operations). The first is intended to be applied on top of the previous patch, the second without it. The first is, I believe, clearer in what it's doing. Rather than simply mechanically visiting every place a function call might be constructed, it visits a central one to change the default, then goes only to the places where it's relevant. The patches are both early WIP as they contain no docs or regression tests yet. Docs and regression tests added, makeFuncArgs approached dropped for now, will re-visit later. Regression tests added to reflect bug fixes in COLLATE. Rebased vs. master. Hi, I've been looking at this patch, which adds support for the SQL standard feature of applying a filter to rows used in an aggregate. The syntax matches the spec: agg_fn ( args [ ORDER BY sort_clause ] ) [ FILTER ( WHERE qual ) ] and this patch makes FILTER a new reserved keyword, usable as a function or type, but not in other contexts, e.g., as a table name or alias. I'm not sure if that might be a problem for some people, but I can't see any way round it, since otherwise there would be no way for the parser to distinguish a filter clause from an alias expression. The feature appears to work per-spec, and the code and doc changes look reasonable. However, it looks a little light on regression tests, What tests do you think should be there that aren't? and so I think some necessary changes have been overlooked. In my testing of sub-queries in the FILTER clause (an extension to the spec), I was able to produce the following error: Per the spec, B) A filter clause shall not contain a query expression, a window function, or an outer reference. I'd be happy to see about adding one despite this, though. That they didn't figure out how doesn't mean we shouldn't eventually, ideally in time for 9.4. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
David Fetter escribió: On Thu, Jun 20, 2013 at 08:59:27PM +0100, Dean Rasheed wrote: In my testing of sub-queries in the FILTER clause (an extension to the spec), I was able to produce the following error: Per the spec, B) A filter clause shall not contain a query expression, a window function, or an outer reference. If this is not allowed, I think there should be a clearer error message. What Dean showed is an internal (not user-facing) error message. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On Fri, Jun 21, 2013 at 12:10:25AM -0400, Alvaro Herrera wrote: David Fetter escribió: On Thu, Jun 20, 2013 at 08:59:27PM +0100, Dean Rasheed wrote: In my testing of sub-queries in the FILTER clause (an extension to the spec), I was able to produce the following error: Per the spec, B) A filter clause shall not contain a query expression, a window function, or an outer reference. If this is not allowed, I think there should be a clearer error message. What Dean showed is an internal (not user-facing) error message. Folding to lower isn't allowed by the spec either, and then there's the matter of arrays... Before going to lots of trouble to figure out how to throw an error that says, only the spec and no further, I'd like to investigate how to make this work. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On Fri, Jun 21, 2013 at 12:10:25AM -0400, Alvaro Herrera wrote: David Fetter escribió: On Thu, Jun 20, 2013 at 08:59:27PM +0100, Dean Rasheed wrote: In my testing of sub-queries in the FILTER clause (an extension to the spec), I was able to produce the following error: Per the spec, B) A filter clause shall not contain a query expression, a window function, or an outer reference. If this is not allowed, I think there should be a clearer error message. What Dean showed is an internal (not user-facing) error message. Please find attached a patch which allows subqueries in the FILTER clause and adds regression testing for same. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 68309ba..b289a3a 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -594,10 +594,13 @@ GROUP BY replaceable class=parameterexpression/replaceable [, ...] /para para -Aggregate functions, if any are used, are computed across all rows +In the absence of a literalFILTER/literal clause, +aggregate functions, if any are used, are computed across all rows making up each group, producing a separate value for each group (whereas without literalGROUP BY/literal, an aggregate produces a single value computed across all the selected rows). +When a literalFILTER/literal clause is present, only those +rows matching the FILTER clause are included. When literalGROUP BY/literal is present, it is not valid for the commandSELECT/command list expressions to refer to ungrouped columns except within aggregate functions or if the diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index b139212..c4d5f33 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -1562,24 +1562,26 @@ sqrt(2) syntax of an aggregate expression is one of the following: synopsis -replaceableaggregate_name/replaceable (replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) -replaceableaggregate_name/replaceable (ALL replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) -replaceableaggregate_name/replaceable (DISTINCT replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) -replaceableaggregate_name/replaceable ( * ) +replaceableaggregate_name/replaceable (replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) [ FILTER ( WHERE replaceablefilter_clause/replaceable ) ] +replaceableaggregate_name/replaceable (ALL replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) [ FILTER ( WHERE replaceablefilter_clause/replaceable ) ] +replaceableaggregate_name/replaceable (DISTINCT replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) [ FILTER ( WHERE replaceablefilter_clause/replaceable ) ] +replaceableaggregate_name/replaceable ( * ) [ FILTER ( WHERE replaceablefilter_clause/replaceable ) ] /synopsis where replaceableaggregate_name/replaceable is a previously defined aggregate (possibly qualified with a schema name), -replaceableexpression/replaceable is -any value expression that does not itself contain an aggregate -expression or a window function call, and -replaceableorder_by_clause/replaceable is a optional -literalORDER BY/ clause as described below. +replaceableexpression/replaceable is any value expression that +does not itself contain an aggregate expression or a window +function call, replaceableorder_by_clause/replaceable is a +optional literalORDER BY/ clause as described below. The +replaceableaggregate_name/replaceable can also be suffixed +with literalFILTER/literal as described below. /para para -The first form of aggregate expression invokes the aggregate -once for each input row. +The first form of aggregate expression invokes the aggregate once +for each input row, or when a FILTER clause is present, each row +matching same. The second form is the same as the first, since literalALL/literal is the default. The third form invokes the aggregate once for each distinct value @@ -1607,6 +1609,21 @@ sqrt(2) /para para +Adding a FILTER clause to an aggregate specifies which values of +the expression being aggregated to evaluate. For example: +programlisting +SELECT +count(*) AS unfiltered, +count(*) FILTER (WHERE i 5) AS filtered +FROM generate_series(1,10) AS s(i); + unfiltered | filtered ++-- + 10
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On Sun, Apr 28, 2013 at 01:29:41PM -0700, David Fetter wrote: On Tue, Feb 26, 2013 at 01:09:30PM -0800, David Fetter wrote: On Wed, Feb 13, 2013 at 06:45:31AM -0800, David Fetter wrote: On Sat, Feb 09, 2013 at 11:59:22PM -0800, David Fetter wrote: Folks, Per suggestions and lots of help from Andrew Gierth, please find attached a patch to clean up the call sites for FuncCall nodes, which I'd like to expand centrally rather than in each of the 37 (or 38, but I only redid 37) places where it's called. The remaining one is in src/backend/nodes/copyfuncs.c, which has to be modified for any changes in the that struct anyhow. The immediate purpose is two-fold: to reduce some redundancies, which I believe is worth doing in and of itself, and to prepare for adding FILTER on aggregates from the spec, and possibly other things in the aggregate function part of the spec. Cheers, David. Folks, Please find attached two versions of a patch which provides optional FILTER clause for aggregates (T612, Advanced OLAP operations). The first is intended to be applied on top of the previous patch, the second without it. The first is, I believe, clearer in what it's doing. Rather than simply mechanically visiting every place a function call might be constructed, it visits a central one to change the default, then goes only to the places where it's relevant. The patches are both early WIP as they contain no docs or regression tests yet. Docs and regression tests added, makeFuncArgs approached dropped for now, will re-visit later. Regression tests added to reflect bug fixes in COLLATE. Rebased vs. master. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate *** a/doc/src/sgml/ref/select.sgml --- b/doc/src/sgml/ref/select.sgml *** *** 594,603 GROUP BY replaceable class=parameterexpression/replaceable [, ...] /para para ! Aggregate functions, if any are used, are computed across all rows making up each group, producing a separate value for each group (whereas without literalGROUP BY/literal, an aggregate produces a single value computed across all the selected rows). When literalGROUP BY/literal is present, it is not valid for the commandSELECT/command list expressions to refer to ungrouped columns except within aggregate functions or if the --- 594,606 /para para ! In the absence of a literalFILTER/literal clause, ! aggregate functions, if any are used, are computed across all rows making up each group, producing a separate value for each group (whereas without literalGROUP BY/literal, an aggregate produces a single value computed across all the selected rows). + When a literalFILTER/literal clause is present, only those + rows matching the FILTER clause are included. When literalGROUP BY/literal is present, it is not valid for the commandSELECT/command list expressions to refer to ungrouped columns except within aggregate functions or if the *** a/doc/src/sgml/syntax.sgml --- b/doc/src/sgml/syntax.sgml *** *** 1562,1585 sqrt(2) syntax of an aggregate expression is one of the following: synopsis ! replaceableaggregate_name/replaceable (replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) ! replaceableaggregate_name/replaceable (ALL replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) ! replaceableaggregate_name/replaceable (DISTINCT replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) ! replaceableaggregate_name/replaceable ( * ) /synopsis where replaceableaggregate_name/replaceable is a previously defined aggregate (possibly qualified with a schema name), ! replaceableexpression/replaceable is ! any value expression that does not itself contain an aggregate ! expression or a window function call, and ! replaceableorder_by_clause/replaceable is a optional ! literalORDER BY/ clause as described below. /para para ! The first form of aggregate expression invokes the aggregate ! once for each input row. The second form is the same as the first, since literalALL/literal is the default. The third form invokes the aggregate once for each distinct value --- 1562,1587 syntax of an aggregate expression is one of the following: synopsis ! replaceableaggregate_name/replaceable (replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) [
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On Tue, Feb 26, 2013 at 01:09:30PM -0800, David Fetter wrote: On Wed, Feb 13, 2013 at 06:45:31AM -0800, David Fetter wrote: On Sat, Feb 09, 2013 at 11:59:22PM -0800, David Fetter wrote: Folks, Per suggestions and lots of help from Andrew Gierth, please find attached a patch to clean up the call sites for FuncCall nodes, which I'd like to expand centrally rather than in each of the 37 (or 38, but I only redid 37) places where it's called. The remaining one is in src/backend/nodes/copyfuncs.c, which has to be modified for any changes in the that struct anyhow. The immediate purpose is two-fold: to reduce some redundancies, which I believe is worth doing in and of itself, and to prepare for adding FILTER on aggregates from the spec, and possibly other things in the aggregate function part of the spec. Cheers, David. Folks, Please find attached two versions of a patch which provides optional FILTER clause for aggregates (T612, Advanced OLAP operations). The first is intended to be applied on top of the previous patch, the second without it. The first is, I believe, clearer in what it's doing. Rather than simply mechanically visiting every place a function call might be constructed, it visits a central one to change the default, then goes only to the places where it's relevant. The patches are both early WIP as they contain no docs or regression tests yet. Docs and regression tests added, makeFuncArgs approached dropped for now, will re-visit later. Regression tests added to reflect bug fixes in COLLATE. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate *** a/doc/src/sgml/ref/select.sgml --- b/doc/src/sgml/ref/select.sgml *** *** 595,604 GROUP BY replaceable class=parameterexpression/replaceable [, ...] /para para ! Aggregate functions, if any are used, are computed across all rows making up each group, producing a separate value for each group (whereas without literalGROUP BY/literal, an aggregate produces a single value computed across all the selected rows). When literalGROUP BY/literal is present, it is not valid for the commandSELECT/command list expressions to refer to ungrouped columns except within aggregate functions or if the --- 595,607 /para para ! In the absence of a literalFILTER/literal clause, ! aggregate functions, if any are used, are computed across all rows making up each group, producing a separate value for each group (whereas without literalGROUP BY/literal, an aggregate produces a single value computed across all the selected rows). + When a literalFILTER/literal clause is present, only those + rows matching the FILTER clause are included. When literalGROUP BY/literal is present, it is not valid for the commandSELECT/command list expressions to refer to ungrouped columns except within aggregate functions or if the *** a/doc/src/sgml/syntax.sgml --- b/doc/src/sgml/syntax.sgml *** *** 1562,1585 sqrt(2) syntax of an aggregate expression is one of the following: synopsis ! replaceableaggregate_name/replaceable (replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) ! replaceableaggregate_name/replaceable (ALL replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) ! replaceableaggregate_name/replaceable (DISTINCT replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) ! replaceableaggregate_name/replaceable ( * ) /synopsis where replaceableaggregate_name/replaceable is a previously defined aggregate (possibly qualified with a schema name), ! replaceableexpression/replaceable is ! any value expression that does not itself contain an aggregate ! expression or a window function call, and ! replaceableorder_by_clause/replaceable is a optional ! literalORDER BY/ clause as described below. /para para ! The first form of aggregate expression invokes the aggregate ! once for each input row. The second form is the same as the first, since literalALL/literal is the default. The third form invokes the aggregate once for each distinct value --- 1562,1587 syntax of an aggregate expression is one of the following: synopsis ! replaceableaggregate_name/replaceable (replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) [ FILTER ( WHERE replaceablefilter_clause/replaceable ) ] ! replaceableaggregate_name/replaceable (ALL
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On Wed, Feb 13, 2013 at 06:45:31AM -0800, David Fetter wrote: On Sat, Feb 09, 2013 at 11:59:22PM -0800, David Fetter wrote: Folks, Per suggestions and lots of help from Andrew Gierth, please find attached a patch to clean up the call sites for FuncCall nodes, which I'd like to expand centrally rather than in each of the 37 (or 38, but I only redid 37) places where it's called. The remaining one is in src/backend/nodes/copyfuncs.c, which has to be modified for any changes in the that struct anyhow. The immediate purpose is two-fold: to reduce some redundancies, which I believe is worth doing in and of itself, and to prepare for adding FILTER on aggregates from the spec, and possibly other things in the aggregate function part of the spec. Cheers, David. Folks, Please find attached two versions of a patch which provides optional FILTER clause for aggregates (T612, Advanced OLAP operations). The first is intended to be applied on top of the previous patch, the second without it. The first is, I believe, clearer in what it's doing. Rather than simply mechanically visiting every place a function call might be constructed, it visits a central one to change the default, then goes only to the places where it's relevant. The patches are both early WIP as they contain no docs or regression tests yet. Docs and regression tests added, makeFuncArgs approached dropped for now, will re-visit later. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate *** a/doc/src/sgml/ref/select.sgml --- b/doc/src/sgml/ref/select.sgml *** *** 595,604 GROUP BY replaceable class=parameterexpression/replaceable [, ...] /para para ! Aggregate functions, if any are used, are computed across all rows making up each group, producing a separate value for each group (whereas without literalGROUP BY/literal, an aggregate produces a single value computed across all the selected rows). When literalGROUP BY/literal is present, it is not valid for the commandSELECT/command list expressions to refer to ungrouped columns except within aggregate functions or if the --- 595,607 /para para ! In the absence of a literalFILTER/literal clause, ! aggregate functions, if any are used, are computed across all rows making up each group, producing a separate value for each group (whereas without literalGROUP BY/literal, an aggregate produces a single value computed across all the selected rows). + When a literalFILTER/literal clause is present, only those + rows matching the FILTER clause are included. When literalGROUP BY/literal is present, it is not valid for the commandSELECT/command list expressions to refer to ungrouped columns except within aggregate functions or if the *** a/doc/src/sgml/syntax.sgml --- b/doc/src/sgml/syntax.sgml *** *** 1562,1585 sqrt(2) syntax of an aggregate expression is one of the following: synopsis ! replaceableaggregate_name/replaceable (replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) ! replaceableaggregate_name/replaceable (ALL replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) ! replaceableaggregate_name/replaceable (DISTINCT replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) ! replaceableaggregate_name/replaceable ( * ) /synopsis where replaceableaggregate_name/replaceable is a previously defined aggregate (possibly qualified with a schema name), ! replaceableexpression/replaceable is ! any value expression that does not itself contain an aggregate ! expression or a window function call, and ! replaceableorder_by_clause/replaceable is a optional ! literalORDER BY/ clause as described below. /para para ! The first form of aggregate expression invokes the aggregate ! once for each input row. The second form is the same as the first, since literalALL/literal is the default. The third form invokes the aggregate once for each distinct value --- 1562,1587 syntax of an aggregate expression is one of the following: synopsis ! replaceableaggregate_name/replaceable (replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) [ FILTER ( WHERE replaceablefilter_clause/replaceable ) ] ! replaceableaggregate_name/replaceable (ALL replaceableexpression/replaceable [ , ... ] [ replaceableorder_by_clause/replaceable ] ) [ FILTER ( WHERE replaceablefilter_clause/replaceable ) ] !
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On Wed, Feb 13, 2013 at 06:45:31AM -0800, David Fetter wrote: On Sat, Feb 09, 2013 at 11:59:22PM -0800, David Fetter wrote: Folks, Per suggestions and lots of help from Andrew Gierth, please find attached a patch to clean up the call sites for FuncCall nodes, which I'd like to expand centrally rather than in each of the 37 (or 38, but I only redid 37) places where it's called. The remaining one is in src/backend/nodes/copyfuncs.c, which has to be modified for any changes in the that struct anyhow. The immediate purpose is two-fold: to reduce some redundancies, which I believe is worth doing in and of itself, and to prepare for adding FILTER on aggregates from the spec, and possibly other things in the aggregate function part of the spec. Cheers, David. Folks, Please find attached two versions of a patch which provides optional FILTER clause for aggregates (T612, Advanced OLAP operations). The first is intended to be applied on top of the previous patch, the second without it. I'll find a brown paper back to wear over my head at some point, but meanwhile, here's a cleaned-up version of the patch that doesn't use makeFuncArgs, now without merge artifacts and with the ability to actually compile. It's still WIP in the sense previously mentioned. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate *** a/src/backend/executor/execQual.c --- b/src/backend/executor/execQual.c *** *** 4395,4400 ExecInitExpr(Expr *node, PlanState *parent) --- 4395,4401 astate-args = (List *) ExecInitExpr((Expr *) aggref-args, parent); + astate-agg_filter = ExecInitExpr(aggref-agg_filter, parent); /* * Complain if the aggregate's arguments contain any *** *** 4433,4438 ExecInitExpr(Expr *node, PlanState *parent) --- 4434,4440 wfstate-args = (List *) ExecInitExpr((Expr *) wfunc-args, parent); + wfstate-agg_filter = ExecInitExpr(wfunc-agg_filter, parent); /* * Complain if the windowfunc's arguments contain any *** a/src/backend/executor/functions.c --- b/src/backend/executor/functions.c *** *** 364,370 sql_fn_post_column_ref(ParseState *pstate, ColumnRef *cref, Node *var) list_make1(subfield), list_make1(param), NIL, false, false, false, ! NULL, true, cref-location); } return param; --- 364,370 list_make1(subfield), list_make1(param), NIL, false, false, false, ! NULL, NULL, true, cref-location); } return param; *** a/src/backend/executor/nodeAgg.c --- b/src/backend/executor/nodeAgg.c *** *** 487,492 advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup) --- 487,504 int i; TupleTableSlot *slot; + /* Skip anything FILTERed out */ + ExprState *filter = peraggstate-aggrefstate-agg_filter; + if (filter) + { + MemoryContext oldcontext = MemoryContextSwitchTo(aggstate-tmpcontext-ecxt_per_tuple_memory); + bool isnull; + Datum res = ExecEvalExpr(filter, aggstate-tmpcontext, isnull, NULL); + MemoryContextSwitchTo(oldcontext); + if (isnull || !DatumGetBool(res)) + continue; + } + /* Evaluate the current input expressions for this aggregate */ slot = ExecProject(peraggstate-evalproj, NULL); *** a/src/backend/executor/nodeWindowAgg.c --- b/src/backend/executor/nodeWindowAgg.c *** *** 226,234