Re: [HACKERS] Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)
On Mon, Nov 30, 2015 at 02:41:02PM +0100, Shulgin, Oleksandr wrote: > On Wed, Nov 25, 2015 at 9:13 AM, Lukas Fittlwrote: > > On Mon, Nov 23, 2015 at 11:53 PM, Peter Geoghegan wrote: > > One specific justification he gave for not using pg_stat_statements > was: > > "Doesn’t merge bind vars in IN()" (See slide #11) > > > I wonder: > > * How do other people feel about this? Personally, I've seen enough > problems of this kind in the field that "slippery slope" arguments > against this don't seem very compelling. > > > As someone who runs a little monitoring service thats solely based on > pg_stat_statements data, ignoring IN list length would certainly be a good > change. > > We currently do this in post-processing, together with other data cleanup > (e.g. ignoring the length of a VALUES list in an INSERT statement). > > Given the fact that pgss data is normalized & you don't know which plan > was > chosen, I don't think distinguishing based on the length of the list helps > anyone really. > > I see pg_stat_statements as a high-level overview of which queries have > run, and which ones you might want to look into closer using e.g. > auto_explain. > > > I still have the plans to try to marry pg_stat_statements and auto_explain for > the next iteration of "online query plans" extension I was proposing a few > months ago, and the first thing I was going to look into is rectifying this > problem with IN() jumbling. So, have a +1 from me. Is this a TODO? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)
On Wed, Dec 30, 2015 at 5:20 PM, Bruce Momjianwrote: > Is this a TODO? It's on my (very long) personal TODO list. It would be great if someone else took it, though. So, yes. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)
On Wed, Dec 30, 2015 at 05:21:05PM -0800, Peter Geoghegan wrote: > On Wed, Dec 30, 2015 at 5:20 PM, Bruce Momjianwrote: > > Is this a TODO? > > It's on my (very long) personal TODO list. It would be great if > someone else took it, though. So, yes. TODO added. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)
On Wed, Nov 25, 2015 at 9:13 AM, Lukas Fittlwrote: > On Mon, Nov 23, 2015 at 11:53 PM, Peter Geoghegan wrote: > >> One specific justification he gave for not using pg_stat_statements was: >> >> "Doesn’t merge bind vars in IN()" (See slide #11) >> >> I wonder: >> >> * How do other people feel about this? Personally, I've seen enough >> problems of this kind in the field that "slippery slope" arguments >> against this don't seem very compelling. >> > > As someone who runs a little monitoring service thats solely based on > pg_stat_statements data, ignoring IN list length would certainly be a good > change. > > We currently do this in post-processing, together with other data cleanup > (e.g. ignoring the length of a VALUES list in an INSERT statement). > > Given the fact that pgss data is normalized & you don't know which plan > was chosen, I don't think distinguishing based on the length of the list > helps anyone really. > > I see pg_stat_statements as a high-level overview of which queries have > run, and which ones you might want to look into closer using e.g. > auto_explain. > I still have the plans to try to marry pg_stat_statements and auto_explain for the next iteration of "online query plans" extension I was proposing a few months ago, and the first thing I was going to look into is rectifying this problem with IN() jumbling. So, have a +1 from me. -- Alex
Re: [HACKERS] Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)
On Mon, Nov 23, 2015 at 11:53 PM, Peter Geogheganwrote: > One specific justification he gave for not using pg_stat_statements was: > > "Doesn’t merge bind vars in IN()" (See slide #11) > > I wonder: > > * How do other people feel about this? Personally, I've seen enough > problems of this kind in the field that "slippery slope" arguments > against this don't seem very compelling. > As someone who runs a little monitoring service thats solely based on pg_stat_statements data, ignoring IN list length would certainly be a good change. We currently do this in post-processing, together with other data cleanup (e.g. ignoring the length of a VALUES list in an INSERT statement). Given the fact that pgss data is normalized & you don't know which plan was chosen, I don't think distinguishing based on the length of the list helps anyone really. I see pg_stat_statements as a high-level overview of which queries have run, and which ones you might want to look into closer using e.g. auto_explain. Best, Lukas -- Lukas Fittl Skype: lfittl Phone: +1 415 321 0630
Re: [HACKERS] Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)
On Tue, Nov 24, 2015 at 10:55 PM, Jim Nasbywrote: > I'm not following your point. Obviously you can't compare int to text that > doesn't convert back to an int, but that's not what I was talking about. I didn't see what else you could have meant. In any case, the type text has no involvement in your example. pg_stat_statements sees the following SQL queries as equivalent: select integer '5'; select 6; select 7::int4; -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)
Peter Geogheganwrites: > On Tue, Dec 10, 2013 at 1:30 AM, Peter Geoghegan wrote: >> pg_stat_statements' fingerprinting logic considers the following two >> statements as distinct: >> >> select 1 in (1, 2, 3); >> select 1 in (1, 2, 3, 4); >> >> This is because the ArrayExpr jumble case jumbles any ArrayExpr's list >> of elements recursively. In this case it's a list of Const nodes, and >> the fingerprinting logic jumbles those nodes indifferently. I think this is a vastly oversimplified explanation of the problem. In particular, because the planner will flatten an ArrayExpr containing only Const nodes to an array constant (see eval_const_expressions), I don't believe the case ever arises in exactly the form you posit here. A portion of the problem is possibly due to the heuristics in parse_expr.c's transformAExprIn(): * We try to generate a ScalarArrayOpExpr from IN/NOT IN, but this is only * possible if there is a suitable array type available. If not, we fall * back to a boolean condition tree with multiple copies of the lefthand * expression. Also, any IN-list items that contain Vars are handled as * separate boolean conditions, because that gives the planner more scope * for optimization on such clauses. If the original text actually involves a variable number of Vars, then you will end up with a boolean expression with a varying number of OR arms, even if the Vars later get flattened to constants. However, it's not clear to me that anyone would expect such cases to be treated as identical. Another possibility is a type clash, for example "x IN (42, 44.1)" will end up as a boolean tree for lack of a common type for the would-be array elements. That case might possibly be an issue in practice. But what seems more likely to be annoying people is cases in which the original text contains a varying number of Param markers. Those might or might not get folded to constants during planning depending on context, so that they might or might not look different to pg_stat_statements. So I suspect the real problem here is that we might want all of these things to look identical to pg_stat_statements: ARRAY[$1, $2, 42] ARRAY[$1, $2, $3, 47] '{1,2,3,47}'::int[] Don't see a very clean way to do that ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)
On 11/24/15 1:29 PM, Tom Lane wrote: So I suspect the real problem here is that we might want all of these things to look identical to pg_stat_statements: ARRAY[$1, $2, 42] ARRAY[$1, $2, $3, 47] '{1,2,3,47}'::int[] Don't see a very clean way to do that ... Another not-uncommon case is IN ( '1', '2', ... , '2342' ); in other words, treating an integer as text. A lot of frameworks like to do that and just push the problem onto the database. I'm not sure what pg_stat_statements would ultimately see in that case.. Since there's a few different things people might want, maybe a good first step is to allow extending/changing the jumbling decision at the C level. That would make it easy for a knowledgeable enough person to come up with an alternative as a plugin that regular users could use. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)
On Tue, Nov 24, 2015 at 5:39 PM, Jim Nasbywrote: > Another not-uncommon case is IN ( '1', '2', ... , '2342' ); in other words, > treating an integer as text. A lot of frameworks like to do that and just > push the problem onto the database. I'm not sure what pg_stat_statements > would ultimately see in that case.. They do? postgres=# select 5::int4 in ('5'); ?column? ── t (1 row) postgres=# select 5::int4 in ('5a'); ERROR: 22P02: invalid input syntax for integer: "5a" LINE 1: select 5::int4 in ('5a'); ^ -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)
I wrote: > Peter Geogheganwrites: >> This is because the ArrayExpr jumble case jumbles any ArrayExpr's list >> of elements recursively. In this case it's a list of Const nodes, and >> the fingerprinting logic jumbles those nodes indifferently. > I think this is a vastly oversimplified explanation of the problem. > In particular, because the planner will flatten an ArrayExpr containing > only Const nodes to an array constant (see eval_const_expressions), > I don't believe the case ever arises in exactly the form you posit here. Um ... disregard that. For some reason I was thinking that pg_stat_statements looks at plan trees, but of course what it looks at is the query tree immediately post-parse-analysis. So the behavior of the const-folding pass is not relevant. The heuristics in transformAExprIn() are still relevant, though, and I suspect that the question of whether Params should be considered the same as Consts is also highly relevant. I wonder whether we could improve this by arranging things so that both Consts and Params contribute zero to the jumble hash, and a list of these things also contributes zero, regardless of the length of the list. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)
On 11/24/15 7:46 PM, Peter Geoghegan wrote: On Tue, Nov 24, 2015 at 5:39 PM, Jim Nasbywrote: Another not-uncommon case is IN ( '1', '2', ... , '2342' ); in other words, treating an integer as text. A lot of frameworks like to do that and just push the problem onto the database. I'm not sure what pg_stat_statements would ultimately see in that case.. They do? postgres=# select 5::int4 in ('5'); ?column? ── t (1 row) postgres=# select 5::int4 in ('5a'); ERROR: 22P02: invalid input syntax for integer: "5a" LINE 1: select 5::int4 in ('5a'); ^ I'm not following your point. Obviously you can't compare int to text that doesn't convert back to an int, but that's not what I was talking about. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)
On Tue, Dec 10, 2013 at 1:30 AM, Peter Geogheganwrote: > pg_stat_statements' fingerprinting logic considers the following two > statements as distinct: > > select 1 in (1, 2, 3); > select 1 in (1, 2, 3, 4); > > This is because the ArrayExpr jumble case jumbles any ArrayExpr's list > of elements recursively. In this case it's a list of Const nodes, and > the fingerprinting logic jumbles those nodes indifferently. > > Somebody told me that they think that pg_stat_statements should not do > that. This person felt that it would be preferable for such > expressions to be normalized without regard to the number of distinct > Const elements. I suppose that that would work by determing if the > ArrayExpr elements list was a list of Const nodes and only const > nodes. Iff that turned out to be the case, something else would be > jumbled (something other than the list) that would essentially be a > representation of "some list of zero or more (or maybe one or more) > Const nodes with consttype of, in this example, 23". I think that this > would make at least one person happy, because of course the two > statements above would have their costs aggregated within a single > pg_stat_statements entry. Baron Schwartz recently gave a talk at PGConf Silicon Valley about the proprietary query instrumentation tool, VividCortex. The slides are available from: http://info.citusdata.com/rs/235-CNE-301/images/Analyzing_PostgreSQL_Network_Traffic_with_vc-pgsql-sniffer_-_Baron_Schwartz.pdf One specific justification he gave for not using pg_stat_statements was: "Doesn’t merge bind vars in IN()" (See slide #11) His theory is that you should allow a proprietary binary to run with root permissions, a binary that sniffs the wire protocol, mostly because pg_stat_statements has this limitation (all other pg_stat_statements limitations listed are pretty unconvincing, IMV). That doesn't seem like a great plan to me, but I think he has a point about pg_stat_statements. It's about time that we fixed this -- it isn't realistic to imagine that people are going to know to use an array constant like "= any ('{1,2,3}')" -- even a major contributor to Django that I talked to about this issue a couple of years ago didn't know about that. It also isn't portable across database systems. I wonder: * How do other people feel about this? Personally, I've seen enough problems of this kind in the field that "slippery slope" arguments against this don't seem very compelling. * How might altering the jumbling logic to make it recognize a variable number of constants as equivalent work in practice? Perhaps we should do something to flatten the representation based on which powers of two the number of constants is between. There are still some details to work out there, but that's my first idea. That seems like a good compromise between the current behavior, and completely disregarding the number of constants. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers