Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan writes: > Having taken another look at the code, I wonder if we wouldn't have > been better off just fastpathing out of pgss_store in the first call > (in a pair of calls made by a backend as part an execution of some > non-prepared query) iff there is already an entry in the hashtable - > after all, we're now going to the trouble of acquiring the spinlock > just to increment the usage for the entry by 0 (likewise, every other > field), which is obviously superfluous. I apologise for not having > spotted this before submitting my last patch. On reflection, we can actually make the code a good bit simpler if we push the responsibility for initializing the usage count correctly into entry_alloc(), instead of having to fix it up later. Then we can just skip the entire adjust-the-stats step in pgss_store when building a sticky entry. See my commit just now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 8 April 2012 20:51, Tom Lane wrote: > Applied with some cosmetic adjustments. Thanks. Having taken another look at the code, I wonder if we wouldn't have been better off just fastpathing out of pgss_store in the first call (in a pair of calls made by a backend as part an execution of some non-prepared query) iff there is already an entry in the hashtable - after all, we're now going to the trouble of acquiring the spinlock just to increment the usage for the entry by 0 (likewise, every other field), which is obviously superfluous. I apologise for not having spotted this before submitting my last patch. I have attached a patch with the modifications described. This is more than a micro-optimisation, since it will cut the number of spinlock acquisitions approximately in half for non-prepared queries. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services pg_stat_statements_optimization_2012_04_08.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan writes: > On 29 March 2012 21:05, Tom Lane wrote: >> Barring objections I'll go fix this, and then this patch can be >> considered closed except for possible future tweaking of the >> sticky-entry decay rule. > Attached patch fixes a bug, and tweaks sticky-entry decay. Applied with some cosmetic adjustments. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 29 March 2012 21:05, Tom Lane wrote: > Barring objections I'll go fix this, and then this patch can be > considered closed except for possible future tweaking of the > sticky-entry decay rule. Attached patch fixes a bug, and tweaks sticky-entry decay. The extant code bumps usage (though not call counts) in two hooks (pgss_post_parse_analyze() and pgss_ExecutorEnd()) , so prepared queries will always have about half the usage of an equivalent simple query, which is clearly not desirable. With the proposed patch, "usage" should be similar to "calls" until the first call of entry_dealloc(), rather than usually having a value that's about twice as high. With the patch, a run of pgbench with and without "-M prepared" results in a usage of calls + 1 for each query from both runs. The approach I've taken with decay is to maintain a server-wide median usage value (well, a convenient approximation), which is assigned to sticky entries. This makes it hard to evict the entries in the first couple of calls to entry_dealloc(). On the other hand, if there really is contention for entries, it will soon become really easy to evict sticky entries, because we use a much more aggressive multiplier of 0.5 for their decay. I rather conservatively initially assume that the median usage is 10, which is a very low value considering the use of the multiplier trick. In any case, in the real world it won't take too long to call entry_dealloc() to set the median value, if in fact it actually matters. You described entries as precious. This isn't quite the full picture; while pg_stat_statements will malthusianistically burn through pretty much as many entries as you care give to it, or so you might think, I believe that in the real world, the rate at which the module burns through them would frequently look logarithmic. In other words, after an entry_dealloc() call the hashtable is 95% full, but it might take rather a long time to reach 100% again - the first 5% is consumed dramatically faster than the last. The user might not actually care if you need to cache a sticky value for a few hours in one of their slots, as you run an epic reporting query, even though the hashtable is over 95% full. The idea is to avoid evicting a sticky entry just because there happened to be an infrequent entry_dealloc() at the wrong time, and the least marginal of the most marginal 5% of non-sticky entries (that is, the 5% up for eviction) happened to have a call count/usage of higher than the magic value of 3, which I find quite plausible. If I apply your test for dead sticky entries after the regression tests (serial schedule) were run, my approach compares very favourably (granted, presumably usage values were double-counted for your test, making our results less than completely comparable). For the purposes of this experiment, I've just commented out "if (calls == 0) continue;" within the pg_stat_statements() function, obviously: postgres=# select calls = 0, count(*) from pg_stat_statements() group by calls = 0; -[ RECORD 1 ]- ?column? | f count| 959 -[ RECORD 2 ]- ?column? | t count| 3 <--- this includes the above query itself postgres=# select calls = 0, count(*) from pg_stat_statements() group by calls = 0; -[ RECORD 1 ]- ?column? | f count| 960
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Thu, Mar 29, 2012 at 4:05 PM, Tom Lane wrote: > I wrote: >> ... PREPARE/EXECUTE work a bit funny though: if you have >> track = all then you get EXECUTE cycles reported against both the >> EXECUTE statement and the underlying PREPARE. This is because when >> PREPARE calls parse_analyze_varparams the post-analyze hook doesn't know >> that this isn't a top-level statement, so it marks the query with a >> queryId. I don't see any way around that part without something like >> what I suggested before. However, this behavior seems to me to be >> considerably less of a POLA violation than the cases involving two >> identical-looking entries for self-contained statements, and it might >> even be thought to be a feature not a bug (since the PREPARE entry will >> accumulate totals for all uses of the prepared statement). So I'm >> satisfied with it for now. > > Actually, there's an easy hack for that too: we can teach the > ProcessUtility hook to do nothing (and in particular not increment the > nesting level) when the statement is an ExecuteStmt. This will result > in the executor time being blamed on the original PREPARE, whether or > not you have enabled tracking of nested statements. That seems like a > substantial win to me, because right now you get a distinct EXECUTE > entry for each textually-different set of parameter values, which seems > pretty useless. This change would make use of PREPARE/EXECUTE behave > very nearly the same in pg_stat_statement as use of protocol-level > prepared statements. About the only downside I can see is that the > cycles expended on evaluating the EXECUTE's parameters will not be > charged to any pg_stat_statement entry. Since those can be expressions, > in principle this might be a non-negligible amount of execution time, > but in practice it hardly seems likely that anyone would care about it. > > Barring objections I'll go fix this, and then this patch can be > considered closed except for possible future tweaking of the > sticky-entry decay rule. After reading your last commit message, I was wondering if something like this might be possible, so +1 from me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
I wrote: > ... PREPARE/EXECUTE work a bit funny though: if you have > track = all then you get EXECUTE cycles reported against both the > EXECUTE statement and the underlying PREPARE. This is because when > PREPARE calls parse_analyze_varparams the post-analyze hook doesn't know > that this isn't a top-level statement, so it marks the query with a > queryId. I don't see any way around that part without something like > what I suggested before. However, this behavior seems to me to be > considerably less of a POLA violation than the cases involving two > identical-looking entries for self-contained statements, and it might > even be thought to be a feature not a bug (since the PREPARE entry will > accumulate totals for all uses of the prepared statement). So I'm > satisfied with it for now. Actually, there's an easy hack for that too: we can teach the ProcessUtility hook to do nothing (and in particular not increment the nesting level) when the statement is an ExecuteStmt. This will result in the executor time being blamed on the original PREPARE, whether or not you have enabled tracking of nested statements. That seems like a substantial win to me, because right now you get a distinct EXECUTE entry for each textually-different set of parameter values, which seems pretty useless. This change would make use of PREPARE/EXECUTE behave very nearly the same in pg_stat_statement as use of protocol-level prepared statements. About the only downside I can see is that the cycles expended on evaluating the EXECUTE's parameters will not be charged to any pg_stat_statement entry. Since those can be expressions, in principle this might be a non-negligible amount of execution time, but in practice it hardly seems likely that anyone would care about it. Barring objections I'll go fix this, and then this patch can be considered closed except for possible future tweaking of the sticky-entry decay rule. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
I wrote: > Hm ... I just had a different idea. I need to go look at the code > again, but I believe that in the problematic cases, the post-analyze > hook does not compute a queryId for the optimizable statement. This > means that it will arrive at the executor with queryId zero. What if > we simply made the executor hooks do nothing when queryId is zero? > (Note that this also means that in the problematic cases, the behavior > is already pretty wrong because executor costs for *all* statements of > this sort are getting merged into one hashtable entry for hash zero.) The attached proposed patch does it that way. It makes the EXPLAIN, SELECT INTO, and DECLARE CURSOR cases behave as expected for utility statements. PREPARE/EXECUTE work a bit funny though: if you have track = all then you get EXECUTE cycles reported against both the EXECUTE statement and the underlying PREPARE. This is because when PREPARE calls parse_analyze_varparams the post-analyze hook doesn't know that this isn't a top-level statement, so it marks the query with a queryId. I don't see any way around that part without something like what I suggested before. However, this behavior seems to me to be considerably less of a POLA violation than the cases involving two identical-looking entries for self-contained statements, and it might even be thought to be a feature not a bug (since the PREPARE entry will accumulate totals for all uses of the prepared statement). So I'm satisfied with it for now. regards, tom lane diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 137b242..bebfff1 100644 *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** pgss_post_parse_analyze(ParseState *psta *** 602,610 if (!pgss || !pgss_hash) return; ! /* We do nothing with utility statements at this stage */ if (query->utilityStmt) return; /* Set up workspace for query jumbling */ jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE); --- 602,620 if (!pgss || !pgss_hash) return; ! /* ! * Utility statements get queryId zero. We do this even in cases where ! * the statement contains an optimizable statement for which a queryId ! * could be derived (such as EXPLAIN or DECLARE CURSOR). For such cases, ! * runtime control will first go through ProcessUtility and then the ! * executor, and we don't want the executor hooks to do anything, since ! * we are already measuring the statement's costs at the utility level. ! */ if (query->utilityStmt) + { + query->queryId = 0; return; + } /* Set up workspace for query jumbling */ jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE); *** pgss_post_parse_analyze(ParseState *psta *** 619,624 --- 629,641 query->queryId = hash_any(jstate.jumble, jstate.jumble_len); /* + * If we are unlucky enough to get a hash of zero, use 1 instead, to + * prevent confusion with the utility-statement case. + */ + if (query->queryId == 0) + query->queryId = 1; + + /* * If we were able to identify any ignorable constants, we immediately * create a hash table entry for the query, so that we can record the * normalized form of the query string. If there were no such constants, *** pgss_ExecutorStart(QueryDesc *queryDesc, *** 649,655 else standard_ExecutorStart(queryDesc, eflags); ! if (pgss_enabled()) { /* * Set up to track total elapsed time in ExecutorRun. Make sure the --- 666,677 else standard_ExecutorStart(queryDesc, eflags); ! /* ! * If query has queryId zero, don't track it. This prevents double ! * counting of optimizable statements that are directly contained in ! * utility statements. ! */ ! if (pgss_enabled() && queryDesc->plannedstmt->queryId != 0) { /* * Set up to track total elapsed time in ExecutorRun. Make sure the *** pgss_ExecutorFinish(QueryDesc *queryDesc *** 719,731 static void pgss_ExecutorEnd(QueryDesc *queryDesc) { ! if (queryDesc->totaltime && pgss_enabled()) ! { ! uint32 queryId; ! ! /* Query's ID should have been filled in by post-analyze hook */ ! queryId = queryDesc->plannedstmt->queryId; /* * Make sure stats accumulation is done. (Note: it's okay if several * levels of hook all do this.) --- 741,750 static void pgss_ExecutorEnd(QueryDesc *queryDesc) { ! uint32 queryId = queryDesc->plannedstmt->queryId; + if (queryId != 0 && queryDesc->totaltime && pgss_enabled()) + { /* * Make sure stats accumulation is done. (Note: it's okay if several * levels of hook all do this.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
[ forgot to respond to this bit ] Robert Haas writes: > Another thought is: if we simply treated these as nested queries for > all purposes, would that really be so bad? That was actually what I suggested first, and now that I look at the code, that's exactly what's happening right now. However, Peter pointed out --- I think rightly --- that this fails to satisfy the principle of least astonishment, because the user doesn't think of these statements as being utility statements with other statements wrapped inside. Users are going to expect these cases to be treated as single statements. The issue existed before this patch, BTW, but was partially masked by the fact that we grouped pg_stat_statements view entries strictly by query text, so that both the utility statement and the contained optimizable statement got matched to the same table entry. The execution costs were getting double-counted, but apparently nobody noticed that. As things now stand, the utility statement and contained statement show up as distinct table entries (if you have nested-statement tracking enabled) because of the different hashing methods used. And it's those multiple table entries that seem confusing, especially since they are counting mostly the same costs. [ time passes ... ] Hm ... I just had a different idea. I need to go look at the code again, but I believe that in the problematic cases, the post-analyze hook does not compute a queryId for the optimizable statement. This means that it will arrive at the executor with queryId zero. What if we simply made the executor hooks do nothing when queryId is zero? (Note that this also means that in the problematic cases, the behavior is already pretty wrong because executor costs for *all* statements of this sort are getting merged into one hashtable entry for hash zero.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Robert Haas writes: > What I'm imagining is that instead of just having a global for > nested_level, you'd have a global variable pointing to a linked list. This is more or less what I have in mind, too, except I do not believe that a mere boolean flag is sufficient to tell the difference between an executor call that you want to suppress logging for and one that you do not. You need some more positive way of identifying the target statement than that, and what I propose that be is the statement's query string. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Thu, Mar 29, 2012 at 11:42 AM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Mar 29, 2012 at 11:23 AM, Tom Lane wrote: >>> However, I think there is a solution for that, though it may sound a bit >>> ugly. Rather than just stacking a flag, let's stack the query source >>> text pointer for the utility statement. Then in the executor hooks, >>> if that pointer is *pointer* equal (not strcmp equal) to the optimizable >>> statement's source-text pointer, we know we are executing the "same" >>> statement as the surrounding utility command, and we do nothing. > >> Without wishing to tick you off, that sounds both ugly and fragile. > > What do you object to --- the pointer-equality part? We could do strcmp > comparison instead, on the assumption that a utility command could not > look the same as an optimizable statement except in the case we care > about. I think that's probably unnecessary though. The pointer equality part seems like the worst ugliness, yes. >> Can't we find a way to set the stacked flag (on the top stack frame) >> after planning and before execution? > > That would require a way for pg_stat_statements to get control at rather > random places in several different types of utility statements. And > if we did add hook functions in those places, you'd still need to have > sufficient stacked context for those hooks to know what to do, which > leads you right back to this I think. What I'm imagining is that instead of just having a global for nested_level, you'd have a global variable pointing to a linked list. The length of the list would be equal to what we currently call nested_level + 1. Something like this: struct pgss_nesting_info { struct pgss_nesting_info *next; int flag; /* bad name */ }; static pgss_nesting_info *pgss_stack_top; So any test for nesting_depth == 0 would instead test pgss_stack_top->next != NULL. Then, when you get control at the relevant spots, you set pgss_stack_top->flag = 1 and that's it. Now, maybe it's too ugly to think about passing control at those spots; I'm surprised there's not a central point they all go through... Another thought is: if we simply treated these as nested queries for all purposes, would that really be so bad? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Robert Haas writes: > On Thu, Mar 29, 2012 at 11:23 AM, Tom Lane wrote: >> However, I think there is a solution for that, though it may sound a bit >> ugly. Rather than just stacking a flag, let's stack the query source >> text pointer for the utility statement. Then in the executor hooks, >> if that pointer is *pointer* equal (not strcmp equal) to the optimizable >> statement's source-text pointer, we know we are executing the "same" >> statement as the surrounding utility command, and we do nothing. > Without wishing to tick you off, that sounds both ugly and fragile. What do you object to --- the pointer-equality part? We could do strcmp comparison instead, on the assumption that a utility command could not look the same as an optimizable statement except in the case we care about. I think that's probably unnecessary though. > Can't we find a way to set the stacked flag (on the top stack frame) > after planning and before execution? That would require a way for pg_stat_statements to get control at rather random places in several different types of utility statements. And if we did add hook functions in those places, you'd still need to have sufficient stacked context for those hooks to know what to do, which leads you right back to this I think. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Thu, Mar 29, 2012 at 11:23 AM, Tom Lane wrote: > It would make more sense to me to go the other way, that is suppress > creation of a separate entry for the contained optimizable statement. > The stats will still be correctly accumulated into the surrounding > statement (or at least, if they are not, that's a separate pre-existing > bug). If we do it in the direction you suggest, we'll fail to capture > costs incurred outside execution of the contained statement. All things being equal, I completely agree. However, ISTM that the difficulty of implementation might be higher for your proposal, for the reasons you go on to state. If getting it right means that other significant features are not going to get committed at all for 9.2, I think we could leave this as a TODO. > Right now, we already have logic in there to track nesting of statements > in a primitive way, that is just count the nesting depth. My first idea > about fixing this was to tweak that logic so that it stacks a flag > saying "we're in a utility statement that contains an optimizable > statement", and then the first layer of Executor hooks that sees that > flag set would know to not do anything. However this isn't quite good > enough because that first layer might not be for the "same" statement. > As an example, in an EXPLAIN ANALYZE the planner might pre-execute > immutable or stable SQL functions before we reach the executor. We > would prefer that any statements embedded in such a function still be > seen as independent nested statements. > > However, I think there is a solution for that, though it may sound a bit > ugly. Rather than just stacking a flag, let's stack the query source > text pointer for the utility statement. Then in the executor hooks, > if that pointer is *pointer* equal (not strcmp equal) to the optimizable > statement's source-text pointer, we know we are executing the "same" > statement as the surrounding utility command, and we do nothing. Without wishing to tick you off, that sounds both ugly and fragile. Can't we find a way to set the stacked flag (on the top stack frame) after planning and before execution? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Robert Haas writes: > On Wed, Mar 28, 2012 at 10:39 PM, Tom Lane wrote: >> The SELECT INTO tests all fail, but we know the reason why (the testbed >> isn't expecting them to result in creating separate entries for the >> utility statement and the underlying plannable SELECT). > This might be a dumb idea, but for a quick hack, could we just rig > SELECT INTO, CREATE TABLE AS, and EXPLAIN not to create entries for > themselves at all, without suppressing creation of an entry for the > underlying query? It would make more sense to me to go the other way, that is suppress creation of a separate entry for the contained optimizable statement. The stats will still be correctly accumulated into the surrounding statement (or at least, if they are not, that's a separate pre-existing bug). If we do it in the direction you suggest, we'll fail to capture costs incurred outside execution of the contained statement. Right now, we already have logic in there to track nesting of statements in a primitive way, that is just count the nesting depth. My first idea about fixing this was to tweak that logic so that it stacks a flag saying "we're in a utility statement that contains an optimizable statement", and then the first layer of Executor hooks that sees that flag set would know to not do anything. However this isn't quite good enough because that first layer might not be for the "same" statement. As an example, in an EXPLAIN ANALYZE the planner might pre-execute immutable or stable SQL functions before we reach the executor. We would prefer that any statements embedded in such a function still be seen as independent nested statements. However, I think there is a solution for that, though it may sound a bit ugly. Rather than just stacking a flag, let's stack the query source text pointer for the utility statement. Then in the executor hooks, if that pointer is *pointer* equal (not strcmp equal) to the optimizable statement's source-text pointer, we know we are executing the "same" statement as the surrounding utility command, and we do nothing. This looks like it would work for the SELECT INTO and EXPLAIN cases, and for DECLARE CURSOR whenever that gets changed to a less bizarre structure. It would not work for EXECUTE, because in that case we pass the query string saved from PREPARE to the executor. However, we could possibly do a special-case hack for EXECUTE; maybe ask prepare.c for the statement's string and stack that instead of the outer EXECUTE query string. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 29 March 2012 02:09, Tom Lane wrote: > Thanks. I've committed the patch along with the docs, after rather > heavy editorialization. Thank you. > 1. What to do with EXPLAIN, SELECT INTO, etc. We had talked about > tweaking the behavior of statement nesting and some other possibilities. > I think clearly this could use improvement but I'm not sure just how. > (Note: I left out the part of your docs patch that attempted to explain > the current behavior, since I think we should fix it not document it.) Yeah, this is kind of unsatisfactory. Nobody would expect the module to behave this way. On the other hand, users probably aren't hugely interested in this information. I'm still kind of attached to the idea of exposing the hash value in the view. It could be handy in replication situations, to be able to aggregate statistics across the cluster (assuming you're using streaming replication and not a trigger based system). You need a relatively stable identifier to be able to do that. You've already sort-of promised to not break the format in point releases, because it is serialised to disk, and may have to persist for months or years. Also, it will drive home the reality of what's going on in situations like this (from the docs): "In some cases, queries with visibly different texts might get merged into a single pg_stat_statements entry. Normally this will happen only for semantically equivalent queries, but there is a small chance of hash collisions causing unrelated queries to be merged into one entry. (This cannot happen for queries belonging to different users or databases, however.) Since the hash value is computed on the post-parse-analysis representation of the queries, the opposite is also possible: queries with identical texts might appear as separate entries, if they have different meanings as a result of factors such as different search_path settings." > 2. Whether and how to adjust the aging-out of sticky entries. This > seems like a research project, but the code impact should be quite > localized. As I said, I'll try and give it some thought, and do some experiments. > BTW, I eventually concluded that the parameterization testing we were > worried about before was a red herring. As committed, the patch tries > to store a normalized string if it found any deletable constants, full > stop. This seems to me to be correct behavior because the presence of > constants is exactly what makes the string normalizable, and such > constants *will* be ignored in the hash calculation no matter whether > there are other parameters or not. Yeah, that aspect of not canonicalising parametrised entries had bothered me. I guess we're better off gracefully handling the problem across the board, rather than attempting to band-aid the problem up by just not having speculative hashtable entries in cases where they arguably are not so useful. Avoiding canonicalising those constants was somewhat misleading. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Wed, Mar 28, 2012 at 10:39 PM, Tom Lane wrote: > The SELECT INTO tests all fail, but we know the reason why (the testbed > isn't expecting them to result in creating separate entries for the > utility statement and the underlying plannable SELECT). This might be a dumb idea, but for a quick hack, could we just rig SELECT INTO, CREATE TABLE AS, and EXPLAIN not to create entries for themselves at all, without suppressing creation of an entry for the underlying query? The output might be slightly misleading but it wouldn't be broken, and I'm disinclined to want to spend a lot of time doing fine-tuning right now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
BTW, I forgot to mention that I did experiment with your python-based test script for pg_stat_statements, but decided not to commit it. There are just too many external dependencies for my taste: 1. python 2. psycopg2 3. dellstore2 test database That coupled with the apparent impossibility of running the script without manual preconfiguration makes it look not terribly useful. Also, as of the committed patch there are several individual tests that fail or need adjustment: The SELECT INTO tests all fail, but we know the reason why (the testbed isn't expecting them to result in creating separate entries for the utility statement and the underlying plannable SELECT). verify_statement_equivalency("select a.orderid from orders a join orders b on a.orderid = b.orderid", "select b.orderid from orders a join orders b on a.orderid = b.orderid", conn) These are not equivalent statements, or at least would not be if the join condition were anything else than what it is, so the fact that the original coding failed to distinguish the targetlist entries is a bug. The test # temporary column name within recursive CTEs doesn't differentiate fails, not because of the change of column name, but because of the change of CTE name. This is a consequence of my having used the CTE name here: case RTE_CTE: /* * Depending on the CTE name here isn't ideal, but it's the * only info we have to identify the referenced WITH item. */ APP_JUMB_STRING(rte->ctename); APP_JUMB(rte->ctelevelsup); break; We could avoid the name dependency by omitting ctename from the jumble but I think that cure is worse than the disease. Anyway, not too important, but I just thought I'd document this in case you were wondering about the discrepancies. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan writes: > doc-patch is attached. I'm not sure if I got the balance right - it > may be on the verbose side. Thanks. I've committed the patch along with the docs, after rather heavy editorialization. There remain some loose ends that should be worked on but didn't seem like commit-blockers: 1. What to do with EXPLAIN, SELECT INTO, etc. We had talked about tweaking the behavior of statement nesting and some other possibilities. I think clearly this could use improvement but I'm not sure just how. (Note: I left out the part of your docs patch that attempted to explain the current behavior, since I think we should fix it not document it.) 2. Whether and how to adjust the aging-out of sticky entries. This seems like a research project, but the code impact should be quite localized. BTW, I eventually concluded that the parameterization testing we were worried about before was a red herring. As committed, the patch tries to store a normalized string if it found any deletable constants, full stop. This seems to me to be correct behavior because the presence of constants is exactly what makes the string normalizable, and such constants *will* be ignored in the hash calculation no matter whether there are other parameters or not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 29 March 2012 00:14, Tom Lane wrote: > I'm planning to commit the patch with a USAGE_NON_EXEC_STICK value > of 3.0, which is the largest value that stays below 10% wastage. > We can twiddle that logic later, so if you want to experiment with an > alternate decay rule, feel free. I think I may well end up doing so when I get a chance. This seems like the kind of problem that will be solved only when we get some practical experience (i.e. use the tool on something closer to a production system than the regression tests). doc-patch is attached. I'm not sure if I got the balance right - it may be on the verbose side. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services pg_stat_statements_norm_docs.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan writes: > On 28 March 2012 15:57, Tom Lane wrote: >> Is there any actual benefit in providing the >> "pg_stat_statements.string_key" GUC? It looks to me more like something >> that was thrown in because it was easy than because anybody would want >> it. I'd just as soon leave it out and avoid the incremental API >> complexity increase. (While on that subject, I see no documentation >> updates in the patch...) > Personally, I don't care for it, and I'm sure most users wouldn't > either, but I thought that someone somewhere might be relying on the > existing behaviour. Hearing no squawks, I will remove it from the committed patch; one less thing to document. Easy enough to put back later, if someone makes a case for it. >> Also, I'm not terribly happy with the "sticky entries" hack. > I was troubled by that too, and had considered various ways of at > least polishing the kludge. Maybe a better approach would be to start > with a usage of 1e10 (or something rather high, anyway), but apply a > much more aggressive multiplier than USAGE_DECREASE_FACTOR for sticky > entries only? That way, in earlier calls of entry_dealloc() the sticky > entries, easily identifiable as having 0 calls, are almost impossible > to evict, but after a relatively small number of calls they soon > become more readily evictable. I did some simple experiments with the regression tests. Now, those tests are by far a worst case for this sort of thing, since (a) they probably generate many more unique queries than a typical production application, and (b) they almost certainly provoke many more errors and hence more dead sticky entries than a typical production app. Nonetheless, the results look pretty bad. Using various values of USAGE_NON_EXEC_STICK, the numbers of useful and dead entries in the hash table after completing one round of regression tests was: STICK live entriesdead sticky entries 10.0780 190 5.0 858 112 4.0 874 96 3.0 911 62 2.0 918 43 I did not bother measuring 1e10 ;-). It's clear that sticky entries are forcing useful entries out of the table in this scenario. I think wasting more than about 10% of the table in this way is not acceptable. I'm planning to commit the patch with a USAGE_NON_EXEC_STICK value of 3.0, which is the largest value that stays below 10% wastage. We can twiddle that logic later, so if you want to experiment with an alternate decay rule, feel free. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 28 March 2012 15:57, Tom Lane wrote: > Is there any actual benefit in providing the > "pg_stat_statements.string_key" GUC? It looks to me more like something > that was thrown in because it was easy than because anybody would want > it. I'd just as soon leave it out and avoid the incremental API > complexity increase. (While on that subject, I see no documentation > updates in the patch...) Personally, I don't care for it, and I'm sure most users wouldn't either, but I thought that someone somewhere might be relying on the existing behaviour. I will produce a doc-patch. It would have been premature to do so until quite recently. > Also, I'm not terribly happy with the "sticky entries" hack. The way > you had it set up with a 1e10 bias for a sticky entry was completely > unreasonable IMO, because if the later pgss_store call never happens > (which is quite possible if the statement contains an error detected > during planning or execution), that entry is basically never going to > age out, and will just uselessly consume a precious table slot for a > long time. In the extreme, somebody could effectively disable query > tracking by filling the hashtable with variants of "SELECT 1/0". > The best quick answer I can think of is to reduce USAGE_NON_EXEC_STICK > to maybe 10 or so, but I wonder whether there's some less klugy way to > get the result in the first place. I thought about keeping the > canonicalized query string around locally in the backend rather than > having the early pgss_store call at all, but am not sure it's worth > the complexity of an additional local hashtable or some such to hold > such pending entries. I was troubled by that too, and had considered various ways of at least polishing the kludge. Maybe a better approach would be to start with a usage of 1e10 (or something rather high, anyway), but apply a much more aggressive multiplier than USAGE_DECREASE_FACTOR for sticky entries only? That way, in earlier calls of entry_dealloc() the sticky entries, easily identifiable as having 0 calls, are almost impossible to evict, but after a relatively small number of calls they soon become more readily evictable. This seems better than simply having some much lower usage that is only a few times the value of USAGE_INIT. Let's suppose we set sticky entries to have a usage value of 10. If all other queries have more than 10 calls, which is not unlikely (under the current usage format, 1.0 usage = 1 call, at least until entry_dealloc() dampens usage) then when we entry_dealloc(), the sticky entry might as well have a usage of 1, and has no way of increasing its usage short of becoming a "real" entry. On the other hand, with the multiplier trick, how close the sticky entry is to eviction is, importantly, far more strongly influenced by the number of entry_dealloc() calls, which in turn is influenced by churn in the system, rather than being largely influenced by how the magic sticky usage value happens to compare to those usage values of some random set of "real" entries on some random database. If entries really are precious, then the sticky entry is freed much sooner. If not, then why not allow the sticky entry to stick around pending its execution/ promotion to a "real" entry? It would probably be pretty inexpensive to maintain what is currently the largest usage value in the hash table at entry_dealloc() time - that would likely be far more suitable than 1e10, and might even work well. We could perhaps cut that in half every entry_dealloc(). -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
A couple other issues about this patch ... Is there any actual benefit in providing the "pg_stat_statements.string_key" GUC? It looks to me more like something that was thrown in because it was easy than because anybody would want it. I'd just as soon leave it out and avoid the incremental API complexity increase. (While on that subject, I see no documentation updates in the patch...) Also, I'm not terribly happy with the "sticky entries" hack. The way you had it set up with a 1e10 bias for a sticky entry was completely unreasonable IMO, because if the later pgss_store call never happens (which is quite possible if the statement contains an error detected during planning or execution), that entry is basically never going to age out, and will just uselessly consume a precious table slot for a long time. In the extreme, somebody could effectively disable query tracking by filling the hashtable with variants of "SELECT 1/0". The best quick answer I can think of is to reduce USAGE_NON_EXEC_STICK to maybe 10 or so, but I wonder whether there's some less klugy way to get the result in the first place. I thought about keeping the canonicalized query string around locally in the backend rather than having the early pgss_store call at all, but am not sure it's worth the complexity of an additional local hashtable or some such to hold such pending entries. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 28 March 2012 15:25, Tom Lane wrote: > That's been an issue right along for cases such as EXPLAIN and EXECUTE, > I believe. Possible, since I didn't have test coverage for either of those 2 commands. Perhaps the right thing is to consider such executor calls > as nested statements --- that is, the ProcessUtility hook ought to > bump the nesting depth too. That makes a lot of sense, but it might spoil things for the pg_stat_statements.track = 'top' + pg_stat_statements.track_utility = 'on' case. At the very least, it's a POLA violation, to the extent that if you were going to do this, you might mandate that nested statements be tracked along with utility statements (probably while defaulting to having both off, which would be a change). Since you've already removed the intoClause chunk, I'm not sure how far underway the review effort is - would you like me to produce a new revision, or is that unnecessary? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan writes: > I merged upstream changes with the intention of providing a new patch > for you to review. I found a problem that I'd guess was introduced by > commit 9dbf2b7d75de5af38d087cbe2b1147dd0fd10f0a, "Restructure SELECT > INTO's parsetree representation into CreateTableAsStmt". This has > nothing to do with my patch in particular. Yeah, I already deleted the intoClause chunk from the patch. I think treating SELECT INTO as a utility statement is probably fine, at least for now. > In the existing pg_stat_statements code in HEAD, there are 2 > pgss_store call sites - one in pgss_ProcessUtility, and the other in > pgss_ExecutorFinish. There is an implicit assumption in the extant > code (and my patch too) that there will be exactly one pgss_store call > per query execution. However, that assumption appears to now fall > down, as illustrated by the GDB session below. What's more, our new > hook is called twice, which is arguably redundant. That's been an issue right along for cases such as EXPLAIN and EXECUTE, I believe. Perhaps the right thing is to consider such executor calls as nested statements --- that is, the ProcessUtility hook ought to bump the nesting depth too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 27 March 2012 20:26, Tom Lane wrote: > Have yet to look at the pg_stat_statements code itself. I merged upstream changes with the intention of providing a new patch for you to review. I found a problem that I'd guess was introduced by commit 9dbf2b7d75de5af38d087cbe2b1147dd0fd10f0a, "Restructure SELECT INTO's parsetree representation into CreateTableAsStmt". This has nothing to do with my patch in particular. I noticed that my tests broke, on queries like "select * into orders_recent FROM orders WHERE orderdate >= '2002-01-01';". Since commands like that are now utility statements, I thought it best to just hash the query string directly, along with all other utility statements - richer functionality would be unlikely to be missed all that much, and that's a fairly clean demarcation that I don't want to make blurry. In the existing pg_stat_statements code in HEAD, there are 2 pgss_store call sites - one in pgss_ProcessUtility, and the other in pgss_ExecutorFinish. There is an implicit assumption in the extant code (and my patch too) that there will be exactly one pgss_store call per query execution. However, that assumption appears to now fall down, as illustrated by the GDB session below. What's more, our new hook is called twice, which is arguably redundant. (gdb) break pgss_parse_analyze Breakpoint 1 at 0x7fbd17b96790: file pg_stat_statements.c, line 640. (gdb) break pgss_ProcessUtility Breakpoint 2 at 0x7fbd17b962b4: file pg_stat_statements.c, line 1710. (gdb) break pgss_ExecutorEnd Breakpoint 3 at 0x7fbd17b9618c: file pg_stat_statements.c, line 1674. (gdb) c Continuing. < I execute the command "select * into orders_recent FROM orders WHERE orderdate >= '2002-01-01';" > Breakpoint 1, pgss_parse_analyze (pstate=0x2473bc0, post_analysis_tree=0x2474010) at pg_stat_statements.c:640 640 if (post_analysis_tree->commandType != CMD_UTILITY) (gdb) c Continuing. Breakpoint 1, pgss_parse_analyze (pstate=0x2473bc0, post_analysis_tree=0x2474010) at pg_stat_statements.c:640 640 if (post_analysis_tree->commandType != CMD_UTILITY) (gdb) c Continuing. Breakpoint 2, pgss_ProcessUtility (parsetree=0x2473cd8, queryString=0x2472a88 "select * into orders_recent FROM orders WHERE orderdate >= '2002-01-01';", params=0x0, isTopLevel=1 '\001', dest=0x2474278, completionTag=0x7fff74e481e0 "") at pg_stat_statements.c:1710 1710if (pgss_track_utility && pgss_enabled()) (gdb) c Continuing. Breakpoint 3, pgss_ExecutorEnd (queryDesc=0x24c9660) at pg_stat_statements.c:1674 1674if (queryDesc->totaltime && pgss_enabled()) (gdb) c Continuing. What do you think we should do about this? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
[ also for the archives' sake ] On 27 March 2012 22:05, Tom Lane wrote: > Well, testing function pointers for null is certainly OK --- note that > all our hook function call sites do that. It's true that testing for > equality to a particular function's name can fail on some platforms > because of jump table hacks. I was actually talking about stylistic iffiness. This seems contrary to the standard, which states: (ISO C 99 section 6.5.9.6) "Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (...) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space." However, the fly in the ointment is IA-64 (Itanic), which apparently at least at one stage had broken function pointer comparisons, at least when code was built using some version(s) of GCC. I found it a bit difficult to square your contention that performing function pointer comparisons against function addresses was what sounded like undefined behaviour, and yet neither GCC nor Clang complained. However, in light of what I've learned about IA-64, I can certainly see why we as a project would avoid the practice. Source: http://gcc.gnu.org/ml/gcc/2003-06/msg01283.html -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
[ just for the archives' sake ] Peter Geoghegan writes: > On 27 March 2012 18:15, Tom Lane wrote: >> Now, if what it wants to know about is the parameterization status >> of the query, things aren't ideal because most of the info is hidden >> in parse-callback fields that aren't of globally exposed types. However >> we could at least duplicate the behavior you have here, because you're >> only passing canonicalize = true in cases where no parse callback will >> be registered at all, so pg_stat_statements could equivalently test for >> pstate->p_paramref_hook == NULL. > It has been suggested to me before that comparisons with function > pointers - using them as a flag, in effect - is generally iffy, but > that particular usage seems reasonable to me. Well, testing function pointers for null is certainly OK --- note that all our hook function call sites do that. It's true that testing for equality to a particular function's name can fail on some platforms because of jump table hacks. Thus for example, if you had a need to know that parse_variable_parameters parameter management was in use, it wouldn't do to test whether p_coerce_param_hook == variable_coerce_param_hook. (Not that you could anyway, what with that being a static function, but exposing it as global wouldn't offer a safe solution.) If we had a need to make this information available, I think what we'd want to do is insist that p_ref_hook_state entries be subclasses of Node, so that plugins could apply IsA tests on the node tag to figure out what style of parameter management was in use. This would also mean exposing the struct definitions globally, which you'd need anyway else the plugins couldn't safely access the struct contents. I don't particularly want to go there without very compelling reasons, but that would be the direction to head in if we had to. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 27 March 2012 20:26, Tom Lane wrote: > I've committed the core-backend parts of this, just to get them out of > the way. Have yet to look at the pg_stat_statements code itself. Thanks. I'm glad that we have that out of the way. > I ended up choosing not to apply that bit. I remain of the opinion that > this behavior is fundamentally inconsistent with the general rules for > assigning parse locations to analyzed constructs, and I see no reason to > propagate that inconsistency further than we absolutely have to. Fair enough. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan writes: > I've attached a patch with the required modifications. I've committed the core-backend parts of this, just to get them out of the way. Have yet to look at the pg_stat_statements code itself. > I restored the location field to the ParamCoerceHook signature, but > the removal of code to modify the param location remains (again, not > because I need it, but because I happen to think that it ought to be > consistent with Const). I ended up choosing not to apply that bit. I remain of the opinion that this behavior is fundamentally inconsistent with the general rules for assigning parse locations to analyzed constructs, and I see no reason to propagate that inconsistency further than we absolutely have to. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan writes: >> On 22 March 2012 17:19, Tom Lane wrote: >>> Either way, I think we'd be a lot better advised to define a single >>> hook "post_parse_analysis_hook" and make the core code responsible >>> for calling it at the appropriate places, rather than supposing that >>> the contrib module knows exactly which core functions ought to be >>> the places to do it. > I have done this too. The "canonicalize" argument to the proposed hook seems like a bit of a crock. You've got the core code magically setting that in a way that responds to extremely pg_stat_statements-specific concerns, and I am not very sure it's right even for those concerns. I am thinking that perhaps a reasonable signature for the hook function would be void post_parse_analyze (ParseState *pstate, Query *query); with the expectation that it could dig whatever it wants to know out of the ParseState (in particular the sourceText is available there, and in general this should provide everything that's known at parse time). Now, if what it wants to know about is the parameterization status of the query, things aren't ideal because most of the info is hidden in parse-callback fields that aren't of globally exposed types. However we could at least duplicate the behavior you have here, because you're only passing canonicalize = true in cases where no parse callback will be registered at all, so pg_stat_statements could equivalently test for pstate->p_paramref_hook == NULL. Thoughts, other ideas? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 22 March 2012 19:07, Tom Lane wrote: > Will you adjust the patch for the other issues? Sure. I'll take a look at it now. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan writes: > Since you haven't mentioned the removal of parse-analysis Const > location alterations, I take it that you do not object to that, which > is something I'm glad of. I remain un-thrilled about it, but apparently nobody else cares, so I'll yield the point. (I do however object to your removal of the cast location value from the param_coerce_hook signature. The fact that one current user of the hook won't need it anymore doesn't mean no others would. Consider throwing a "can't coerce" error from within the hook function, for instance.) Will you adjust the patch for the other issues? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 22 March 2012 17:19, Tom Lane wrote: > I'm inclined to think that the most useful behavior is to teach the > rewriter to copy queryId from the original query into all the Queries > generated by rewrite. Then, all rules fired by a source query would > be lumped into that query for tracking purposes. This might not be > the ideal behavior either, but I don't see a better solution. +1. This behaviour seems fairly sane. The lumping together of DO ALSO and DO INSTEAD operations was a simple oversight. > This seems to boil down to what you want to have happen with queries > created/executed inside functions, which is something I don't recall > being discussed. Uh, well, pg_stat_statements is clearly supposed to monitor execution of queries from within functions - there is a GUC, "pg_stat_statements.track", which can be set to 'all' to track nested queries. That being the case, we should clearly be fingerprinting those query trees too. The fact that we'll fingerprint these queries even though we usually don't care about them doesn't seem like a problem, since in practice the vast majority will be prepared. > Either way, I think we'd be a lot better advised to define a single > hook "post_parse_analysis_hook" and make the core code responsible > for calling it at the appropriate places, rather than supposing that > the contrib module knows exactly which core functions ought to be > the places to do it. I agree. Since you haven't mentioned the removal of parse-analysis Const location alterations, I take it that you do not object to that, which is something I'm glad of. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan writes: > [ pg_stat_statements_norm_2012_02_29.patch ] I started to look at this patch (just the core-code modifications so far). There are some things that seem not terribly well thought out: * It doesn't look to me like it will behave very sanely with rules. The patch doesn't store queryId in a stored rule tree, so a Query retrieved from a stored rule will have a zero queryId, and that's what will get pushed through to the resulting plan tree as well. So basically all DO ALSO or DO INSTEAD operations are going to get lumped together by pg_stat_statements, and separated from the queries that triggered them, which seems pretty darn unhelpful. I don't know that storing queryId would be better, since after a restart that'd mean there are query IDs running around in the system that the current instance of pg_stat_statements has never heard of. Permanently stored query IDs would also be a headache if you needed to change the fingerprint algorithm, or if there were more than one add-on trying to use the query ID support. I'm inclined to think that the most useful behavior is to teach the rewriter to copy queryId from the original query into all the Queries generated by rewrite. Then, all rules fired by a source query would be lumped into that query for tracking purposes. This might not be the ideal behavior either, but I don't see a better solution. * The patch injects the query ID calculation code by redefining parse_analyze and parse_analyze_varparams as hookable functions and then getting into those hooks. I don't find this terribly sane either. pg_stat_statements has no interest in the distinction between those two methods of getting into parse analysis. Perhaps more to the point, those are not the only two ways of getting into parse analysis: some places call transformTopLevelStmt directly, for instance pg_analyze_and_rewrite_params. While it might be that the code paths that do that are not of interest for fingerprinting queries, it's far from obvious that these two are the correct and only places to do such fingerprinting. I think that if we are going to take the attitude that we only care about fingerprinting queries that come in from the client, then we ought to call the fingerprinting code in the client-message-processing routines in postgres.c. But in that case we need to be a little clearer about what we are doing with unfingerprinted queries. Alternatively, we might take the position that we want to fingerprint every Query struct, but in that case the existing hooks are clearly insufficient. This seems to boil down to what you want to have happen with queries created/executed inside functions, which is something I don't recall being discussed. Either way, I think we'd be a lot better advised to define a single hook "post_parse_analysis_hook" and make the core code responsible for calling it at the appropriate places, rather than supposing that the contrib module knows exactly which core functions ought to be the places to do it. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Mon, Mar 19, 2012 at 08:48:07PM +, Peter Geoghegan wrote: > On 19 March 2012 19:55, Peter Eisentraut wrote: > > If someone wanted to bite the bullet and do the work, I think we could > > move to a Perl/TAP-based test suite (not pgTAP, but Perl and some fairly > > standard Test::* modules) and reduce that useless reformatting work and > > test more interesting things. Just a thought ... > > I think that that is a good idea. However, I am not a Perl hacker, > though this is the second time that that has left me at a disadvantage > when working on Postgres, so I think it's probably time to learn a > certain amount. My blog entry on this topic might be helpful: http://momjian.us/main/blogs/pgblog/2008.html#October_4_2008_2 -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Sun, Mar 18, 2012 at 7:35 PM, Peter Geoghegan wrote: > On 19 March 2012 01:50, Tom Lane wrote: >> I am *not* a fan of regression tests that try to microscopically test >> every feature in the system. > > I see your point of view. I suppose I can privately hold onto the test > suite, since it might prove useful again. > > I will work on a pg_regress based approach with a reasonably-sized > random subset of about 20 of my existing tests, to provide some basic > smoke testing. This may sound rather tortured, but in the main regression suite there is a .c file that links some stuff into the backend that is then accessed via CREATE FUNCTION to do some special fiddly bits. Could a creative hook be used here to avoid the repetition you are avoiding via Python? (e.g. constant resetting of pg_stat_statements or whatnot). It might sound too much like changing the system under test, but I think it would still retain most of the value. I also do like the pg_regress workflow in general, although clearly it cannot do absolutely everything. Running and interpreting the results of your tests was not hard, but it was definitely *different* which could be a headache if one-off testing frameworks proliferate. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 19 March 2012 19:55, Peter Eisentraut wrote: > If someone wanted to bite the bullet and do the work, I think we could > move to a Perl/TAP-based test suite (not pgTAP, but Perl and some fairly > standard Test::* modules) and reduce that useless reformatting work and > test more interesting things. Just a thought ... I think that that is a good idea. However, I am not a Perl hacker, though this is the second time that that has left me at a disadvantage when working on Postgres, so I think it's probably time to learn a certain amount. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On mån, 2012-03-19 at 08:59 +, Greg Stark wrote: > The other problem with this approach is that it's hard to keep a huge > test suite 100% clean. Changes inevitably introduce behaviour changes > that cause some of the tests to fail. I think we are used to that because of the way pg_regress works. When you have a better test infrastructure that tests actual functionality rather than output formatting, this shouldn't be the case (nearly as much). If someone wanted to bite the bullet and do the work, I think we could move to a Perl/TAP-based test suite (not pgTAP, but Perl and some fairly standard Test::* modules) and reduce that useless reformatting work and test more interesting things. Just a thought ... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On mån, 2012-03-19 at 02:35 +, Peter Geoghegan wrote: > I see your point of view. I suppose I can privately hold onto the test > suite, since it might prove useful again. I would still like to have those tests checked in, but not run by default, in case someone wants to hack on this particular feature again. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 19 March 2012 01:50, Tom Lane wrote: > I am *not* a fan of regression tests that try to microscopically test > every feature in the system. I see your point of view. I suppose I can privately hold onto the test suite, since it might prove useful again. I will work on a pg_regress based approach with a reasonably-sized random subset of about 20 of my existing tests, to provide some basic smoke testing. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan writes: > On 19 March 2012 00:10, Andrew Dunstan wrote: >> Why exactly does this feature need particularly to have script-driven >> regression test generation when others don't? > It's not that it needs it, so much as that it is possible to provide > coverage for much of the code with black-box testing. In the case of > most of the hundreds of tests, I can point to a particular piece of > code that is being tested, that was written *after* the test was. Well, basically what you're saying is that you did test-driven development, which is fine. However, that does not mean that those same tests are ideal for ongoing regression testing. What we want from a regression test these days is primarily (a) portability testing, ie does the feature work on platforms other than yours?, and (b) early warning if someone breaks it down the road. In most cases, fairly coarse testing is enough to catch drive-by breakage; and when it's not enough, like as not the breakage is due to something you never thought about originally and thus never tested for, so you'd not have caught it anyway. I am *not* a fan of regression tests that try to microscopically test every feature in the system. Sure you should do that when initially developing a feature, but it serves little purpose to do it over again every time any other developer runs the regression tests for the foreseeable future. That road leads to a regression suite that's so voluminous that it takes too long to run and developers start to avoid running it, which is counterproductive. For an example in our own problem space look at mysql, whose regression tests take well over an hour to run on a fast box. So they must be damn near bug-free right? Uh, not so much, and I think the fact that developers can't easily run their test suite is not unrelated to that. So what I'd rather see is a small set of tests that are designed to do a smoke-test of functionality and then exercise any interfaces to the rest of the system that seem likely to break. Over time we might augment that, when we find particular soft spots as a result of previously undetected bugs. But sheer volume of tests is not a positive IMO. As for the scripted vs raw-SQL-in-pg_regress question, I'm making the same point as Andrew: only the pg_regress method is likely to get run nearly everywhere, which means that the scripted approach is a FAIL so far as the portability-testing aspect is concerned. Lastly, even given that we were willing to accept a scripted set of tests, I'd want to see it in perl not python. Perl is the project standard; I see no reason to expect developers to learn two different scripting languages to work on PG. (There might be a case for accepting python-scripted infrastructure for pl/python, say, but not for components that are 100% unrelated to python.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 19 March 2012 00:10, Andrew Dunstan wrote: > If your tests are that voluminous then maybe they are not what we're looking > for anyway. As Tom noted: > > > IMO the objective of a regression test is not to memorialize every single > case the code author thought about during development. ISTM it would not > take very many cases to have reasonable code coverage. Fair enough. > Why exactly does this feature need particularly to have script-driven > regression test generation when others don't? It's not that it needs it, so much as that it is possible to provide coverage for much of the code with black-box testing. In the case of most of the hundreds of tests, I can point to a particular piece of code that is being tested, that was written *after* the test was. Doing this with pg_regress the old-fashioned way is going to be incredibly verbose. I'm all for doing script-generation of pg_regress tests in a well-principled way, and I'm happy to take direction from others as to what that should look like. I know that for the most part the tests provide coverage for discrete units of functionality, and so add value. If they add value, why not include them? Tests are supposed to be comprehensive. If that inconveniences you, by slowing down the buildfarm for questionable benefits, maybe it would be okay to have some tests not run automatically, even if that did make them "next door to useless" in Tom's estimation. There could be a more limited set of conventional pg_regress tests that are run automatically, plus more comprehensive tests that are run less frequently, typically only as it becomes necessary to alter pg_stat_statements to take account of those infrequent changes (typically additions) to the query tree. We have tests that ensure that header files don't contain C++ keywords, and nominally promise to not do so, and they are not run automatically. I don't see the sense in requiring that tests should be easy to run, while also aspiring to have tests that are as useful and comprehensive as possible. It seems like the code should dictate the testing infrastructure, and not the other way around. Part of the reason why I'm resistant to reducing the number of tests is that it seems to me that excluding some tests but not others would be quite arbitrary. It is not the case that some tests are clearly more useful than others (except for the fuzz testing stuff, which probably isn't all that useful). -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 03/18/2012 07:46 PM, Peter Geoghegan wrote: On 18 March 2012 22:46, Andrew Dunstan wrote: If you want to generate the tests using some tool, then use whatever works for you, be it Python or Perl or Valgol, but ideally what is committed (and this what should be in your patch) will be the SQL output of that, not the generator plus input. The reason that I'd prefer to use Perl or even Python to generate pg_regress input, and then have that infrastructure committed is because it's a lot more natural and succint to deal with the problem that way. I would have imagined that a patch that repeats the same boilerplate again and again, to test almost every minor facet of normalisation would be frowned upon. However, if you prefer that, it can easily be accommodated. If your tests are that voluminous then maybe they are not what we're looking for anyway. As Tom noted: IMO the objective of a regression test is not to memorialize every single case the code author thought about during development. ISTM it would not take very many cases to have reasonable code coverage. Why exactly does this feature need particularly to have script-driven regression test generation when others don't? If this is a general pattern that people want to follow, then maybe we need to plan and support it rather than just add a random test generation script here and there. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 18 March 2012 22:46, Andrew Dunstan wrote: > If you want to generate the tests using some tool, then use whatever works > for you, be it Python or Perl or Valgol, but ideally what is committed (and > this what should be in your patch) will be the SQL output of that, not the > generator plus input. The reason that I'd prefer to use Perl or even Python to generate pg_regress input, and then have that infrastructure committed is because it's a lot more natural and succint to deal with the problem that way. I would have imagined that a patch that repeats the same boilerplate again and again, to test almost every minor facet of normalisation would be frowned upon. However, if you prefer that, it can easily be accommodated. The best approach might be to commit the output of the Python script as well as the python script itself, with some clean-up work. That way, no one is actually required to run the Python script themselves as part of a standard build, and so they have no basis to complain about additional dependencies. We can run the regression tests from the buildfarm without any additional infrastructure to invoke the python script to generate the pg_regress tests each time. When time comes to change the representation of the query tree, which is not going to be that frequent an event, but will occur every once in a while, the author of the relevant patch should think to add some tests to my existing set, and verify that they pass. That's going to be made a lot easier by having them edit a file that expresses the problem in terms whether two queries should be equivalent or distinct, or what a given query's final canonicalised representation should look like, all with minimal boilerplate. I'm only concerned with making the patch as easy as possible to maintain. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 03/18/2012 06:12 PM, Peter Geoghegan wrote: On 18 March 2012 16:13, Tom Lane wrote: Is there a really strong reason why adequate regression testing isn't possible in a plain-vanilla pg_regress script? A quick look at the script says that it's just doing some SQL commands and then checking the results of queries on the pg_stat_statements views. Admittedly the output would be bulkier in pg_regress, which would mean that we'd not likely want several hundred test cases. But IMO the objective of a regression test is not to memorialize every single case the code author thought about during development. ISTM it would not take very many cases to have reasonable code coverage. Hmm. It's difficult to have much confidence that a greatly reduced number of test cases ought to provide sufficient coverage. I don't disagree with your contention, I just don't know how to judge this matter. Given that there isn't really a maintenance burden with regression tests, I imagine that that makes it compelling to be much more inclusive. The fact that we rely on there being no concurrent queries might have to be worked around for parallel scheduled regression tests, such as by doing everything using a separate database, with that database oid always in the predicate of the query checking the pg_stat_statements view. I probably would have written the tests in Perl in the first place, but I don't know Perl. These tests existed in some form from day 1, as I followed a test-driven development methodology, and needed to use a language that I could be productive in immediately. There is probably no reason why they cannot be re-written in Perl, but spit out pg_regress tests, compacting the otherwise-verbose pg_regress input. Should I cut my teeth on Perl by writing the tests to do so? How might this be integrated with the standard regression tests, if that's something that is important? A pg_regress script doesn't require any perl. It's pure SQL. It is perfectly possible to make a single test its own group in a parallel schedule, and this is done now for a number of cases. See src/test/regress/parallel_schedule. Regression tests run in their own database set up for the purpose. You should be able to restrict the regression queries to only the current database. If you want to generate the tests using some tool, then use whatever works for you, be it Python or Perl or Valgol, but ideally what is committed (and this what should be in your patch) will be the SQL output of that, not the generator plus input. Tests built that way get automatically run by the buildfarm. Tests that don't use the standard testing framework don't. You need a *really* good reason, therefore, not to do it that way. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 18 March 2012 16:13, Tom Lane wrote: > Is there a really strong reason why adequate regression testing isn't > possible in a plain-vanilla pg_regress script? A quick look at the > script says that it's just doing some SQL commands and then checking the > results of queries on the pg_stat_statements views. Admittedly the > output would be bulkier in pg_regress, which would mean that we'd not > likely want several hundred test cases. But IMO the objective of a > regression test is not to memorialize every single case the code author > thought about during development. ISTM it would not take very many > cases to have reasonable code coverage. Hmm. It's difficult to have much confidence that a greatly reduced number of test cases ought to provide sufficient coverage. I don't disagree with your contention, I just don't know how to judge this matter. Given that there isn't really a maintenance burden with regression tests, I imagine that that makes it compelling to be much more inclusive. The fact that we rely on there being no concurrent queries might have to be worked around for parallel scheduled regression tests, such as by doing everything using a separate database, with that database oid always in the predicate of the query checking the pg_stat_statements view. I probably would have written the tests in Perl in the first place, but I don't know Perl. These tests existed in some form from day 1, as I followed a test-driven development methodology, and needed to use a language that I could be productive in immediately. There is probably no reason why they cannot be re-written in Perl, but spit out pg_regress tests, compacting the otherwise-verbose pg_regress input. Should I cut my teeth on Perl by writing the tests to do so? How might this be integrated with the standard regression tests, if that's something that is important? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan writes: > Is there anything that I could be doing to help bring this patch > closer to a committable state? Sorry, I've not actually looked at that patch yet. I felt I should push on Andres' CTAS patch first, since that's blocking progress on the command triggers patch. > I'm thinking of the tests in particular > - do you suppose it's acceptable to commit them more or less as-is? If they rely on having python, that's a 100% guaranteed rejection in my opinion. It's difficult enough to sell people on incremental additions of perl dependencies to the build/test process. Bringing in an entire new scripting language seems like a nonstarter. I suppose we could commit such a thing as an appendage that doesn't get run in standard builds, but then I see little point in it at all. Tests that don't get run regularly are next door to useless. Is there a really strong reason why adequate regression testing isn't possible in a plain-vanilla pg_regress script? A quick look at the script says that it's just doing some SQL commands and then checking the results of queries on the pg_stat_statements views. Admittedly the output would be bulkier in pg_regress, which would mean that we'd not likely want several hundred test cases. But IMO the objective of a regression test is not to memorialize every single case the code author thought about during development. ISTM it would not take very many cases to have reasonable code coverage. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Is there anything that I could be doing to help bring this patch closer to a committable state? I'm thinking of the tests in particular - do you suppose it's acceptable to commit them more or less as-is? The standard for testing contrib modules seems to be a bit different, as there is a number of other cases where an impedance mistmatch with pg_regress necessitates doing things differently. So, the sepgsql tests, which I understand are mainly to test the environment that the module is being built for rather than the code itself, are written as a shellscript than uses various selinux tools. There is also a Perl script that uses DBD::Pg to benchmark intarray, for example. Now that we have a defacto standard python driver, something that we didn't have a couple of years ago, it probably isn't terribly unreasonable to keep the tests in Python. They'll still probably need some level of clean-up, to cut back on some of the tests that are redundant. Some of the tests are merely fuzz tests, which are perhaps a bit questionable. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan writes: > I probably should have exposed the query_id directly in the > pg_stat_statements view, perhaps as "query_hash". FWIW, I think that's a pretty bad idea; the hash seems to me to be strictly an internal matter. Given the sponginess of its definition I don't really want it exposed to users. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 2 March 2012 20:10, Tom Lane wrote: > I do intend to take this one up in due course I probably should have exposed the query_id directly in the pg_stat_statements view, perhaps as "query_hash". The idea of that would be to advertise the potential non-uniqueness of the value - a collision is *extremely* unlikely (as I've previously calculated), but we cannot preclude the possibility, and as such it isn't *really* usable as a primary key. BTW, even if there is a collision, we at least know that there can't be a situation where one user's query entry gets spurious statistics from the execution of some other user's, or one database gets statistics from another, since their corresponding oid values separately form part of the dynahash key, alongside query_id. The other reason why I'd like to do this is that I'd like to build on this work for 9.3, and add a new column - plan_hash. When a new mode, pg_stat_statements.plan_hash (or somesuch) is disabled (as it is by default), this is always null, and we get the same 9.2 behaviour. When it is enabled, however, all existing entries are invalidated, for a clean slate. We then start hashing both the query tree *and* the query plan. It's a whole lot less useful if we only hash the latter. Now, entries within the view use the plan_hash as their key (or maybe a composite of query_hash and plan_hash). This often results in entries with duplicate query_hash values, as the planner generates different plans for equivalent queries, but that doesn't matter; you can easily write an aggregate query with a "GROUP BY query_hash" clause if that's what you happen to want to see. When this optional mode is enabled, at that point we'd probably also separately instrument planning time, as recently proposed by Fujii. Does that seem like an interesting idea? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 03/05/2012 05:12 AM, Simon Riggs wrote: On Sat, Mar 3, 2012 at 12:01 AM, Robert Haas wrote: On Fri, Mar 2, 2012 at 4:56 PM, Simon Riggs wrote: Checksums patch isn't sucking much attention at all but admittedly there are some people opposed to the patch that want to draw out the conversation until the patch is rejected, Wow. Sounds like a really shitty thing for those people to do - torpedoing a perfectly good patch for no reason. You've explained to me how you think I do that elsewhere and how that annoyed you, so I think that topic deserves discussion at the developers meeting to help us understand one another rather than perpetuate this. No matter how much we occasionally annoy each other, I think we all need to accept that we're all dealing in good faith. Suggestions to the contrary are ugly, have no foundation in fact that I'm aware of, and reflect badly on our community. Postgres has a well deserved reputation for not having the sort of public bickering that has caused people to avoid certain other projects. Please keep it that way. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Sat, Mar 3, 2012 at 12:01 AM, Robert Haas wrote: > On Fri, Mar 2, 2012 at 4:56 PM, Simon Riggs wrote: >> Checksums patch isn't sucking much attention at all but admittedly >> there are some people opposed to the patch that want to draw out the >> conversation until the patch is rejected, > > Wow. Sounds like a really shitty thing for those people to do - > torpedoing a perfectly good patch for no reason. You've explained to me how you think I do that elsewhere and how that annoyed you, so I think that topic deserves discussion at the developers meeting to help us understand one another rather than perpetuate this. > I have an alternative theory, though: they have sincere objections and > don't accept your reasons for discounting those objections. That's exactly the problem though and the discussion on it is relevant here. Nobody thinks objections on this patch, checksums or others are made insincerely. It's what happens next that matters. The question should be about acceptance criteria. What do we need to do to get something useful committed? Without a clear set of criteria for resolution we cannot move forward swiftly enough to do useful things. My thoughts are always about salvaging what we can, trying to find a way through the maze of objections and constraints not just black/white decisions based upon the existence of an objection, as if that single point trumps any other consideration and blocks all possibilities. So there is a clear difference between an objection to any progress on a topic ("I sincerely object to the checksum patch"), and a technical objection to taking a particular course of action ("We shouldn't use bits x1..x3 because"). The first is not viable, however sincerely it is made, because it leaves the author with no way of resolving things and it also presumes that the patch only exists in one version and that the author is somehow refusing to make agreed changes. Discussion started *here* because it was said "Person X is trying to force patch Y thru", which is true - but that doesn't necessarily mean the version of the patch that current objections apply to, only that the author has an equally sincere wish to do something useful. The way forwards here and elsewhere is to list out the things we can't do and list out the things that must change - a clear list of acceptance criteria. If we do that as early as possible we give the author a good shot at being able to make those changes in time to commit something useful. Again, only *something* useful: the full original vision is not always possible. In summary: "What can be done in this release, given the constraints discussed?" So for Peter's patch - what do we need to do to allow some/all of this to be committed? And for the checksum patch please go back to the checksum thread and list out all the things you consider unresolved. In some cases, resolutions have been suggested but not yet implemented so it would help if those are either discounted now before they are written, or accepted in principle to allow work to proceed. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Fri, Mar 2, 2012 at 4:56 PM, Simon Riggs wrote: > Checksums patch isn't sucking much attention at all but admittedly > there are some people opposed to the patch that want to draw out the > conversation until the patch is rejected, Wow. Sounds like a really shitty thing for those people to do - torpedoing a perfectly good patch for no reason. I have an alternative theory, though: they have sincere objections and don't accept your reasons for discounting those objections. > I'm not sure how this topic is even raised here, since the patches are > wholly and completely separate, apart from the minor and irrelevant > point that the patch authors both work for 2ndQuadrant. If that > matters at all, I'll be asking how and why. It came up because Josh pointed out that this patch is, in his opinion, in better shape than the checksum patch. I don't believe anyone's employment situation comes into it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Fri, Mar 2, 2012 at 8:13 PM, Tom Lane wrote: > Josh Berkus writes: >> This is exactly why I'm not keen on checksums for 9.2. We've reached >> the point where the attention on the checksum patch is pushing aside >> other patches which are more ready and have had more work. > > IMO the reason why it's sucking so much attention is precisely that it's > not close to being ready to commit. But this is well off topic for the > thread we're on. If you want to propose booting checksums from > consideration for 9.2, let's have that discussion on the checksum > thread. Checksums patch isn't sucking much attention at all but admittedly there are some people opposed to the patch that want to draw out the conversation until the patch is rejected, but that's not the same thing. The main elements of the patch have been working for around 7 weeks by now. I'm not sure how this topic is even raised here, since the patches are wholly and completely separate, apart from the minor and irrelevant point that the patch authors both work for 2ndQuadrant. If that matters at all, I'll be asking how and why. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Josh Berkus writes: > This is exactly why I'm not keen on checksums for 9.2. We've reached > the point where the attention on the checksum patch is pushing aside > other patches which are more ready and have had more work. IMO the reason why it's sucking so much attention is precisely that it's not close to being ready to commit. But this is well off topic for the thread we're on. If you want to propose booting checksums from consideration for 9.2, let's have that discussion on the checksum thread. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Robert Haas writes: > On Fri, Mar 2, 2012 at 12:48 PM, Tom Lane wrote: >> We'll get to it in due time. In case you haven't noticed, there's a lot >> of stuff in this commitfest. And I don't follow the logic that says >> that because Simon is trying to push through a not-ready-for-commit >> patch we should drop our standards for other patches. > I don't follow that logic either, but I also feel like this CommitFest > is dragging on and on. Unless you -- or someone -- are prepared to > devote a lot more time to this, "due time" is not going to arrive any > time in the forseeable future. We're currently making progress at a > rate of maybe 4 patches a week, at which rate we're going to finish > this CommitFest in May. And that might be generous, because we've > been disproportionately knocking off the easy ones. Do we have any > kind of a plan for, I don't know, bringing this to closure on some > reasonable time frame? Well, personally I was paying approximately zero attention to the commitfest for most of February, because I was occupied with trying to get back-branch releases out, as well as some non-Postgres matters. CF items are now back to the head of my to-do queue; you may have noticed that I'm busy with Korotkov's array stats patch. I do intend to take this one up in due course (although considering it's not marked Ready For Committer yet, I don't see that it deserves time ahead of those that are). As for when we'll be done with the CF, I dunno, but since it's the last one for this release cycle I didn't think that we'd be arbitrarily closing it on any particular schedule. It'll be done when it's done. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
> We'll get to it in due time. In case you haven't noticed, there's a lot > of stuff in this commitfest. And I don't follow the logic that says > that because Simon is trying to push through a not-ready-for-commit > patch we should drop our standards for other patches. What I'm pointing out is that Peter shouldn't even be talking about cutting functionality from an apparently-ready-for-committer patch in order to yield way to a patch about which people are still arguing specification. This is exactly why I'm not keen on checksums for 9.2. We've reached the point where the attention on the checksum patch is pushing aside other patches which are more ready and have had more work. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Fri, Mar 2, 2012 at 5:48 PM, Tom Lane wrote: > Josh Berkus writes: >>> It would probably be prudent to concentrate on getting the core >>> infrastructure committed first. That way, we at least know that if >>> this doesn't get into 9.2, we can work on getting it into 9.3 knowing >>> that once committed, people won't have to wait over a year at the very > >> I don't see why we can't commit the whole thing. This is way more ready >> for prime-time than checksums. > > We'll get to it in due time. In case you haven't noticed, there's a lot > of stuff in this commitfest. And I don't follow the logic that says > that because Simon is trying to push through a not-ready-for-commit > patch we should drop our standards for other patches. Hmm, not deaf you know. I would never push through a patch that isn't ready for commit. If I back something it is because it is ready for use in production by PostgreSQL users, in my opinion. I get burned just as much, if not more, than others if that's a bad decision, so its not given lightly. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Fri, Mar 2, 2012 at 12:48 PM, Tom Lane wrote: > Josh Berkus writes: >>> It would probably be prudent to concentrate on getting the core >>> infrastructure committed first. That way, we at least know that if >>> this doesn't get into 9.2, we can work on getting it into 9.3 knowing >>> that once committed, people won't have to wait over a year at the very > >> I don't see why we can't commit the whole thing. This is way more ready >> for prime-time than checksums. > > We'll get to it in due time. In case you haven't noticed, there's a lot > of stuff in this commitfest. And I don't follow the logic that says > that because Simon is trying to push through a not-ready-for-commit > patch we should drop our standards for other patches. I don't follow that logic either, but I also feel like this CommitFest is dragging on and on. Unless you -- or someone -- are prepared to devote a lot more time to this, "due time" is not going to arrive any time in the forseeable future. We're currently making progress at a rate of maybe 4 patches a week, at which rate we're going to finish this CommitFest in May. And that might be generous, because we've been disproportionately knocking off the easy ones. Do we have any kind of a plan for, I don't know, bringing this to closure on some reasonable time frame? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Josh Berkus writes: >> It would probably be prudent to concentrate on getting the core >> infrastructure committed first. That way, we at least know that if >> this doesn't get into 9.2, we can work on getting it into 9.3 knowing >> that once committed, people won't have to wait over a year at the very > I don't see why we can't commit the whole thing. This is way more ready > for prime-time than checksums. We'll get to it in due time. In case you haven't noticed, there's a lot of stuff in this commitfest. And I don't follow the logic that says that because Simon is trying to push through a not-ready-for-commit patch we should drop our standards for other patches. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
> It would probably be prudent to concentrate on getting the core > infrastructure committed first. That way, we at least know that if > this doesn't get into 9.2, we can work on getting it into 9.3 knowing > that once committed, people won't have to wait over a year at the very I don't see why we can't commit the whole thing. This is way more ready for prime-time than checksums. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 1 March 2012 22:09, Josh Berkus wrote: > On 3/1/12 1:57 PM, Daniel Farina wrote: >> On Thu, Mar 1, 2012 at 1:27 PM, Peter Geoghegan >> wrote: >>> My expectation is that this feature will make life a lot >>> easier for a lot of Postgres users. >> >> Yes. It's hard to overstate the apparent utility of this feature in >> the general category of visibility and profiling. > > More importantly, this is what pg_stat_statements *should* have been in > 8.4, and wasn't. It would probably be prudent to concentrate on getting the core infrastructure committed first. That way, we at least know that if this doesn't get into 9.2, we can work on getting it into 9.3 knowing that once committed, people won't have to wait over a year at the very least to be able to use the feature. The footprint of such a change is quite small: src/backend/nodes/copyfuncs.c |2 + src/backend/nodes/equalfuncs.c |4 + src/backend/nodes/outfuncs.c |6 + src/backend/nodes/readfuncs.c |5 + src/backend/optimizer/plan/planner.c |1 + src/backend/parser/analyze.c | 37 +- src/backend/parser/parse_coerce.c | 12 +- src/backend/parser/parse_param.c | 11 +- src/include/nodes/parsenodes.h |3 + src/include/nodes/plannodes.h |2 + src/include/parser/analyze.h | 12 + src/include/parser/parse_node.h|3 +- That said, I believe that the patch is pretty close to a commitable state as things stand, and that all that is really needed is for a committer familiar with the problem space to conclude the work started by Daniel and others, adding: contrib/pg_stat_statements/pg_stat_statements.c| 1420 ++- -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 3/1/12 1:57 PM, Daniel Farina wrote: > On Thu, Mar 1, 2012 at 1:27 PM, Peter Geoghegan wrote: >> My expectation is that this feature will make life a lot >> easier for a lot of Postgres users. > > Yes. It's hard to overstate the apparent utility of this feature in > the general category of visibility and profiling. More importantly, this is what pg_stat_statements *should* have been in 8.4, and wasn't. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Thu, Mar 1, 2012 at 1:27 PM, Peter Geoghegan wrote: > My expectation is that this feature will make life a lot > easier for a lot of Postgres users. Yes. It's hard to overstate the apparent utility of this feature in the general category of visibility and profiling. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 1 March 2012 00:48, Alvaro Herrera wrote: > I'm curious about the LeafNode stuff. Is this something that could be > done by expression_tree_walker? I'm not completely familiar with it so > maybe there's some showstopper such as some node tags not being > supported, or maybe it just doesn't help. But I'm curious. Good question. The LeafNode function (which is a bit of a misnomer, as noted in a comment) looks rather like a walker function, as appears in the example in a comment in nodeFuncs.c: * expression_tree_walker() is designed to support routines that traverse * a tree in a read-only fashion (although it will also work for routines * that modify nodes in-place but never add/delete/replace nodes). * A walker routine should look like this: * * bool my_walker (Node *node, my_struct *context) * { * if (node == NULL) * return false; * // check for nodes that special work is required for, eg: * if (IsA(node, Var)) * { * ... do special actions for Var nodes * } * else if (IsA(node, ...)) * { * ... do special actions for other node types * } * // for any node type not specially processed, do: * return expression_tree_walker(node, my_walker, (void *) context); * } My understanding is that the expression-tree walking support is mostly useful for the majority of walker code, which only cares about a small subset of nodes, and hopes to avoid including boilerplate code just to walk those other nodes that it's actually disinterested in. This code, unlike most clients of expression_tree_walker(), is pretty much interested in everything, since its express purpose is to fingerprint all possible query trees. Another benefit of expression_tree_walker is that if you miss a certain node being added, (say a FuncExpr-like node), you get to automatically have that node walked over to walk to the nodes that you do in fact care about (such as those within this new nodes args List). That makes perfect sense in the majority of cases, but here you've already missed the fields within this new node that FuncExpr itself lacks, so you're already finger-printing inaccurately. I suppose you could still at least get the nodetag and still have a warning about the fingerprinting being inadequate by going down the expression_tree_walker path, but I'm inclined to wonder if it you aren't just better of directly walking the tree, if only to encourage the idea that this code needs to be maintained over time, and to cut down on the little extra bit of indirection that that imposes. It's not going to be any sort of burden to maintain it - it currently stands at a relatively meagre 800 lines of code for everything to do with tree walking - and the code that will have to be added with new nodes or refactored along with the existing tree structure is going to be totally trivial. All of that said, I wouldn't mind making LeafNode into a walker, if that approach is judged to be better, and you don't mind documenting the order in which the tree is walked as deterministic, because the order now matters in a way apparently not really anticipated by expression_tree_walker, though that's probably not a problem. My real concern now is that it's March 1st, and the last commitfest may end soon. Even though this patch has extensive regression tests, has been floating around for months, and, I believe, will end up being a timely and important feature, a committer has yet to step forward to work towards this patch getting committed. Can someone volunteer, please? My expectation is that this feature will make life a lot easier for a lot of Postgres users. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
I'm curious about the LeafNode stuff. Is this something that could be done by expression_tree_walker? I'm not completely familiar with it so maybe there's some showstopper such as some node tags not being supported, or maybe it just doesn't help. But I'm curious. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Mon, Feb 27, 2012 at 4:26 AM, Peter Geoghegan wrote: > This does not appear to have any user-visible effect on caret position > for all variations in coercion syntax, while giving me everything that > I need. I had assumed that we were relying on things being this way, > but apparently this is not the case. The system is correctly blaming > the coercion token when it finds the coercion is at fault, and the > const token when it finds the Const node at fault, just as it did > before. So this looks like a case of removing what amounts to dead > code. To shed some light on that hypothesis, attached is a patch whereby I use 'semantic analysis by compiler error' to show the extent of the reach of the changes by renaming (codebase-wide) the Const node's location symbol. The scope whereby the error token will change position is small and amenable to analysis. I don't see a problem, nor wide-reaching consequences. As Peter says: probably dead code. Note that the cancellation of the error position happens very soon, after an invocation of stringTypeDatum (on two sides of a branch). Pre and post-patch is puts the carat at the beginning of the constant string, even in event there is a failure to parse it properly to the destined type. -- fdr Straw-man-to-show-the-effects-of-the-change-to-const.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 27 February 2012 06:23, Tom Lane wrote: > I think that what Peter is on about in > http://archives.postgresql.org/pgsql-hackers/2012-02/msg01152.php > is the question of what location to use for the *result* of > 'literal string'::typename, assuming that the type's input function > doesn't complain. Generally we consider that we should use the > leftmost token's location for the location of any expression composed > of more than one input token. This is of course the same place for > 'literal string'::typename, but not for the alternate syntaxes > typename 'literal string' and cast('literal string' as typename). > I'm not terribly impressed by the proposal to put in an arbitrary > exception to that general rule for the convenience of this patch. Now, you don't have to be. It was a mistake on my part to bring the current user-visible behaviour into this. I don't see that there is necessarily a tension between your position that we should blame the leftmost token's location, and my contention that the Const "location" field shouldn't misrepresent the location of certain Consts involved in coercion post-analysis. Let me put that in concrete terms. In my working copy of the patch, I have made some more changes to the core system (mostly reverting things that turned out to be unnecessary), but I have also made the following change: *** a/src/backend/parser/parse_coerce.c --- b/src/backend/parser/parse_coerce.c *** coerce_type(ParseState *pstate, Node *no *** 280,293 newcon->constlen = typeLen(targetType); newcon->constbyval = typeByVal(targetType); newcon->constisnull = con->constisnull; ! /* Use the leftmost of the constant's and coercion's locations */ ! if (location < 0) ! newcon->location = con->location; ! else if (con->location >= 0 && con->location < location) ! newcon->location = con->location; ! else ! newcon->location = location; ! /* * Set up to point at the constant's text if the input routine throws * an error. --- 280,286 newcon->constlen = typeLen(targetType); newcon->constbyval = typeByVal(targetType); newcon->constisnull = con->constisnull; ! newcon->location = con->location; /* * Set up to point at the constant's text if the input routine throws * an error. * This does not appear to have any user-visible effect on caret position for all variations in coercion syntax, while giving me everything that I need. I had assumed that we were relying on things being this way, but apparently this is not the case. The system is correctly blaming the coercion token when it finds the coercion is at fault, and the const token when it finds the Const node at fault, just as it did before. So this looks like a case of removing what amounts to dead code. > Especially not when the only reason it's needed is that Peter is > doing the fingerprinting at what is IMO the wrong place anyway. > If he were working on the raw grammar output it wouldn't matter > what parse_coerce chooses to do afterwards. Well, I believe that your reason for preferring to do it at that stage was that we could not capture all of the system's "normalisation smarts", like the fact that the omission of noise words isn't a differentiator, so we might as well not have any. This was because much of it - like the recognition of the equivalence of explicit joins and queries with join conditions in the where clause - occurs within the planner. We can't have it all, so we might as well not have any. My solution here is that we be sufficiently vague about the behaviour of normalisation that the user has no reasonable basis to count on that kind of more advanced reduction occurring. I did very seriously consider hashing the raw parse tree, but I have several practical reasons for not doing so. Whatever way you look at it, hashing there is going to result in more code, that is more ugly. There is no uniform parent node that I can tag with a query_id. There has to be more modifications to the core system so that queryId value is carried around more places and persists for longer. The fact that I'd actually be hashing different structs at different times (that tree is accessed through a Node pointer) would necessitate lots of redundant code that operated on each of the very similar structs in an analogous way. The fact is that waiting until after parse analysis has plenty of things to recommend it, and yes, the fact that we already have working code with extensive regression tests is one of them. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgres
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Robert Haas writes: > I think I agree Tom's position upthread: blaming the coercion seems to > me to make more sense. But if that's what we're trying to do, then > why does parse_coerce() say this? > /* > * Set up to point at the constant's text if the input routine throws > * an error. > */ > /me is confused. There are two cases that are fundamentally different in the eyes of the system: 'literal string'::typename defines a constant of the named type. The string is fed to the type's input routine de novo, that is, it never really had any other type. (Under the hood, it had type UNKNOWN for a short time, but that's an implementation detail.) In this situation it seems appropriate to point at the text string if the input routine doesn't like it, because it is the input string and nothing else that is wrong. On the other hand, when you cast something that already had a known type to some other type, any failure seems reasonable to blame on the cast operator. So in these terms there's a very real difference between what '42'::bigint means and what 42::bigint means --- the latter implies forming an int4 constant and then converting it to int8. I think that what Peter is on about in http://archives.postgresql.org/pgsql-hackers/2012-02/msg01152.php is the question of what location to use for the *result* of 'literal string'::typename, assuming that the type's input function doesn't complain. Generally we consider that we should use the leftmost token's location for the location of any expression composed of more than one input token. This is of course the same place for 'literal string'::typename, but not for the alternate syntaxes typename 'literal string' and cast('literal string' as typename). I'm not terribly impressed by the proposal to put in an arbitrary exception to that general rule for the convenience of this patch. Especially not when the only reason it's needed is that Peter is doing the fingerprinting at what is IMO the wrong place anyway. If he were working on the raw grammar output it wouldn't matter what parse_coerce chooses to do afterwards. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Fri, Feb 24, 2012 at 9:43 AM, Peter Geoghegan wrote: > Tom's point example does not seem to be problematic to me - the cast > *should* blame the 42 const token, as the cast doesn't work as a > result of its representation, which is in point of fact why the core > system blames the Const node and not the coercion one. I think I agree Tom's position upthread: blaming the coercion seems to me to make more sense. But if that's what we're trying to do, then why does parse_coerce() say this? /* * Set up to point at the constant's text if the input routine throws * an error. */ /me is confused. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 21 February 2012 01:48, Tom Lane wrote: > you're proposing to move the error pointer to the "42", and that does > not seem like an improvement, especially not if it only happens when the > cast subject is a simple constant rather than an expression. 2008's commit a2794623d292f7bbfe3134d1407281055acce584 [1] added the following code to parse_coerce.c [2]: /* Use the leftmost of the constant's and coercion's locations */ if (location < 0) newcon->location = con->location; else if (con->location >= 0 && con->location < location) newcon->location = con->location; else newcon->location = location; With that commit, Tom made a special case of both Const and Param nodes, and had them take the leftmost location of the original Const location and the coercion location. Clearly, he judged that the current exact set of behaviours with regard to caret position were optimal. It is my contention that: A. They may not actually be optimal, at least not according to my taste. At the very least, it is a hack to misrepresent the location of Const nodes just so the core system can blame things on Const nodes and have the user see the coercion being at fault. I appreciate that it wouldn't have seemed to matter at the time, but the fact remains. B. The question of where the caret goes in relevant cases - the location of the coercion, or the location of the constant - is inconsequential to the vast majority of Postgres users, if not all, even if the existing behaviour is technically superior according to the prevailing aesthetic. On the other hand, it matters a lot to me that I be able to trust the Const location under all circumstances - I'd really like to not have to engineer a way around this behaviour, because the only way to do that is with tricks with the low-level scanner API, which would be quite brittle. The fact that "select integer '5'" is canonicalised to "select ?" isn't very pretty. That's not the only issue though, as even to get that more limited behaviour lots more code is required, that is more difficult to verify as correct. "Canonicalise one token at each Const location" is a simple and robust approach, if only the core system could be tweaked to make this assumption hold in all circumstances, rather than just the vast majority. Tom's point example does not seem to be problematic to me - the cast *should* blame the 42 const token, as the cast doesn't work as a result of its representation, which is in point of fact why the core system blames the Const node and not the coercion one. For that reason, the constant vs expression thing strikes me as false equivalency. All of that said, I must reiterate that the difference in behaviour strike me as very unimportant, or it would if it was not so important to what I'm trying to do with pg_stat_statements. Can this be accommodated? It might be a matter of changing the core system to blame the coercion node rather than the Const node, if you're determined to preserve the existing behaviour. [1] http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2794623d292f7bbfe3134d1407281055acce584 [2] http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/backend/parser/parse_coerce.c;h=cd9b7b0cfbed03ec74f2cf295e4a7113627d7f72;hp=1244498ffb291b67d35917a6fdddb54b0d8d759d;hb=a2794623d292f7bbfe3134d1407281055acce584;hpb=6734182c169a1ecb74dd8495004e896ee4519adb -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 21 February 2012 01:48, Tom Lane wrote: > you're proposing to move the error pointer to the "42", and that does > not seem like an improvement, especially not if it only happens when the > cast subject is a simple constant rather than an expression. I'm not actually proposing that though. What I'm proposing, quite simply, is that the Const location actually be correct in all circumstances. Now, I can understand why the Coercion node for this query would have its current location starting from the "CAST" part in your second example or would happen to be the same as the Constant in your first, and I'm not questioning that. I'm questioning why the Const node's location need to *always* be the same as that of the Coercion node when pg_stat_statements walks the tree, since I'd have imagined that Postgres has no business blaming the error that you've shown on the Const node. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan writes: > Another look around shows that the CoerceToDomain struct takes its > location from the new Const location in turn, so my dirty little hack > will break the location of the CoerceToDomain struct, giving an > arguably incorrect caret position in some error messages. It would > suit me if MyCoerceToDomain->arg (or the "arg" of a similar node > related to coercion, like CoerceViaIO) pointed to a Const node with, > potentially, and certainly in the case of my original CoerceToDomain > test case, a distinct location to the coercion node itself. Sorry, I'm not following. What about that isn't true already? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan writes: > Here is the single, hacky change I've made just for now to the core > parser to quickly see if it all works as expected: > *** transformTypeCast(ParseState *pstate, Ty > *** 2108,2113 > --- 2108,2116 > if (location < 0) > location = tc->typeName->location; > + if (IsA(expr, Const)) > + location = ((Const*)expr)->location; > + > result = coerce_to_target_type(pstate, expr, inputType, > targetType, > targetTypmod, > > COERCION_EXPLICIT, This does not look terribly sane to me. AFAICS, the main effect of this would be that if you have an error in coercing a literal to some specified type, the error message would point at the literal and not at the cast operator. That is, in examples like these: regression=# select 42::point; ERROR: cannot cast type integer to point LINE 1: select 42::point; ^ regression=# select cast (42 as point); ERROR: cannot cast type integer to point LINE 1: select cast (42 as point); ^ you're proposing to move the error pointer to the "42", and that does not seem like an improvement, especially not if it only happens when the cast subject is a simple constant rather than an expression. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers