Re: [BUG] pg_stat_statements and extended query protocol
On Wed, Apr 05, 2023 at 05:39:35PM -0700, Andres Freund wrote: > Seems like a complicated enough facility to benefit from a test or two? Peter > Eisentraut added support for the extended query protocol to psql, so it > shouldn't be too hard... PQsendQueryGuts() does not split yet the bind/describe phase and the execute phases, so we'd need a couple more libpq APIs to do that, with more tracking of the state we're currently on when looping across multiple execute fetches. My guess is that it is possible to follow a model similar to JDBC here. I don't think that's necessarily complicated, but it is not as straight-forward as it looks. \bind was much more straight-forward than that, as it can feed on a single call of PQsendQueryParams() after saving a set of parameters. An \exec would not completely do that. Attaching one of the scripts I've played with, in a very rusty java with no classes or such, for future reference. Just update CLASSPATH to point to a copy of the JDBC driver, run it with a java command, and then look at rows, query in pg_stat_statements. -- Michael import java.sql.*; //import System.out.format; //import org.postgresql.Driver; public class Test { static final String DB_URL = "jdbc:postgresql://localhost/mydb"; public static void main(String[] args) { Connection conn = null; // Open a connection try { conn = DriverManager.getConnection(DB_URL); conn.setAutoCommit(false); Integer increment = 0; // SELECT query PreparedStatement statement = conn.prepareStatement("select * from pg_class"); statement.setFetchSize(100); ResultSet rs = statement.executeQuery(); while (rs.next()) { increment++; } conn.commit(); statement.close(); System.out.format("SELECT increment " + increment + "\n"); //INSERT RETURNING //CREATE TABLE aa (a int); increment = 0; PreparedStatement insertStmt = conn.prepareStatement("insert into aa select oid from pg_class returning a"); insertStmt.setFetchSize(100); ResultSet insertRs = insertStmt.executeQuery(); while (insertRs.next()) { increment++; } conn.commit(); insertStmt.close(); System.out.format("INSERT RETURNING increment " + increment + "\n"); //INSERT RETURNING //CREATE TABLE bb (a int); increment = 0; PreparedStatement withStmt = conn.prepareStatement("with insert_stmt AS (insert into bb select oid from pg_class returning a) SELECT r1.a, r2.a from insert_stmt as r1, aa as r2;"); withStmt.setFetchSize(100); ResultSet withRs = withStmt.executeQuery(); while (withRs.next()) { increment++; } conn.commit(); insertStmt.close(); System.out.format("INSERT WITH increment " + increment + "\n"); increment = 0; PreparedStatement withStmt2 = conn.prepareStatement("with select_stmt AS (select oid from pg_class) SELECT r1.oid, r2.a from select_stmt as r1, aa as r2;"); withStmt2.setFetchSize(10); ResultSet withRs2 = withStmt2.executeQuery(); while (withRs2.next()) { increment++; } conn.commit(); insertStmt.close(); System.out.format("SELECT WITH increment " + increment + "\n"); } catch (SQLException e) { e.printStackTrace(); try { conn.rollback(); } catch (SQLException ee) { ee.printStackTrace(); } } } } signature.asc Description: PGP signature
Re: [BUG] pg_stat_statements and extended query protocol
On Wed, Apr 05, 2023 at 10:16:19PM +, Imseih (AWS), Sami wrote: > Here is v6. That was my mistake not to zero out the es_total_processed. > I had it in the first version. The update of es_total_processed in standard_ExecutorRun() felt a bit lonely, so I have added an extra comment, ran an indentation, and applied the result. Thanks Sami for the patch, and everyone else for the feedback! -- Michael signature.asc Description: PGP signature
Re: [BUG] pg_stat_statements and extended query protocol
Hi, On 2023-04-06 07:09:37 +0900, Michael Paquier wrote: > I'll look at that again today, potentially apply the fix on HEAD. Seems like a complicated enough facility to benefit from a test or two? Peter Eisentraut added support for the extended query protocol to psql, so it shouldn't be too hard... commit 5b66de3433e Author: Peter Eisentraut Date: 2022-11-15 13:50:27 +0100 psql: Add command to use extended query protocol Greetings, Andres Freund
Re: [BUG] pg_stat_statements and extended query protocol
> Makes sense to me. I'll look at that again today, potentially apply > the fix on HEAD. Here is v6. That was my mistake not to zero out the es_total_processed. I had it in the first version. -- Regards, Sami Imseih Amazon Web Services (AWS) v6-0001-Fix-row-tracking-in-pg_stat_statements.patch Description: v6-0001-Fix-row-tracking-in-pg_stat_statements.patch
Re: [BUG] pg_stat_statements and extended query protocol
On Wed, Apr 05, 2023 at 05:39:13PM -0400, Tom Lane wrote: > v5 seems OK to me except I think CreateExecutorState() should explicitly > zero the new es_total_processed field, alongside zeroing es_processed. > (I realize that the makeNode would have done it already, but our > coding conventions generally run towards not relying on that. This is > mainly for greppability, so you can find where a field is initialized.) Makes sense to me. I'll look at that again today, potentially apply the fix on HEAD. -- Michael signature.asc Description: PGP signature
Re: [BUG] pg_stat_statements and extended query protocol
Michael Paquier writes: > On Wed, Apr 05, 2023 at 04:07:21AM +, Imseih (AWS), Sami wrote: >> Attached is v5 addressing the comments. > Thanks, this should be enough to persist the number of tuples tracked > across multiple ExecutorRun() calls. This looks pretty good to me. v5 seems OK to me except I think CreateExecutorState() should explicitly zero the new es_total_processed field, alongside zeroing es_processed. (I realize that the makeNode would have done it already, but our coding conventions generally run towards not relying on that. This is mainly for greppability, so you can find where a field is initialized.) regards, tom lane
Re: [BUG] pg_stat_statements and extended query protocol
On Wed, Apr 05, 2023 at 04:07:21AM +, Imseih (AWS), Sami wrote: >> - es_processed: number of tuples processed during one ExecutorRun() >> call. >> - es_total_processed: total number of tuples aggregated across all >> ExecutorRun() calls. > > I thought hard about this point and for some reason I did not want to > mention ExecutorRun in the comment. But, I agree with what you suggest. > It's more clear as to the intention of the fields. > > Attached is v5 addressing the comments. Thanks, this should be enough to persist the number of tuples tracked across multiple ExecutorRun() calls. This looks pretty good to me. We should do something about providing more control over that to libpq in the long run, IMO, and have more test coverage, but let's see about that in 17~. -- Michael signature.asc Description: PGP signature
Re: [BUG] pg_stat_statements and extended query protocol
> Doing nothing for calls now is fine by me, though I > agree that this could be improved at some point, as seeing only 1 > rather than N for each fetch depending on the size is a bit confusing. I think we will need to clearly define what "calls" is. Perhaps as mentioned above, we may need separate counters for "calls" vs "fetches". This is definitely a separate thread. > Doesn't this comment at the top of ExecutorRun() need an update? It > seems to me that this comment should mention both es_total_processed Yes, updated in v5. > There is no need for this part in ExecutorFinish(), actually, as long > as we always increment es_total_processed at the end ExecutorRun() for > all the operation types? Ah, correct. I changed that and tested again. > - es_processed: number of tuples processed during one ExecutorRun() > call. > - es_total_processed: total number of tuples aggregated across all > ExecutorRun() calls. I thought hard about this point and for some reason I did not want to mention ExecutorRun in the comment. But, I agree with what you suggest. It's more clear as to the intention of the fields. Attached is v5 addressing the comments. Regards, Sami Imseih Amazon Web Services (AWS) v5-0001-Fix-row-tracking-in-pg_stat_statements.patch Description: v5-0001-Fix-row-tracking-in-pg_stat_statements.patch
Re: [BUG] pg_stat_statements and extended query protocol
On Tue, Apr 04, 2023 at 09:48:17PM +, Imseih (AWS), Sami wrote: > The "calls" tracking is removed from Estate. Unlike v1 however, > I added a check for the operation type. Inside ExecutorRun, > es_total_processed is incremented when the operation is > a SELECT. This check is done for es_processed as well inside > executorRun -> ExecutePlan. I see. This seems in line with ExecutePlan() where es_processed is incremented only for SELECT queries. > For non select operations, es_total_processed is set to > es_processed in executorfinish. This is because the modify > plan nodes set es_processed outside of execMain.c My java is very rusty but (after some time reminding myself how to do a CLASSPATH) I can see the row counts being right for things like SELECT, INSERT RETURNING, WITH queries for both of them and autocommit, and the proposed fix influences SELECT only depending on the fetch size. Doing nothing for calls now is fine by me, though I agree that this could be improved at some point, as seeing only 1 rather than N for each fetch depending on the size is a bit confusing. * There is no return value, but output tuples (if any) are sent to * the destination receiver specified in the QueryDesc; and the number * of tuples processed at the top level can be found in * estate->es_processed. Doesn't this comment at the top of ExecutorRun() need an update? It seems to me that this comment should mention both es_total_processed and es_processed, telling that es_processed is the number of rows processed in a single call, while es_total_processed is the sum of all tuples processed in the ExecutorRun() calls. @@ -441,6 +451,13 @@ standard_ExecutorFinish(QueryDesc *queryDesc) if (queryDesc->totaltime) InstrStopNode(queryDesc->totaltime, 0); + /* +* For non-SELECT operations, es_total_processed will always be +* equal to es_processed. +*/ + if (operation != CMD_SELECT) + queryDesc->estate->es_total_processed = queryDesc->estate->es_processed; There is no need for this part in ExecutorFinish(), actually, as long as we always increment es_total_processed at the end ExecutorRun() for all the operation types? If the portal does not have a store, we would do one ExecutorRun() call per execute fetch. If the portal has a store, we'd do only one ExecutorRun(). Both cases call once ExecutorFinish(), but the finish would happen during the first execute when filling a portal's tuple store, and during the last execute fetch if the portal has no store. This is Tom's point in [1], from what I can see. That seems simpler to me, as well. - uint64 es_processed; /* # of tuples processed */ + uint64 es_processed; /* # of tuples processed for a single +* execution of a portal */ + uint64 es_total_processed; /* # of tuples processed for all +* executions of a portal */ Hmm. This does not reflect completely the reality for non-SELECT statements, no? For SELECT statements, that's correct, because es_processed is reset in standard_ExecutorFinish() each time the backend does an execute fetch thanks to PortalRunSelect(). For non-SELECT statements, the extended query protocol uses the tuples stored in the portal after one execution, meaning that we run through the executor once with both es_processed and es_total_processed set to their final numbers from the start, before any fetches. I would suggest something like that to document both fields: - es_processed: number of tuples processed during one ExecutorRun() call. - es_total_processed: total number of tuples aggregated across all ExecutorRun() calls. At the end, I'm OK with the proposal after a closer look, but I think that we should do a much better job at describing es_processed and es_total_processed in execnodes.h, particularly in the case of a portal holding a store where es_processed may not reflect the number of rows for a single portal execution, and it seems to me that the proposal of incrementing es_total_processed at the end of ExecutorRun() for all commands is simpler, based on what I have tested. [1]: https://www.postgresql.org/message-id/1311773.1680577...@sss.pgh.pa.us -- Michael signature.asc Description: PGP signature
Re: [BUG] pg_stat_statements and extended query protocol
> I was looking back at this thread, and the suggestion to use one field > in EState sounds fine to me. Sami, would you like to send a new > version of the patch (simplified version based on v1)? Here is v4. The "calls" tracking is removed from Estate. Unlike v1 however, I added a check for the operation type. Inside ExecutorRun, es_total_processed is incremented when the operation is a SELECT. This check is done for es_processed as well inside executorRun -> ExecutePlan. For non select operations, es_total_processed is set to es_processed in executorfinish. This is because the modify plan nodes set es_processed outside of execMain.c Regards, Sami Imseih Amazon Web Services (AWS) v4-0001-Fix-row-tracking-in-pg_stat_statements.patch Description: v4-0001-Fix-row-tracking-in-pg_stat_statements.patch
Re: [BUG] pg_stat_statements and extended query protocol
On Tue, Apr 04, 2023 at 03:29:07AM +, Imseih (AWS), Sami wrote: >> We clearly do need to fix the >> reported rowcount for cases where ExecutorRun is invoked more than >> once per ExecutorEnd call; but I think that's sufficient. > > Sure, the original proposed fix, but with tracking the es_total_processed > only in Estate should be enough for now. I was looking back at this thread, and the suggestion to use one field in EState sounds fine to me. Sami, would you like to send a new version of the patch (simplified version based on v1)? -- Michael signature.asc Description: PGP signature
Re: [BUG] pg_stat_statements and extended query protocol
> Maybe, but is there any field demand for that? I don't think there is. > We clearly do need to fix the > reported rowcount for cases where ExecutorRun is invoked more than > once per ExecutorEnd call; but I think that's sufficient. Sure, the original proposed fix, but with tracking the es_total_processed only in Estate should be enough for now. Regards, Sami Imseih Amazon Web Services (AWS)
Re: [BUG] pg_stat_statements and extended query protocol
"Imseih (AWS), Sami" writes: > I wonder if the right answer here is to track fetches as > a separate counter in pg_stat_statements, in which fetch > refers to the number of times a portal is executed? Maybe, but is there any field demand for that? IMV, the existing behavior is that we count one "call" per overall query execution (that is, per ExecutorEnd invocation). The argument that that's a bug and we should change it seems unsupportable to me, and even the argument that we should also count ExecutorRun calls seems quite lacking in evidence. We clearly do need to fix the reported rowcount for cases where ExecutorRun is invoked more than once per ExecutorEnd call; but I think that's sufficient. regards, tom lane
Re: [BUG] pg_stat_statements and extended query protocol
> Why should that be the definition? Partial execution of a portal > might be something that is happening at the driver level, behind the > user's back. You can't make rational calculations of, say, plan > time versus execution time if that's how "calls" is measured. Correct, and there are also drivers that implement fetch size using cursor statements, i.e. DECLARE CURSOR, FETCH CURSOR, and each FETCH gets counted as 1 call. I wonder if the right answer here is to track fetches as a separate counter in pg_stat_statements, in which fetch refers to the number of times a portal is executed? Regards, Sami Imseih Amazon Web Services (AWS)
Re: [BUG] pg_stat_statements and extended query protocol
"Imseih (AWS), Sami" writes: >> Also, I'm doubtful that counting calls this way is a great idea, >> which would mean you only need one new counter field not two. The >> fact that you're having trouble defining what it means certainly >> suggests that the implementation is out front of the design. > ISTM you are not in agreement that a call count should be incremented > after every executorRun, but should only be incremented after > the portal is closed, at executorEnd. Is that correct? Right. That makes the "call count" equal to the number of times the query is invoked. > FWIW, The rationale for incrementing calls in executorRun is that calls > refers > to the number of times a client executes a portal, whether partially or to > completion. Why should that be the definition? Partial execution of a portal might be something that is happening at the driver level, behind the user's back. You can't make rational calculations of, say, plan time versus execution time if that's how "calls" is measured. regards, tom lane
Re: [BUG] pg_stat_statements and extended query protocol
> * Yeah, it'd be nice to have an in-core test, but it's folly to insist > on one that works via libpq and psql. That requires a whole new set > of features that you're apparently designing on-the-fly with no other > use cases in mind. I don't think that will accomplish much except to > ensure that this bug fix doesn't make it into v16. I agree, Solving the lack of in-core testing for this area belong in a different discussion. > * I don't understand why it was thought good to add two new counters > to struct Instrumentation. In EXPLAIN ANALYZE cases those will be > wasted space *per plan node*, not per Query. Indeed, looking at ExplainNode, Instrumentation struct is allocated per node and the new fields will be wasted space. Thanks for highlighting this. > * It also seems quite bizarre to add counters to a core data structure > and then leave it to pg_stat_statements to maintain them. That is fair point > I'm inclined to think that adding the counters to struct EState is > fine. That's 304 bytes already on 64-bit platforms, another 8 or 16 > won't matter. With the point you raise about Insrumentation per node, Estate is the better place for the new counters. > Also, I'm doubtful that counting calls this way is a great idea, > which would mean you only need one new counter field not two. The > fact that you're having trouble defining what it means certainly > suggests that the implementation is out front of the design. ISTM you are not in agreement that a call count should be incremented after every executorRun, but should only be incremented after the portal is closed, at executorEnd. Is that correct? FWIW, The rationale for incrementing calls in executorRun is that calls refers to the number of times a client executes a portal, whether partially or to completion. Clients can also fetch rows from portals at various rates, so to determine the "rows per call" accurately from pg_stat_statements, we should track a calls as the number of times executorRun was called on a portal. > In short, what I think I'd suggest is adding an es_total_processed > field to EState and having standard_ExecutorRun do "es_total_processed > += es_processed" near the end, then just change pg_stat_statements to > use es_total_processed not es_processed. The original proposal in 0001-correct-pg_stat_statements-tracking-of-portals.patch, was to add a "calls" and "es_total_processed" field to EState. Regards, Sami Imseih Amazon Web Services (AWS)
Re: [BUG] pg_stat_statements and extended query protocol
"Imseih (AWS), Sami" writes: >> So... The idea here is to set a custom fetch size so as the number of >> calls can be deterministic in the tests, still more than 1 for the >> tests we'd have. And your point is that libpq enforces always 0 when >> sending the EXECUTE message causing it to always return all the rows >> for any caller of PQsendQueryGuts(). > That is correct. Hi, I took a quick look through this thread, and I have a couple of thoughts: * Yeah, it'd be nice to have an in-core test, but it's folly to insist on one that works via libpq and psql. That requires a whole new set of features that you're apparently designing on-the-fly with no other use cases in mind. I don't think that will accomplish much except to ensure that this bug fix doesn't make it into v16. * I don't understand why it was thought good to add two new counters to struct Instrumentation. In EXPLAIN ANALYZE cases those will be wasted space *per plan node*, not per Query. * It also seems quite bizarre to add counters to a core data structure and then leave it to pg_stat_statements to maintain them. (BTW, I didn't much care for putting that maintenance into pgss_ExecutorRun without updating its header comment.) I'm inclined to think that adding the counters to struct EState is fine. That's 304 bytes already on 64-bit platforms, another 8 or 16 won't matter. Also, I'm doubtful that counting calls this way is a great idea, which would mean you only need one new counter field not two. The fact that you're having trouble defining what it means certainly suggests that the implementation is out front of the design. In short, what I think I'd suggest is adding an es_total_processed field to EState and having standard_ExecutorRun do "es_total_processed += es_processed" near the end, then just change pg_stat_statements to use es_total_processed not es_processed. regards, tom lane
Re: [BUG] pg_stat_statements and extended query protocol
> So... The idea here is to set a custom fetch size so as the number of > calls can be deterministic in the tests, still more than 1 for the > tests we'd have. And your point is that libpq enforces always 0 when > sending the EXECUTE message causing it to always return all the rows > for any caller of PQsendQueryGuts(). That is correct. > The extended protocol allows that, so you would like a libpq API to > have more control of what we send with EXECUTE: > https://www.postgresql.org/docs/current/protocol-overview.html#PROTOCOL-QUERY-CONCEPTS > The extended query protocol would require multiple 'E' messages, but > we would not need multiple describe or bind messages, meaning that > this cannot just be an extra flavor PQsendQueryParams(). Am I gettig > that right? Correct, there will need to be separate APIs for Parse/Bind, Execute and Close of a Portal. > The correct API design seems tricky, to say the least. > Perhaps requiring this much extra work in libpq for the purpose of > having some tests in this thread is not a brilliant idea.. Or perhaps > we could just do it and have something a-la-JDBC with two routines? > That would be one libpq routine for describe/bind and one for execute > where the limit can be given by the caller in the latter case, similar > to sendDescribeStatement() and sendExecute() in > QueryExecutorImpl.java. I am not too clear on your point here. ISTM you are suggesting adding new libpq APis similar to JDBC, which is what I am also suggesting. Did I understand correctly? -- Sami Imseih Amazon Web Services (AWS)
Re: [BUG] pg_stat_statements and extended query protocol
On Thu, Mar 23, 2023 at 01:54:05PM +, Imseih (AWS), Sami wrote: > Yes, that is possible but we will need to add a libpq API > that allows the caller to pass in a "fetch size". > PQsendQueryParams does not take in a "fetch size", > so it returns all rows, through PQsendQueryParams > > https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-exec.c#L1882 > > Adding such an API that takes in a "fetch size" will be beneficial > not just for this test, but I can see it enabling another psql meta command, > similar to \bind but that takes in a "fetch size". So... The idea here is to set a custom fetch size so as the number of calls can be deterministic in the tests, still more than 1 for the tests we'd have. And your point is that libpq enforces always 0 when sending the EXECUTE message causing it to always return all the rows for any caller of PQsendQueryGuts(). The extended protocol allows that, so you would like a libpq API to have more control of what we send with EXECUTE: https://www.postgresql.org/docs/current/protocol-overview.html#PROTOCOL-QUERY-CONCEPTS The extended query protocol would require multiple 'E' messages, but we would not need multiple describe or bind messages, meaning that this cannot just be an extra flavor PQsendQueryParams(). Am I gettig that right? The correct API design seems tricky, to say the least. Perhaps requiring this much extra work in libpq for the purpose of having some tests in this thread is not a brilliant idea.. Or perhaps we could just do it and have something a-la-JDBC with two routines? That would be one libpq routine for describe/bind and one for execute where the limit can be given by the caller in the latter case, similar to sendDescribeStatement() and sendExecute() in QueryExecutorImpl.java. -- Michael signature.asc Description: PGP signature
Re: [BUG] pg_stat_statements and extended query protocol
> I wonder that this patch changes the meaning of "calls" in the > pg_stat_statement > view a bit; previously it was "Number of times the statement was executed" as > described in the documentation, but currently this means "Number of times the > portal was executed". I'm worried that this makes users confused. For example, > a user may think the average numbers of rows returned by a statement is given > by > rows/calls, but it is not always correct because some statements could be > executed > with multiple portal runs. I don't think it changes the meaning of "calls" in pg_stat_statements, since every time the app fetches X amount of rows from a portal, it's still done in a separate execution, and thus a separate call. I agree, the meaning of "calls" should be clarified in docs. Regards, Sami Imseih Amazon Web Services (AWS)
Re: [BUG] pg_stat_statements and extended query protocol
On Thu, 23 Mar 2023 09:33:16 +0100 "Drouvot, Bertrand" wrote: > Hi, > > On 3/22/23 10:35 PM, Imseih (AWS), Sami wrote: > >> What about using an uint64 for calls? That seems more appropriate to me > >> (even if > >> queryDesc->totaltime->calls will be passed (which is int64), but that's > >> already > >> also the case for the "rows" argument and > >> queryDesc->totaltime->rows_processed) > > > > That's fair > > > > > >> I'm not sure it's worth mentioning that the new counters are "currently" > >> used with the ExecutorRun. > > > > Sure, I suppose these fields could be used outside of ExecutorRun. Good > > point. > > > > > >> Also, I wonder if "rows" (and not rows_processed) would not be a better > >> naming. > > > > Agree. > > > > I went with rows_processed initially, since it was accumulating > > es_processed, > > but as the previous point, this instrumentation could be used outside of > > ExecutorRun. > > > > v3 addresses the comments. I wonder that this patch changes the meaning of "calls" in the pg_stat_statement view a bit; previously it was "Number of times the statement was executed" as described in the documentation, but currently this means "Number of times the portal was executed". I'm worried that this makes users confused. For example, a user may think the average numbers of rows returned by a statement is given by rows/calls, but it is not always correct because some statements could be executed with multiple portal runs. Although it might not big issue to users, I think it is better to add an explanation to the doc for clarification. Regards, Yugo Nagata > > > > Thanks! LGTM and also do confirm that, with the patch, the JDBC test does > show the correct results. > > That said, not having a test (for the reasons you explained up-thread) > associated with the patch worry me a bit. > > But, I'm tempted to say that adding new tests could be addressed separately > though (as this patch looks pretty straightforward). > > Regards, > > -- > Bertrand Drouvot > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com > > -- Yugo NAGATA
Re: [BUG] pg_stat_statements and extended query protocol
> How does JDBC test that? Does it have a dependency on > pg_stat_statements? No, at the start of the thread, a sample jdbc script was attached. But I agree, we need to add test coverage. See below. >> But, I'm tempted to say that adding new tests could be addressed >> separately though (as this patch looks pretty straightforward). > Even small patches can have gotchas. I think that this should have > tests in-core rather than just depend on JDBC and hope for the best. > Even if \bind does not allow that, we could use an approach similar to > libpq_pipeline, for example, depending on pg_stat_statements for the > validation with a test module in src/test/modules/? Yes, that is possible but we will need to add a libpq API that allows the caller to pass in a "fetch size". PQsendQueryParams does not take in a "fetch size", so it returns all rows, through PQsendQueryParams https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-exec.c#L1882 Adding such an API that takes in a "fetch size" will be beneficial not just for this test, but I can see it enabling another psql meta command, similar to \bind but that takes in a "fetch size". Regards -- Sami Imseih Amazon Web Services (AWS)
Re: [BUG] pg_stat_statements and extended query protocol
On Thu, Mar 23, 2023 at 09:33:16AM +0100, Drouvot, Bertrand wrote: > Thanks! LGTM and also do confirm that, with the patch, the JDBC test > does show the correct results. How does JDBC test that? Does it have a dependency on pg_stat_statements? > > That said, not having a test (for the reasons you explained > up-thread) associated with the patch worry me a bit. Same impression here. > But, I'm tempted to say that adding new tests could be addressed > separately though (as this patch looks pretty straightforward). Even small patches can have gotchas. I think that this should have tests in-core rather than just depend on JDBC and hope for the best. Even if \bind does not allow that, we could use an approach similar to libpq_pipeline, for example, depending on pg_stat_statements for the validation with a test module in src/test/modules/? -- Michael signature.asc Description: PGP signature
Re: [BUG] pg_stat_statements and extended query protocol
Hi, On 3/22/23 10:35 PM, Imseih (AWS), Sami wrote: What about using an uint64 for calls? That seems more appropriate to me (even if queryDesc->totaltime->calls will be passed (which is int64), but that's already also the case for the "rows" argument and queryDesc->totaltime->rows_processed) That's fair I'm not sure it's worth mentioning that the new counters are "currently" used with the ExecutorRun. Sure, I suppose these fields could be used outside of ExecutorRun. Good point. Also, I wonder if "rows" (and not rows_processed) would not be a better naming. Agree. I went with rows_processed initially, since it was accumulating es_processed, but as the previous point, this instrumentation could be used outside of ExecutorRun. v3 addresses the comments. Thanks! LGTM and also do confirm that, with the patch, the JDBC test does show the correct results. That said, not having a test (for the reasons you explained up-thread) associated with the patch worry me a bit. But, I'm tempted to say that adding new tests could be addressed separately though (as this patch looks pretty straightforward). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [BUG] pg_stat_statements and extended query protocol
> What about using an uint64 for calls? That seems more appropriate to me (even > if > queryDesc->totaltime->calls will be passed (which is int64), but that's > already > also the case for the "rows" argument and > queryDesc->totaltime->rows_processed) That's fair > I'm not sure it's worth mentioning that the new counters are "currently" used > with the ExecutorRun. Sure, I suppose these fields could be used outside of ExecutorRun. Good point. > Also, I wonder if "rows" (and not rows_processed) would not be a better > naming. Agree. I went with rows_processed initially, since it was accumulating es_processed, but as the previous point, this instrumentation could be used outside of ExecutorRun. v3 addresses the comments. Regards, -- Sami Imseih Amazon Web Services (AWS) v3-0001-Correct-accumulation-of-counters-for-extended-query-.patch Description: v3-0001-Correct-accumulation-of-counters-for-extended-query-.patch
Re: [BUG] pg_stat_statements and extended query protocol
Hi, On 3/21/23 2:16 PM, Imseih (AWS), Sami wrote: This indeed feels a bit more natural seen from here, after looking at the code paths using an Instrumentation in the executor and explain, for example. At least, this stresses me much less than adding 16 bytes to EState for something restricted to the extended protocol when it comes to monitoring capabilities. Attached is the patch that uses Instrumentation. Thanks, I think this new approach makes sense. - const BufferUsage *bufusage, + int64 calls, const BufferUsage *bufusage, What about using an uint64 for calls? That seems more appropriate to me (even if queryDesc->totaltime->calls will be passed (which is int64), but that's already also the case for the "rows" argument and queryDesc->totaltime->rows_processed) @@ -88,6 +88,8 @@ typedef struct Instrumentation double nfiltered2; /* # of tuples removed by "other" quals */ BufferUsage bufusage; /* total buffer usage */ WalUsagewalusage; /* total WAL usage */ + int64 calls; /* # of total calls to ExecutorRun */ + int64 rows_processed; /* # of total rows processed in ExecutorRun */ I'm not sure it's worth mentioning that the new counters are "currently" used with the ExecutorRun. What about just "total calls" and "total rows processed" (or "total rows", see below)? Also, I wonder if "rows" (and not rows_processed) would not be a better naming. Those last comments regarding the Instrumentation are done because ISTM that at the end their usage could vary depending of the use case of the Instrumentation. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [BUG] pg_stat_statements and extended query protocol
> This indeed feels a bit more natural seen from here, after looking at > the code paths using an Instrumentation in the executor and explain, > for example. At least, this stresses me much less than adding 16 > bytes to EState for something restricted to the extended protocol when > it comes to monitoring capabilities. Attached is the patch that uses Instrumentation. I did not add any new tests, and we do not have anyway now of setting a row count when going through the Execute message. I think this may be need to be addressed separately since there Seems to be. Gap in extended query protocol testing. For this fix however, The JDBC test does show correct results. Regards, Sami Imseih Amazon Web Services (AWS) 0001-Correct-accumulation-of-counters-for-extended-query-.patch Description: 0001-Correct-accumulation-of-counters-for-extended-query-.patch
Re: [BUG] pg_stat_statements and extended query protocol
On Mon, Mar 20, 2023 at 09:41:12PM +, Imseih (AWS), Sami wrote: > I was thinking about this and it seems to me we can avoid > adding new fields to Estate. I think a better place to track > rows and calls is in the Instrumentation struct. > > If this is more palatable, I can prepare the patch. This indeed feels a bit more natural seen from here, after looking at the code paths using an Instrumentation in the executor and explain, for example. At least, this stresses me much less than adding 16 bytes to EState for something restricted to the extended protocol when it comes to monitoring capabilities. -- Michael signature.asc Description: PGP signature
Re: [BUG] pg_stat_statements and extended query protocol
Sorry about the delay in response about this. I was thinking about this and it seems to me we can avoid adding new fields to Estate. I think a better place to track rows and calls is in the Instrumentation struct. --- a/src/include/executor/instrument.h +++ b/src/include/executor/instrument.h @@ -88,6 +88,8 @@ typedef struct Instrumentation double nfiltered2; /* # of tuples removed by "other" quals */ BufferUsage bufusage; /* total buffer usage */ WalUsagewalusage; /* total WAL usage */ + int64 calls; + int64 rows_processed; } Instrumentation; If this is more palatable, I can prepare the patch. Thanks for your feedback. Regards. Sami Imseih Amazon Web Services (AWS)
Re: [BUG] pg_stat_statements and extended query protocol
It appears you must "make clean; make install" to correctly compile after applying the patch. In a git repository, I've learnt to rely on this simple formula, even if it means extra cycles when running ./configure: git clean -d -x -f Thank you all for pointing out that it needs make clean first. After make clean followed by recompile with the patch then both make check from regression test and pg_stat_statements extension report all test passed. So the current existing test cases can't really detect any change from this patch, then it would be better to add some test cases to cover this. Best regards, David
Re: [BUG] pg_stat_statements and extended query protocol
Hi, On 3/2/23 8:27 AM, Michael Paquier wrote: On Wed, Jan 25, 2023 at 11:22:04PM +, Imseih (AWS), Sami wrote: Doing some work with extended query protocol, I encountered the same issue that was discussed in [1]. It appears when a client is using extended query protocol and sends an Execute message to a portal with max_rows, and a portal is executed multiple times, pg_stat_statements does not correctly track rows and calls. Well, it is one of these areas where it seems to me we have never been able to put a definition on what should be the correct behavior when it comes to pg_stat_statements. Could it be possible to add some regression tests using the recently-added \bind command and see how this affects things? I would suggest splitting these into their own SQL file, following an effort I have been doing recently for the regression tests of pg_stat_statements. It would be good to know the effects of this change for pg_stat_statements.track = (top|all), as well. @@ -657,7 +657,9 @@ typedef struct EState List *es_tupleTable; /* List of TupleTableSlots */ - uint64 es_processed; /* # of tuples processed */ + uint64 es_processed; /* # of tuples processed at the top level only */ + uint64 es_calls; /* # of calls */ + uint64 es_total_processed; /* total # of tuples processed */ So the root of the logic is here. Anything that makes the executor structures larger freaks me out, FWIW, and that's quite an addition. -- Michael I wonder if we can't "just" make use of the "count" parameter passed to the ExecutorRun_hook. Something like? - Increment a "es_total_processed" counter in pgss based on the count received in pgss_ExecutorRun() - In pgss_ExecutorEnd(): substract the last count we received in pgss_ExecutorRun() and add queryDesc->estate->es_processed? (we'd need to be able to distinguish when we should apply this rule or not). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [BUG] pg_stat_statements and extended query protocol
On Sat, Mar 11, 2023 at 11:55:22PM +, Imseih (AWS), Sami wrote: > It appears you must "make clean; make install" to correctly compile after > applying the patch. In a git repository, I've learnt to rely on this simple formula, even if it means extra cycles when running ./configure: git clean -d -x -f -- Michael signature.asc Description: PGP signature
Re: [BUG] pg_stat_statements and extended query protocol
> If I remove this patch and recompile again, then "initdb -D $PGDATA" works. It appears you must "make clean; make install" to correctly compile after applying the patch. Regards, Sami Imseih Amazon Web Services (AWS)
Re: [BUG] pg_stat_statements and extended query protocol
Yes, I agree that proper test coverage is needed here. Will think about how to accomplish this. Tried to apply this patch to current master branch and the build was ok, however it crashed during initdb with a message like below. "performing post-bootstrap initialization ... Segmentation fault (core dumped)" If I remove this patch and recompile again, then "initdb -D $PGDATA" works. Thanks, David
Re: [BUG] pg_stat_statements and extended query protocol
> Well, it is one of these areas where it seems to me we have never been > able to put a definition on what should be the correct behavior when > it comes to pg_stat_statements. What needs to be defined here is how pgss should account for # of rows processed when A) a select goes through extended query (EP) protocol, and B) it requires multiple executes to complete a portal. The patch being suggested will treat every 'E' ( execute message ) to the same portal as a new call ( pgss will increment the calls ) and the number of rows processed will be accumulated for every 'E' message. Currently, only the rows fetched in the last 'E' call to the portal is tracked by pgss. This is incorrect. > Could it be possible to add some > regression tests using the recently-added \bind command and see how > this affects things? \bind alone will not be enough as we also need a way to fetch from a portal in batches. The code that needs to be exercised as part of the test is exec_execute_message with max_rows != 0. \bind will call exec_execute_message with max_rows = 0 to fetch all the rows. > I would suggest splitting these into their own > SQL file, following an effort I have been doing recently for the > regression tests of pg_stat_statements. It would be good to know the > effects of this change for pg_stat_statements.track = (top|all), as > well. Yes, I agree that proper test coverage is needed here. Will think about how to accomplish this. > - uint64 es_processed; /* # of tuples processed */ > + uint64 es_processed; /* # of tuples processed at the top level only */ > + uint64 es_calls; /* # of calls */ > + uint64 es_total_processed; /* total # of tuples processed */ > So the root of the logic is here. Anything that makes the executor > structures larger freaks me out, FWIW, and that's quite an addition. I am not sure how to get around the changes to Estate and fixing this Issue. We could potentially only need the es_total_processed field and continue to track calls in pgss. es_total_processed in EState however is still needed. Regards, -- Sami Imseih Amazon Web Services (AWS)
Re: [BUG] pg_stat_statements and extended query protocol
On Wed, Jan 25, 2023 at 11:22:04PM +, Imseih (AWS), Sami wrote: > Doing some work with extended query protocol, I encountered the same > issue that was discussed in [1]. It appears when a client is using > extended query protocol and sends an Execute message to a portal with > max_rows, and a portal is executed multiple times, > pg_stat_statements does not correctly track rows and calls. Well, it is one of these areas where it seems to me we have never been able to put a definition on what should be the correct behavior when it comes to pg_stat_statements. Could it be possible to add some regression tests using the recently-added \bind command and see how this affects things? I would suggest splitting these into their own SQL file, following an effort I have been doing recently for the regression tests of pg_stat_statements. It would be good to know the effects of this change for pg_stat_statements.track = (top|all), as well. @@ -657,7 +657,9 @@ typedef struct EState List *es_tupleTable; /* List of TupleTableSlots */ - uint64 es_processed; /* # of tuples processed */ + uint64 es_processed; /* # of tuples processed at the top level only */ + uint64 es_calls; /* # of calls */ + uint64 es_total_processed; /* total # of tuples processed */ So the root of the logic is here. Anything that makes the executor structures larger freaks me out, FWIW, and that's quite an addition. -- Michael signature.asc Description: PGP signature