Re: libpq debug log
On Mon, Jun 07, 2021 at 11:37:58AM -0400, Robert Haas wrote: > On Sat, Jun 5, 2021 at 11:03 AM Alvaro Herrera > wrote: > > > This added a PQtraceSetFlags() function. We have a dozen PQset*() > > > functions, > > > but this and PQresultSetInstanceData() are the only PQSomethingSet() > > > functions. Is it okay if I rename it to PQsetTraceFlags()? I think that > > > would be more idiomatic for libpq. > > > > Hmm, there is an obvious parallel between PQtrace() and > > PQtraceSetFlags() which is lost with your proposed rename. I'm not > > wedded to the name though, so I'm just -0.1 on the rename. If you feel > > strongly about it, I won't oppose it. > > I'm +1 for the rename. I think it's a lot more idiomatic. At +1 vs. -0.1, I would have withdrawn the proposal. Seeing +2 vs. -0.1, I have pushed it.
Re: libpq debug log
On Sat, Jun 5, 2021 at 11:03 AM Alvaro Herrera wrote: > > This added a PQtraceSetFlags() function. We have a dozen PQset*() > > functions, > > but this and PQresultSetInstanceData() are the only PQSomethingSet() > > functions. Is it okay if I rename it to PQsetTraceFlags()? I think that > > would be more idiomatic for libpq. > > Hmm, there is an obvious parallel between PQtrace() and > PQtraceSetFlags() which is lost with your proposed rename. I'm not > wedded to the name though, so I'm just -0.1 on the rename. If you feel > strongly about it, I won't oppose it. I'm +1 for the rename. I think it's a lot more idiomatic. -- Robert Haas EDB: http://www.enterprisedb.com
Re: libpq debug log
On 2021-Jun-04, Noah Misch wrote: > On Fri, Apr 09, 2021 at 05:16:11PM -0400, Alvaro Herrera wrote: > > Pushed now. > > This added a PQtraceSetFlags() function. We have a dozen PQset*() functions, > but this and PQresultSetInstanceData() are the only PQSomethingSet() > functions. Is it okay if I rename it to PQsetTraceFlags()? I think that > would be more idiomatic for libpq. Hmm, there is an obvious parallel between PQtrace() and PQtraceSetFlags() which is lost with your proposed rename. I'm not wedded to the name though, so I'm just -0.1 on the rename. If you feel strongly about it, I won't oppose it. -- Álvaro Herrera Valdivia, Chile "La espina, desde que nace, ya pincha" (Proverbio africano)
Re: libpq debug log
On Fri, Apr 09, 2021 at 05:16:11PM -0400, Alvaro Herrera wrote: > Pushed now. This added a PQtraceSetFlags() function. We have a dozen PQset*() functions, but this and PQresultSetInstanceData() are the only PQSomethingSet() functions. Is it okay if I rename it to PQsetTraceFlags()? I think that would be more idiomatic for libpq.
Re: libpq debug log
On 2021-Apr-02, Tom Lane wrote: > Alvaro Herrera writes: > > On 2021-Apr-02, Tom Lane wrote: > >> +1, but the comment could be more specific. Maybe like "In regress mode, > >> suppress the length of ErrorResponse and NoticeResponse messages --- the F > >> (file name) field, in particular, can vary in length depending on compiler > >> used." > > > With your commit this is no longer true, so how about the attached? > > Right. Your text looks fine. Pushed now. -- Álvaro Herrera39°49'30"S 73°17'W "I'm always right, but sometimes I'm more right than other times." (Linus Torvalds)
Re: libpq debug log
Alvaro Herrera writes: > On 2021-Apr-02, Tom Lane wrote: >> +1, but the comment could be more specific. Maybe like "In regress mode, >> suppress the length of ErrorResponse and NoticeResponse messages --- the F >> (file name) field, in particular, can vary in length depending on compiler >> used." > With your commit this is no longer true, so how about the attached? Right. Your text looks fine. (I see drongo is now green, so it looks like its compiler does indeed emit backslash-separated full paths.) regards, tom lane
Re: libpq debug log
On 2021-Apr-02, Tom Lane wrote: > Alvaro Herrera writes: > > As in the attached patch. > > +1, but the comment could be more specific. Maybe like "In regress mode, > suppress the length of ErrorResponse and NoticeResponse messages --- the F > (file name) field, in particular, can vary in length depending on compiler > used." With your commit this is no longer true, so how about the attached? -- Álvaro Herrera Valdivia, Chile >From 80af8f687ea3665a045d9005e0ec1fc53e1a393a Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 2 Apr 2021 10:51:42 -0300 Subject: [PATCH v2] Suppress length of Notice/Error in PQtrace regress mode A (relatively minor) annoyance of ErrorResponse/NoticeResponse messages as printed by PQtrace() is that their length might vary when we move error messages from one file to another, one function to another, or even when their location line number changes number of digits. To avoid having to adjust expected files for some tests, make the regress mode of PQtrace() suppress the length word of those messages. Discussion: https://postgr.es/m/20210402023010.GA13563@alvherre.pgsql Reviewed-by: Tom Lane --- src/interfaces/libpq/fe-trace.c | 11 ++- .../libpq_pipeline/traces/pipeline_abort.trace| 8 .../modules/libpq_pipeline/traces/transaction.trace | 6 +++--- 3 files changed, 17 insertions(+), 8 deletions(-) diff --git a/src/interfaces/libpq/fe-trace.c b/src/interfaces/libpq/fe-trace.c index 9a4595f5c8..ca24869e91 100644 --- a/src/interfaces/libpq/fe-trace.c +++ b/src/interfaces/libpq/fe-trace.c @@ -540,7 +540,16 @@ pqTraceOutputMessage(PGconn *conn, const char *message, bool toServer) length = (int) pg_ntoh32(length); logCursor += 4; - fprintf(conn->Pfdebug, "%s\t%d\t", prefix, length); + /* + * In regress mode, suppress the length of ErrorResponse and + * NoticeResponse. The F (file name), L (line number) and R (routine + * name) fields can change as server code is modified, and if their + * lengths differ from the originals, that would break tests. + */ + if (regress && !toServer && (id == 'E' || id == 'N')) + fprintf(conn->Pfdebug, "%s\tNN\t", prefix); + else + fprintf(conn->Pfdebug, "%s\t%d\t", prefix, length); switch (id) { diff --git a/src/test/modules/libpq_pipeline/traces/pipeline_abort.trace b/src/test/modules/libpq_pipeline/traces/pipeline_abort.trace index b061435785..254e485997 100644 --- a/src/test/modules/libpq_pipeline/traces/pipeline_abort.trace +++ b/src/test/modules/libpq_pipeline/traces/pipeline_abort.trace @@ -1,5 +1,5 @@ F 42 Query "DROP TABLE IF EXISTS pq_pipeline_demo" -B 123 NoticeResponse S "NOTICE" V "NOTICE" C "0" M "table "pq_pipeline_demo" does not exist, skipping" F "" L "" R "" \x00 +B NN NoticeResponse S "NOTICE" V "NOTICE" C "0" M "table "pq_pipeline_demo" does not exist, skipping" F "" L "" R "" \x00 B 15 CommandComplete "DROP TABLE" B 5 ReadyForQuery I F 99 Query "CREATE UNLOGGED TABLE pq_pipeline_demo(id serial primary key, itemno integer,int8filler int8);" @@ -27,7 +27,7 @@ B 4 ParseComplete B 4 BindComplete B 4 NoData B 15 CommandComplete "INSERT 0 1" -B 217 ErrorResponse S "ERROR" V "ERROR" C "42883" M "function no_such_function(integer) does not exist" H "No function matches the given name and argument types. You might need to add explicit type casts." P "8" F "" L "" R "" \x00 +B NN ErrorResponse S "ERROR" V "ERROR" C "42883" M "function no_such_function(integer) does not exist" H "No function matches the given name and argument types. You might need to add explicit type casts." P "8" F "" L "" R "" \x00 B 5 ReadyForQuery I B 4 ParseComplete B 4 BindComplete @@ -39,7 +39,7 @@ F 12 Bind "" "" 0 0 0 F 6 Describe P "" F 9 Execute "" 0 F 4 Sync -B 123 ErrorResponse S "ERROR" V "ERROR" C "42601" M "cannot insert multiple commands into a prepared statement" F "" L "" R "" \x00 +B NN ErrorResponse S "ERROR" V "ERROR" C "42601" M "cannot insert multiple commands into a prepared statement" F "" L "" R "" \x00 B 5 ReadyForQuery I F 54 Parse "" "SELECT 1.0/g FROM generate_series(3, -1, -1) g" 0 F 12 Bind "" "" 0 0 0 @@ -52,7 +52,7 @@ B 33 RowDescription 1 "?column?" 0 65535 -1 0 B 32 DataRow 1 22 '0.' B 32 DataRow 1 22 '0.5000' B 32 DataRow 1 22 '1.' -B 70 ErrorResponse S "ERROR" V "ERROR" C "22012" M "division by zero" F "" L "" R "" \x00 +B NN ErrorResponse S "ERROR" V "ERROR" C "22012" M "division by zero" F "" L "" R "" \x00 B 5 ReadyForQuery I F 40 Query "SELECT itemno FROM pq_pipeline_demo" B 31 RowDescription 1 "itemno" 2 4 -1 0 diff --git a/src/test/modules/libpq_pipeline/traces/transaction.trace b/src/test/modules/libpq_pipeline/traces/transaction.trace index 0267a534fa..1dcc2373c0 100644 --- a/src/test/modules/libpq_pipeline/traces/tra
Re: libpq debug log
On 2021-Apr-02, Tom Lane wrote: > On third thought, maybe we should push your patch too. Although I think > 53aafdb9f is going to fix the platform-specific aspect of this, we are > still going to risk some implementation dependence of the libpq_pipeline > results: > > * Every so often, the number of digits in the reported line numbers > will change (999 -> 1001 or the like), due to changes in unrelated > code. Bah -- of course. > * Occasionally we refactor things so that the "same" error is reported > from a different file. True. (This was the reason for masking F and R, but I didn't realize that it'd have an effect in the message length). > It's hard to judge whether that will happen often enough to be an > annoying maintenance problem, but there's definitely a hazard. > Not sure if we should proactively lobotomize the test, or wait to > see if we get annoyed. I suspect we're going to see enough bf failures if we don't suppress it, because the problem is going to only show up with TAP testing enabled and the src/test/modules tests, which perhaps not all committers run. > In any case I'd like to wait till after drongo's next run, so > we can confirm whether or not the backslashes-in-__FILE__ hypothesis > is correct. If it is, then 53aafdb9f is a good fix on its own > merits, independently of libpq_pipeline. Wilco. -- Álvaro Herrera Valdivia, Chile Essentially, you're proposing Kevlar shoes as a solution for the problem that you want to walk around carrying a loaded gun aimed at your foot. (Tom Lane)
Re: libpq debug log
I wrote: >> I bet what is happening on drongo is that the compiler has generated a >> __FILE__ value that contains backslashes not slashes, and this code >> doesn't know how to shorten those. So maybe instead of lobotomizing >> this test, we should fix that. > Did that, but we'll have to wait a few hours to see if it makes > drongo happy. On third thought, maybe we should push your patch too. Although I think 53aafdb9f is going to fix the platform-specific aspect of this, we are still going to risk some implementation dependence of the libpq_pipeline results: * Every so often, the number of digits in the reported line numbers will change (999 -> 1001 or the like), due to changes in unrelated code. * Occasionally we refactor things so that the "same" error is reported from a different file. It's hard to judge whether that will happen often enough to be an annoying maintenance problem, but there's definitely a hazard. Not sure if we should proactively lobotomize the test, or wait to see if we get annoyed. In any case I'd like to wait till after drongo's next run, so we can confirm whether or not the backslashes-in-__FILE__ hypothesis is correct. If it is, then 53aafdb9f is a good fix on its own merits, independently of libpq_pipeline. regards, tom lane
Re: libpq debug log
On 2021-Apr-02, Tom Lane wrote: > I wrote: > > I bet what is happening on drongo is that the compiler has generated a > > __FILE__ value that contains backslashes not slashes, and this code > > doesn't know how to shorten those. So maybe instead of lobotomizing > > this test, we should fix that. > > Did that, but we'll have to wait a few hours to see if it makes > drongo happy. Thanks, I really hope it does! -- Álvaro Herrera Valdivia, Chile
Re: libpq debug log
I wrote: > I bet what is happening on drongo is that the compiler has generated a > __FILE__ value that contains backslashes not slashes, and this code > doesn't know how to shorten those. So maybe instead of lobotomizing > this test, we should fix that. Did that, but we'll have to wait a few hours to see if it makes drongo happy. regards, tom lane
Re: libpq debug log
Alvaro Herrera writes: > As in the attached patch. +1, but the comment could be more specific. Maybe like "In regress mode, suppress the length of ErrorResponse and NoticeResponse messages --- the F (file name) field, in particular, can vary in length depending on compiler used." Actually though ... I recall now that elog.c tries to standardize the F output by stripping the path part: /* keep only base name, useful especially for vpath builds */ slash = strrchr(filename, '/'); if (slash) filename = slash + 1; I bet what is happening on drongo is that the compiler has generated a __FILE__ value that contains backslashes not slashes, and this code doesn't know how to shorten those. So maybe instead of lobotomizing this test, we should fix that. slash = strrchr(filename, '/'); + if (!slash) + slash = strrchr(filename, '\\'); if (slash) filename = slash + 1; regards, tom lane
Re: libpq debug log
On 2021-Apr-02, Alvaro Herrera wrote: > On 2021-Apr-02, Kyotaro Horiguchi wrote: > > > At Fri, 02 Apr 2021 14:40:09 +0900 (JST), Kyotaro Horiguchi > > wrote in > > > > So, the cheapest measure for regression test would be just making the > > > > So, the cheapest measure for regression test would be just *masking* the > > > > > length field, while regress is true... > > Yeah. As in the attached patch. -- Álvaro Herrera Valdivia, Chile "Investigación es lo que hago cuando no sé lo que estoy haciendo" (Wernher von Braun) >From 387a15ec80cfcdd2279b109215b9f658bc972748 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 2 Apr 2021 10:51:42 -0300 Subject: [PATCH] Suppress length of Notice/Error in PQtrace regress mode --- src/interfaces/libpq/fe-trace.c | 9 - .../modules/libpq_pipeline/traces/pipeline_abort.trace | 8 src/test/modules/libpq_pipeline/traces/transaction.trace | 6 +++--- 3 files changed, 15 insertions(+), 8 deletions(-) diff --git a/src/interfaces/libpq/fe-trace.c b/src/interfaces/libpq/fe-trace.c index 9a4595f5c8..298713c602 100644 --- a/src/interfaces/libpq/fe-trace.c +++ b/src/interfaces/libpq/fe-trace.c @@ -540,7 +540,14 @@ pqTraceOutputMessage(PGconn *conn, const char *message, bool toServer) length = (int) pg_ntoh32(length); logCursor += 4; - fprintf(conn->Pfdebug, "%s\t%d\t", prefix, length); + /* + * In regress mode, suppress the length of ErrorResponse and + * NoticeResponse -- they vary depending on compiler. + */ + if (regress && !toServer && (id == 'E' || id == 'N')) + fprintf(conn->Pfdebug, "%s\tNN\t", prefix); + else + fprintf(conn->Pfdebug, "%s\t%d\t", prefix, length); switch (id) { diff --git a/src/test/modules/libpq_pipeline/traces/pipeline_abort.trace b/src/test/modules/libpq_pipeline/traces/pipeline_abort.trace index b061435785..254e485997 100644 --- a/src/test/modules/libpq_pipeline/traces/pipeline_abort.trace +++ b/src/test/modules/libpq_pipeline/traces/pipeline_abort.trace @@ -1,5 +1,5 @@ F 42 Query "DROP TABLE IF EXISTS pq_pipeline_demo" -B 123 NoticeResponse S "NOTICE" V "NOTICE" C "0" M "table "pq_pipeline_demo" does not exist, skipping" F "" L "" R "" \x00 +B NN NoticeResponse S "NOTICE" V "NOTICE" C "0" M "table "pq_pipeline_demo" does not exist, skipping" F "" L "" R "" \x00 B 15 CommandComplete "DROP TABLE" B 5 ReadyForQuery I F 99 Query "CREATE UNLOGGED TABLE pq_pipeline_demo(id serial primary key, itemno integer,int8filler int8);" @@ -27,7 +27,7 @@ B 4 ParseComplete B 4 BindComplete B 4 NoData B 15 CommandComplete "INSERT 0 1" -B 217 ErrorResponse S "ERROR" V "ERROR" C "42883" M "function no_such_function(integer) does not exist" H "No function matches the given name and argument types. You might need to add explicit type casts." P "8" F "" L "" R "" \x00 +B NN ErrorResponse S "ERROR" V "ERROR" C "42883" M "function no_such_function(integer) does not exist" H "No function matches the given name and argument types. You might need to add explicit type casts." P "8" F "" L "" R "" \x00 B 5 ReadyForQuery I B 4 ParseComplete B 4 BindComplete @@ -39,7 +39,7 @@ F 12 Bind "" "" 0 0 0 F 6 Describe P "" F 9 Execute "" 0 F 4 Sync -B 123 ErrorResponse S "ERROR" V "ERROR" C "42601" M "cannot insert multiple commands into a prepared statement" F "" L "" R "" \x00 +B NN ErrorResponse S "ERROR" V "ERROR" C "42601" M "cannot insert multiple commands into a prepared statement" F "" L "" R "" \x00 B 5 ReadyForQuery I F 54 Parse "" "SELECT 1.0/g FROM generate_series(3, -1, -1) g" 0 F 12 Bind "" "" 0 0 0 @@ -52,7 +52,7 @@ B 33 RowDescription 1 "?column?" 0 65535 -1 0 B 32 DataRow 1 22 '0.' B 32 DataRow 1 22 '0.5000' B 32 DataRow 1 22 '1.' -B 70 ErrorResponse S "ERROR" V "ERROR" C "22012" M "division by zero" F "" L "" R "" \x00 +B NN ErrorResponse S "ERROR" V "ERROR" C "22012" M "division by zero" F "" L "" R "" \x00 B 5 ReadyForQuery I F 40 Query "SELECT itemno FROM pq_pipeline_demo" B 31 RowDescription 1 "itemno" 2 4 -1 0 diff --git a/src/test/modules/libpq_pipeline/traces/transaction.trace b/src/test/modules/libpq_pipeline/traces/transaction.trace index 0267a534fa..1dcc2373c0 100644 --- a/src/test/modules/libpq_pipeline/traces/transaction.trace +++ b/src/test/modules/libpq_pipeline/traces/transaction.trace @@ -1,5 +1,5 @@ F 79 Query "DROP TABLE IF EXISTS pq_pipeline_tst;CREATE TABLE pq_pipeline_tst (id int)" -B 122 NoticeResponse S "NOTICE" V "NOTICE" C "0" M "table "pq_pipeline_tst" does not exist, skipping" F "" L "" R "" \x00 +B NN NoticeResponse S "NOTICE" V "NOTICE" C "0" M "table "pq_pipeline_tst" does not exist, skipping" F "" L "" R "" \x00 B 15 CommandComplete "DROP TABLE" B 17 CommandComplete "CREATE TABLE" B 5 ReadyForQue
Re: libpq debug log
On 2021-Apr-02, Kyotaro Horiguchi wrote: > At Fri, 02 Apr 2021 14:40:09 +0900 (JST), Kyotaro Horiguchi > wrote in > > So, the cheapest measure for regression test would be just making the > > So, the cheapest measure for regression test would be just *masking* the > > > length field, while regress is true... Yeah. > tired? Same here. -- Álvaro Herrera39°49'30"S 73°17'W "Thou shalt not follow the NULL pointer, for chaos and madness await thee at its end." (2nd Commandment for C programmers)
Re: libpq debug log
At Fri, 02 Apr 2021 01:45:06 -0400, Tom Lane wrote in > Kyotaro Horiguchi writes: > > At Fri, 2 Apr 2021 02:56:44 +, "iwata@fujitsu.com" > > wrote in > >> Do ErrorResponse and NoticeResponse vary from test to test ...? > >> If so, it seemed difficult to make the trace logs available for regression > >> test. > >> I will also investigate the cause of this issue. > > > The redacted fields, F, L and R contained source file, souce line and > > source function respectively. It is reasonable guess that the > > difference comes from them but I'm not sure how they make a difference > > of 50 bytes in length... > > Some compilers include the full path of the source file in F ... Ah. I suspected that but dismissed the possibility looking that on my environment, gcc8. That completely makes sense. Thank you! regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: libpq debug log
At Fri, 02 Apr 2021 14:40:09 +0900 (JST), Kyotaro Horiguchi wrote in > At Fri, 2 Apr 2021 02:56:44 +, "iwata@fujitsu.com" > wrote in > > Hi Alvaro san > > > > Thank you for your fix of trace log code. > > > > > From: 'alvhe...@alvh.no-ip.org' > > > Sent: Friday, April 2, 2021 11:30 AM > > ... > > > It still didn't fix it! Drongo is now reporting a difference in the > > > expected trace -- > > > and the differences all seem to be message lengths. > > > Now that is pretty mysterious, because the messages themselves are printed > > > identically. Perl's output is pretty unhelpful, but I wrote them to a > > > file and diffed > > > manually; it's attached. > > > > Do ErrorResponse and NoticeResponse vary from test to test ...? > > If so, it seemed difficult to make the trace logs available for regression > > test. > > I will also investigate the cause of this issue. > > The redacted fields, F, L and R contained source file, souce line and > source function respectively. It is reasonable guess that the > difference comes from them but I'm not sure how they make a difference > of 50 bytes in length... > > Anyway if the length is wrong, we should get an error after emitting > the log line. > > > if (logCursor - 1 != length) > > fprintf(conn->Pfdebug, > > "mismatched message length: consumed %d, > > expected %d\n", > > logCursor - 1, length); > > So, the cheapest measure for regression test would be just making the So, the cheapest measure for regression test would be just *masking* the > length field, while regress is true... tired? -- Kyotaro Horiguchi NTT Open Source Software Center
Re: libpq debug log
Kyotaro Horiguchi writes: > At Fri, 2 Apr 2021 02:56:44 +, "iwata@fujitsu.com" > wrote in >> Do ErrorResponse and NoticeResponse vary from test to test ...? >> If so, it seemed difficult to make the trace logs available for regression >> test. >> I will also investigate the cause of this issue. > The redacted fields, F, L and R contained source file, souce line and > source function respectively. It is reasonable guess that the > difference comes from them but I'm not sure how they make a difference > of 50 bytes in length... Some compilers include the full path of the source file in F ... regards, tom lane
Re: libpq debug log
At Fri, 2 Apr 2021 02:56:44 +, "iwata@fujitsu.com" wrote in > Hi Alvaro san > > Thank you for your fix of trace log code. > > > From: 'alvhe...@alvh.no-ip.org' > > Sent: Friday, April 2, 2021 11:30 AM > ... > > It still didn't fix it! Drongo is now reporting a difference in the > > expected trace -- > > and the differences all seem to be message lengths. > > Now that is pretty mysterious, because the messages themselves are printed > > identically. Perl's output is pretty unhelpful, but I wrote them to a file > > and diffed > > manually; it's attached. > > Do ErrorResponse and NoticeResponse vary from test to test ...? > If so, it seemed difficult to make the trace logs available for regression > test. > I will also investigate the cause of this issue. The redacted fields, F, L and R contained source file, souce line and source function respectively. It is reasonable guess that the difference comes from them but I'm not sure how they make a difference of 50 bytes in length... Anyway if the length is wrong, we should get an error after emitting the log line. > if (logCursor - 1 != length) > fprintf(conn->Pfdebug, > "mismatched message length: consumed %d, > expected %d\n", > logCursor - 1, length); So, the cheapest measure for regression test would be just making the length field, while regress is true... regards. -- Kyotaro Horiguchi NTT Open Source Software Center
RE: libpq debug log
Hi Alvaro san Thank you for your fix of trace log code. > From: 'alvhe...@alvh.no-ip.org' > Sent: Friday, April 2, 2021 11:30 AM ... > It still didn't fix it! Drongo is now reporting a difference in the expected > trace -- > and the differences all seem to be message lengths. > Now that is pretty mysterious, because the messages themselves are printed > identically. Perl's output is pretty unhelpful, but I wrote them to a file > and diffed > manually; it's attached. Do ErrorResponse and NoticeResponse vary from test to test ...? If so, it seemed difficult to make the trace logs available for regression test. I will also investigate the cause of this issue. Regards, Aya Iwata
Re: libpq debug log
On 2021-Apr-01, 'alvhe...@alvh.no-ip.org' wrote: > Ooh, wow ... now that is a silly bug! Thanks, I'll push the fix in a > minute. It still didn't fix it! Drongo is now reporting a difference in the expected trace -- and the differences all seem to be message lengths. Now that is pretty mysterious, because the messages themselves are printed identically. Perl's output is pretty unhelpful, but I wrote them to a file and diffed manually; it's attached. So for example when we expect ! # B 123 ErrorResponseS "ERROR" V "ERROR" C "42601" M "cannot insert multiple commands into a prepared statement" F "" L "" R "" \\x00 we actually get ! # B 173 ErrorResponseS "ERROR" V "ERROR" C "42601" M "cannot insert multiple commands into a prepared statement" F "" L "" R "" \\x00 [slaps forehead] I suppose the difference must be that the message length includes the redacted string fields. So the file in the F file is 50 chars longer, *kaboom*. (I'm actually very surprised that this didn't blow up earlier.) I'm not sure what to do about this. Maybe the message length for ErrorResponse/NoticeResponse ought to be redacted too. -- Álvaro Herrera39°49'30"S 73°17'W
Re: libpq debug log
On 2021-Apr-01, Tom Lane wrote: > So drongo is still failing, and after a bit of looking around at > other code I finally got hit with the clue hammer. Per port.h: > > * On Windows, setvbuf() does not support _IOLBF mode, and interprets that > * as _IOFBF. To add insult to injury, setvbuf(file, NULL, _IOFBF, 0) > * crashes outright if "parameter validation" is enabled. Therefore, in > * places where we'd like to select line-buffered mode, we fall back to > * unbuffered mode instead on Windows. Always use PG_IOLBF not _IOLBF > * directly in order to implement this behavior. > > You want to do the honors? And do something about that shift bug > while at it. Ooh, wow ... now that is a silly bug! Thanks, I'll push the fix in a minute. Andrew also noted that the use of command_ok() in the test file eats all the output and is what is preventing us from seeing it in the first. I'll try to get rid of that together with the rest. -- Álvaro Herrera Valdivia, Chile
Re: libpq debug log
On 2021-Apr-01, Tom Lane wrote: > "'alvhe...@alvh.no-ip.org'" writes: > > On 2021-Apr-01, Tom Lane wrote: > >> BTW, what in the world is this supposed to accomplish? > >> -(long long) rows_to_send); > >> +(1L << 62) + (long long) rows_to_send); > > > It makes the text representation wider, which means some buffer fills up > > faster and the program switches from sending to receiving. > > Hm. If the actual field value isn't relevant, why bother including > rows_to_send in it? A constant string would be simpler and much > less confusing as to its purpose. Hah, yeah, we could as well do that I guess :-) -- Álvaro Herrera Valdivia, Chile "I think my standards have lowered enough that now I think 'good design' is when the page doesn't irritate the living f*ck out of me." (JWZ)
Re: libpq debug log
So drongo is still failing, and after a bit of looking around at other code I finally got hit with the clue hammer. Per port.h: * On Windows, setvbuf() does not support _IOLBF mode, and interprets that * as _IOFBF. To add insult to injury, setvbuf(file, NULL, _IOFBF, 0) * crashes outright if "parameter validation" is enabled. Therefore, in * places where we'd like to select line-buffered mode, we fall back to * unbuffered mode instead on Windows. Always use PG_IOLBF not _IOLBF * directly in order to implement this behavior. You want to do the honors? And do something about that shift bug while at it. regards, tom lane
Re: libpq debug log
"'alvhe...@alvh.no-ip.org'" writes: > On 2021-Apr-01, Tom Lane wrote: >> BTW, what in the world is this supposed to accomplish? >> -(long long) rows_to_send); >> +(1L << 62) + (long long) rows_to_send); > It makes the text representation wider, which means some buffer fills up > faster and the program switches from sending to receiving. Hm. If the actual field value isn't relevant, why bother including rows_to_send in it? A constant string would be simpler and much less confusing as to its purpose. regards, tom lane
Re: libpq debug log
On 2021-Apr-01, Tom Lane wrote: > BTW, what in the world is this supposed to accomplish? > > -(long long) rows_to_send); > +(1L << 62) + (long long) rows_to_send); > > Various buildfarm members are complaining that the shift distance > is more than the width of "long", which I'd just fix with s/L/LL/ > except that the addition of the constant seems just wrong to > begin with. It makes the text representation wider, which means some buffer fills up faster and the program switches from sending to receiving. -- Álvaro Herrera Valdivia, Chile "Ni aún el genio muy grande llegaría muy lejos si tuviera que sacarlo todo de su propio interior" (Goethe)
Re: libpq debug log
BTW, what in the world is this supposed to accomplish? -(long long) rows_to_send); +(1L << 62) + (long long) rows_to_send); Various buildfarm members are complaining that the shift distance is more than the width of "long", which I'd just fix with s/L/LL/ except that the addition of the constant seems just wrong to begin with. regards, tom lane
Re: libpq debug log
"'alvhe...@alvh.no-ip.org'" writes: > Eh, so I forgot to strdup(optarg[optind]). Apparently that works fine > in glibc but other getopt implementations are likely not so friendly. Hmm ... interesting theory, but I don't think I buy it, since the program isn't doing anything that should damage argv[]. I guess we'll soon find out. regards, tom lane
Re: libpq debug log
On 2021-Mar-31, Tom Lane wrote: > I wrote: > > That is weird - only test 4 (of 8) runs at all, the rest seem to > > fail to connect. What's different about pipelined_insert? > > Oh ... there's a pretty obvious theory. pipelined_insert is > the only one that is not asked to write a trace file. > So for some reason, opening the trace file fails. > (I wonder why we don't see an error message for this though.) Eh, so I forgot to strdup(optarg[optind]). Apparently that works fine in glibc but other getopt implementations are likely not so friendly. -- Álvaro Herrera39°49'30"S 73°17'W "I am amazed at [the pgsql-sql] mailing list for the wonderful support, and lack of hesitasion in answering a lost soul's question, I just wished the rest of the mailing list could be like this." (Fotis) (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)
Re: libpq debug log
On 2021-Mar-31, Tom Lane wrote: > While this may have little to do with drongo's issue, > I'm going to take exception to this bit that I see that > the patch added to PQtrace(): > > /* Make the trace stream line-buffered */ > setvbuf(debug_port, NULL, _IOLBF, 0); Mea culpa. I added this early on because it made PQtrace's use more comfortable, but I agree that it's a mistake. Removed. I put it in libpq_pipeline.c instead. > ... and as for an actual explanation for drongo's issue, > I'm sort of wondering why libpq_pipeline.c is opening the > trace file in "w+" mode rather than vanilla "w" mode. > That seems unnecessary, and maybe it doesn't work so well > on Windows. well, at least the MSVC documentation claims that it works, but who knows. I removed that too. https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-160 -- Álvaro Herrera Valdivia, Chile
Re: libpq debug log
I wrote: > What I suspect is some Windows dependency in the way that > 001_libpq_pipeline.pl is setting up the trace output files. While this may have little to do with drongo's issue, I'm going to take exception to this bit that I see that the patch added to PQtrace(): /* Make the trace stream line-buffered */ setvbuf(debug_port, NULL, _IOLBF, 0); I do not like that on two grounds: 1. The trace file handle belongs to the calling application, not to libpq. I do not think libpq has any business editorializing on the file's settings. 2. POSIX says that setvbuf() must be done before any other operation on the file. Since we have no way to know whether any output has already been written to the trace file, the above is a spec violation. ... and as for an actual explanation for drongo's issue, I'm sort of wondering why libpq_pipeline.c is opening the trace file in "w+" mode rather than vanilla "w" mode. That seems unnecessary, and maybe it doesn't work so well on Windows. regards, tom lane
Re: libpq debug log
"'alvhe...@alvh.no-ip.org'" writes: > On 2021-Mar-31, Tom Lane wrote: >> So for some reason, opening the trace file fails. >> (I wonder why we don't see an error message for this though.) > .. oh, I think we forgot to set conn->Pfdebug = NULL when creating the > connection. So when we do PQtrace(), the first thing it does is > PQuntrace(), and then that tries to do fflush(conn->Pfdebug) ---> crash. > So this should fix it. Nope, see the MemSet a few lines further up. This change seems like good style, but it won't fix anything --- else we'd have been having issues with the pre-existing trace logic. What I suspect is some Windows dependency in the way that 001_libpq_pipeline.pl is setting up the trace output files. cc'ing Andrew to see if he has any ideas. regards, tom lane
Re: libpq debug log
On 2021-Mar-31, 'alvhe...@alvh.no-ip.org' wrote: > .. oh, I think we forgot to set conn->Pfdebug = NULL when creating the > connection. So when we do PQtrace(), the first thing it does is > PQuntrace(), and then that tries to do fflush(conn->Pfdebug) ---> crash. > So this should fix it. I tried to use MALLOC_PERTURB_ to prove this theory, but I failed at it. I'll just push this blind and see what happens. -- Álvaro Herrera39°49'30"S 73°17'W "Someone said that it is at least an order of magnitude more work to do production software than a prototype. I think he is wrong by at least an order of magnitude." (Brian Kernighan)
Re: libpq debug log
On 2021-Mar-31, Tom Lane wrote: > I wrote: > > That is weird - only test 4 (of 8) runs at all, the rest seem to > > fail to connect. What's different about pipelined_insert? > > Oh ... there's a pretty obvious theory. pipelined_insert is > the only one that is not asked to write a trace file. > So for some reason, opening the trace file fails. > (I wonder why we don't see an error message for this though.) .. oh, I think we forgot to set conn->Pfdebug = NULL when creating the connection. So when we do PQtrace(), the first thing it does is PQuntrace(), and then that tries to do fflush(conn->Pfdebug) ---> crash. So this should fix it. -- Álvaro Herrera Valdivia, Chile really, I see PHP as like a strange amalgamation of C, Perl, Shell inflex: you know that "amalgam" means "mixture with mercury", more or less, right? i.e., "deadly poison" diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index a90bdb8ab6..56a8266bc3 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -3952,6 +3952,7 @@ makeEmptyPGconn(void) conn->verbosity = PQERRORS_DEFAULT; conn->show_context = PQSHOW_CONTEXT_ERRORS; conn->sock = PGINVALID_SOCKET; + conn->Pfdebug = NULL; /* * We try to send at least 8K at a time, which is the usual size of pipe
Re: libpq debug log
I wrote: > That is weird - only test 4 (of 8) runs at all, the rest seem to > fail to connect. What's different about pipelined_insert? Oh ... there's a pretty obvious theory. pipelined_insert is the only one that is not asked to write a trace file. So for some reason, opening the trace file fails. (I wonder why we don't see an error message for this though.) regards, tom lane
Re: libpq debug log
"'alvhe...@alvh.no-ip.org'" writes: > This is not the *only* issue though; at least animal drongo shows a > completely different failure, where the last few tests don't even get to > run, and the server log just has this: That is weird - only test 4 (of 8) runs at all, the rest seem to fail to connect. What's different about pipelined_insert? regards, tom lane
Re: libpq debug log
On 2021-Mar-31, Tom Lane wrote: > I think this is a timing problem that's triggered (on some machines) > by force_parallel_mode = regress. Looking at spurfowl's latest > failure of this type, the postmaster log shows > > 2021-03-31 14:34:54.982 EDT [18233:15] 001_libpq_pipeline.pl LOG: execute > : SELECT 1.0/g FROM generate_series(3, -1, -1) g > 2021-03-31 14:34:54.992 EDT [18234:1] ERROR: division by zero > 2021-03-31 14:34:54.992 EDT [18234:2] STATEMENT: SELECT 1.0/g FROM > generate_series(3, -1, -1) g > 2021-03-31 14:34:54.993 EDT [18233:16] 001_libpq_pipeline.pl ERROR: division > by zero > 2021-03-31 14:34:54.993 EDT [18233:17] 001_libpq_pipeline.pl STATEMENT: > SELECT 1.0/g FROM generate_series(3, -1, -1) g > 2021-03-31 14:34:54.995 EDT [18216:4] LOG: background worker "parallel > worker" (PID 18234) exited with exit code 1 > 2021-03-31 14:34:54.995 EDT [18233:18] 001_libpq_pipeline.pl LOG: could not > send data to client: Broken pipe > 2021-03-31 14:34:54.995 EDT [18233:19] 001_libpq_pipeline.pl FATAL: > connection to client lost > > We can see that the division by zero occurred in a parallel worker. > My theory is that in parallel mode, it's uncertain whether the > error will be reported before or after the "preceding" successful > output rows. So you need to disable parallelism to make this > test case stable. Ooh, that makes sense, thanks. Added a SET in the program itself to set it to off. This is not the *only* issue though; at least animal drongo shows a completely different failure, where the last few tests don't even get to run, and the server log just has this: 2021-03-31 20:20:14.460 UTC [178364:4] 001_libpq_pipeline.pl LOG: disconnection: session time: 0:00:00.469 user=pgrunner database=postgres host=127.0.0.1 port=52638 2021-03-31 20:20:14.681 UTC [178696:1] [unknown] LOG: connection received: host=127.0.0.1 port=52639 2021-03-31 20:20:14.687 UTC [178696:2] [unknown] LOG: connection authorized: user=pgrunner database=postgres application_name=001_libpq_pipeline.pl 2021-03-31 20:20:15.157 UTC [178696:3] 001_libpq_pipeline.pl LOG: could not receive data from client: An existing connection was forcibly closed by the remote host. I suppose the program is crashing for some reason. -- Álvaro Herrera Valdivia, Chile "Los dioses no protegen a los insensatos. Éstos reciben protección de otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)
Re: libpq debug log
"'alvhe...@alvh.no-ip.org'" writes: > So crake failed. The failure is that it doesn't print the DataRow > messages. That's quite odd. We see this in the trace log: I think this is a timing problem that's triggered (on some machines) by force_parallel_mode = regress. Looking at spurfowl's latest failure of this type, the postmaster log shows 2021-03-31 14:34:54.982 EDT [18233:15] 001_libpq_pipeline.pl LOG: execute : SELECT 1.0/g FROM generate_series(3, -1, -1) g 2021-03-31 14:34:54.992 EDT [18234:1] ERROR: division by zero 2021-03-31 14:34:54.992 EDT [18234:2] STATEMENT: SELECT 1.0/g FROM generate_series(3, -1, -1) g 2021-03-31 14:34:54.993 EDT [18233:16] 001_libpq_pipeline.pl ERROR: division by zero 2021-03-31 14:34:54.993 EDT [18233:17] 001_libpq_pipeline.pl STATEMENT: SELECT 1.0/g FROM generate_series(3, -1, -1) g 2021-03-31 14:34:54.995 EDT [18216:4] LOG: background worker "parallel worker" (PID 18234) exited with exit code 1 2021-03-31 14:34:54.995 EDT [18233:18] 001_libpq_pipeline.pl LOG: could not send data to client: Broken pipe 2021-03-31 14:34:54.995 EDT [18233:19] 001_libpq_pipeline.pl FATAL: connection to client lost We can see that the division by zero occurred in a parallel worker. My theory is that in parallel mode, it's uncertain whether the error will be reported before or after the "preceding" successful output rows. So you need to disable parallelism to make this test case stable. regards, tom lane
RE: libpq debug log
From: 'alvhe...@alvh.no-ip.org' > > got expected ERROR: cannot insert multiple commands into a prepared > statement > > got expected division-by-zero > > Eh? this is not at all expected, of course, but clearly not PQtrace's > fault. I'll look tomorrow. Iwata-san and I were starting to investigate the issue. I guessed the first message was not expected. Please let us know if there's something we can help. (I was amazed that you finally committed this great feature, libpq pipeline, while you are caring about many patches.) Regards Takayuki Tsunakawa
Re: libpq debug log
On 2021-Mar-30, 'alvhe...@alvh.no-ip.org' wrote: > got expected ERROR: cannot insert multiple commands into a prepared statement > got expected division-by-zero Eh? this is not at all expected, of course, but clearly not PQtrace's fault. I'll look tomorrow. -- Álvaro Herrera39°49'30"S 73°17'W "El sentido de las cosas no viene de las cosas, sino de las inteligencias que las aplican a sus problemas diarios en busca del progreso." (Ernesto Hernández-Novich)
RE: libpq debug log
From: 'alvhe...@alvh.no-ip.org' > Okay, pushed this patch and the new testing for it based on > libpq_pipeline. We'll see how the buildfarm likes it. Thank you very much! I appreciate you taking your valuable time while I imagine you must be pretty busy with taking care of other (possibly more important) patches. TBH, when Tom-san suggested drastic change, I was afraid we may not be able to complete this in PG 14. But in the end, I'm very happy that the patch has become much leaner and cleaner. And congratulations on your first commit, Iwata-san! I hope you can have time and energy to try adding a connection parameter to enable tracing, which eliminates application modification. > I didn't like the idea of silently skipping the redacted fields, so I > changed the code to print or instead. I also made the > redacting occur in the last mile (pqTraceOutputInt32 / String) rather > that in their callers: it seemed quite odd to advance the cursor in the > "else" branch. > > I refactored the duplicate code that appeared for Notice and Error. > In that function, we redact not only the 'L' field (what Iwata-san was > doing) but also 'F' (file) and 'R' (routine) because those things can > move around for reasons that are not interesting to testing this code. > > In the libpq_pipeline commit I added 'pipeline_abort' and 'transaction' > to the cases that generate traces, which adds coverage for > NoticeResponse and ErrorResponse. These make sense to me. Thank you for repeatedly polishing and making the patch better much. Regards Takayuki Tsunakawa
Re: libpq debug log
So crake failed. The failure is that it doesn't print the DataRow messages. That's quite odd. We see this in the trace log: Mar 30 20:05:15 # F 54 Parse"" "SELECT 1.0/g FROM generate_series(3, -1, -1) g" 0 Mar 30 20:05:15 # F 12 Bind "" "" 0 0 0 Mar 30 20:05:15 # F 6 Describe P "" Mar 30 20:05:15 # F 9 Execute "" 0 Mar 30 20:05:15 # F 4 Sync Mar 30 20:05:15 # B 4 ParseComplete Mar 30 20:05:15 # B 4 BindComplete Mar 30 20:05:15 # B 33 RowDescription 1 "?column?" 0 65535 -1 0 Mar 30 20:05:15 # B 70 ErrorResponseS "ERROR" V "ERROR" C "22012" M "division by zero" F "" L "" R "" \\x00 Mar 30 20:05:15 # B 5 ReadyForQueryI and the expected is this: Mar 30 20:05:15 # F 54 Parse"" "SELECT 1.0/g FROM generate_series(3, -1, -1) g" 0 Mar 30 20:05:15 # F 12 Bind "" "" 0 0 0 Mar 30 20:05:15 # F 6 Describe P "" Mar 30 20:05:15 # F 9 Execute "" 0 Mar 30 20:05:15 # F 4 Sync Mar 30 20:05:15 # B 4 ParseComplete Mar 30 20:05:15 # B 4 BindComplete Mar 30 20:05:15 # B 33 RowDescription 1 "?column?" 0 65535 -1 0 Mar 30 20:05:15 # B 32 DataRow 1 22 '0.' Mar 30 20:05:15 # B 32 DataRow 1 22 '0.5000' Mar 30 20:05:15 # B 32 DataRow 1 22 '1.' Mar 30 20:05:15 # B 70 ErrorResponseS "ERROR" V "ERROR" C "22012" M "division by zero" F "" L "" R "" \\x00 Mar 30 20:05:15 # B 5 ReadyForQueryI ... I wonder if this is a libpq pipeline problem rather than a PQtrace problem. In that test case we're using the libpq singlerow mode, so we should see three rows before the error is sent, but for some reason crake is not doing that. aborted pipeline... NOTICE: table "pq_pipeline_demo" does not exist, skipping ok got expected ERROR: cannot insert multiple commands into a prepared statement got expected division-by-zero ok 5 - libpq_pipeline pipeline_abort not ok 6 - pipeline_abort trace match Other hosts seem to get it right: # Running: libpq_pipeline -t /Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/libpq_pipeline/tmp_check/log/pipeline_abort.trace pipeline_abort port=49797 host=/var/folders/md/8mp8j5m5169ccgy11plb4w_8gp/T/iA9YP9_IHa dbname='postgres' 1 aborted pipeline... NOTICE: table "pq_pipeline_demo" does not exist, skipping ok got expected ERROR: cannot insert multiple commands into a prepared statement got row: 0. got row: 0.5000 got row: 1. got expected division-by-zero ok 5 - libpq_pipeline pipeline_abort ok 6 - pipeline_abort trace match -- Álvaro Herrera Valdivia, Chile "This is what I like so much about PostgreSQL. Most of the surprises are of the "oh wow! That's cool" Not the "oh shit!" kind. :)" Scott Marlowe, http://archives.postgresql.org/pgsql-admin/2008-10/msg00152.php
Re: libpq debug log
Okay, pushed this patch and the new testing for it based on libpq_pipeline. We'll see how the buildfarm likes it. I made some further changes to the last version; user-visibly, I split the trace flags in two, keeping the timestamp suppression separate from the redacting feature for regression testing. I didn't like the idea of silently skipping the redacted fields, so I changed the code to print or instead. I also made the redacting occur in the last mile (pqTraceOutputInt32 / String) rather that in their callers: it seemed quite odd to advance the cursor in the "else" branch. I refactored the duplicate code that appeared for Notice and Error. In that function, we redact not only the 'L' field (what Iwata-san was doing) but also 'F' (file) and 'R' (routine) because those things can move around for reasons that are not interesting to testing this code. In the libpq_pipeline commit I added 'pipeline_abort' and 'transaction' to the cases that generate traces, which adds coverage for NoticeResponse and ErrorResponse. -- Álvaro Herrera39°49'30"S 73°17'W
RE: libpq debug log
Hi Alvaro san, Tsunakawa san Thank you for creating the v30 patch. > From: Tsunakawa, Takayuki/綱川 貴之 > Sent: Monday, March 29, 2021 9:45 AM ... > Iwata-san, > Please review Alvaro-san's code, and I think you can integrate all patches > into > one except for 0002 and 0007. Those two patches may be separate or > merged into one as a test patch. I reviewed v30 patches. I think it was good except that the documentation about the direction of the message was not changing. I also tried the v30 patch using regression test and it worked fine. I merged v30 patches and update the patch to v31. This new version patch includes the fix of libpq.smgl fix and the addition of regression test mode. In libpq.smgl, the symbol indicating the message direction has been corrected from "<" ">" to "B" "F" in the documentation. > From: alvhe...@alvh.no-ip.org > Sent: Sunday, March 28, 2021 4:28 AM ... > Maybe the easiest way is to have a new flag PQTRACE_REGRESS_MODE. To prepare for regression test, I read Message Formats documentation. https://www.postgresql.org/docs/current/protocol-message-formats.html Following protocol messages have values that depend on the code of master at that time; - BackendKeyData(B) ... includes backend PID and backend private key - ErrorResponse(B) ... includes error message line number - FunctionCall(F) ... includes function OID - NoticeResponse(B) ... includes notice message line number - NotificationResponse (B) ... includes backend PID - ParameterDescription ... includes parameter OID - Parse(F) ... includes parameter data type OID - RowDescription(B) ... includes OIDs I checked status of conn->pqTraceFlags to decide whether output version-dependent messages or not in above protocol message output functions. In ErrorResponse and NoticeResponse, I skip string type message logging only when field type code is "L". In my understanding, other field code message type does not depend on version. So I didn't skip other code type's string messages. And I also changed description of pqTraceSetFlags() by changing PQTRACE_SUPPRESS_TIMESTAMPS flag to the PQTRACE_REGRESS_MODE flag. Output of regress mode is following; B 124 ErrorResponseS "ERROR" V "ERROR" C "22023" M "unrecognized parameter "some_nonexistent_parameter"" F "reloptions.c" L R "parseRelOptionsInternal" \x00 B 14 ParameterDescription 2 Output of non-regress mode is following; 2021-03-30 12:55:31.327913 B 124 ErrorResponseS "ERROR" V "ERROR" C "22023" M "unrecognized parameter "some_nonexistent_parameter"" F "reloptions.c" L "1447" R "parseRelOptionsInternal" \x00 2021-03-30 12:56:12.691617 B 14 ParameterDescription 2 25 701 Regards, Aya Iwata v31-libpq-trace-log.patch Description: v31-libpq-trace-log.patch
RE: libpq debug log
From: 'alvhe...@alvh.no-ip.org' > > > (Hey, what the heck is that "Z" at the end of the message? I thought > > > the error ended right at the \x00 ...) > > > > We'll investigate these issues. > > For what it's worth, I did fix this problem in patch 0005 that I > attached. The problem was that one "continue" should have been "break", > and also a "while ( .. )" needed to be made an infinite loop. It was > easy to catch these problems once I added (in 0006) the check that the > bytes consumed equal message length, as I had suggested a couple of > weeks ago :-) I also changed the code for Notice, but I didn't actually > verify that one. You already fix the issue, didn't you? Thank you. Your suggestion of checking the length has surely proved to be correct. Regards Takayuki Tsunakawa
RE: libpq debug log
From: Kyotaro Horiguchi > It's better to be short as far as it is clear enough. Actually '<' to > 'F' and '>' to 'B' is clear enough to me. So I don't need a longer > notation. Agreed, because the message format description in the PG manual uses F and B. Regards Takayuki Tsunakawa
Re: libpq debug log
At Mon, 29 Mar 2021 00:02:58 -0300, "'alvhe...@alvh.no-ip.org'" wrote in > On 2021-Mar-29, tsunakawa.ta...@fujitsu.com wrote: > > > > (Hey, what the heck is that "Z" at the end of the message? I thought > > > the error ended right at the \x00 ...) > > > > We'll investigate these issues. > > For what it's worth, I did fix this problem in patch 0005 that I > attached. The problem was that one "continue" should have been "break", > and also a "while ( .. )" needed to be made an infinite loop. It was > easy to catch these problems once I added (in 0006) the check that the > bytes consumed equal message length, as I had suggested a couple of > weeks ago :-) I also changed the code for Notice, but I didn't actually > verify that one. > > > > 2. The < and > characters are not good for visual inspection. I > > > replaced them with F and B and I think it's much clearer. Compare: > > > I think the second one is much easier on the eye. > > > > Yes, agreed. I too thought of something like "C->S" and "S->C" > > because client and server should be more familiar for users than > > frontend and backend. > > Hmm, yeah, that's a reasonable option too. What do others think? It's better to be short as far as it is clear enough. Actually '<' to 'F' and '>' to 'B' is clear enough to me. So I don't need a longer notation. O(ut) and (I)n also makes sense to me. Rather, "C->S", and "S->C" are a little difficult to understand at a glance regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: libpq debug log
On 2021-Mar-29, tsunakawa.ta...@fujitsu.com wrote: > > (Hey, what the heck is that "Z" at the end of the message? I thought > > the error ended right at the \x00 ...) > > We'll investigate these issues. For what it's worth, I did fix this problem in patch 0005 that I attached. The problem was that one "continue" should have been "break", and also a "while ( .. )" needed to be made an infinite loop. It was easy to catch these problems once I added (in 0006) the check that the bytes consumed equal message length, as I had suggested a couple of weeks ago :-) I also changed the code for Notice, but I didn't actually verify that one. > > 2. The < and > characters are not good for visual inspection. I > > replaced them with F and B and I think it's much clearer. Compare: > > I think the second one is much easier on the eye. > > Yes, agreed. I too thought of something like "C->S" and "S->C" > because client and server should be more familiar for users than > frontend and backend. Hmm, yeah, that's a reasonable option too. What do others think? -- Álvaro Herrera Valdivia, Chile
RE: libpq debug log
From: alvhe...@alvh.no-ip.org > > Proposed changes on top of v29. > > This last one uses libpq_pipeline -t and verifies the output against an > expected > trace file. Applies on top of all the previous patches. I attach the whole > lot, > so that the CF bot has a chance to run it. Thank you for polishing the patch. Iwata-san, Please review Alvaro-san's code, and I think you can integrate all patches into one except for 0002 and 0007. Those two patches may be separate or merged into one as a test patch. > I did notice another problem for comparison of expected trace files, which is > that RowDescription includes table OIDs for some columns. I think we would > need to have a flag to suppress that too, somehow, although the answer to what > should we do is not as clear as for the other two cases. I'm afraid this may render the test comparison almost impossible. Tests that access system catalogs and large objects probably output OIDs. Regards Takayuki Tsunakawa
RE: libpq debug log
From: alvhe...@alvh.no-ip.org > I added an option to the new libpq_pipeline program that it activates > libpq trace. It works nicely and I think we can add that to the > regression tests. That's good news. Thank you. > 1. The trace output for the error message won't be very nice, because it > prints line numbers. So if I want to match the output to an "expected" > file, it would break every time somebody edits the source file where the > error originates and the ereport() line is moved. For example: > (Hey, what the heck is that "Z" at the end of the message? I thought > the error ended right at the \x00 ...) We'll investigate these issues. > 2. The < and > characters are not good for visual inspection. I > replaced them with F and B and I think it's much clearer. Compare: > I think the second one is much easier on the eye. Yes, agreed. I too thought of something like "C->S" and "S->C" because client and server should be more familiar for users than frontend and backend. Regards Takayuki Tsunakawa
Re: libpq debug log
On 2021-Mar-27, alvhe...@alvh.no-ip.org wrote: > This last one uses libpq_pipeline -t and verifies the output against an > expected trace file. Applies on top of all the previous patches. I > attach the whole lot, so that the CF bot has a chance to run it. All tests pass, but CFbot does not run src/test/modules, so it's not saying much. -- Álvaro Herrera39°49'30"S 73°17'W
Re: libpq debug log
On 2021-Mar-26, alvhe...@alvh.no-ip.org wrote: > Proposed changes on top of v29. This last one uses libpq_pipeline -t and verifies the output against an expected trace file. Applies on top of all the previous patches. I attach the whole lot, so that the CF bot has a chance to run it. I did notice another problem for comparison of expected trace files, which is that RowDescription includes table OIDs for some columns. I think we would need to have a flag to suppress that too, somehow, although the answer to what should we do is not as clear as for the other two cases. I dodged the issue by just using -t for the pipeline cases that don't have OIDs in their output. This leaves us with no coverage for the ErrorResponse message, which I think is a shortcoming. If we fixed the OID problem and the ErrorResponse problem, we could add an expected trace for pipeline_abort. I think we should do that. Maybe the easiest way is to have a new flag PQTRACE_REGRESS_MODE. -- Álvaro Herrera39°49'30"S 73°17'W Essentially, you're proposing Kevlar shoes as a solution for the problem that you want to walk around carrying a loaded gun aimed at your foot. (Tom Lane) >From f239da90f069b19e1c2fa6b69bc7c6fd4be826b9 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 26 Mar 2021 11:10:52 -0300 Subject: [PATCH v30 1/7] libpq trace v29 --- doc/src/sgml/libpq.sgml | 39 +- src/interfaces/libpq/Makefile | 1 + src/interfaces/libpq/exports.txt| 1 + src/interfaces/libpq/fe-connect.c | 21 - src/interfaces/libpq/fe-exec.c | 4 - src/interfaces/libpq/fe-misc.c | 66 +-- src/interfaces/libpq/fe-protocol3.c | 8 + src/interfaces/libpq/libpq-fe.h | 2 + src/interfaces/libpq/libpq-int.h| 7 + src/interfaces/libpq/libpq-trace.c | 717 10 files changed, 789 insertions(+), 77 deletions(-) create mode 100644 src/interfaces/libpq/libpq-trace.c diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index be674fbaa9..838e394d54 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -6459,12 +6459,27 @@ PGContextVisibility PQsetErrorContextVisibility(PGconn *conn, PGContextVisibilit - Enables tracing of the client/server communication to a debugging file stream. + Enables tracing of the client/server communication to a debugging file + stream. void PQtrace(PGconn *conn, FILE *stream); + + Each line consists of: an optional timestamp, a direction indicator + (> for messages from client to server + or < for messages from server to client), + message length, message type, and message contents. + Non-message contents fields (timestamp, direction, length and message type) + are separated by a tab. Message contents are separated by a space. + Protocol strings are enclosed in double quotes, while strings used as data + values are enclosed in single quotes. Non-printable chars are printed as + hexadecimal escapes. + Further message-type-specific detail can be found in + . + + On Windows, if the libpq library and an application are @@ -6479,6 +6494,28 @@ void PQtrace(PGconn *conn, FILE *stream); + +PQtraceSetFlagsPQtraceSetFlags + + + + Controls the tracing behavior of client/server communication. + +void PQtraceSetFlags(PGconn *conn, int flags); + + + + + flags contains flag bits describing the operating mode + of tracing. + If flags contains PQTRACE_SUPPRESS_TIMESTAMPS, + then the timestamp is not included when printing each message. + This function must be called after calling PQtrace. + + + + + PQuntracePQuntrace diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile index 2aca882a2b..0424523492 100644 --- a/src/interfaces/libpq/Makefile +++ b/src/interfaces/libpq/Makefile @@ -41,6 +41,7 @@ OBJS = \ fe-secure.o \ legacy-pqsignal.o \ libpq-events.o \ + libpq-trace.o \ pqexpbuffer.o \ fe-auth.o diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt index 5c48c14191..a00701f2c5 100644 --- a/src/interfaces/libpq/exports.txt +++ b/src/interfaces/libpq/exports.txt @@ -183,3 +183,4 @@ PQenterPipelineMode 180 PQexitPipelineMode181 PQpipelineSync182 PQpipelineStatus 183 +PQtraceSetFlags 184 diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 53b354abb2..a90bdb8ab6 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -6859,27 +6859,6 @@ PQsetErrorContextVisibility(PGconn *conn, PGContextVisibility show_context) return old; } -void -PQtrace(PGconn *conn, FILE *debug_port) -{ - if (conn == NULL) - return; - PQuntrace(conn); - conn->Pfdebug = debug_port; -} - -void -PQuntrace(PGconn *conn) -{ - if (conn == N
Re: libpq debug log
Proposed changes on top of v29. -- Álvaro Herrera Valdivia, Chile >From b32ae3805bb28553c0a1cf308c6ed27f58576f3c Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 26 Mar 2021 19:12:12 -0300 Subject: [PATCH 1/5] libpq_pipeline: add -t support for PQtrace --- .../modules/libpq_pipeline/libpq_pipeline.c | 84 +-- 1 file changed, 59 insertions(+), 25 deletions(-) diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/test/modules/libpq_pipeline/libpq_pipeline.c index 846ee9f5ab..34d085035b 100644 --- a/src/test/modules/libpq_pipeline/libpq_pipeline.c +++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c @@ -23,6 +23,7 @@ #include "catalog/pg_type_d.h" #include "common/fe_memutils.h" #include "libpq-fe.h" +#include "pg_getopt.h" #include "portability/instr_time.h" @@ -30,6 +31,9 @@ static void exit_nicely(PGconn *conn); const char *const progname = "libpq_pipeline"; +/* Options and defaults */ +char *tracefile = NULL; /* path to PQtrace() file */ + #define DEBUG #ifdef DEBUG @@ -1209,8 +1213,10 @@ usage(const char *progname) { fprintf(stderr, "%s tests libpq's pipeline mode.\n\n", progname); fprintf(stderr, "Usage:\n"); - fprintf(stderr, " %s tests", progname); - fprintf(stderr, " %s testname [conninfo [number_of_rows]]\n", progname); + fprintf(stderr, " %s [OPTION] tests\n", progname); + fprintf(stderr, " %s [OPTION] TESTNAME [CONNINFO [NUMBER_OF_ROWS]\n", progname); + fprintf(stderr, "\nOptions:\n"); + fprintf(stderr, " -t TRACEFILE generate a libpq trace to TRACEFILE\n"); } static void @@ -1231,37 +1237,54 @@ main(int argc, char **argv) { const char *conninfo = ""; PGconn *conn; + FILE *trace; + char *testname; int numrows = 1; PGresult *res; + int c; - if (strcmp(argv[1], "tests") == 0) + while ((c = getopt(argc, argv, "t:")) != -1) { - print_test_list(); - exit(0); + switch (c) + { + case 't': /* trace file */ +tracefile = pg_strdup(optarg); +break; + } } - /* - * The testname parameter is mandatory; it can be followed by a conninfo - * string and number of rows. - */ - if (argc < 2 || argc > 4) + if (optind < argc) + { + testname = argv[optind]; + optind++; + } + else { usage(argv[0]); exit(1); } - if (argc >= 3) - conninfo = pg_strdup(argv[2]); + if (strcmp(testname, "tests") == 0) + { + print_test_list(); + exit(0); + } - if (argc >= 4) + if (optind < argc) + { + conninfo = argv[optind]; + optind++; + } + if (optind < argc) { errno = 0; - numrows = strtol(argv[3], NULL, 10); + numrows = strtol(argv[optind], NULL, 10); if (errno != 0 || numrows <= 0) { - fprintf(stderr, "couldn't parse \"%s\" as a positive integer\n", argv[3]); + fprintf(stderr, "couldn't parse \"%s\" as a positive integer\n", argv[optind]); exit(1); } + optind++; } /* Make a connection to the database */ @@ -1272,30 +1295,41 @@ main(int argc, char **argv) PQerrorMessage(conn)); exit_nicely(conn); } + + /* Set the trace file, if requested */ + if (tracefile != NULL) + { + trace = fopen(tracefile, "w+"); + + if (trace == NULL) + pg_fatal("could not open file \"%s\": %m", tracefile); + PQtrace(conn, trace); + PQtraceSetFlags(conn, PQTRACE_SUPPRESS_TIMESTAMPS); + } + res = PQexec(conn, "SET lc_messages TO \"C\""); if (PQresultStatus(res) != PGRES_COMMAND_OK) pg_fatal("failed to set lc_messages: %s", PQerrorMessage(conn)); - if (strcmp(argv[1], "disallowed_in_pipeline") == 0) + if (strcmp(testname, "disallowed_in_pipeline") == 0) test_disallowed_in_pipeline(conn); - else if (strcmp(argv[1], "multi_pipelines") == 0) + else if (strcmp(testname, "multi_pipelines") == 0) test_multi_pipelines(conn); - else if (strcmp(argv[1], "pipeline_abort") == 0) + else if (strcmp(testname, "pipeline_abort") == 0) test_pipeline_abort(conn); - else if (strcmp(argv[1], "pipelined_insert") == 0) + else if (strcmp(testname, "pipelined_insert") == 0) test_pipelined_insert(conn, numrows); - else if (strcmp(argv[1], "prepared") == 0) + else if (strcmp(testname, "prepared") == 0) test_prepared(conn); - else if (strcmp(argv[1], "simple_pipeline") == 0) + else if (strcmp(testname, "simple_pipeline") == 0) test_simple_pipeline(conn); - else if (strcmp(argv[1], "singlerow") == 0) + else if (strcmp(testname, "singlerow") == 0) test_singlerowmode(conn); - else if (strcmp(argv[1], "transaction") == 0) + else if (strcmp(testname, "transaction") == 0) test_transaction(conn); else { - fprintf(stderr, "\"%s\" is not a recognized test name\n", argv[1]); - usage(argv[0]); + fprintf(stderr, "\"%s\" is not a recognized test name\n", testname); exit(1); } -- 2.20.1 >From 4b62c3159fe3aa8445317a5d65b7e81d91c7fba6 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 26 Mar 2021 19:13:04 -0300 Subject: [PATCH 2/5] put FILE * arg always first --- src/interfaces/libpq/libpq-trace.c | 234 ++--- 1 fil
Re: libpq debug log
Hello I added an option to the new libpq_pipeline program that it activates libpq trace. It works nicely and I think we can add that to the regression tests. However I have two observations. 1. The trace output for the error message won't be very nice, because it prints line numbers. So if I want to match the output to an "expected" file, it would break every time somebody edits the source file where the error originates and the ereport() line is moved. For example: < 70 ErrorResponseS "ERROR" V "ERROR" C "22012" M "division by zero" F "numeric.c" L "8366" R "div_var" \x00 "Z" The line number 8366 in this line would be problematic. I think if we want regression testing for this, we should have another trace flag to suppress output of a few of these fields. I would have it print only S, C and M. (Hey, what the heck is that "Z" at the end of the message? I thought the error ended right at the \x00 ...) 2. The < and > characters are not good for visual inspection. I replaced them with F and B and I think it's much clearer. Compare: > 27 Query"SET lc_messages TO "C"" < 8 CommandComplete "SET" < 5 ReadyForQueryI > 21 Parse"" "SELECT $1" 1 23 > 19 Bind "" "" 0 1 1 '1' 1 0 > 6 Describe P "" > 9 Execute "" 0 > 4 Sync < 4 ParseComplete < 4 BindComplete < 33 RowDescription 1 "?column?" 0 0 23 4 -1 0 < 11 DataRow 1 1 '1' < 13 CommandComplete "SELECT 1" < 5 ReadyForQueryI > 4 Terminate with F 27 Query"SET lc_messages TO "C"" B 8 CommandComplete "SET" B 5 ReadyForQueryI F 21 Parse"" "SELECT $1" 1 23 F 19 Bind "" "" 0 1 1 '1' 1 0 F 6 Describe P "" F 9 Execute "" 0 F 4 Sync B 4 ParseComplete B 4 BindComplete B 33 RowDescription 1 "?column?" 0 0 23 4 -1 0 B 11 DataRow 1 1 '1' B 13 CommandComplete "SELECT 1" B 5 ReadyForQueryI F 4 Terminate I think the second one is much easier on the eye. (This one is the output from "libpq_pipeline simple_pipeline"). -- Álvaro Herrera Valdivia, Chile "Saca el libro que tu religión considere como el indicado para encontrar la oración que traiga paz a tu alma. Luego rebootea el computador y ve si funciona" (Carlos Duclós)
RE: libpq debug log
Alvaro-san, Iwata-san, cc: Tom-san, Horiguchi-san, Thank you, it finally looks complete to me! Alvaro-san, We appreciate your help and patience, sometimes rewriting large part of the patch. Could you do the final check (and possibly tweak) for commit? Regards Takayuki Tsunakawa
RE: libpq debug log
Hi Tsunakawa san, Thank you for your review. I update patch to v29. Output is following. It is fine. ``` 2021-03-19 07:21:09.917302 > 4 Terminate 2021-03-19 07:21:09.961494 > 155 Query"CREATE TABLESPACE regress_tblspacewith LOCATION '/home/iwata/pgsql/postgres/src/test/regress/testtablespace' WITH (some_nonexistent_parameter = true);" 2021-03-19 07:21:09.962657 < 124 ErrorResponseS "ERROR" V "ERROR" C "22023" M "unrecognized parameter "some_nonexistent_parameter"" F "reloptions.c" L "1456" R "parseRelOptionsInternal" \x00 "Z" 2021-03-19 07:21:09.962702 < 5 ReadyForQueryI 2021-03-19 07:21:09.962767 > 144 Query"CREATE TABLESPACE regress_tblspacewith LOCATION '/home/iwata/pgsql/postgres/src/test/regress/testtablespace' WITH (random_page_cost = 3.0);" 2021-03-19 07:21:09.967056 < 22 CommandComplete "CREATE TABLESPACE" 2021-03-19 07:21:09.967092 < 5 ReadyForQueryI 2021-03-19 07:21:09.967148 > 81 Query"SELECT spcoptions FROM pg_tablespace WHERE spcname = 'regress_tblspacewith';" 2021-03-19 07:21:09.970338 < 35 RowDescription 1 "spcoptions" 1213 5 1009 65535 -1 0 2021-03-19 07:21:09.970402 < 32 DataRow 1 22 '{random_page_cost=3.0}' 2021-03-19 07:21:09.970420 < 13 CommandComplete "SELECT 1" 2021-03-19 07:21:09.970431 < 5 ReadyForQueryI 2021-03-19 07:21:09.970558 > 42 Query"DROP TABLESPACE regress_tblspacewith;" 2021-03-19 07:21:09.971500 < 20 CommandComplete "DROP TABLESPACE" 2021-03-19 07:21:09.971561 < 5 ReadyForQueryI ``` > -Original Message- > From: Tsunakawa, Takayuki/綱川 貴之 > Sent: Friday, March 19, 2021 11:37 AM > Thanks to removing the static arrays, the code got much smaller. I'm happy > with this result. Thank you so much. Your advice was very helpful in refactoring the patch. > (1) > + (> for messages from client to server, > + and < for messages from server to client), > > I think this was meant to say "> or <". So, maybe you should remove "," at > the end of the first line, and replace "and" with "or". I fixed. > > > (2) > + case 8 :/* GSSENCRequest or SSLRequest */ > + /* These messsage does not reach here. */ > > messsage does -> messages do I fixed. > (3) > + fprintf(f, "\tAuthentication"); > > Why don't you move this \t in each message type to the end of: > > + fprintf(conn->Pfdebug, "%s\t%s\t%d", timestr, prefix, length); I fixed. > > Plus, don't we say in the manual that fields (timestamp, direction, length, > message type, and message content) are separated by a tab? Sure. I added following documentation; + Non-message contents fields (timestamp, direction, length and message type) + are separated by a tab. Message contents are separated by a space. > Also, the code doesn't seem to separate the message type and content with a > tab. I fixed to output a tab at the end of message types. > (4) > Where did the processing for unknown message go in > pqTraceOutputMessage()? Add default label? > (5) > + case 16:/* CancelRequest */ > + fprintf(conn->Pfdebug, > "%s\t>\t%d\tCancelRequest", timestr, length); > > I think this should follow pqTraceOutputMessage(). That is, each case label > only print the message type name in case other message types are added in > the future. Sure. I fixed pqTraceOutputNoTypeByteMessage() following to pqTraceOutputMessage() style. Regards, Aya Iwata v29-libpq-trace-log.patch Description: v29-libpq-trace-log.patch
RE: libpq debug log
Hello Iwata-san, Thanks to removing the static arrays, the code got much smaller. I'm happy with this result. (1) + (> for messages from client to server, + and < for messages from server to client), I think this was meant to say "> or <". So, maybe you should remove "," at the end of the first line, and replace "and" with "or". (2) + case 8 :/* GSSENCRequest or SSLRequest */ + /* These messsage does not reach here. */ messsage does -> messages do (3) + fprintf(f, "\tAuthentication"); Why don't you move this \t in each message type to the end of: + fprintf(conn->Pfdebug, "%s\t%s\t%d", timestr, prefix, length); Plus, don't we say in the manual that fields (timestamp, direction, length, message type, and message content) are separated by a tab? Also, the code doesn't seem to separate the message type and content with a tab. (4) Where did the processing for unknown message go in pqTraceOutputMessage()? Add default label? (5) + case 16:/* CancelRequest */ + fprintf(conn->Pfdebug, "%s\t>\t%d\tCancelRequest", timestr, length); I think this should follow pqTraceOutputMessage(). That is, each case label only print the message type name in case other message types are added in the future. Regards Takayuki Tsunakawa
RE: libpq debug log
Hi Horiguchi san and Tsunakawa san, Thank you for you review. I update patch to v28. In this patch, I removed array. And I fixed some code according to Horiguchi san and Tsunakawa san review comment. > From: Tsunakawa, Takayuki/綱川 貴之 > Sent: Thursday, March 18, 2021 12:38 PM > I sort of think so to remove the big arrays. But to minimize duplicate code, > I > think the code structure will look like: > > fprintf(timestamp, length); > switch (type) > { > case '?': > pqTraceOutput?(...); > break; > case '?': > /* No message content */ > fprintf("message_type_name"); > break; > } > > void > pqTraceOutput?(...) > { > fprintf("message_type_name"); > print message content; > } The code follows above format. And I changed .sgml documentation; - Changed order of message length and type - Removed byte-16 and byte-32 explanation because I removed # output in previous patch. Output style is following; ``` 2021-03-18 08:59:36.141660 < 5 ReadyForQuery I 2021-03-18 08:59:36.141723 > 4 Terminate 2021-03-18 08:59:36.173263 > 155 Query "CREATE TABLESPACE regress_tblspacewith LOCATION '/home/iwata/pgsql/postgres/src/test/regress/testtablespace' WITH (some_nonexistent_parameter = true);" 2021-03-18 08:59:36.174439 < 124 ErrorResponse S "ERROR" V "ERROR" C "22023" M "unrecognized parameter "some_nonexistent_parameter"" F "reloptions.c" L "1456" R "parseRelOptionsInternal" \x00 "Z" 2021-03-18 08:59:36.174483 < 5 ReadyForQuery I 2021-03-18 08:59:36.174545 > 144 Query "CREATE TABLESPACE regress_tblspacewith LOCATION '/home/iwata/pgsql/postgres/src/test/regress/testtablespace' WITH (random_page_cost = 3.0);" 2021-03-18 08:59:36.176155 < 22 CommandComplete "CREATE TABLESPACE" 2021-03-18 08:59:36.176190 < 5 ReadyForQuery I 2021-03-18 08:59:36.176243 > 81 Query "SELECT spcoptions FROM pg_tablespace WHERE spcname = 'regress_tblspacewith';" 2021-03-18 08:59:36.179286 < 35 RowDescription 1 "spcoptions" 1213 5 1009 65535 -1 0 2021-03-18 08:59:36.179326 < 32 DataRow 1 22 '{random_page_cost=3.0}' 2021-03-18 08:59:36.179339 < 13 CommandComplete "SELECT 1" 2021-03-18 08:59:36.179349 < 5 ReadyForQuery I 2021-03-18 08:59:36.179504 > 42 Query "DROP TABLESPACE regress_tblspacewith;" 2021-03-18 08:59:36.180400 < 20 CommandComplete "DROP TABLESPACE" 2021-03-18 08:59:36.180432 < 5 ReadyForQuery I ``` > -Original Message- > From: Kyotaro Horiguchi > Sent: Thursday, March 18, 2021 5:30 PM > > At Thu, 18 Mar 2021 07:34:36 +, "tsunakawa.ta...@fujitsu.com" > wrote in > > From: Iwata, Aya/岩田 彩 > > > > Yes, precisely, 2 bytes for the double quotes needs to be > > > > subtracted as > > > > follows: > > > > > > > > len = fprintf(...); > > > > *cursor += (len - 2); > > > > > > Thank you for your advice. I changed pqTraceOutputString set cursor > > > to fprintf return -2. > > > And I removed cursor movement from that function. > > > > Ouch, not 2 but 3, to include a single whitespace at the beginning. > > > > The rest looks good. I hope we're almost at the finish line. > > Maybe. String is end by '\0'. So (len -2) is OK. However it seems like mistake because fprintf output string and 3 characters. So I added explanation here and changed (len -2) to (len -3 +1). I think it is OK because I found similar style in ecpg code. > > + pqTraceOutputR(const char *message, FILE *f) { > > + int cursor = 0; > > + > > + pqTraceOutputInt32(message, &cursor, f); > > > > I don't understand the reason for spliting message and &cursor here. > > > > + pqTraceOutputR(const char *message, FILE *f) { > > + char *p = message; > > + > > + pqTraceOutputInt32(&p, f); > > > > works well. > > Yes, that would also work. But I like the separate cursor + fixed starting > point here, probably because it's sometimes confusing to see the pointer > value changed inside functions. (And a pointer itself is an allergy for some > people, not to mention a pointer to ointer...) Also, libpq uses cursors for > network I/O buffers. So, I think the patch can be as it is now. I think pass message and cursor is better. Because it is easy to understand and The moving cursor style is used when libpq execute protocol message. So I didn't make this change. > +/* RowDescription */ > +static void > +pqTraceOutputT(const char *message, int end, FILE *f) { > + int cursor = 0; > + int nfields; > + int i; > + > + nfields = pqTraceOutputInt16(message, &cursor, f); > + > + for (i = 0; i < nf
RE: libpq debug log
From: Kyotaro Horiguchi > + pqTraceOutputR(const char *message, FILE *f) > + { > + int cursor = 0; > + > + pqTraceOutputInt32(message, &cursor, f); > > I don't understand the reason for spliting message and &cursor here. > > + pqTraceOutputR(const char *message, FILE *f) > + { > + char *p = message; > + > + pqTraceOutputInt32(&p, f); > > works well. Yes, that would also work. But I like the separate cursor + fixed starting point here, probably because it's sometimes confusing to see the pointer value changed inside functions. (And a pointer itself is an allergy for some people, not to mention a pointer to ointer...) Also, libpq uses cursors for network I/O buffers. So, I think the patch can be as it is now. > +static void > +pqTraceOutputT(const char *message, int end, FILE *f) > +{ > + int cursor = 0; > + int nfields; > + int i; > + > + nfields = pqTraceOutputInt16(message, &cursor, f); > + > + for (i = 0; i < nfields; i++) > + { > + pqTraceOutputString(message, &cursor, end, f); > + pqTraceOutputInt32(message, &cursor, f); > + pqTraceOutputInt16(message, &cursor, f); > + pqTraceOutputInt32(message, &cursor, f); > + pqTraceOutputInt16(message, &cursor, f); > + pqTraceOutputInt32(message, &cursor, f); > + pqTraceOutputInt16(message, &cursor, f); > + } > +} > > I didn't looked closer, but lookong the usage of the variable "end", > something's wrong in the function. Ah, end doesn't serve any purpose. I guess the compiler emitted a warning "end is not used". I think end can be removed. Regards Takayuki Tsunakawa}
Re: libpq debug log
At Thu, 18 Mar 2021 07:34:36 +, "tsunakawa.ta...@fujitsu.com" wrote in > From: Iwata, Aya/岩田 彩 > > > Yes, precisely, 2 bytes for the double quotes needs to be subtracted > > > as > > > follows: > > > > > > len = fprintf(...); > > > *cursor += (len - 2); > > > > Thank you for your advice. I changed pqTraceOutputString set cursor to > > fprintf > > return -2. > > And I removed cursor movement from that function. > > Ouch, not 2 but 3, to include a single whitespace at the beginning. > > The rest looks good. I hope we're almost at the finish line. Maybe. At Wed, 17 Mar 2021 13:36:32 -0300, Alvaro Herrera wrote in > In pqTraceOutputString(), you can use the return value from fprintf to > move the cursor -- no need to count chars. > > I still think that the message-type specific functions should print the > message type, rather than having the string arrays. In other words, pqTraceOutputMessage recognizes message id and calls the function corresponding one-on-one to the id. So the functions knows what is the message type of myself and there's no reason for pqTraceOutputMessage to print the message type on its behalf. + pqTraceOutputR(const char *message, FILE *f) + { + int cursor = 0; + + pqTraceOutputInt32(message, &cursor, f); I don't understand the reason for spliting message and &cursor here. + pqTraceOutputR(const char *message, FILE *f) + { + char *p = message; + + pqTraceOutputInt32(&p, f); works well. +/* RowDescription */ +static void +pqTraceOutputT(const char *message, int end, FILE *f) +{ + int cursor = 0; + int nfields; + int i; + + nfields = pqTraceOutputInt16(message, &cursor, f); + + for (i = 0; i < nfields; i++) + { + pqTraceOutputString(message, &cursor, end, f); + pqTraceOutputInt32(message, &cursor, f); + pqTraceOutputInt16(message, &cursor, f); + pqTraceOutputInt32(message, &cursor, f); + pqTraceOutputInt16(message, &cursor, f); + pqTraceOutputInt32(message, &cursor, f); + pqTraceOutputInt16(message, &cursor, f); + } +} I didn't looked closer, but lookong the usage of the variable "end", something's wrong in the function. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
RE: libpq debug log
From: Iwata, Aya/岩田 彩 > > Yes, precisely, 2 bytes for the double quotes needs to be subtracted > > as > > follows: > > > > len = fprintf(...); > > *cursor += (len - 2); > > Thank you for your advice. I changed pqTraceOutputString set cursor to fprintf > return -2. > And I removed cursor movement from that function. Ouch, not 2 but 3, to include a single whitespace at the beginning. The rest looks good. I hope we're almost at the finish line. Regards Takayuki Tsunakawa}
RE: libpq debug log
Hi Alvaro san and Tsunakawa san, Thank you for your review. I updated patch to v27. `make check` output is following. I think it is OK. ``` 2021-03-18 07:02:55.090598 < ReadyForQuery 5 I 2021-03-18 07:02:55.090672 > Terminate 4 2021-03-18 07:02:55.120492 > Query 155 "CREATE TABLESPACE regress_tblspacewith LOCATION '/home/iwata/pgsql/postgres/src/test/regress/testtablespace' WITH (some_nonexistent_parameter = true);" 2021-03-18 07:02:55.121624 < ErrorResponse 124 S "ERROR" V "ERROR" C "22023" M "unrecognized parameter "some_nonexistent_parameter"" F "reloptions.c" L "1447" R "parseRelOptionsInternal" \x00 "Z" 2021-03-18 07:02:55.121664 < ReadyForQuery 5 I 2021-03-18 07:02:55.121728 > Query 144 "CREATE TABLESPACE regress_tblspacewith LOCATION '/home/iwata/pgsql/postgres/src/test/regress/testtablespace' WITH (random_page_cost = 3.0);" 2021-03-18 07:02:55.123335 < CommandComplete 22 "CREATE TABLESPACE" 2021-03-18 07:02:55.123400 < ReadyForQuery 5 I 2021-03-18 07:02:55.123460 > Query 81 "SELECT spcoptions FROM pg_tablespace WHERE spcname = 'regress_tblspacewith';" 2021-03-18 07:02:55.126556 < RowDescription 35 1 "spcoptions" 1213 5 1009 65535 -1 0 2021-03-18 07:02:55.126594 < DataRow 32 1 22 '{random_page_cost=3.0}' 2021-03-18 07:02:55.126607 < CommandComplete 13 "SELECT 1" 2021-03-18 07:02:55.126617 < ReadyForQuery 5 I ``` > Iwata-san, > Why don't you submit v27 patch with the current arrays kept, and then v28 > with the arrays removed soon after? And I will try to remove byte1 type table for v28 patch. > From: Tsunakawa, Takayuki/綱川 貴之 > Sent: Thursday, March 18, 2021 12:38 PM > From: Alvaro Herrera > > In pqTraceOutputString(), you can use the return value from fprintf to > > move the cursor -- no need to count chars. > > Yes, precisely, 2 bytes for the double quotes needs to be subtracted as > follows: > > len = fprintf(...); > *cursor += (len - 2); Thank you for your advice. I changed pqTraceOutputString set cursor to fprintf return -2. And I removed cursor movement from that function. > From: Tsunakawa, Takayuki/綱川 貴之 > Sent: Thursday, March 18, 2021 11:52 AM ... > I'll look at the comments from Alvaro-san and Horiguchi-san. Here are my > review comments: Thank you for your review. I am sorry previous patch contain some mistake. > (23) > + /* Trace message only when there is first 1 byte */ > + if (conn->Pfdebug && conn->outCount < conn->outMsgStart) > + { > + if (conn->outCount < conn->outMsgStart) > + pqTraceOutputMessage(conn, conn->outBuffer + > conn->outCount, true); > + else > + pqTraceOutputNoTypeByteMessage(conn, > + > conn->outBuffer + conn->outMsgStart); > + } > > The inner else path doesn't seem to be reached because both the outer and > inner if contain the same condition. I think you want to remove the second > condition from the outer if. Yes, I remove second condition. > (24) pqTraceOutputNoTypeByteMessage > + case 16:/* CancelRequest */ > + fprintf(conn->Pfdebug, > "%s\t>\tCancelRequest\t%d", timestr, length); > + pqTraceOutputInt32(message, &LogCursor, > conn->Pfdebug); > + pqTraceOutputInt32(message, &LogCursor, > conn->Pfdebug); > + break; > > Another int32 data needs to be output as follows: > > -- > Int32(80877102) > The cancel request code. The value is chosen to contain 1234 in the most > significant 16 bits, and 5678 in the least significant 16 bits. (To avoid > confusion, this code must not be the same as any protocol version number.) > > Int32 > The process ID of the target backend. > > Int32 > The secret key for the target backend. > -- Thank you. I read document again and I add one pqTraceOutputInt32(). > (25) > + case 8 :/* GSSENRequest or SSLRequest */ > > GSSENRequest -> GSSENCRequest Thank you. I fixed. > (26) > +static void > +pqTraceOutputByte1(const char *data, int *cursor, FILE *pfdebug) { > + const char *v = data + *cursor; > + /* > > +static void > +pqTraceOutputf(const char *message, int end, FILE *f) { > + int cursor = 0; > + pqTraceOutputString(message, &cursor, end, f); } > > Put an empty line to separate the declaration part and execution part. Thank you. I fixed this. I add space anywhere in pqTraceOutput? function. > (27) > + const char *message_type = "UnkownMessage"; > + > + id = message[LogCursor++]; > + > + memcpy(&length, message + LogCursor , 4); > + length = (int) pg_ntoh32(length); > + LogCursor += 4; > + LogEn
RE: libpq debug log
From: Alvaro Herrera > In pqTraceOutputString(), you can use the return value from fprintf to > move the cursor -- no need to count chars. Yes, precisely, 2 bytes for the double quotes needs to be subtracted as follows: len = fprintf(...); *cursor += (len - 2); > I still think that the message-type specific functions should print the > message type, rather than having the string arrays. I sort of think so to remove the big arrays. But to minimize duplicate code, I think the code structure will look like: fprintf(timestamp, length); switch (type) { case '?': pqTraceOutput?(...); break; case '?': /* No message content */ fprintf("message_type_name"); break; } void pqTraceOutput?(...) { fprintf("message_type_name"); print message content; } The order of length and message type is reversed. The .sgml should also be changed accordingly. What do you think? Iwata-san, Why don't you submit v27 patch with the current arrays kept, and then v28 with the arrays removed soon after? From: Kyotaro Horiguchi > It would help when the value is "255", which is confusing between -1 > (or 255) in byte or 255 in 2-byte word. Or when the value looks like > broken. I'd suggest "b"yte for 1 byte, "s"hort for 2 bytes, "l"ong for > 4 bytes ('l' is confusing with '1', but anyway it is not needed).. I don't have a strong opinion on this. (I kind of think I want to see unprefixed numbers; readers will look at the protocol reference anyway.) I'd like to leave this up to Iwata-san and Alvaro-san. Regards Takayuki Tsunakawa}
RE: libpq debug log
I'll look at the comments from Alvaro-san and Horiguchi-san. Here are my review comments: (23) + /* Trace message only when there is first 1 byte */ + if (conn->Pfdebug && conn->outCount < conn->outMsgStart) + { + if (conn->outCount < conn->outMsgStart) + pqTraceOutputMessage(conn, conn->outBuffer + conn->outCount, true); + else + pqTraceOutputNoTypeByteMessage(conn, + conn->outBuffer + conn->outMsgStart); + } The inner else path doesn't seem to be reached because both the outer and inner if contain the same condition. I think you want to remove the second condition from the outer if. (24) pqTraceOutputNoTypeByteMessage + case 16:/* CancelRequest */ + fprintf(conn->Pfdebug, "%s\t>\tCancelRequest\t%d", timestr, length); + pqTraceOutputInt32(message, &LogCursor, conn->Pfdebug); + pqTraceOutputInt32(message, &LogCursor, conn->Pfdebug); + break; Another int32 data needs to be output as follows: -- Int32(80877102) The cancel request code. The value is chosen to contain 1234 in the most significant 16 bits, and 5678 in the least significant 16 bits. (To avoid confusion, this code must not be the same as any protocol version number.) Int32 The process ID of the target backend. Int32 The secret key for the target backend. -- (25) + case 8 :/* GSSENRequest or SSLRequest */ GSSENRequest -> GSSENCRequest (26) +static void +pqTraceOutputByte1(const char *data, int *cursor, FILE *pfdebug) +{ + const char *v = data + *cursor; + /* +static void +pqTraceOutputf(const char *message, int end, FILE *f) +{ + int cursor = 0; + pqTraceOutputString(message, &cursor, end, f); +} Put an empty line to separate the declaration part and execution part. (27) + const char *message_type = "UnkownMessage"; + + id = message[LogCursor++]; + + memcpy(&length, message + LogCursor , 4); + length = (int) pg_ntoh32(length); + LogCursor += 4; + LogEnd = length - 4; + /* Get a protocol type from first byte identifier */ + if (toServer && + id < lengthof(protocol_message_type_f) && + protocol_message_type_f[(unsigned char)id] != NULL) + message_type = protocol_message_type_f[(unsigned char)id]; + else if (!toServer && + id < lengthof(protocol_message_type_b) && + protocol_message_type_b[(unsigned char)id] != NULL) + message_type = protocol_message_type_b[(unsigned char)id]; + else + { + fprintf(conn->Pfdebug, "Unknown message: %02x\n", id); + return; + } + The initial value "UnkownMessage" is not used. So, you can initialize message_type with NULL and do like: +if (...) + ... + else if (...) + ... + + if (message_type == NULL) + { + fprintf(conn->Pfdebug, "Unknown message: %02x\n", id); + return; + Plus, I think this should be done before looking at the message length. (28) pqTraceOutputBinary() is only used in pqTraceOutputNchar(). Do they have to be separated? Regards Takayuki Tsunakawa
Re: libpq debug log
Hello In pqTraceOutputString(), you can use the return value from fprintf to move the cursor -- no need to count chars. I still think that the message-type specific functions should print the message type, rather than having the string arrays. -- Álvaro Herrera Valdivia, Chile "La gente vulgar sólo piensa en pasar el tiempo; el que tiene talento, en aprovecharlo"
RE: libpq debug log
Hi Tsunakawa san, Alvaro san, Thank you very much for your review. It helped me a lot to make the patch better. I update patch to v26. This patch has been updated according to Tsunakawa san and Alvaro san review comments. The output is following; ``` 2021-03-17 14:43:16.411238 > Terminate 4 2021-03-17 14:43:16.441775 > Query 155 "CREATE TABLESPACE regress_tblspacewith LOCATION '/home/iwata/pgsql/postgres/src/test/regress/testtablespace' WITH (some_nonexistent_parameter = true);" 2021-03-17 14:43:16.442935 < ErrorResponse 124 S "ERROR" V "ERROR" C "22023" M "unrecognized parameter "some_nonexistent_parameter"" F "reloptions.c" L "1447" R "parseRelOptionsInternal" \x00 "Z" 2021-03-17 14:43:16.442977 < ReadyForQuery 5 I 2021-03-17 14:43:16.443042 > Query 144 "CREATE TABLESPACE regress_tblspacewith LOCATION '/home/iwata/pgsql/postgres/src/test/regress/testtablespace' WITH (random_page_cost = 3.0);" 2021-03-17 14:43:16.444774 < CommandComplete 22 "CREATE TABLESPACE" 2021-03-17 14:43:16.444807 < ReadyForQuery 5 I 2021-03-17 14:43:16.444878 > Query 81 "SELECT spcoptions FROM pg_tablespace WHERE spcname = 'regress_tblspacewith';" 2021-03-17 14:43:16.448117 < RowDescription 35 1 "spcoptions" 1213 5 1009 65535 -1 0 2021-03-17 14:43:16.448157 < DataRow 32 1 22 '{random_page_cost=3.0}' 2021-03-17 14:43:16.448171 < CommandComplete 13 "SELECT 1" ``` > -Original Message- > From: Tsunakawa, Takayuki/綱川 貴之 > Sent: Tuesday, March 16, 2021 12:03 PM > (1) > - Enables tracing of the client/server communication to a debugging > file stream. > + Enables tracing of the client/server communication to a debugging file > + stream. > + Only available when protocol version 3 or higher is used. > > The last line is unnecessary now that v2 is not supported. I removed last comment from documentation file. > (2) > @@ -113,6 +114,7 @@ install: all installdirs install-lib > $(INSTALL_DATA) $(srcdir)/libpq-fe.h '$(DESTDIR)$(includedir)' > $(INSTALL_DATA) $(srcdir)/libpq-events.h '$(DESTDIR)$(includedir)' > $(INSTALL_DATA) $(srcdir)/libpq-int.h > '$(DESTDIR)$(includedir_internal)' > + $(INSTALL_DATA) $(srcdir)/libpq-trace.h > '$(DESTDIR)$(includedir_internal)' > $(INSTALL_DATA) $(srcdir)/pqexpbuffer.h > '$(DESTDIR)$(includedir_internal)' > $(INSTALL_DATA) $(srcdir)/pg_service.conf.sample > '$(DESTDIR)$(datadir)/pg_service.conf.sample' > > Why is this necessary? I think it is not necessary. I removed it. > (3) libpq-trace.h > +#ifdef __cplusplus > +extern "C" > +{ > > This is unnecessary because pqTraceOutputMessage() is for libpq's internal > use. This extern is for the user's C++ application to call public libpq > functions. > > +#include "libpq-fe.h" > +#include "libpq-int.h" > > I think these can be removed by declaring the struct and function as follows: > > struct pg_conn; > extern void pqTraceOutputMessage(struct pg_conn *conn, const char > *message, bool toServer); > > But... after doing the above, only this declaration is left in libpq-trace.h. > Why > don't you put your original declaration using PGconn in libpq-int.h and remove > libpq-trace.h? I remove this file. I was wondering whether to delete this file during the development of v25 patch. I leave it because it keep scalability. If tracing process become update and it have a lot of extern function, leave this header file is meaningful. However I think it does not happen. So I remove it. > (4) > + /* Trace message only when there is first 1 byte */ > + if (conn->Pfdebug && (conn->outCount < conn->outMsgStart)) > + pqTraceOutputMessage(conn, conn->outBuffer + > conn->outCount, true); > > () surrounding the condition after && can be removed. I removed this (). And This if () statement has been modified according to the review comment (14). > (5) > +static const char *pqGetProtocolMsgType(unsigned char c, > + > bool toServer); > > This is unnecessary because the function definition precedes its only one call > site. Yes, I removed this definition. > (6) > + * We accumulate frontend message pieces in an array as the libpq code > + writes > + * them, and log the complete message when pqTraceOutputFeMsg is called. > + * For backend, we print the pieces as soon as we receive them from the > server. > > The first paragraph is no longer true. I think this entire comment is > unnecessary. I removed this explanation. > (7) > +static const char * > +pqGetProtocolMsgType(unsigned char c, bool toServer) { > + if (toServer == true && > + c < lengthof(protocol_message_type_f) && > + protocol_message_type_f[c] != NULL) > + return protocol_message_type_f[c]; > + > + if (toServer == false && > + c < lengthof(protoco
Re: libpq debug log
At Wed, 17 Mar 2021 02:09:32 +, "tsunakawa.ta...@fujitsu.com" wrote in > Alvaro-san, > > > Thank you for taking your time to take a look at an incomplete patch. I > thought I would ask you for final check for commit after Iwata-san has > reflected my review comments. > > I discussed with Iwata-sn your below comments. Let me convey her opinions. > (She is now focusing on fixing the patch.) We'd appreciate if you could > tweak her completed patch. > > > From: Alvaro Herrera > > * There's no cross-check that the protocol message length word > > corresponds to what we actually print. I think one easy way to > > cross-check that would be to return the "count" from the type-specific > > routines, and verify that it matches "length" in pqTraceOutputMessage. > > (If the separate routines are removed and the code put back in one > > giant switch, then the "count" is already available when the switch > > ends.) > > We don't think the length check is necessary, because 1) for FE->BE, correct > messages are always assembled, and 2) for BE->FE, the parsing and > decomposition of messages have already checked the messages. Maybe it is not for the sanity of message bytes but to check if the logging-stuff is implemented in sync with the messages structure. If the given message length differs from the length the logging facility read after scanning a message bytes, it's sign of something wrong in *the logging feature*. > > * If we have separate functions for each message type, then we can have > > that function supply the message name by itself, rather than have the > > separate protocol_message_type_f / _b arrays that print it. > > I felt those two arrays are clumsy and thought of suggesting to remove them. > But I didn't because the functions or case labels for each message type have > to duplicate the printing of header fields: timestamp, message type, and > length. Maybe we can change the order of output to "timestamp length > message_type content", but I kind of prefer the current order. What do you > think? (I can understand removing the clumsy static arrays should be better > than the order of output fields.) +1 for removing the arrays. > > * If we make each type-specific function print the message name, then we > > need to make the same function print the message length, because it > > comes after. So the function would have to receive the length (so > > that it can be printed). But I think we should still have the > > cross-check I mentioned in my first point occur in the main > > pqTraceOutputMessage, not the type-specific function, for fear that we > > will forget the cross-check in one of the functions and never realize > > that we did. > > As mentioned above, we think the current structure is better for smaller and > readable code. > > > > * I would make the pqTraceOutputInt16() function and siblings advance > > the cursor themselves, actually. I think something like this: > > static int > > pqTraceOutputInt16(const char *data, int *cursor, FILE *pfdebug) > > { > > uint16tmp; > > int result; > > > > memcpy(&tmp, data + *cursor, 2); > > *cursor += 2; > > result = (int) pg_ntoh16(tmp); > > fprintf(pfdebug, " #%d", result); > > > > return result; > > } > > (So the caller has to pass the original "data" pointer, not > > "data+cursor"). This means the caller no longer has to do "cursor+=N" > > at each place. Also, you get rid of the moveStrCursor() which does > > not look good. > > That makes sense, because in fact the patch mistakenly added 4 when it should > add 2. Also, the code would become smaller. FWIW, that's what I suggested upthread:p So +3. me> *I*'d want to make the output functions move the reading pointer by me> themseves. pqTradeOutputMsg can be as simplified as the follows doing > > * I'm not fond of the idea of prefixing "#" for 16bit ints and no > > prefix for 32bit ints. Seems obscure and the output looks weird. > > I would use a one-letter prefix for each type, "w" for 32-bit ints > > (stands for "word") and "h" for 16-bit ints (stands for "half-word"). > > Message length goes unadorned. Then you'd have lines like this > > > > 2021-03-15 08:10:44.324296 < RowDescription 35 h1 "spcoptions" > > w1213 h5 w1009 h65535 w-1 h0 > > 2021-03-15 08:10:44.324335 < DataRow 32 h1 w22 > > '{random_page_cost=3.0}' > > Yes, actually I felt something similar. Taking a second thought, I think we > don't have to have any prefix because it doesn't help users. So we're > removing the prefix. We don't have any opinion on adding some prefix. It would help when the value is "255", which is confusing between -1 (or 255) in byte or 255 in 2-byte word. Or when the value looks like broken. I'd suggest "b"yte for 1 byte, "s"hort for 2 bytes, "l"ong for 4 bytes ('l' is confusing with '1', but anyway it is not needed)..
RE: libpq debug log
Alvaro-san, Thank you for taking your time to take a look at an incomplete patch. I thought I would ask you for final check for commit after Iwata-san has reflected my review comments. I discussed with Iwata-sn your below comments. Let me convey her opinions. (She is now focusing on fixing the patch.) We'd appreciate if you could tweak her completed patch. From: Alvaro Herrera > * There's no cross-check that the protocol message length word > corresponds to what we actually print. I think one easy way to > cross-check that would be to return the "count" from the type-specific > routines, and verify that it matches "length" in pqTraceOutputMessage. > (If the separate routines are removed and the code put back in one > giant switch, then the "count" is already available when the switch > ends.) We don't think the length check is necessary, because 1) for FE->BE, correct messages are always assembled, and 2) for BE->FE, the parsing and decomposition of messages have already checked the messages. > * If we have separate functions for each message type, then we can have > that function supply the message name by itself, rather than have the > separate protocol_message_type_f / _b arrays that print it. I felt those two arrays are clumsy and thought of suggesting to remove them. But I didn't because the functions or case labels for each message type have to duplicate the printing of header fields: timestamp, message type, and length. Maybe we can change the order of output to "timestamp length message_type content", but I kind of prefer the current order. What do you think? (I can understand removing the clumsy static arrays should be better than the order of output fields.) > * If we make each type-specific function print the message name, then we > need to make the same function print the message length, because it > comes after. So the function would have to receive the length (so > that it can be printed). But I think we should still have the > cross-check I mentioned in my first point occur in the main > pqTraceOutputMessage, not the type-specific function, for fear that we > will forget the cross-check in one of the functions and never realize > that we did. As mentioned above, we think the current structure is better for smaller and readable code. > * I would make the pqTraceOutputInt16() function and siblings advance > the cursor themselves, actually. I think something like this: > static int > pqTraceOutputInt16(const char *data, int *cursor, FILE *pfdebug) > { > uint16tmp; > int result; > > memcpy(&tmp, data + *cursor, 2); > *cursor += 2; > result = (int) pg_ntoh16(tmp); > fprintf(pfdebug, " #%d", result); > > return result; > } > (So the caller has to pass the original "data" pointer, not > "data+cursor"). This means the caller no longer has to do "cursor+=N" > at each place. Also, you get rid of the moveStrCursor() which does > not look good. That makes sense, because in fact the patch mistakenly added 4 when it should add 2. Also, the code would become smaller. > * I'm not fond of the idea of prefixing "#" for 16bit ints and no > prefix for 32bit ints. Seems obscure and the output looks weird. > I would use a one-letter prefix for each type, "w" for 32-bit ints > (stands for "word") and "h" for 16-bit ints (stands for "half-word"). > Message length goes unadorned. Then you'd have lines like this > > 2021-03-15 08:10:44.324296 < RowDescription 35 h1 "spcoptions" > w1213 h5 w1009 h65535 w-1 h0 > 2021-03-15 08:10:44.324335 < DataRow 32 h1 w22 > '{random_page_cost=3.0}' Yes, actually I felt something similar. Taking a second thought, I think we don't have to have any prefix because it doesn't help users. So we're removing the prefix. We don't have any opinion on adding some prefix. > * I don't like that pqTraceOutputS/H print nothing when !toServer. I > think they should complain. Yes, the caller should not call the function if there's no message content. That way, the function doesn't need the toServer argument. Regards Takayuki Tsunakawa
Re: libpq debug log
On 2021-Mar-15, iwata@fujitsu.com wrote: > I create protocol message reading function for each protocol message type. > (Ex. pqTraceOutputB() read Bind message) > This makes the nesting shallower and makes the code easier to read. I'm not sure I agree with this structural change. Yes, it is less indentation, but these functions are all quite short and simple anyway. But I don't disagree strongly with it either. Four issues: * There's no cross-check that the protocol message length word corresponds to what we actually print. I think one easy way to cross-check that would be to return the "count" from the type-specific routines, and verify that it matches "length" in pqTraceOutputMessage. (If the separate routines are removed and the code put back in one giant switch, then the "count" is already available when the switch ends.) * If we have separate functions for each message type, then we can have that function supply the message name by itself, rather than have the separate protocol_message_type_f / _b arrays that print it. * If we make each type-specific function print the message name, then we need to make the same function print the message length, because it comes after. So the function would have to receive the length (so that it can be printed). But I think we should still have the cross-check I mentioned in my first point occur in the main pqTraceOutputMessage, not the type-specific function, for fear that we will forget the cross-check in one of the functions and never realize that we did. * I would make the pqTraceOutputInt16() function and siblings advance the cursor themselves, actually. I think something like this: static int pqTraceOutputInt16(const char *data, int *cursor, FILE *pfdebug) { uint16tmp; int result; memcpy(&tmp, data + *cursor, 2); *cursor += 2; result = (int) pg_ntoh16(tmp); fprintf(pfdebug, " #%d", result); return result; } (So the caller has to pass the original "data" pointer, not "data+cursor"). This means the caller no longer has to do "cursor+=N" at each place. Also, you get rid of the moveStrCursor() which does not look good. Bikeshedding item: * I'm not fond of the idea of prefixing "#" for 16bit ints and no prefix for 32bit ints. Seems obscure and the output looks weird. I would use a one-letter prefix for each type, "w" for 32-bit ints (stands for "word") and "h" for 16-bit ints (stands for "half-word"). Message length goes unadorned. Then you'd have lines like this 2021-03-15 08:10:44.324296 < RowDescription 35 h1 "spcoptions" w1213 h5 w1009 h65535 w-1 h0 2021-03-15 08:10:44.324335 < DataRow 32 h1 w22 '{random_page_cost=3.0}' * I don't like that pqTraceOutputS/H print nothing when !toServer. I think they should complain. -- Álvaro Herrera39°49'30"S 73°17'W
RE: libpq debug log
I've finished the review. Here are the last set of comments. (16) (15) is also true for the processing of 'H' message. (17) pqTraceOutputD + for (i = 0; i < nfields; i++) + { + len = pqTraceOutputInt32(message + cursor, f); + cursor += 4; + if (len == -1) + break; + pqTraceOutputNchar(message + cursor, f, len); + cursor += len; + } Not break but continue, because non-NULL columns may follow a NULL column. (18) + case 'E': /* Execute(F) or Error Return(B) */ + pqTraceOutputE(message + LogCursor, LogEnd, conn->Pfdebug, toServer); + break; Error Return -> ErrorResponse (19) pqTraceOutputF Check the protocol again. Two for loops should be required, one for format codes and another for arguments. (20) + if ( len != -1) Remove extra space before len. (21) I guess you intentionally omitted the following messages because they are only used during connection establishment. I'm fine with it. I wrote this just in case you missed them. GSSENCRequest (F) Int32(8) GSSResponse (F) Byte1('p') PasswordMessage (F) Byte1('p') SASLInitialResponse (F) Byte1('p') SASLResponse (F) Byte1('p') (22) +/* NotificationResponse */ +static void +pqTraceOutputA(const char *message, int end, FILE *f) +{ + int cursor = 0; + + pqTraceOutputInt16(message + cursor, f); + cursor += 2; + pqTraceOutputString(message + cursor, f); Not Int16 but Int32. Regards Takayuki Tsunakawa
RE: libpq debug log
I've not finished reviewing yet, but there seems to be many mistakes. I'm sending second set of review comments now so you can fix them in parallel. (8) + charid = '\0'; This initialization is not required because id will always be assigned a value shortly. (9) +static int +pqTraceOutputInt32(const char *data, FILE *pfdebug) +{ + int result; + + memcpy(&result, data, 4); + result = (int) pg_ntoh32(result); + fputc(' ', pfdebug); + fprintf(pfdebug, "%d", result); + + return result; +} fputc() and fprintf() can be merged into one fprintf() call for efficiency. TBH, it feels strange that an "output" function returns a value. But I understood this as a positive compromise for reducing code; without this, the caller has to do memcpy() and endianness conversion. (10) +/* BackendKeyData */ +static void +pqTraceOutputK(const char *message, FILE *f) +{ + int cursor = 0; + + pqTraceOutputInt32(message + cursor, f); + cursor += 4; + pqTraceOutputInt32(message + cursor, f); +} I don't think you need to always use a cursor variable in order to align with more complex messages. That is, you can just write: + pqTraceOutputInt32(message, f); + pqTraceOutputInt32(message + 4, f); (11) + default: + fprintf(conn->Pfdebug, "Invalid Protocol:%c\n", id); + break; + (This is related to (7)) You can remove this default label if you exit the function before the switch statement when the message type is unknown. Make sure to output \n in that case. (12) pqTraceOutputB + for (i = 0; i < nparams; i++) + { + nbytes = pqTraceOutputInt32(message + cursor, f); + cursor += 4; + if (nbytes == -1) + break; + pqTraceOutputNchar(message + cursor, f, nbytes); + cursor += nbytes; + } Not break but continue, because non-NULL parameters may follow a NULL parameter. (13) pqTraceOutputB + pqTraceOutputInt16(message + cursor, f); + cursor += 4; + pqTraceOutputInt16(message + cursor, f); +} This part doesn't seem to match the following description. -- After the last parameter, the following fields appear: Int16 The number of result-column format codes that follow (denoted R below). This can be zero to indicate that there are no result columns or that the result columns should all use the default format (text); or one, in which case the specified format code is applied to all result columns (if any); or it can equal the actual number of result columns of the query. Int16[R] The result-column format codes. Each must presently be zero (text) or one (binary). -- (14) The processing for CancelRequest message is missing? (15) +/* CopyInResponse */ +static void +pqTraceOutputG(const char *message, int end, FILE *f) +{ + int cursor = 0; + + pqTraceOutputByte1(message + cursor, f); + cursor++; + + while (end > cursor) + { + pqTraceOutputInt16(message + cursor, f); + cursor += 2; + } +} + According to the following description, for loop should be used. -- Int16 The number of columns in the data to be copied (denoted N below). Int16[N] The format codes to be used for each column. Each must presently be zero (text) or one (binary). All must be zero if the overall copy format is textual. -- Regards Takayuki Tsunakawa
RE: libpq debug log
I'm looking at the last file libpq-trace.c. I'll continue the review after lunch. Below are some comments so far. (1) - Enables tracing of the client/server communication to a debugging file stream. + Enables tracing of the client/server communication to a debugging file + stream. + Only available when protocol version 3 or higher is used. The last line is unnecessary now that v2 is not supported. (2) @@ -113,6 +114,7 @@ install: all installdirs install-lib $(INSTALL_DATA) $(srcdir)/libpq-fe.h '$(DESTDIR)$(includedir)' $(INSTALL_DATA) $(srcdir)/libpq-events.h '$(DESTDIR)$(includedir)' $(INSTALL_DATA) $(srcdir)/libpq-int.h '$(DESTDIR)$(includedir_internal)' + $(INSTALL_DATA) $(srcdir)/libpq-trace.h '$(DESTDIR)$(includedir_internal)' $(INSTALL_DATA) $(srcdir)/pqexpbuffer.h '$(DESTDIR)$(includedir_internal)' $(INSTALL_DATA) $(srcdir)/pg_service.conf.sample '$(DESTDIR)$(datadir)/pg_service.conf.sample' Why is this necessary? (3) libpq-trace.h +#ifdef __cplusplus +extern "C" +{ This is unnecessary because pqTraceOutputMessage() is for libpq's internal use. This extern is for the user's C++ application to call public libpq functions. +#include "libpq-fe.h" +#include "libpq-int.h" I think these can be removed by declaring the struct and function as follows: struct pg_conn; extern void pqTraceOutputMessage(struct pg_conn *conn, const char *message, bool toServer); But... after doing the above, only this declaration is left in libpq-trace.h. Why don't you put your original declaration using PGconn in libpq-int.h and remove libpq-trace.h? (4) + /* Trace message only when there is first 1 byte */ + if (conn->Pfdebug && (conn->outCount < conn->outMsgStart)) + pqTraceOutputMessage(conn, conn->outBuffer + conn->outCount, true); () surrounding the condition after && can be removed. (5) +static const char *pqGetProtocolMsgType(unsigned char c, + bool toServer); This is unnecessary because the function definition precedes its only one call site. (6) + * We accumulate frontend message pieces in an array as the libpq code writes + * them, and log the complete message when pqTraceOutputFeMsg is called. + * For backend, we print the pieces as soon as we receive them from the server. The first paragraph is no longer true. I think this entire comment is unnecessary. (7) +static const char * +pqGetProtocolMsgType(unsigned char c, bool toServer) +{ + if (toServer == true && + c < lengthof(protocol_message_type_f) && + protocol_message_type_f[c] != NULL) + return protocol_message_type_f[c]; + + if (toServer == false && + c < lengthof(protocol_message_type_b) && + protocol_message_type_b[c] != NULL) + return protocol_message_type_b[c]; + + return "UnknownMessage:"; +} "== true/false" can be removed. libpq doesn't appear to use such style. Why don't we embed this processing in pqTraceOutputMessage() because this function is short and called only once? The added benefit would be that you can print an invalid message type byte like this: fprintf(..., "Unknown message: %02x\n", ...); Regards Takayuki Tsunakawa
RE: libpq debug log
Alvaro san, Tom san Horiguchi san, Tsunakawa san and Kirk san, Thank you very much for review and advice. > > Works for me. > > Pushed that. I think we're now waiting on Iwata-san to finish a new version > of > the tracing patch. Thank you very much Alvaro san and Tom san. Your patch and code change makes my work more smoothly. And I am sorry for attaching not-good-work patch. fixsegv.patch help my implementation. I update this patch to v25. I fix function arguments and something which are pointed out by reviewers. I create protocol message reading function for each protocol message type. (Ex. pqTraceOutputB() read Bind message) This makes the nesting shallower and makes the code easier to read. Each data type output functions(Ex. pqTraceOutputString() ) are called from each protocol message type function. Cursor operation is performed in each protocol message type function. Like Kirk san, I share `make check` results. As far as I can see, tracing works well. I compare this result with current master branch trace log. It does not miss/add any protocol message. And Thank you Kirk san for your share of make check result and previous patch! ``` 2021-03-15 08:10:44.288999 > Terminate 4 2021-03-15 08:10:44.317959 > Query 155 "CREATE TABLESPACE regress_tblspacewith LOCATION '/home/iwata/pgsql/postgres/src/test/regress/testtablespace' WITH (some_nonexistent_parameter = true);" 2021-03-15 08:10:44.319121 < ErrorResponse 124 S "ERROR" V "ERROR" C "22023" M "unrecognized parameter "some_nonexistent_parameter"" F "reloptions.c" L "1447" R "parseRelOptionsInternal" \x00 2021-03-15 08:10:44.319161 < ReadyForQuery 5 I 2021-03-15 08:10:44.319221 > Query 144 "CREATE TABLESPACE regress_tblspacewith LOCATION '/home/iwata/pgsql/postgres/src/test/regress/testtablespace' WITH (random_page_cost = 3.0);" 2021-03-15 08:10:44.320889 < CommandComplete 22 "CREATE TABLESPACE" 2021-03-15 08:10:44.320922 < ReadyForQuery 5 I 2021-03-15 08:10:44.320979 > Query 81 "SELECT spcoptions FROM pg_tablespace WHERE spcname = 'regress_tblspacewith';" 2021-03-15 08:10:44.324296 < RowDescription 35 #1 "spcoptions" 1213 #5 1009 #65535 -1 #0 2021-03-15 08:10:44.324335 < DataRow 32 #1 22 '{random_page_cost=3.0}' 2021-03-15 08:10:44.324347 < CommandComplete 13 "SELECT 1" 2021-03-15 08:10:44.324388 < ReadyForQuery 5 I 2021-03-15 08:10:44.324516 > Query 42 "DROP TABLESPACE regress_tblspacewith;" 2021-03-15 08:10:44.325423 < CommandComplete 20 "DROP TABLESPACE" 2021-03-15 08:10:44.325453 < ReadyForQuery 5 I ``` > From: Tsunakawa, Takayuki/綱川 貴之 > Sent: Friday, March 5, 2021 3:17 PM ... > > void pqLogMessage(PGconn *conn, const char *message, bool toServer); Thank you Tsunakawa san. In v25 patch, I fixed qTraceOutputMessag() function to this style. I am sorry changing arguments in previous patch. That was my misunderstanding. > From: Tom Lane > Sent: Thursday, March 11, 2021 6:33 AM ... > There are still some cowboy advancements of inStart in the functions > concerned with COPY processing, but it doesn't look like there's much to be > done about that. Those spots will need their own trace calls. I called pqTraceOutputMessage() from pqParseInput3(), COPY process and pqFunctionCall3(). pqFunctionCall3() is called by PQfn(). I found that there is protocol message id switch(). So I call the tracing function there. > From: Kyotaro Horiguchi > Sent: Thursday, March 11, 2021 10:20 AM Thank you Horiguchi san. Your keen review suggestions and detailed advice. It is very helpful for updating my patch. > The individual per-type-output functions are designed to fit to the > corresponding pqGet*() functions. On the other hand, now that we read the > message bytes knowing the exact format in advance as you did in > pqTraceOutputMsg(). So the complexity exist in the functions can be > eliminated by separating them into more type specific output functions. For > example, pqTraceOutputInt() is far simplified by spliting it into > pqTraceOutputInt16() and -32(). Yes, I refer to pqGet functions. To make data type output function more simpler, I separate output int function to pqTraceOutputInt16 and -32. > I think we can get rid of copying in the output functions for String and > Bytes in > different ways. I remove memcpy call from String and Bytes output function. These functions just pass *message to fprintf(). > + /* Protocol 2.0 does not support tracing. */ > + if (PG_PROTOCOL_MAJOR(conn->pversion) < 3) > + return; > > We can write that message out into tracing file. > > +void > +PQtraceSetFlags(PGconn *conn, int flags) { > + if (conn == NULL) > + return; > + /* Protocol 2.0 is not supported. */ > + if (PG_PROTOCOL_MAJOR(conn->pv
Re: libpq debug log
Alvaro Herrera writes: > On 2021-Mar-10, Tom Lane wrote: >> After studying this further, I think we should apply the attached >> patch to remove that responsibility from pqParseInput3's subroutines. >> This will allow a single trace call near the bottom of pqParseInput3 >> to handle all cases that that function processes. > Works for me. Pushed that. I think we're now waiting on Iwata-san to finish a new version of the tracing patch. regards, tom lane
Re: libpq debug log
At Thu, 11 Mar 2021 04:12:57 +, "tsunakawa.ta...@fujitsu.com" wrote in > From: Kyotaro Horiguchi > > Right. So something like this? > > > > unsigned char p; > > > > p = buf + *cursor; > > result = (uint32) (*p << 24) + (*(p + 1)) << 16 + ...); > > Yes, that would work (if p is a pointer), but I think memcpy() is enough like > pqGetInt() does. On second thought, memcpy onto a local variable doesn't harm. I agreed to you. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
RE: libpq debug log
From: Kyotaro Horiguchi > Right. So something like this? > > unsigned char p; > > p = buf + *cursor; > result = (uint32) (*p << 24) + (*(p + 1)) << 16 + ...); Yes, that would work (if p is a pointer), but I think memcpy() is enough like pqGetInt() does. Regards Takayuki Tsunakawa
Re: libpq debug log
At Thu, 11 Mar 2021 03:01:16 +, "tsunakawa.ta...@fujitsu.com" wrote in > From: Kyotaro Horiguchi > > The output functions copy message bytes into local variable but the > > same effect can be obtained by just casing via typed pointer type. > > > > uint32 tmp4; > > .. > > memcpy(&tmp4, buf + *cursor, 4); > > result = (int) pg_ntoh32(tmp4); > > > > can be written as > > > > result = pg_ntoh32(* (uint32 *) (buf + *cursor)); > > I'm afraid we need to memcpy() because of memory alignment. Right. So something like this? unsigned char p; p = buf + *cursor; result = (uint32) (*p << 24) + (*(p + 1)) << 16 + ...); > > I think we can get rid of copying in the output functions for String > > and Bytes in different ways. > > I haven't looked at this code, but you sound right. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: libpq debug log
At Thu, 11 Mar 2021 01:51:55 +, "tsunakawa.ta...@fujitsu.com" wrote in > Alvaro-san, Tom-san, Horiguchi-san, > > From: Kyotaro Horiguchi > > +1 for the thanks for the quick work. I have some random comments > > after a quick look on it. > > Thank you very much for giving many comments. And We're sorry to > have caused you trouble. I told Iwata-san yesterday to modify the > patch to use the logging function interface that Tom-san, Alvaro-san > and I agreed on, that I'd review after the new revision has been Yeah, I agreed at the time. Sorry for not responding it but had no room in my mind to do that:p > posted, and let others know she is modifying the patch again so that > they won't start reviewing. But I should have done the request on > hackers. > > We're catching up with your feedback and post the updated patch. > Then I'll review it. > > We appreciate your help so much. My pleasure. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: libpq debug log
At Wed, 10 Mar 2021 20:38:20 -0500, Tom Lane wrote in > Kyotaro Horiguchi writes: > > Maybe I'm missing something, but the above doesn't seem working. I > > didn't see such message when making a SSL connection. > > As far as that goes, I don't see any point in trying to trace > connection-related messages, because there is no way to enable > tracing on a connection before it's created. Yeah, agreed. In the previous version tracing functions are called during protocol negotiation but that no longer happenes. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
RE: libpq debug log
From: Kyotaro Horiguchi > The output functions copy message bytes into local variable but the > same effect can be obtained by just casing via typed pointer type. > > uint32 tmp4; > .. > memcpy(&tmp4, buf + *cursor, 4); > result = (int) pg_ntoh32(tmp4); > > can be written as > > result = pg_ntoh32(* (uint32 *) (buf + *cursor)); I'm afraid we need to memcpy() because of memory alignment. > I think we can get rid of copying in the output functions for String > and Bytes in different ways. I haven't looked at this code, but you sound right. Regards Takayuki Tsunakawa
RE: libpq debug log
Alvaro-san, Tom-san, Horiguchi-san, From: Kyotaro Horiguchi > +1 for the thanks for the quick work. I have some random comments > after a quick look on it. Thank you very much for giving many comments. And We're sorry to have caused you trouble. I told Iwata-san yesterday to modify the patch to use the logging function interface that Tom-san, Alvaro-san and I agreed on, that I'd review after the new revision has been posted, and let others know she is modifying the patch again so that they won't start reviewing. But I should have done the request on hackers. We're catching up with your feedback and post the updated patch. Then I'll review it. We appreciate your help so much. Regards Takayuki Tsunakawa
Re: libpq debug log
Kyotaro Horiguchi writes: > Maybe I'm missing something, but the above doesn't seem working. I > didn't see such message when making a SSL connection. As far as that goes, I don't see any point in trying to trace connection-related messages, because there is no way to enable tracing on a connection before it's created. regards, tom lane
Re: libpq debug log
At Wed, 10 Mar 2021 10:31:27 -0300, "'alvhe...@alvh.no-ip.org'" wrote in > On 2021-Mar-10, iwata@fujitsu.com wrote: > > > Hi all, > > > > Following all reviewer's advice, I have created a new patch. > > > > In this patch, I add only two tracing entry points; I call > > pqTraceOutputMsg(PGconn conn, int msgCursor, PGCommSource commsource) in > > pqParseInput3 () and pqPutMsgEnd () to output log. > > The argument contains message first byte offset called msgCursor because it > > is simpler to specify the buffer pointer in the caller. > > > > In pqTraceOutputMsg(), the content common to all protocol messages (first > > timestamp, < or >, first 1 byte, and length) are output first and after > > that each protocol message contents is output. I referred to pqParseInput3 > > () , fe-exec.c and documentation page for that part. > > > > This fix almost eliminates if(conn->Pfdebug) that was built into the code > > for other features. > > This certainly looks much better! Thanks for getting it done so > quickly. > > I didn't review the code super closely. I do see a compiler warning: +1 for the thanks for the quick work. I have some random comments after a quick look on it. The individual per-type-output functions are designed to fit to the corresponding pqGet*() functions. On the other hand, now that we read the message bytes knowing the exact format in advance as you did in pqTraceOutputMsg(). So the complexity exist in the functions can be eliminated by separating them into more type specific output functions. For example, pqTraceOutputInt() is far simplified by spliting it into pqTraceOutputInt16() and -32(). The output functions copy message bytes into local variable but the same effect can be obtained by just casing via typed pointer type. uint32 tmp4; .. memcpy(&tmp4, buf + *cursor, 4); result = (int) pg_ntoh32(tmp4); can be written as result = pg_ntoh32(* (uint32 *) (buf + *cursor)); I think we can get rid of copying in the output functions for String and Bytes in different ways. Now that we can manage our own reading pointer within pqTraceOutputMsg(), the per-type-output functions can work only on the reading pointer instead of buffer pointer and cursor, and length. *I*'d want to make the output functions move the reading pointer by themseves. pqTradeOutputMsg can be as simplified as the follows doing this. End-of-message pointer may be useful to check read-overrunning on the message buffer. switch (id) { case 'R': pqTraceOutputInt32(&p, endptr, conn->Pfdebug); fputc('\n', conn->Pfdebug); break; case 'K': pqTraceOutputInt32(&p, endptr, conn->Pfdebug)); pqTraceOutputInt32(&p, endptr, conn->Pfdebug)); ... + char *prefix = commsource == MSGDIR_FROM_BACKEND ? "<" : ">"; + int LogEnd = commsource == MSGDIR_FROM_BACKEND ? conn->inCursor : conn->outMsgEnd; + char*logBuffer = commsource == MSGDIR_FROM_BACKEND ? conn->inBuffer .. + if (commsource == MSGDIR_FROM_BACKEND) + id = logBuffer[LogCursor++]; Repeated usage of the ternaly operator on the same condition makes code hard-to-read. It is simpler to consolidate them into one if-else statement. + result = (int) pg_ntoh32(result32); + if (result == NEGOTIATE_SSL_CODE) Maybe I'm missing something, but the above doesn't seem working. I didn't see such message when making a SSL connection. + /* Protocol 2.0 does not support tracing. */ + if (PG_PROTOCOL_MAJOR(conn->pversion) < 3) + return; We can write that message out into tracing file. +void +PQtraceSetFlags(PGconn *conn, int flags) +{ + if (conn == NULL) + return; + /* Protocol 2.0 is not supported. */ + if (PG_PROTOCOL_MAJOR(conn->pversion) < 3) + return; + /* If PQtrace() failed, do nothing. */ + if (conn->Pfdebug == NULL) + return; + conn->traceFlags = flags; Pfdebug is always NULL for V2 protocol there, so it can be an assertion? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: libpq debug log
Alvaro Herrera writes: > On 2021-Mar-10, Tom Lane wrote: >> After studying this further, I think we should apply the attached >> patch to remove that responsibility from pqParseInput3's subroutines. >> This will allow a single trace call near the bottom of pqParseInput3 >> to handle all cases that that function processes. > Works for me. I dug into the git history a little and concluded that the reason for doing that in the subroutines is that 92785dac2 made it so for this reason: + /* +* Advance inStart to show that the "D" message has been processed. We +* must do this before calling the row processor, in case it longjmps. +*/ + conn->inStart = conn->inCursor; We later gave up on allowing user-defined row processor callbacks, but we failed to undo this messiness. Looking at what 92785dac2 did confirms something else I'd been eyeing, which is why these subroutines have their own checks for having consumed the right amount of data instead of letting the master check in pqParseInput3 take care of it. They didn't use to do that, but that check was hoisted up to before the row processor call, so we wouldn't expose data from a corrupt message to user code. So I think we can undo that too. 92785dac2 only directly hacked getRowDescriptions and getAnotherTuple, but it looks like similar error handling was stuck into getParamDescriptions later. regards, tom lane
Re: libpq debug log
On 2021-Mar-10, Tom Lane wrote: > After studying this further, I think we should apply the attached > patch to remove that responsibility from pqParseInput3's subroutines. > This will allow a single trace call near the bottom of pqParseInput3 > to handle all cases that that function processes. Works for me. > BTW, while looking at this I concluded that getParamDescriptions > is actually buggy: if it's given a malformed message that seems > to need more data than the message length specifies, it just > returns EOF, resulting in an infinite loop. This function apparently > got missed while converting the logic from v2 (where that was the > right thing) to v3 (where it ain't). So that bit needs to be > back-patched. Ah, that makes sense. > I'm tempted to back-patch the whole thing though. +0.5 on that. I think we may be happy that all branches are alike (though it doesn't look like this will cause any subtle bugs -- breakage will be fairly obvious). -- Álvaro Herrera39°49'30"S 73°17'W "Most hackers will be perfectly comfortable conceptualizing users as entropy sources, so let's move on." (Nathaniel Smith)
Re: libpq debug log
I wrote: > Or we could rethink the logic. It's not quite clear to me, after > all this time, why getRowDescriptions() et al are allowed to > move inStart themselves rather than letting the main loop in > pqParseInput3 do it. It might well be an artifact of having not > rewritten the v2 logic too much. After studying this further, I think we should apply the attached patch to remove that responsibility from pqParseInput3's subroutines. This will allow a single trace call near the bottom of pqParseInput3 to handle all cases that that function processes. There are still some cowboy advancements of inStart in the functions concerned with COPY processing, but it doesn't look like there's much to be done about that. Those spots will need their own trace calls. BTW, while looking at this I concluded that getParamDescriptions is actually buggy: if it's given a malformed message that seems to need more data than the message length specifies, it just returns EOF, resulting in an infinite loop. This function apparently got missed while converting the logic from v2 (where that was the right thing) to v3 (where it ain't). So that bit needs to be back-patched. I'm tempted to back-patch the whole thing though. regards, tom lane diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c index 2ca8c057b9..233456fd90 100644 --- a/src/interfaces/libpq/fe-protocol3.c +++ b/src/interfaces/libpq/fe-protocol3.c @@ -290,8 +290,6 @@ pqParseInput3(PGconn *conn) /* First 'T' in a query sequence */ if (getRowDescriptions(conn, msgLength)) return; - /* getRowDescriptions() moves inStart itself */ - continue; } else { @@ -337,8 +335,7 @@ pqParseInput3(PGconn *conn) case 't': /* Parameter Description */ if (getParamDescriptions(conn, msgLength)) return; - /* getParamDescriptions() moves inStart itself */ - continue; + break; case 'D': /* Data Row */ if (conn->result != NULL && conn->result->resultStatus == PGRES_TUPLES_OK) @@ -346,8 +343,6 @@ pqParseInput3(PGconn *conn) /* Read another tuple of a normal query response */ if (getAnotherTuple(conn, msgLength)) return; - /* getAnotherTuple() moves inStart itself */ - continue; } else if (conn->result != NULL && conn->result->resultStatus == PGRES_FATAL_ERROR) @@ -462,7 +457,6 @@ handleSyncLoss(PGconn *conn, char id, int msgLength) * command for a prepared statement) containing the attribute data. * Returns: 0 if processed message successfully, EOF to suspend parsing * (the latter case is not actually used currently). - * In the former case, conn->inStart has been advanced past the message. */ static int getRowDescriptions(PGconn *conn, int msgLength) @@ -577,9 +571,6 @@ getRowDescriptions(PGconn *conn, int msgLength) /* Success! */ conn->result = result; - /* Advance inStart to show that the "T" message has been processed. */ - conn->inStart = conn->inCursor; - /* * If we're doing a Describe, we're done, and ready to pass the result * back to the client. @@ -603,9 +594,6 @@ advance_and_error: if (result && result != conn->result) PQclear(result); - /* Discard the failed message by pretending we read it */ - conn->inStart += 5 + msgLength; - /* * Replace partially constructed result with an error result. First * discard the old result to try to win back some memory. @@ -624,6 +612,12 @@ advance_and_error: appendPQExpBuffer(&conn->errorMessage, "%s\n", errmsg); pqSaveErrorResult(conn); + /* + * Show the message as fully consumed, else pqParseInput3 will overwrite + * our error with a complaint about that. + */ + conn->inCursor = conn->inStart + 5 + msgLength; + /* * Return zero to allow input parsing to continue. Subsequent "D" * messages will be ignored until we get to end of data, since an error @@ -635,12 +629,8 @@ advance_and_error: /* * parseInput subroutine to read a 't' (ParameterDescription) message. * We'll build a new PGresult structure containing the parameter data. - * Returns: 0 if completed message, EOF if not enough data yet. - * In the former case, conn->inStart has been advanced past the message. - * - * Note that if we run out of data, we have to release the partially - * constructed PGresult, and rebuild it again next time. Fortunately, - * that shouldn't happen often, since 't' messages usually fit in a packet. + * Returns: 0 if processed message successfully, EOF to suspend parsing + * (the latter case is not actually used currently). */ static int getParamDescriptions(PGconn *conn, int msgLength) @@ -690,23 +680,16 @@ getParamDescriptions(PGconn *conn, int msgLength) /* Success! */ conn->result = result; - /* Advance inStart to show that the "t" message has been processed. */ - conn->inStart = conn->inCursor; - return 0; not_enough_data: - PQclear(result); - return EOF; + errmsg =
Re: libpq debug log
On 2021-Mar-10, Tom Lane wrote: > Or we could rethink the logic. It's not quite clear to me, after > all this time, why getRowDescriptions() et al are allowed to > move inStart themselves rather than letting the main loop in > pqParseInput3 do it. It might well be an artifact of having not > rewritten the v2 logic too much. I would certainly prefer that the logic stays put for the time being, while I finalize the pipelining stuff. -- Álvaro Herrera Valdivia, Chile
Re: libpq debug log
On 2021-Mar-10, Tom Lane wrote: > "'alvhe...@alvh.no-ip.org'" writes: > > After staring at it a couple of times, I think that the places in > > pqParseInput3() where there's a comment "... moves inStart itself" and > > then "continue;" should have a call to pqTraceOutputMsg(), since AFAIU > > those are the places where the message in question would not reach the > > "Successfully consumed this message" block that prints each message. > > Yeah, the whole business of just when a message has been "consumed" > is a stumbling block for libpq tracing. It was a real mess with the > existing code and v2 protocol, because we could actually try to parse > a message more than once, with the first few tries deciding that the > message wasn't all here yet (but nonetheless emitting partial trace > output). Hmm, that makes sense, but the issue I'm reporting is not the same, unless I misunderstand you. > Now that v2 is dead, the logic to abort because of the message not > being all arrived yet is basically useless: only the little bit of > code concerned with the message length word really needs to cope with > that. It's tempting to go through and get rid of all the now-unreachable > "return"s and such, but it seemed like it would be a lot of code churn for > not really that much gain. That sounds like an interesting exercise, and I bet it'd bring a lot of code readability improvements. > I didn't look at the new version of the patch yet, so I'm not > sure whether the issues it still has are related to this. The issues I noticed are related to separate messages rather than one message split in pieces -- for example several DataRow messages are processed internally in a loop, rather than each individually. The number of places that need to be adjusted for things to AFAICT work correctly are few enough; ISTM that the attached patch is sufficient. (The business with the "logged" boolean is necessary so that we print to the trace file any of those messages even if they are deviating from the protocol.) -- Álvaro Herrera Valdivia, Chile "La experiencia nos dice que el hombre peló millones de veces las patatas, pero era forzoso admitir la posibilidad de que en un caso entre millones, las patatas pelarían al hombre" (Ijon Tichy)
Re: libpq debug log
"'alvhe...@alvh.no-ip.org'" writes: > After staring at it a couple of times, I think that the places in > pqParseInput3() where there's a comment "... moves inStart itself" and > then "continue;" should have a call to pqTraceOutputMsg(), since AFAIU > those are the places where the message in question would not reach the > "Successfully consumed this message" block that prints each message. After digging around a little, I remember that there are a *bunch* of places in fe-protocol3.c that advance inStart. The cleanest way to shove this stuff in without rethinking that logic would be to call pqTraceOutputMsg immediately before each such advance (using the old inStart value as the message start address). A possible advantage of doing it like that is that we'd be aware by that point of whether we think the message is good or bad (or whether it was good but we failed to process it, perhaps because of OOM). Seems like that could be a useful thing to include in the log. Or we could rethink the logic. It's not quite clear to me, after all this time, why getRowDescriptions() et al are allowed to move inStart themselves rather than letting the main loop in pqParseInput3 do it. It might well be an artifact of having not rewritten the v2 logic too much. regards, tom lane
Re: libpq debug log
"'alvhe...@alvh.no-ip.org'" writes: > After staring at it a couple of times, I think that the places in > pqParseInput3() where there's a comment "... moves inStart itself" and > then "continue;" should have a call to pqTraceOutputMsg(), since AFAIU > those are the places where the message in question would not reach the > "Successfully consumed this message" block that prints each message. Yeah, the whole business of just when a message has been "consumed" is a stumbling block for libpq tracing. It was a real mess with the existing code and v2 protocol, because we could actually try to parse a message more than once, with the first few tries deciding that the message wasn't all here yet (but nonetheless emitting partial trace output). Now that v2 is dead, the logic to abort because of the message not being all arrived yet is basically useless: only the little bit of code concerned with the message length word really needs to cope with that. It's tempting to go through and get rid of all the now-unreachable "return"s and such, but it seemed like it would be a lot of code churn for not really that much gain. I didn't look at the new version of the patch yet, so I'm not sure whether the issues it still has are related to this. regards, tom lane
Re: libpq debug log
On 2021-Mar-10, 'alvhe...@alvh.no-ip.org' wrote: > I also found that DataRow messages are not being printed. This seems to > fix that in the normal case and singlerow, at least in pipeline mode. > Didn't verify the non-pipeline mode. Hm, and RowDescription ('T') messages are also not being printed ... so I think there's some larger problem here, and perhaps it needs to be fixed using a different approach. After staring at it a couple of times, I think that the places in pqParseInput3() where there's a comment "... moves inStart itself" and then "continue;" should have a call to pqTraceOutputMsg(), since AFAIU those are the places where the message in question would not reach the "Successfully consumed this message" block that prints each message. -- Álvaro Herrera39°49'30"S 73°17'W Maybe there's lots of data loss but the records of data loss are also lost. (Lincoln Yeoh)
Re: libpq debug log
I also found that DataRow messages are not being printed. This seems to fix that in the normal case and singlerow, at least in pipeline mode. Didn't verify the non-pipeline mode. -- Álvaro Herrera39°49'30"S 73°17'W "Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre) diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c index 68f0f6a081..e8db5edb71 100644 --- a/src/interfaces/libpq/fe-protocol3.c +++ b/src/interfaces/libpq/fe-protocol3.c @@ -869,6 +869,9 @@ getAnotherTuple(PGconn *conn, int msgLength) goto advance_and_error; } + if (conn->Pfdebug) + pqTraceOutputMsg(conn, conn->inStart, MSGDIR_FROM_BACKEND); + /* Advance inStart to show that the "D" message has been processed. */ conn->inStart = conn->inCursor;
Re: libpq debug log
On 2021-Mar-10, iwata@fujitsu.com wrote: > Hi all, > > Following all reviewer's advice, I have created a new patch. > > In this patch, I add only two tracing entry points; I call > pqTraceOutputMsg(PGconn conn, int msgCursor, PGCommSource commsource) in > pqParseInput3 () and pqPutMsgEnd () to output log. > The argument contains message first byte offset called msgCursor because it > is simpler to specify the buffer pointer in the caller. > > In pqTraceOutputMsg(), the content common to all protocol messages (first > timestamp, < or >, first 1 byte, and length) are output first and after that > each protocol message contents is output. I referred to pqParseInput3 () , > fe-exec.c and documentation page for that part. > > This fix almost eliminates if(conn->Pfdebug) that was built into the code for > other features. This certainly looks much better! Thanks for getting it done so quickly. I didn't review the code super closely. I do see a compiler warning: /pgsql/source/pipeline/src/interfaces/libpq/libpq-trace.c: In function 'pqTraceOutputNchar': /pgsql/source/pipeline/src/interfaces/libpq/libpq-trace.c:373:2: warning: argument 1 null where non-null expected [-Wnonnull] memcpy(v, buf + *cursor, len); ^ and then when I try to run the "libpq_pipeline" program from the other thread, I get a crash with this backtrace: Program received signal SIGSEGV, Segmentation fault. __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:288 288 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No existe el fichero o el directorio. (gdb) bt #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:288 #1 0x77f9b50b in pqTraceOutputNchar (buf=buf@entry=0x55564660 "P", LogEnd=LogEnd@entry=42, cursor=cursor@entry=0x7fffdeb4, pfdebug=0x55569320, len=1) at /pgsql/source/pipeline/src/interfaces/libpq/libpq-trace.c:373 #2 0x77f9bc25 in pqTraceOutputMsg (conn=conn@entry=0x55560260, msgCursor=, commsource=commsource@entry=MSGDIR_FROM_FRONTEND) at /pgsql/source/pipeline/src/interfaces/libpq/libpq-trace.c:476 #3 0x77f96280 in pqPutMsgEnd (conn=conn@entry=0x55560260) at /pgsql/source/pipeline/src/interfaces/libpq/fe-misc.c:533 ... The attached patch fixes it, though I'm not sure that that's the final form we want it to have (since we'd alloc and free repeatedly, making it slow -- perhaps better to use a static, or ...? ). -- Álvaro Herrera Valdivia, Chile Essentially, you're proposing Kevlar shoes as a solution for the problem that you want to walk around carrying a loaded gun aimed at your foot. (Tom Lane) diff --git a/src/interfaces/libpq/libpq-trace.c b/src/interfaces/libpq/libpq-trace.c index ef294fa556..5d3b40a1d0 100644 --- a/src/interfaces/libpq/libpq-trace.c +++ b/src/interfaces/libpq/libpq-trace.c @@ -368,7 +368,15 @@ pqTraceOutputBinary(char *v, int length, FILE *pfdebug) static void pqTraceOutputNchar(char *buf, int LogEnd, int *cursor, FILE *pfdebug, int len) { - char *v = '\0'; + char *v; + + v = malloc(len); + if (v == NULL) + { + fprintf(pfdebug, "'..%d chars..'", len); + *cursor += len; + return; + } memcpy(v, buf + *cursor, len); *cursor += len; @@ -377,6 +385,8 @@ pqTraceOutputNchar(char *buf, int LogEnd, int *cursor, FILE *pfdebug, int len) pqTraceOutputBinary(v, len, pfdebug); fprintf(pfdebug, "\'"); + free(v); + if (*cursor < LogEnd) fprintf(pfdebug, " "); else
RE: libpq debug log
Hi all, Following all reviewer's advice, I have created a new patch. In this patch, I add only two tracing entry points; I call pqTraceOutputMsg(PGconn conn, int msgCursor, PGCommSource commsource) in pqParseInput3 () and pqPutMsgEnd () to output log. The argument contains message first byte offset called msgCursor because it is simpler to specify the buffer pointer in the caller. In pqTraceOutputMsg(), the content common to all protocol messages (first timestamp, < or >, first 1 byte, and length) are output first and after that each protocol message contents is output. I referred to pqParseInput3 () , fe-exec.c and documentation page for that part. This fix almost eliminates if(conn->Pfdebug) that was built into the code for other features. Regards, Aya Iwata > -Original Message- > From: alvhe...@alvh.no-ip.org > Sent: Friday, March 5, 2021 10:41 PM > To: Tsunakawa, Takayuki/綱川 貴之 > Cc: 'Tom Lane' ; Iwata, Aya/岩田 彩 > ; Jamison, Kirk/ジャミソン カーク > ; 'Kyotaro Horiguchi' ; > pgsql-hackers@lists.postgresql.org > Subject: Re: libpq debug log > > On 2021-Mar-05, tsunakawa.ta...@fujitsu.com wrote: > > > From: Tom Lane > > > But I think passing the message start address explicitly might be > > > better than having it understand the buffering behavior in enough > > > detail to know where to find the message. Part of the point here > > > (IMO) is to decouple the tracing logic from the core libpq logic, in > > > hopes of not having common-mode bugs. > > > > Ouch, you're perfectly right. Then let's make the signature: > > > > void pqLogMessage(PGconn *conn, const char *message, bool > > toServer); > > Yeah, looks good! I agree that going this route will result in more > trustworthy > trace output. > > -- > Álvaro Herrera39°49'30"S 73°17'W v24-libpq-trace-log.patch Description: v24-libpq-trace-log.patch
Re: libpq debug log
On 2021-Mar-05, tsunakawa.ta...@fujitsu.com wrote: > From: Tom Lane > > But I think passing the message start address explicitly might be > > better than having it understand the buffering behavior in enough > > detail to know where to find the message. Part of the point here > > (IMO) is to decouple the tracing logic from the core libpq logic, in > > hopes of not having common-mode bugs. > > Ouch, you're perfectly right. Then let's make the signature: > > void pqLogMessage(PGconn *conn, const char *message, bool toServer); Yeah, looks good! I agree that going this route will result in more trustworthy trace output. -- Álvaro Herrera39°49'30"S 73°17'W
RE: libpq debug log
Hi Alvaro-san and Horiguchi-san, CC) Iwata-san, Tsunakawa-san Attached is the V23 of the libpq trace patch. (1) From: Álvaro Herrera > It appears that something is still wrong. I applied lipq pipeline v27 from > [1] > and ran src/test/modules/test_libpq/pipeline singlerow, after patching it to > do PQtrace() after PQconn(). Below is the output I get from that. The > noteworthy point is that "ParseComplete" messages appear multiple times > for some reason ... but that's quite odd, because if I look at the network > traffic > with Wireshark I certainly do not see the ParseComplete message being sent > three times. I applied Alvaro's libpq pipeline patch (v33) [1] on top of this libpq trace patch, then traced the src/test/modules/libpq_pipeline/libpq_pipeline.c test_singlerowmode. It's still the same trace output even after applying the fix recommendations of Horiguchi-san. > Parse 38 "" "SELECT generate_series(42, $1)" #0 > Bind20 "" "" #0 #1 2 '44' #1 #0 > Describe6 P "" > Execute 9 "" 0 > Parse 38 "" "SELECT generate_series(42, $1)" #0 > Bind20 "" "" #0 #1 2 '45' #1 #0 > Describe6 P "" > Execute 9 "" 0 > Parse 38 "" "SELECT generate_series(42, $1)" #0 > Bind20 "" "" #0 #1 2 '46' #1 #0 > Describe6 P "" > Execute 9 "" 0 > Sync4 < ParseComplete 4 < BindComplete4 < RowDescription 40 #1 "generate_series" 0 #0 23 #4 -1 #0 < DataRow 12 #1 2 '42' < DataRow 12 #1 2 '43' < DataRow 12 #1 2 '44' < CommandComplete 13 "SELECT 3" < ParseComplete 4 < ParseComplete 4 < ParseComplete 4 < BindComplete4 < RowDescription 40 #1 "generate_series" 0 #0 23 #4 -1 #0 < DataRow 12 #1 2 '42' < DataRow 12 #1 2 '43' < DataRow 12 #1 2 '44' < DataRow 12 #1 2 '45' < CommandComplete 13 "SELECT 4" < ParseComplete 4 < ParseComplete 4 < ParseComplete 4 < BindComplete4 < RowDescription 40 #1 "generate_series" 0 #0 23 #4 -1 #0 < DataRow 12 #1 2 '42' < DataRow 12 #1 2 '43' < DataRow 12 #1 2 '44' < DataRow 12 #1 2 '45' < DataRow 12 #1 2 '46' < CommandComplete 13 "SELECT 5" < ReadyForQuery 5 I ParseComplete still appears 3 times, but I wonder if that is expected only for pipeline's singlerowmode test. From my observation, when I traced the whole regression test output by putting PQtrace() in fe-connect.c: connectDBComplete(), ParseComplete appears only once or twice, etc. for other commands. In other words, ISTM, it's a case-to-case basis: Some examples from the whole regression trace output is below: a. > Describe6 P "" > Execute 9 "" 0 > Sync4 < ParseComplete 4 < BindComplete4 b. < ErrorResponse 217 S "ERROR" V "ERROR" C "42883" M "function no_such_function(integer) does not exist" H "No function matches the given name and argument types. You might need to add explicit type casts." P "8" F "parse_func.c" L "629" R "ParseFuncOrColumn" \x00 < ReadyForQuery 5 I < ParseComplete 4 < ParseComplete 4 c. > Bind31 "" "my_insert" #0 #1 4 '9652' #1 #0 > Describe6 P "" > Execute 9 "" 0 < ParseComplete 4 < ParseComplete 4 < ParseComplete 4 < BindComplete4 < NoData 4 d. < CommandComplete 15 "INSERT 0 2" < ReadyForQuery 5 T > Parse 89 "i" "insert into test (name, amount, letter) select name, amount+$1, letter from test" #0 > Sync4 < ParseComplete 4 < ReadyForQuery 5 T As you can see from the examples above, ParseComplete appears in multiples/once depending on the test. As for libpq_pipeline's test_singlerowmode, it appears 3x for the CommandComplete SELECT. I am wondering now whether or not it's a bug. Maybe it's not. Thoughts? How I traced the whole regression test: fe-connect.c: connectDBComplete() @@ -2114,6 +2137,7 @@ connectDBComplete(PGconn *conn) int timeout = 0; int last_whichhost = -2;/* certainly different from whichhost */ struct addrinfo *last_addr_cur = NULL; + FILE*trace_file; if (conn == NULL || conn->status == CONNECTION_BAD) return 0; @@ -2171,6 +2195,9 @@ connectDBComplete(PGconn *conn) switch (flag) { case PGRES_POLLING_OK: + trace_file = fopen("/home/postgres/tracelog/regression_trace
RE: libpq debug log
From: Tom Lane > But I think passing the message start address explicitly might be better than > having it understand the buffering behavior in enough detail to know where to > find the message. Part of the point here > (IMO) is to decouple the tracing logic from the core libpq logic, in hopes of > not > having common-mode bugs. Ouch, you're perfectly right. Then let's make the signature: void pqLogMessage(PGconn *conn, const char *message, bool toServer); Thank you! Regards Takayuki Tsunakawa
Re: libpq debug log
"tsunakawa.ta...@fujitsu.com" writes: > Oops, the logging facility needs the destination (conn->pfdebug). So, it > will be: > void pqLogMessage(PGconn *conn, bool toServer); Right, and it'd need the debug flags too, so +1 for just passing the PGconn instead of handing those over piecemeal. But I think passing the message start address explicitly might be better than having it understand the buffering behavior in enough detail to know where to find the message. Part of the point here (IMO) is to decouple the tracing logic from the core libpq logic, in hopes of not having common-mode bugs. regards, tom lane
RE: libpq debug log
From: tsunakawa.ta...@fujitsu.com > I understood that the former is pqParseInput3() and the latter is > pqPutMsgEnd(). They call the logging function wne conn->pfdebug is not > NULL. Its signature is like this (that will be defined in libpq-trace.c): > > void pqLogMessage(const char *message, bool toServer); Oops, the logging facility needs the destination (conn->pfdebug). So, it will be: void pqLogMessage(PGconn *conn, bool toServer); The function should know where to extract the message from. Regards Takayuki Tsunakawa
RE: libpq debug log
Tom-san, Alvaro-san, From: Tom Lane > I took a quick look through the v22 patch, and TBH I don't like much of > anything > at all about the proposed architecture. It's retained most of the flavor of > the > way it was done before, which was a hangover from the old process-on-the-fly > scheme. Yes, the patch just follows the current style of interspersing. (But some places are removed so it's a bit cleaner.) Anyway, I think your suggestion should be better, resulting in much cleaner separation of message handling and logging. > I think the right way to do this, now that we've killed off v2 protocol > support, is to > rely on the fact that libpq fully buffers both incoming and outgoing messages. > We should nuke every last bit of the existing code that intersperses tracing > logic > with normal message decoding and construction, and instead have two tracing > entry points: FWIW, I was surprised and glad when I saw the commit message to remove the v2 protocol. > (1) Log a received message. This is called as soon as we know (from the > length word) that we have the whole message available, and it's just passed a > pointer into the input buffer. It should examine the input message for itself > and print the details. > > (2) Log a message-to-be-sent. Again, call it as soon as we've constructed a > complete message in the output buffer, and let it examine and print that > message for itself. I understood that the former is pqParseInput3() and the latter is pqPutMsgEnd(). They call the logging function wne conn->pfdebug is not NULL. Its signature is like this (that will be defined in libpq-trace.c): void pqLogMessage(const char *message, bool toServer); The message argument points to the message type byte. toServer is true for messages sent to the server. The signature could alternatively be one of the following, but this is a matter of taste: void pqLogMessage(char message_type, const char *message, bool toServer); void pqLogMessage(char message_type, int length, const char *message, bool toServer); This function simply outputs the timestamp, message direction, message type, length, and message contents. The output processing of message contents branches on a message type with a switch-case statement. Perhaps the output processing of each message should be separated into its own function like pqLogMessageD() for 'D' message so that the indentation won't get deep in the main logging function. > Both of these pieces of logic could be written directly from the protocol > specification, without any interaction with the main libpq code paths, which > would be a Good Thing IMO. The current intertwined approach is somewhere > between useless and counterproductive if you're in need of tracing down a > libpq bug. (In other words, I'm suggesting that the apparent need for > duplicate logic would be good not bad, and indeed that it'd be best to write > the > tracing logic without consulting the existing libpq code at all.) Yes, the only thing I'm concerned about is to have two places that have to be aware of the message format -- message encoding/decoding and logging. But this cleaner separation would pay it. Alvaro-san, what do you think? Anyway, we'll soon post the updated patch that fixes the repeated ParseComplete issue based on the current patch. After that, we'll send a patch based on Tom-san's idea. Regards Takayuki Tsunakawa
Re: libpq debug log
"tsunakawa.ta...@fujitsu.com" writes: > From: Kyotaro Horiguchi >> Using (inCursor - inStart) as logCursor doesn't work correctly if tracing >> state >> desyncs. Once desync happens inStart can be moved at the timing that the >> tracing code doesn't expect. This requires (as I mentioned upthread) >> pqReadData to actively reset logCursor, though. >> >> logCursor should move when bytes are fed to the tracing functoins even when >> theyare the last bytes of a message. > Hmm, sharp and nice insight. I'd like Iwata-san and Kirk to consider this, > including Horiguchi-san's previous mails. I took a quick look through the v22 patch, and TBH I don't like much of anything at all about the proposed architecture. It's retained most of the flavor of the way it was done before, which was a hangover from the old process-on-the-fly scheme. I think the right way to do this, now that we've killed off v2 protocol support, is to rely on the fact that libpq fully buffers both incoming and outgoing messages. We should nuke every last bit of the existing code that intersperses tracing logic with normal message decoding and construction, and instead have two tracing entry points: (1) Log a received message. This is called as soon as we know (from the length word) that we have the whole message available, and it's just passed a pointer into the input buffer. It should examine the input message for itself and print the details. (2) Log a message-to-be-sent. Again, call it as soon as we've constructed a complete message in the output buffer, and let it examine and print that message for itself. Both of these pieces of logic could be written directly from the protocol specification, without any interaction with the main libpq code paths, which would be a Good Thing IMO. The current intertwined approach is somewhere between useless and counterproductive if you're in need of tracing down a libpq bug. (In other words, I'm suggesting that the apparent need for duplicate logic would be good not bad, and indeed that it'd be best to write the tracing logic without consulting the existing libpq code at all.) This would, I think, also eliminate the need for extra storage to hold info about bits of the message, which IMO is a pretty significant downside of the proposed design. The printing logic would just print directly to Pfdebug; it wouldn't have to accumulate anything, and so it wouldn't have the out-of-memory failure modes that this patch has. You could also get rid of messy stuff like pqTraceForciblyCloseBeMsg. Lastly, this'd reduce the overhead the tracing functionality imposes on normal usage to about nil. Admittedly, all those "if (Pfdebug)" tests probably don't cost that much execution-wise, but they cost something. Making only one such test per sent or received message has to be better. regards, tom lane