On Tue, 21 Nov 2023 at 09:52, Adam Hendel <a...@tembo.io> wrote: > > Hello, > > On Mon, Nov 13, 2023 at 6:01 PM Andres Freund <and...@anarazel.de> wrote: >> >> Hi, >> >> On 2023-11-13 11:55:07 -0600, Adam Hendel wrote: >> > Currently, pgbench will log individual transactions to a logfile when the >> > `--log` parameter flag is provided. The logfile, however, does not include >> > column header. It has become a fairly standard expectation of users to have >> > column headers present in flat files. Without the header in the pgbench log >> > files, new users must navigate to the docs and piece together the column >> > headers themselves. Most industry leading frameworks have tooling built in >> > to read column headers though, for example python/pandas read_csv(). >> >> The disadvantage of doing that is that a single pgbench run with --log will >> generate many files when using -j, to avoid contention. The easiest way to >> deal with that after the run is to cat all the log files to a larger one. If >> you do that with headers present, you obviously have a few bogus rows (the >> heades from the various files). >> >> I guess you could avoid the "worst" of that by documenting that the combined >> log file should be built by >> cat {$log_prefix}.${pid} {$log_prefix}.${pid}.* >> and only outputting the header in the file generated by the main thread. >> >> >> > We can improve the experience for users by adding column headers to pgbench >> > logfiles with an optional command line flag, `--log-header`. This will keep >> > the API backwards compatible by making users opt-in to the column headers. >> > It follows the existing pattern of having conditional flags in pgbench’s >> > API; the `--log` option would have both –log-prefix and –log-header if this >> > work is accepted. >> >> > The implementation considers the column headers only when the >> > `--log-header` flag is present. The values for the columns are taken >> > directly from the “Per-Transaction Logging” section in >> > https://www.postgresql.org/docs/current/pgbench.html and takes into account >> > the conditional columns `schedule_lag` and `retries`. >> > >> > >> > Below is an example of what that logfile will look like: >> > >> > >> > pgbench postgres://postgres:postgres@localhost:5432/postgres --log >> > --log-header >> > >> > client_id transaction_no time script_no time_epoch time_us >> >> Independent of your patch, but we imo ought to combine time_epoch time_us in >> the log output. There's no point in forcing consumers to combine those >> fields, >> and it makes logging more expensive... And if we touch this, we should just >> switch to outputting nanoseconds instead of microseconds. >> >> It also is quite odd that we have "time" and "time_epoch", "time_us", where >> time is the elapsed time of an individual "transaction" and time_epoch + >> time_us together are a wall-clock timestamp. Without differentiating between >> those, the column headers seem not very useful, because one needs to look in >> the documentation to understand the fields. >> >> >> I don't think there's all that strong a need for backward compatibility in >> pgbench. We could just change the columns as I suggest above and then always >> emit the header in the "main" log file. > > > I updated the patch to always log the header and only off the main thread. As > for the time headers, I will work on renaming/combining those in a separate > commit as:
One of the test has failed in CFBOT at[1] with: [09:15:00.526](0.000s) not ok 422 - transaction count for /tmp/cirrus-ci-build/build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/001_pgbench_log_3.25193 (11) [09:15:00.526](0.000s) # Failed test 'transaction count for /tmp/cirrus-ci-build/build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/001_pgbench_log_3.25193 (11)' # at /tmp/cirrus-ci-build/src/bin/pgbench/t/001_pgbench_with_server.pl line 1257. [09:15:00.526](0.000s) not ok 423 - transaction format for 001_pgbench_log_3 [09:15:00.526](0.000s) # Failed test 'transaction format for 001_pgbench_log_3' # at /tmp/cirrus-ci-build/src/bin/pgbench/t/001_pgbench_with_server.pl line 1257. # Log entry not matching: client_id transaction_no time script_no time_epoch time_us # Running: pgbench --no-vacuum -f /tmp/cirrus-ci-build/build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/001_pgbench_incomplete_transaction_block More details for the same is available at [2] [1] - https://cirrus-ci.com/task/5139049757802496 [2] - https://api.cirrus-ci.com/v1/artifact/task/5139049757802496/testrun/build/testrun/pgbench/001_pgbench_with_server/log/regress_log_001_pgbench_with_server Regards, Vignesh