Re: pgbench - rework variable management
On Tue, Sep 3, 2019 at 4:57 PM Fabien COELHO wrote: > I noticed, but I do not have any windows host so I cannot test locally. > > The issue is how to do a mutex on Windows, which does not have pthread so > it has to be emulated. I'll try again by sending a blind update to the > patch and see how it goes. If you have the patience and a github account, you can push code onto a public github branch having also applied the patch mentioned at https://wiki.postgresql.org/wiki/Continuous_Integration, go to appveyor.com and tell it to watch your git hub account, and then it'll build and test every time you push a new tweak. Takes a few minutes to get the answer each time you try something, but I have managed to get things working on Windows that way. -- Thomas Munro https://enterprisedb.com
Re: block-level incremental backup
On Fri, Aug 16, 2019 at 3:54 PM Jeevan Chalke wrote: > 0003: +/* + * When to send the whole file, % blocks modified (90%) + */ +#define WHOLE_FILE_THRESHOLD 0.9 How this threshold is selected. Is it by some test? - magic number, currently 0 (4 bytes) I think in the patch we are using (#define INCREMENTAL_BACKUP_MAGIC 0x494E4352) as a magic number, not 0 + Assert(statbuf->st_size <= (RELSEG_SIZE * BLCKSZ)); + + buf = (char *) malloc(statbuf->st_size); + if (buf == NULL) + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("out of memory"))); + + if ((cnt = fread(buf, 1, statbuf->st_size, fp)) > 0) + { + Bitmapset *mod_blocks = NULL; + int nmodblocks = 0; + + if (cnt % BLCKSZ != 0) + { It will be good to add some comments for the if block and also for the assert. Actully, it's not very clear from the code. 0004: +#include +#include +#include Header file include order (sys/state.h should be before time.h) + printf(_("%s combines full backup with incremental backup.\n\n"), progname); /backup/backups + * scan_file + * + * Checks whether given file is partial file or not. If partial, then combines + * it into a full backup file, else copies as is to the output directory. + */ /If partial, then combines/ If partial, then combine +static void +combine_partial_files(const char *fn, char **IncrDirs, int nIncrDir, + const char *subdirpath, const char *outfn) + /* + * Open all files from all incremental backup directories and create a file + * map. + */ + basefilefound = false; + for (i = (nIncrDir - 1), fmindex = 0; i >= 0; i--, fmindex++) + { + fm = &filemaps[fmindex]; + . + } + + + /* Process all opened files. */ + lastblkno = 0; + modifiedblockfound = false; + for (i = 0; i < fmindex; i++) + { + char*buf; + int hsize; + int k; + int blkstartoffset; .. + } + + for (i = 0; i <= lastblkno; i++) + { + char blkdata[BLCKSZ]; + FILE*infp; + int offset; ... + } } Can we breakdown this function in 2-3 functions. At least creating a file map can directly go to a separate function. I have read 0003 and 0004 patch and there are few cosmetic comments. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Patch to add hook to copydir()
On 2019-09-02 22:16, Swen Kooij wrote: > Is there anything that I am missing? My early experiments have been > very promising but my experience with Postgres internals is limited. Any > help or feedback would be much appreciated. You might want to review several previous threads that were contemplating doing some reflink stuff with btrfs during database creation. Not exactly what you are doing, but related. Also, wouldn't it work if your extension just defined its own copydir() in the .so? That should then be used over the built-in one -- maybe. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: fix "Success" error messages
On 2019-08-27 08:27, Michael Paquier wrote: > Thanks for the new patch, and you are right that pg_checksums has been > slacking here. There is the same issue with pg_verify_checksums in > 11. Not sure that's worth a back-patch though. Those parts could > find their way to v12 easily. Committed to master and PG12 with your suggested changes. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Proposal: roll pg_stat_statements into core
grOn Mon, Sep 02, 2019 at 12:07:17PM -0400, Tom Lane wrote: > Euler Taveira writes: >> At least if pg_stat_statements >> was in core you could load it by default and have a GUC to turn it >> on/off without restarting the server (that was Magnus proposal and >> Andres agreed). > > That assertion is 100% bogus. To turn it on or off on-the-fly, > you'd need some way to acquire or release its shared memory > on-the-fly. > > It's probably now possible to do something like that, using the > DSM mechanisms, but the code for it hasn't been written (AFAIK). > And it wouldn't have much to do with whether the module was > in core or stayed where it is. If we were to actually merge the module into core and switch to DSM instead of the current fixed amout of shared memory defined at start time, then that would be a two-step process: first push the functions into code with a GUC_POSTMASTER as currently done, and secondly attempt to switch the GUC to be reloadable. FWIW, I am not sure that we should have the module into core. -- Michael signature.asc Description: PGP signature
Re: Contribution to Perldoc for TestLib module in Postgres
On Mon, Sep 02, 2019 at 01:48:14PM -0400, Alvaro Herrera wrote: > Agreed ... that's pretty much the same thing I tried to say upthread. I > wrote my own synopsis, partly using your suggestions. I reworded the > description for the routines (I don't think there's any I didn't touch), > added a mention of $windows_os, added a =head1 to split out the ad-hoc > routines from the Test::More wrappers. > > And pushed. > > Please give it another look. It might need more polish. Thanks for committing. I have read through the commit and I am not noticing any issue sorting out. One thing may be to give a short description for some of the routine's arguments like check_mode_recursive, but I think that we could live without that either. -- Michael signature.asc Description: PGP signature
Re: SIGQUIT on archiver child processes maybe not such a hot idea?
At Tue, 3 Sep 2019 02:42:19 +, "Tsunakawa, Takayuki" wrote in <0A3221C70F24FB45833433255569204D1FD1609C@G01JPEXMBYT05> > From: Kyotaro Horiguchi [mailto:horikyota@gmail.com] > > Since we are allowing OPs to use arbitrary command as > > archive_command, providing a replacement with non-standard signal > > handling for a specific command doesn't seem a general solution > > to me. Couldn't we have pg_system(a tentative name), which > > intercepts SIGQUIT then sends SIGINT to children? Might be need > > to resend SIGQUIT after some interval, though.. > > The same idea that you referred to as pg_system occurred to me, > too. But I wondered if the archiver process can get the pid of > its child (shell? archive_command?), while keeping the > capabilities of system() (= the shell). Even if we fork() and > then system(), doesn't the OS send SIGQUIT to any descendents I imagnined that we don't use system(3) at all. It would be almost simplified version of system(3), but doesn't use it. > of the archiver when postmaster sends SIGQUIT to the child > process group? Mmmm.. That's bad. The commit 3ad0728c81 intends to send SIGQUIT to the programs under system() for archiver process. I found the discusion for the patch. https://www.postgresql.org/message-id/flat/24849.1163826231%40sss.pgh.pa.us#c987fc18a4b9113c6276ae8b5d85abba > Reimplementing system() seems pretty ugly, but maybe we have no choice. > It strikes me that system() has a race condition as defined anyway, > because if a signal arrives between blocking the handler and issuing the > fork(), it'll disappear into the ether; and the same at the end of the > routine. .. >> Alternatively perhaps Postgres really ought to be using USR1/USR2 or other >> signals that library routines won't think they have any business rearranging. > > The existing signal assignments were all picked for what seem to me > to be good reasons; I'm disinclined to change them. In any case, the > important point here is that we'd really like an archive or recovery > script, or for that matter any command executed via system() from a > backend, to abort when the parent backend is SIGINT'd or SIGQUIT'd. But now we know that sending it to grand-children is wrong in a sense that that leads to left-alone unwanted core files. But the behavior is already knwon at the time. So, Now I know that we need to revert that in certain extent if we want to stop the core-dumping behavior... I'm inclined to think that we should just document that behavior.. > > # Is there any means to view the whole of a thread from archive? > > # I'm a kind of reluctant to wander among messages like a rat in > > # a maze:p > > You can see the whole thread by clicking the "Whole Thread" link. What??? Oops!!! I found it among "Whole Mssage", "Download mbox" and "Resend email". Sorry for my slipperly eyes and thanks! regrds. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: psql - improve test coverage from 41% to 88%
Attached is a rebase after TestLib.pm got a documentation in 6fcc40b1. The attached patch improves psql code coverage by adding a specific TAP test. The 1709 tests take 4 seconds CPU (6.3 elapsed time) on my laptop. The infrastructure is updated to require perl module "Expect", allowing to test interactive features such as tab completion and prompt changes. Coverage in "src/bin/psql" jumps from 40.0% of lines and 41.9% of functions to 78.4% of lines and 98.1% of functions with "check-world". -- Fabien.diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index b7d36b65dd..aabc27ab6f 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -506,6 +506,7 @@ system_or_bail 'pg_ctl', '-D', $pgdata, 'start'; $node->command_checks_all( [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt" ], 1, + '', [qr{^$}], [qr/^WARNING.*checksum verification failed/s], 'pg_basebackup reports checksum mismatch'); @@ -526,6 +527,7 @@ system_or_bail 'pg_ctl', '-D', $pgdata, 'start'; $node->command_checks_all( [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2" ], 1, + '', [qr{^$}], [qr/^WARNING.*further.*failures.*will.not.be.reported/s], 'pg_basebackup does not report more than 5 checksum mismatches'); @@ -542,6 +544,7 @@ system_or_bail 'pg_ctl', '-D', $pgdata, 'start'; $node->command_checks_all( [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt3" ], 1, + '', [qr{^$}], [qr/^WARNING.*7 total checksum verification failures/s], 'pg_basebackup correctly report the total number of checksum mismatches'); diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl index 59228b916c..cf4811d382 100644 --- a/src/bin/pg_checksums/t/002_actions.pl +++ b/src/bin/pg_checksums/t/002_actions.pl @@ -62,6 +62,7 @@ sub check_relation_corruption '--filenode', $relfilenode_corrupted ], 1, + '', [qr/Bad checksums:.*1/], [qr/checksum verification failed/], "fails with corrupted data for single relfilenode on tablespace $tablespace" @@ -71,6 +72,7 @@ sub check_relation_corruption $node->command_checks_all( [ 'pg_checksums', '--check', '-D', $pgdata ], 1, + '', [qr/Bad checksums:.*1/], [qr/checksum verification failed/], "fails with corrupted data on tablespace $tablespace"); @@ -203,6 +205,7 @@ sub fail_corrupt $node->command_checks_all( [ 'pg_checksums', '--check', '-D', $pgdata ], 1, + '', [qr/^$/], [qr/could not read block 0 in file.*$file\":/], "fails for corrupted data in $file"); diff --git a/src/bin/pg_controldata/t/001_pg_controldata.pl b/src/bin/pg_controldata/t/001_pg_controldata.pl index 3b63ad230f..ebe9b80a52 100644 --- a/src/bin/pg_controldata/t/001_pg_controldata.pl +++ b/src/bin/pg_controldata/t/001_pg_controldata.pl @@ -33,6 +33,7 @@ close $fh; command_checks_all( [ 'pg_controldata', $node->data_dir ], 0, + '', [ qr/WARNING: Calculated CRC checksum does not match value stored in file/, qr/WARNING: invalid WAL segment size/ diff --git a/src/bin/pg_resetwal/t/002_corrupted.pl b/src/bin/pg_resetwal/t/002_corrupted.pl index f9940d7fc5..1990669d26 100644 --- a/src/bin/pg_resetwal/t/002_corrupted.pl +++ b/src/bin/pg_resetwal/t/002_corrupted.pl @@ -30,6 +30,7 @@ close $fh; command_checks_all( [ 'pg_resetwal', '-n', $node->data_dir ], 0, + '', [qr/pg_control version number/], [ qr/pg_resetwal: warning: pg_control exists but is broken or wrong version; ignoring it/ @@ -46,6 +47,7 @@ close $fh; command_checks_all( [ 'pg_resetwal', '-n', $node->data_dir ], 0, + '', [qr/pg_control version number/], [ qr/\Qpg_resetwal: warning: pg_control specifies invalid WAL segment size (0 bytes); proceed with caution\E/ diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl index b82d3f65c4..01010914fe 100644 --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -50,7 +50,7 @@ sub pgbench push @cmd, @args; - $node->command_checks_all(\@cmd, $stat, $out, $err, $name); + $node->command_checks_all(\@cmd, $stat, '', $out, $err, $name); # cleanup? #unlink @filenames or die "cannot unlink files (@filenames): $!"; diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl index f7fa18418b..b58f3548c3 100644 --- a/src/bin/pgbench/t/002_pgbench_no_server.pl +++ b/src/bin/pgbench/t/002_pgbench_no_server.pl @@ -25,7 +25,7 @@ sub pgbench my ($opts, $stat, $out, $err, $name) = @_; print STDERR "opts=$opts, stat=$stat, out=$out, err=$err, name=$name"; command_checks_all([ 'pgbench', split(/\s+/, $opts) ], - $stat, $out, $err, $name); + $stat, '', $out, $err, $name); return; } @@ -52,7 +52,7 @@ sub pgbench_scripts push @cmd, '-f', $filename; } } - command_checks_all(\@cmd, $stat, $out, $err, $name); + command_checks_all(\@cmd, $stat
Re: [HACKERS] CLUSTER command progress monitor
On Tue, Sep 03, 2019 at 01:59:00PM +0900, Masahiko Sawada wrote: > After more thought, even if we don't start a new command progress when > there is another one already started the progress update functions > could be called and these functions don't specify the command type. > Therefore, the progress information might be changed with wrong value > by different command. Probably we can change the caller of progress > updating function so that it doesn't call all of them if the command > could not start a new progress report, but it might be a big change. That's one issue. > As an alternative idea, we can make pgstat_progress_end_command() have > one argument that is command the caller wants to end. That is, we > don't end the command progress when the specified command type doesn't > match to the command type of current running command progress. We > unconditionally clear the progress information at CommitTransaction() > and AbortTransaction() but we can do that by passing > PROGRESS_COMMAND_INVALID to pgstat_progress_end_command(). Possibly. I don't dislike the idea of piling up the progress information for cascading calls and I would use that with a depth counter and a fixed-size array. > BTW the following condition in pgstat_progress_end_command() seems to > be wrong. We should return from the function when either > pgstat_track_activities is disabled or the current progress command is > invalid. > >if (!pgstat_track_activities > && beentry->st_progress_command == PROGRESS_COMMAND_INVALID) > return; > > I've attached the patch fixes the issue I newly found. Indeed, good catch. This is wrong since b6fb647 which has introduced the progress reports. I'll fix that one and back-patch if there are no objections. With my RMT hat on for v12, I don't think that it is really the moment to discuss how we want to change this API post beta3, and we have room for that in future development cycles. There are quite some questions which need answers I am unsure of: - Do we really want to support nested calls of progress reports for multiple command? - Perhaps for some commands it makes sense to have an overlap of the fields used, but we need a clear definition of what can be done or not. I am not really comfortable with the idea of having in reindex_relation() a progress report related only to CLUSTER, which is also a REINDEX code path. The semantics shared between both commands need to be thought a bit more. For example PROGRESS_CLUSTER_INDEX_REBUILD_COUNT could cause the system catalog to report PROGRESS_CREATEIDX_PHASE_WAIT_3 because of an incorrect command type, which would be just wrong for a CLUSTER command. - Which command should be reported to the user, only the upper-level one? - Perhaps we can live only with the approach of not registering a new command if one already exists, and actually be more flexible with the phase fields used, in short we use unique numbers for each phase? The most conservative bet from a release point of view, and actually my bet because that's safe, would be to basically revert 6f97457 (CLUSTER), 03f9e5c (REINDEX) and ab0dfc96 (CREATE INDEX which has overlaps with REINDEX in the build and validation paths). What I am really scared of is that we have just barely scratched the surface of the issues caused by the inter-dependencies between all the code paths of those commands, and that we have much more waiting behind this open item. -- Michael signature.asc Description: PGP signature
Re: row filtering for logical replication
On 2019-09-03 05:32, Euler Taveira wrote: Em ter, 3 de set de 2019 às 00:16, Alexey Zagarin escreveu: There are complaints in the log (both pub and sub) like: ERROR: trying to store a heap tuple into wrong type of slot I have no idea what causes that. Yeah, I've seen that too. It was fixed by Alexey Kondratov, in line 955 of 0005-Row-filtering-for-logical-replication.patch it should be &TTSOpsHeapTuple instead of &TTSOpsVirtual. Ops... exact. That was an oversight while poking with different types of slots. OK, I'll consider Alexey Kondratov's set of patches as the current state-of-the-art then. (They still apply.) I found a problem where I'm not sure it's a bug: The attached bash script does a test by setting up pgbench tables on both master and replica, and then sets up logical replication for a slice of pgbench_accounts. Then it does a short pgbench run, and loops until the results become identical(ok) (or breaks out after a certain time (NOK=not ok)). It turns out this did not work until I added a wait state after the CREATE SUBSCRIPTION. It always fails without the wait state, and always works with the wait state. Do you agree this is a bug? thanks (also to both Alexeys :)) Erik Rijkers PS by the way, this script won't run as-is on other machines; it has stuff particular to my local setup. #!/bin/bash # postgres binary compiled with # # pgpatches/0130/logrep_rowfilter/20190902/v2-0001-Remove-unused-atttypmod-column-from-initial-table.patch # pgpatches/0130/logrep_rowfilter/20190902/v2-0002-Store-number-of-tuples-in-WalRcvExecResult.patch # pgpatches/0130/logrep_rowfilter/20190902/v2-0003-Refactor-function-create_estate_for_relation.patch # pgpatches/0130/logrep_rowfilter/20190902/v2-0004-Rename-a-WHERE-node.patch # pgpatches/0130/logrep_rowfilter/20190902/v2-0005-Row-filtering-for-logical-replication.patch # pgpatches/0130/logrep_rowfilter/20190902/v2-0006-Print-publication-WHERE-condition-in-psql.patch # pgpatches/0130/logrep_rowfilter/20190902/v2-0007-Publication-where-condition-support-for-pg_dump.patch # pgpatches/0130/logrep_rowfilter/20190902/v2-0008-Debug-for-row-filtering.patch # pgpatches/0130/logrep_rowfilter/20190902/v2-0009-Add-simple-BDR-test-for-row-filtering.patch unset PGDATABASE PGPORT PGSERVICE export PGDATABASE=postgres root_dir=/tmp/cascade/logrep_rowfilter mkdir -p $root_dir BIN=$HOME/pg_stuff/pg_installations/pgsql.logrep_rowfilter/bin export PATH=$BIN:$PATH initdb=$BIN/initdb postgres=$BIN/postgres pg_ctl=$BIN/pg_ctl baseport=6525 port1=$(( $baseport + 0 )) port2=$(( $baseport + 1 )) appname=rowfilter num_instances=2 scale=1 where="where (aid between 4 and 5-1)" # scale=10 where="where (aid between 40 and 40+5-1)" clients=64 duration=20 wait=10 BASTA_COUNT=40 # 7200 # wait seconds in total if [[ -d $root_dir/instance1 ]]; then rm -rf $root_dir/instance1; fi if [[ -d $root_dir/instance2 ]]; then rm -rf $root_dir/instance2; fi if [[ -d $root_dir/instance1 ]]; then exit ; fi if [[ -d $root_dir/instance2 ]]; then exit ; fi devel_file=/tmp/bugs echo filterbug>$devel_file for n in `seq 1 $num_instances` do instance=instance$n server_dir=$root_dir/$instance data_dir=$server_dir/data port=$(( $baseport + $n -1 )) logfile=$server_dir/logfile.$port echo "-- $initdb --pgdata=$data_dir --encoding=UTF8 --pwfile=$devel_file " $initdb --pgdata=$data_dir --encoding=UTF8 --pwfile=$devel_file &> /dev/null ( $postgres -D $data_dir -p $port \ --wal_level=logical --logging_collector=on \ --client_min_messages=warning \ --log_directory=$server_dir --log_filename=logfile.${port} \ --log_replication_commands=on & ) &> /dev/null done echo "sleep 3s" sleep 3 echo " drop table if exists pgbench_accounts; drop table if exists pgbench_branches; drop table if exists pgbench_tellers; drop table if exists pgbench_history;" | psql -qXp $port1 \ && echo " drop table if exists pgbench_accounts; drop table if exists pgbench_branches; drop table if exists pgbench_tellers; drop table if exists pgbench_history;" | psql -qXp $port2 \ && pgbench -p $port1 -qis $scale \ && echo "alter table pgbench_history add column hid serial primary key;" \ | psql -q1Xp $port1 && pg_dump -F c -p $port1 \ --exclude-table-data=pgbench_history \ --exclude-table-data=pgbench_accounts \ --exclude-table-data=pgbench_branches \ --exclude-table-data=pgbench_tellers \ -t pgbench_history -t pgbench_accounts \ -t pgbench_branches -t pgbench_tellers \ | pg_restore -1 -p $port2 -d postgres pub1=pub_${port1}_to_${port2} sub1=sub_${port2}_fr_${port1} echo -ne " cr
Re: SIGQUIT on archiver child processes maybe not such a hot idea?
On Tue, Sep 3, 2019 at 2:43 PM Tsunakawa, Takayuki wrote: > From: Kyotaro Horiguchi [mailto:horikyota@gmail.com] > > Since we are allowing OPs to use arbitrary command as > > archive_command, providing a replacement with non-standard signal > > handling for a specific command doesn't seem a general solution > > to me. Couldn't we have pg_system(a tentative name), which > > intercepts SIGQUIT then sends SIGINT to children? Might be need > > to resend SIGQUIT after some interval, though.. > > The same idea that you referred to as pg_system occurred to me, too. But I > wondered if the archiver process can get the pid of its child (shell? > archive_command?), while keeping the capabilities of system() (= the shell). > Even if we fork() and then system(), doesn't the OS send SIGQUIT to any > descendents of the archiver when postmaster sends SIGQUIT to the child > process group? So, to recap what's happening here, we have a tree of processes like this: postmaster -> archiver -> sh -> cp [user-supplied archiving command] The archiver is a process group leader, because it called setsid(). The postmaster's signal_child() does kill(pid, ...) and also kill(-pid, ...), so the kernel sends SIGQUIT to archiver (twice), sh and cp. As for what they do with the signal, it depends on timing: 1. The archiver normally exits immediately in pgarch_exit(), but while system() is running, SIGQUIT and SIGINT are ignored (see POSIX). 2. sh normally uses SIGINT to break out of loops etc, but while it's waiting for a subprocess, it also ignores SIGQUIT and SIGINT (see POSIX). 3. cp inherits the default disposition and (unless it handles it specially) dumps core. I think the general idea here is that interactive shells and similar things want to ignore signals from users typing ^C (SIGINT) or ^\ (SIGQUIT) so they can affect just the thing that's actually running at this moment, not the tree of processes waiting. Yeah, I guess we could have our own pg_system() function that does roughly what system() does, namely fork(), then execl() in the child and waitpid() in the parent, but the child could begin a new process group with setsid() before running execl() (so that it no longer gets SIGQUIT with the postmaster signals the archiver), and the parent could record pg_system_child_pid when forking, and install a QUIT handler that does kill(-pg_system_child_pid, SIGTERM), as well as setting a flag that will cause its main loop to exit (but not before it has run waitpid()). With some carefully placed blocks and unblocks and ignores, to avoid races. That all sounds like a lot of work though, and it might be easier to just make an exception and use SIGTERM to shut down the archiver, as I think Tom was suggesting. Unfortunately we have the same problem elsewhere, where we use popen(). I just wrote a C program that does just "sleep(60)", ran it with COPY FROM PROGRAM, then sent SIGQUIT to the postmaster, and got a dumped core. -- Thomas Munro https://enterprisedb.com
RE: Speed up transaction completion faster after many relations are accessed in a transaction
From: Alvaro Herrera [mailto:alvhe...@2ndquadrant.com] > Hmm ... is this patch rejected, or is somebody still trying to get it to > committable state? David, you're listed as committer. I don't think it's rejected. It would be a pity (mottainai) to refuse this, because it provides significant speedup despite its simple modification. Again, I think the v2 patch is OK. Tom's comment was as follows: [Tom's comment against v2] FWIW, I tried this patch against current HEAD (959d00e9d). Using the test case described by Amit at I do measure an undeniable speedup, close to 35%. However ... I submit that that's a mighty extreme test case. (I had to increase max_locks_per_transaction to get it to run at all.) We should not be using that sort of edge case to drive performance optimization choices. If I reduce the number of partitions in Amit's example from 8192 to something more real-world, like 128, I do still measure a performance gain, but it's ~ 1.5% which is below what I'd consider a reproducible win. I'm accustomed to seeing changes up to 2% in narrow benchmarks like this one, even when "nothing changes" except unrelated code. Trying a standard pgbench test case (pgbench -M prepared -S with one client and an -s 10 database), it seems that the patch is about 0.5% slower than HEAD. Again, that's below the noise threshold, but it's not promising for the net effects of this patch on workloads that aren't specifically about large and prunable partition sets. I'm also fairly concerned about the effects of the patch on sizeof(LOCALLOCK) --- on a 64-bit machine it goes from 72 to 88 bytes, a 22% increase. That's a lot if you're considering cases with many locks. On the whole I don't think there's an adequate case for committing this patch. I'd also point out that this is hardly the only place where we've seen hash_seq_search on nearly-empty hash tables become a bottleneck. So I'm not thrilled about attacking that with one-table-at-time patches. I'd rather see us do something to let hash_seq_search win across the board. * Extreme test case: Not extreme. Two of our customers, who are considering to use PostgreSQL, are using thousands of partitions now. We hit this issue -- a point query gets nearly 20% slower after automatically creating a generic plan. That's the reason for this proposal. * 0.5% slowdown with pgbench: I think it's below the noise, as Tom said. * sizeof(LOCALLOCK): As Andres replied to Tom in the immediately following mail, LOCALLOCK was bigger in PG 11. * Use case is narrow: No. The bloated LockMethodLocalHash affects the performance of the items below as well as transaction commit/abort: - AtPrepare_Locks() and PostPrepare_Locks(): the hash table is scanned twice in PREPARE! - LockReleaseSession: advisory lock - LockReleaseCurrentOwner: ?? - LockReassignCurrentOwner: ?? Regards Takayuki Tsunakawa
Re: row filtering for logical replication
> There are complaints in the log (both pub and sub) like: > ERROR: trying to store a heap tuple into wrong type of slot > > I have no idea what causes that. Yeah, I've seen that too. It was fixed by Alexey Kondratov, in line 955 of 0005-Row-filtering-for-logical-replication.patch it should be &TTSOpsHeapTuple instead of &TTSOpsVirtual.
Re: [HACKERS] CLUSTER command progress monitor
On Mon, Sep 2, 2019 at 6:32 PM Masahiko Sawada wrote: > > On Mon, Sep 2, 2019 at 4:59 PM Michael Paquier wrote: > > > > On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote: > > > > I tried 1. and it shown index_rebuild_count, but it also shown > > > > "initializing" phase again and other columns were empty. So, it is > > > > not suitable to fix the problem. :( > > > > I'm going to try 2. and 3., but, unfortunately, I can't get enough > > > > time to do that after PGConf.Asia 2019. > > > > If we selected 3., it affects following these progress reporting > > > > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay, > > > > I suppose. Any comments welcome! :) > > > > > > I looked at this open item. I prefer #3 but I think it would be enough > > > to have a small stack using a fixed length array to track nested > > > progress information and the current executing command (maybe 4 or 8 > > > would be enough for maximum nested level for now?). That way, we don't > > > need any change the interfaces. AFAICS there is no case where we call > > > only either pgstat_progress_start_command or > > > pgstat_progress_end_command without each other (although > > > pgstat_progress_update_param is called without start). So I think that > > > having a small stack for tracking multiple reports would work. > > > > > > Attached the draft patch that fixes this issue. Please review it. > > > > Do we actually want to show to the user information about CREATE INDEX > > which is different than CLUSTER? It could be confusing for the user > > to see a progress report from a command different than the one > > actually launched. > > I personally think it would be helpful for users. We provide the > progress reporting for each commands but it might not be detailed > enough. But we can provide more details of progress information of > each commands by combining them. Only users who want to confirm the > details need to see different progress reports. > > > There could be a method 4 here: do not start a new > > command progress when there is another one already started, and do not > > try to end it in the code path where it could not be started as it did > > not stack. So while I see the advantages of stacking the progress > > records as you do when doing cascading calls of the progress > > reporting, I am not sure that: > > 1) We should bloat more PgBackendStatus for that. > > 2) We want to add more complication in this area close to the > > release of 12. > > I agreed, especially to 2. We can live with #4 method in PG12 and the > patch I proposed could be discussed as a new feature. After more thought, even if we don't start a new command progress when there is another one already started the progress update functions could be called and these functions don't specify the command type. Therefore, the progress information might be changed with wrong value by different command. Probably we can change the caller of progress updating function so that it doesn't call all of them if the command could not start a new progress report, but it might be a big change. As an alternative idea, we can make pgstat_progress_end_command() have one argument that is command the caller wants to end. That is, we don't end the command progress when the specified command type doesn't match to the command type of current running command progress. We unconditionally clear the progress information at CommitTransaction() and AbortTransaction() but we can do that by passing PROGRESS_COMMAND_INVALID to pgstat_progress_end_command(). BTW the following condition in pgstat_progress_end_command() seems to be wrong. We should return from the function when either pgstat_track_activities is disabled or the current progress command is invalid. if (!pgstat_track_activities && beentry->st_progress_command == PROGRESS_COMMAND_INVALID) return; I've attached the patch fixes the issue I newly found. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center fix_progress_end_command.patch Description: Binary data
Re: pgbench - rework variable management
Some windows-specific hacks are note tested. Somehow this macro hackery has upset the Windows socket headers: https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.55019 I noticed, but I do not have any windows host so I cannot test locally. The issue is how to do a mutex on Windows, which does not have pthread so it has to be emulated. I'll try again by sending a blind update to the patch and see how it goes. -- Fabien.
Re: row filtering for logical replication
Em ter, 3 de set de 2019 às 00:16, Alexey Zagarin escreveu: > > There are complaints in the log (both pub and sub) like: > ERROR: trying to store a heap tuple into wrong type of slot > > I have no idea what causes that. > > > Yeah, I've seen that too. It was fixed by Alexey Kondratov, in line 955 of > 0005-Row-filtering-for-logical-replication.patch it should be > &TTSOpsHeapTuple instead of &TTSOpsVirtual. > Ops... exact. That was an oversight while poking with different types of slots. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
RE: SIGQUIT on archiver child processes maybe not such a hot idea?
From: Kyotaro Horiguchi [mailto:horikyota@gmail.com] > Since we are allowing OPs to use arbitrary command as > archive_command, providing a replacement with non-standard signal > handling for a specific command doesn't seem a general solution > to me. Couldn't we have pg_system(a tentative name), which > intercepts SIGQUIT then sends SIGINT to children? Might be need > to resend SIGQUIT after some interval, though.. The same idea that you referred to as pg_system occurred to me, too. But I wondered if the archiver process can get the pid of its child (shell? archive_command?), while keeping the capabilities of system() (= the shell). Even if we fork() and then system(), doesn't the OS send SIGQUIT to any descendents of the archiver when postmaster sends SIGQUIT to the child process group? > # Is there any means to view the whole of a thread from archive? > # I'm a kind of reluctant to wander among messages like a rat in > # a maze:p You can see the whole thread by clicking the "Whole Thread" link. Regards Takayuki Tsunakawa
Re: Converting NOT IN to anti-joins during planning
On Tue, 3 Sep 2019 at 09:21, Alvaro Herrera wrote: > David, will we hear from you on this patch during this month? > It sounds from Antonin's review that it needs a few changes. Thanks for checking. I'm currently quite committed with things away from the community and it's unlikely I'll get to this in September. I'll kick it out to the next 'fest now. Antonin, Thank you for the review. I will respond to it when I get time again. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout
On Mon, Sep 02, 2019 at 05:38:56PM +0900, Michael Paquier wrote: > Thinking wider, don't we have the same problem with wal_sender_timeout > in the case where a sync request takes longer than the time it would > take the backend to terminate the connection? I have been able to work more on that, and that can indeed happen with wal_sender_timeout. While reviewing the code, I have noticed that there is little point to enable do_sync when fetching WAL segments. This actually led to too many fsyncs done for the plain format as each WAL segment is fsync'd first by walmethods.c, then fsync'd again by fsync_pgdata() in pg_wal/. Attached is an updated patch, which needs to go down to v10. -- Michael From ec75a75514d9601d1b1c3b929ac1eb870a208aa2 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 3 Sep 2019 10:48:45 +0900 Subject: [PATCH v2] Delay fsync of pg_basebackup tar format until the end Since the addition of fsync requests in bc34223 to make base backup data consistent on disk once pg_basebackup finishes, each tablespace tar file is individually flushed once completed, with an additional flush of the parent directory when the base backup finishes. While holding a connection to the server, a fsync request taking a long time may cause a failure of the base backup. A recent example is tcp_user_timeout, but wal_sender_timeout can cause similar problems. While reviewing the code, there was a second issue causing too many fsync requests to be done when completing WAL records for the tar or plain format. As recursive fsyncs are done at the end for both formats for everything written, it is fine to disable fsync when fetching or streaming WAL files. Reported-by: Ryohei Takahashi Author: Michael Paquier Reviewed-by: Ryohei Takahashi Discussion: https://postgr.es/m/osbpr01mb4550dae2f8c9502894a45aab82...@osbpr01mb4550.jpnprd01.prod.outlook.com Backpatch-through: 10 --- src/bin/pg_basebackup/pg_basebackup.c | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 9207109ba3..da475319b6 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -486,15 +486,18 @@ LogStreamerMain(logstreamer_param *param) #endif stream.standby_message_timeout = standby_message_timeout; stream.synchronous = false; - stream.do_sync = do_sync; + /* fsync happens at the end of pg_basebackup for all data */ + stream.do_sync = false; stream.mark_done = true; stream.partial_suffix = NULL; stream.replication_slot = replication_slot; if (format == 'p') - stream.walmethod = CreateWalDirectoryMethod(param->xlog, 0, do_sync); + stream.walmethod = CreateWalDirectoryMethod(param->xlog, 0, + stream.do_sync); else - stream.walmethod = CreateWalTarMethod(param->xlog, compresslevel, do_sync); + stream.walmethod = CreateWalTarMethod(param->xlog, compresslevel, + stream.do_sync); if (!ReceiveXlogStream(param->bgconn, &stream)) @@ -1346,9 +1349,10 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum) if (copybuf != NULL) PQfreemem(copybuf); - /* sync the resulting tar file, errors are not considered fatal */ - if (do_sync && strcmp(basedir, "-") != 0) - (void) fsync_fname(filename, false); + /* + * Do not sync the resulting tar file yet, all files are synced once at + * the end. + */ } @@ -2138,9 +2142,9 @@ BaseBackup(void) /* * Make data persistent on disk once backup is completed. For tar format - * once syncing the parent directory is fine, each tar file created per - * tablespace has been already synced. In plain format, all the data of - * the base directory is synced, taking into account all the tablespaces. + * sync the parent directory and all its contents as each tar file was not + * synced after being completed. In plain format, all the data of the + * base directory is synced, taking into account all the tablespaces. * Errors are not considered fatal. */ if (do_sync) @@ -2150,7 +2154,7 @@ BaseBackup(void) if (format == 't') { if (strcmp(basedir, "-") != 0) -(void) fsync_fname(basedir, true); +(void) fsync_dir_recurse(basedir); } else { -- 2.23.0 signature.asc Description: PGP signature
Re: pg_waldump and PREPARE
On Tue, Sep 3, 2019 at 3:04 AM Alvaro Herrera wrote: > > On 2019-Aug-01, Michael Paquier wrote: > > > I may be missing something of course, but in this case we argued about > > adding a couple of more fields. In consequence, the patch should be > > waiting on its author, no? > > Fujii, > > Are you in a position to submit an updated version of this patch? Sorry for the long delay... Yes, I will update the patch if necessary. Regards, -- Fujii Masao
Re: Avoid full GIN index scan when possible
On 2019-Aug-07, Tom Lane wrote: > I think this would be committable as it stands, except that replacing > an ALL scan with an EVERYTHING scan could be a performance regression > if the index contains many null items. We need to do something about > that before committing. Nikita, any word on getting this change done? > Unfortunately I'm not sold on either 0002 or 0003 as they stand; > they seem overly complicated, I'm not convinced they're correct, > and you haven't really provided examples showing that all this > extra complexity is worthwhile. I suppose we should call ourselves satisfied if we get 0001 done during this cycle (or at least this commitfest). Further refinement can be had in the future, as needed -- even within pg13, if Nikita or anybody else wants to tackle Tom's suggested approaches (or something completely new, or just contest Tom's points) quickly enough. But I don't think we need that in order to call this CF entry committed. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_waldump and PREPARE
Sorry for the long delay... On Thu, Jul 4, 2019 at 5:25 PM Julien Rouhaud wrote: > > On Thu, Jul 4, 2019 at 9:45 AM Michael Paquier wrote: > > > > On Wed, Jul 03, 2019 at 08:23:44PM +0200, Julien Rouhaud wrote: > > > So the patch compiles and works as intended. I don't have much to say > > > about it, it all looks good to me, since the concerns about xactdesc.c > > > aren't worth the trouble. > > > > > > I'm not sure that I understand Michael's objection though, as > > > xl_xact_prepare is not a new definition and AFAICS it couldn't contain > > > the records anyway. So I'll let him say if he has further objections > > > or if it's ready for committer! > > > > This patch provides parsing information only for the header of the 2PC > > record. Wouldn't it be interesting to get more information from the > > various TwoPhaseRecordOnDisk's callbacks? We could also print much > > more information in xact_desc_prepare(). Like the subxacts, the XID, > > the invalidation messages and the delete-on-abort/commit rels. > > Most of those are already described in the COMMIT PREPARE message, > wouldn't that be redundant? abortrels aren't displayed anywhere > though, so +1 for adding them. xact_desc_abort() for ROLLBACK PREPARED describes abortrels. No? > I also see that the dbid isn't displayed in any of the 2PC message, > that'd be useful to have it directly instead of looking for it in > other messages for the same transaction. dbid is not reported even in COMMIT message. So I don't like adding dbid into only the PREPARE message. Regards, -- Fujii Masao
Re: Hypothetical indexes using BRIN broken since pg10
Julien Rouhaud writes: > On Fri, Jul 26, 2019 at 1:34 PM Heikki Linnakangas wrote: >> The patch assumes the default pages_per_range setting, but looking at >> the code at https://github.com/HypoPG/hypopg, the extension actually >> takes pages_per_range into account when it estimates the index size. I >> guess there's no easy way to pass the pages_per_range setting down to >> brincostestimate(). I'm not sure what we should do about that, but seems >> that just using BRIN_DEFAULT_PAGES_PER_RANGE here is not very accurate. > Yes, hypopg can use a custom pages_per_range as it intercepts it when > the hypothetical index is created. I didn't find any way to get that > information in brincostestimate(), especially for something that could > backpatched. I don't like it, but I don't see how to do better than > just using BRIN_DEFAULT_PAGES_PER_RANGE :( I can tell you what I think ought to happen, but making it happen might be more work than this patch should take on. The right answer IMO is basically for the brinGetStats call to go away from brincostestimate and instead happen during plancat.c's building of the IndexOptInfo. In the case of a hypothetical index, it'd fall to the get_relation_info_hook to fill in suitable fake data. Sounds simple, but: 1. We really don't want even more AM-specific knowledge in plancat.c. So I think the right way to do this would be something along the line of adding a "void *amdata" field to IndexOptInfo, and adding an AM callback to be called during get_relation_info that's allowed to fill that in with some AM-specific data (which the AM's costestimate routine would know about). The existing btree-specific hacks in get_relation_info should migrate into btree's version of this callback, and IndexOptInfo.tree_height should probably go away in favor of keeping that in btree's version of the amdata struct. 2. This approach puts a premium on the get_relation_info callback being cheap, because there's no certainty that the data it fills into IndexOptInfo.amdata will ever get used. For btree, the _bt_getrootheight call is cheap enough to not be a problem, because it just looks at the metapage data that btree keeps cached in the index's relcache entry. The same cannot be said for brinGetStats as it stands: it goes off to read the index metapage. There are at least two ways to fix that: 2a. Teach brin to keep the metapage cached like btree does. This seems like it could be a performance win across the board, but you'd need to work out invalidation behavior, and it'd be a bit invasive. 2b. Define IndexOptInfo.amdata as being filled lazily, that is brincostestimate will invoke brinGetStats and fill in the data if the pointer is currently NULL. Then a hypothetical-index plugin could override that by pre-filling the field with the desired fake data. I don't have a problem with allowing brincostestimate to fill in defaults based on BRIN_DEFAULT_PAGES_PER_RANGE if it sees that amdata is null and the index is hypothetical. But there should be a way for the get_relation_info_hook to do better. BTW, the current patch doesn't apply according to the cfbot, but I think it just needs a trivial rebase over 9c703c169 (ie, assume the index is already locked). I didn't bother to do that since what I recommend above would require a lot more change in that area. regards, tom lane
Re: pgbench - rework variable management
On Wed, Aug 14, 2019 at 3:54 AM Fabien COELHO wrote: > Some windows-specific hacks are note tested. Somehow this macro hackery has upset the Windows socket headers: https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.55019 -- Thomas Munro https://enterprisedb.com
Re: Unix-domain socket support on Windows
On Wed, Aug 14, 2019 at 6:27 AM Peter Eisentraut wrote: > This patch set needs testers with various Windows versions to test > different configurations, combinations, and versions. It's failing to build on cfbot's AppVeyor setup[1]. That's currently using Windows SDK 7.1, so too old for the new AF_UNIX sockets, but presumably something is wrong because it shouldn't fail to compile and link. src/interfaces/libpq/fe-connect.c(2682): warning C4101: 'pwdbuf' : unreferenced local variable [C:\projects\postgresql\libpq.vcxproj] src/interfaces/libpq/fe-connect.c(2687): warning C4101: 'passerr' : unreferenced local variable [C:\projects\postgresql\libpq.vcxproj] fe-connect.obj : error LNK2019: unresolved external symbol getpeereid referenced in function PQconnectPoll [C:\projects\postgresql\libpq.vcxproj] [1] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.55034?fullLog=true -- Thomas Munro https://enterprisedb.com
Re: [HACKERS] logical decoding of two-phase transactions
On 9/2/19 6:12 PM, Alvaro Herrera wrote: I don't understand why this patch record has been kept aliv for so long, since no new version has been sent in ages. If this patch is really waiting on the author, let's see the author do something. If no voice is heard very soon, I'll close this patch as RwF. +1. I should have marked this RWF in March but I ignored it because it was tagged v13 before the CF started. -- -David da...@pgmasters.net
Re: pg_upgrade version checking questions
> On 2 Sep 2019, at 19:59, Alvaro Herrera wrote: > > On 2019-Jul-30, Daniel Gustafsson wrote: > >> I’ll take a stab at tidying all of this up to require less duplication, we’ll >> see where that ends up. > > Hello Daniel, are you submitting a new version soon? I am working on an updated version which unfortunately got a bit delayed, but will be submitted shortly (targeting this week). cheers ./daniel
Re: [PATCH] kNN for btree
On 2019-Sep-03, Alexander Korotkov wrote: > I think patches 0001-0008 are very clear and extends our index-AM > infrastructure in query straightforward way. I'm going to propose > them for commit after some further polishing. Hmm. Why is 0001 needed? I see that 0005 introduces a call to that function, but if attnum == 0 then it doesn't call it. Maybe it was necessary in an older version of the patch? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] kNN for btree
On Mon, Sep 2, 2019 at 9:11 PM Alvaro Herrera wrote: > > On 2019-Jul-12, Nikita Glukhov wrote: > > > Attached 13th version of the patches. > > Hello Nikita, > > Can you please rebase this again? Nikita is on vacation now. Rebased patchset is attached. I think patches 0001-0008 are very clear and extends our index-AM infrastructure in query straightforward way. I'm going to propose them for commit after some further polishing. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0004-Extract-structure-BTScanState-v14.patch.gz Description: GNU Zip compressed data 0003-Enable-ORDER-BY-operator-scans-on-ordered-indexe-v14.patch.gz Description: GNU Zip compressed data 0005-Add-kNN-support-to-btree-v14.patch.gz Description: GNU Zip compressed data 0002-Introduce-ammorderbyopfirstcol-v14.patch.gz Description: GNU Zip compressed data 0001-Fix-get_index_column_opclass-v14.patch.gz Description: GNU Zip compressed data 0006-Add-btree-distance-operators-v14.patch.gz Description: GNU Zip compressed data 0007-Remove-distance-operators-from-btree_gist-v14.patch.gz Description: GNU Zip compressed data 0009-Introduce-ammatchorderby-function-v14.patch.gz Description: GNU Zip compressed data 0008-Add-regression-tests-for-kNN-btree-v14.patch.gz Description: GNU Zip compressed data 0010-Add-btree-support-of-kNN-on-non-first-column-v14.patch.gz Description: GNU Zip compressed data 0011-Allow-ammatchorderby-to-return-pathkey-sublists-v14.patch.gz Description: GNU Zip compressed data 0012-Add-support-of-array-ops-to-btree-kNN-v14.patch.gz Description: GNU Zip compressed data
Re: row filtering for logical replication
On 2019-09-02 01:43, Euler Taveira wrote: Em dom, 1 de set de 2019 às 06:09, Erik Rijkers escreveu: The first 4 of these apply without error, but I can't get 0005 to apply. This is what I use: Erik, I generate a new patch set with patience diff algorithm. It seems it applies cleanly. It did apply cleanly, thanks. But I can't get it to correctly do the partial replication in the attached pgbench-script (similar versions of which script I also used for earlier versions of the patch, last year). There are complaints in the log (both pub and sub) like: ERROR: trying to store a heap tuple into wrong type of slot I have no idea what causes that. I attach a zip: $ unzip -l logrep_rowfilter.zip Archive: logrep_rowfilter.zip Length DateTimeName - -- - 17942 2019-09-03 00:47 logfile.6525 10412 2019-09-03 00:47 logfile.6526 6913 2019-09-03 00:47 logrep_rowfilter_2_nodes.sh 3371 2019-09-03 00:47 output.txt - --- 38638 4 files That bash script runs 2 instances (as compiled on my local setup so it will not run as-is) and tries for one minute to get a slice of the pgbench_accounts table replicated. One minute is short but I wanted short logfiles; I have tried the same up to 20 minutes without the replication completing. I'll try even longer but in the meantime I hope you can figure out why these errors occur. thanks, Erik Rijkers <>
Re: XPRS
On Tue, Sep 3, 2019 at 5:20 AM Tomas Vondra wrote: > FWIW it's not clear to me why the cost would need to be recomputed after > constructing the parallel version of the plan? My understanding is that > the idea is to do cost-based planning for the serial plan, and then just > "mechanically" construct a parallel plan. Although, maybe there could be > multiple parallel alternatives ... Presumably you still need to choose between the serial and parallel plans by comparing costs. You lose some by adding exchange operators, but you win some by dividing cardinality estimates. > >This is sort of fudging together of ideas from conversations with > >Kevin Grittner (who talked about admission control a few years back), > >Peter Geoghegan (who mentioned opportunistically asking for more), and > >things I've heard of on SQL Server ("memory grants"). I think it > >would provide some relief from the problems we see today: it's hard to > >set work_mem so that you never get OOM but you can still use a decent > >amount of your precious memory, especially with mixed parallel and > >non-parallel query workloads thanks to our current > >work_mem-multiplying design. > > I think this is probably the simplest and most realistic first step. > > Whenever I was thinking about memory acquisition, I've assumed we'd > monitor how much memory the plan is expected to use while we're > constructing it. My main problem was what to do when we reach the > per-query limit - whether to (a) simply reject the plan, (b) go back and > see if we can replan with lower work_mem (but how much and for which > nodes?), or (c) just continue. Yeah, it's all quite tricky and circular. But I'm pretty sure that we need caps at execution time, anyway, so I think it's OK to start at that end of the problem and then later try to improve the way the planner. > The proposed plan deals with this by not limiting the per-query (or rather > per-session) budget directly, and instead requesting requesting additional > budget. Which is nice. > > I suspect we should also keep an additional plan that is expected to meet > the session_work_mem limit, aside from the regular cheapest plan, and use > it if it's not much worse. Imagine you have a plan with cost 1000 that > needs (global_work_mem/2 + 1kB) memory, essentially serializing executions > of this query. And then there's an alternative plan with cost 1100 that > can run with session_work_mem. It seems better to just accept the second > plan, because it won't need to wait. Hmm. I wonder if it's worth it. You could also just replan as you said, but I'm wondering if just rejecting the query would be OK. > Another challenge with work_mem is that anyone can modify it arbitrarily, > i.e. a user can do > > SET work_mem = '1TB'; > > and use as much memory as they wist, or even crash the system. I wonder if > we could define the new GUCs (session_work_mem and global_work_mem) in a > way to prevent this. We probably don't want to make them PGC_POSTMASTER > (it seems useful to allow overriding them in ALTER USER/DATABASE), but I > don't think we have a good way to do that at the moment. Any ideas in this > direction? How about something giving the superuser the following GUCs: global_work_mem = 16GB session_min_work_mem = 0.5% -- the amount of quota sessions keep, for fast small queries session_max_work_mem = 20% -- the maximum quota any one session is allowed session_extra_work_mem = 5% -- opportunistic execution-time boost Users are free to plan queries with work_mem = 1TB, and if you do that and it estimates that it wants 512GB, it will be rejected if you try to execute it because it exceeds session_max_work_mem, with a hint telling you to turn down work_mem. Otherwise it either runs or joins the queue if it can't get the quota it needs immediately. Eventually we could try to figure out how to set work_mem to automatic (I don't want to propose a concrete rule, but maybe something based on session_max_work_mem / njoins, with various fudge factors, and some accounting for parallel workers; it's probably good to low-ball it and rely on session_extra_work_mem). Yeah, I think you'd want to be able to set session_XXX on databases and roles so that you can say your regular users can't eat more than 10% of memory each, but a big reporting thing is allowed more. -- Thomas Munro https://enterprisedb.com
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
In the interest of moving things forward, how far are we from making 0001 committable? If I understand correctly, the rest of this patchset depends on https://commitfest.postgresql.org/24/944/ which seems to be moving at a glacial pace (or, actually, slower, because glaciers do move, which cannot be said of that other patch.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] WAL logging problem in 9.4.3?
On Mon, Sep 02, 2019 at 05:15:00PM -0400, Alvaro Herrera wrote: > I have updated this patch's status to "needs review", since v20 has not > received any comments yet. > > Noah, you're listed as committer for this patch. Are you still on the > hook for getting it done during the v13 timeframe? Yes, assuming "getting it done" = "getting the CF entry to state other than Needs Review".
Re: Converting NOT IN to anti-joins during planning
David, will we hear from you on this patch during this month? It sounds from Antonin's review that it needs a few changes. Thanks -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] logical decoding of two-phase transactions
I don't understand why this patch record has been kept aliv for so long, since no new version has been sent in ages. If this patch is really waiting on the author, let's see the author do something. If no voice is heard very soon, I'll close this patch as RwF. If others want to see this feature in PostgreSQL, they are welcome to contribute. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] WAL logging problem in 9.4.3?
I have updated this patch's status to "needs review", since v20 has not received any comments yet. Noah, you're listed as committer for this patch. Are you still on the hook for getting it done during the v13 timeframe? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: REL_12_STABLE crashing with assertion failure in ExtractReplicaIdentity
Andres Freund writes: > I agree that it ought to be more efficent - but also about as equally > safe? I.e. if the previous code wasn't safe, the new code wouldn't be > safe either? As in, we're "just" avoiding the assert, but not increasing > safety? Well, the point is that the old code risks performing a relcache load without holding any lock on the relation. In practice, given that we do hold a lock on the parent table, it's probably safe ... but it's at best bad practice. It's not too hard to imagine future optimizations that would allow this to result in a corrupt relcache entry. I don't believe that there's any equivalent risk in the modified code. regards, tom lane
Re: REL_12_STABLE crashing with assertion failure in ExtractReplicaIdentity
Hi, On 2019-09-01 16:50:00 -0400, Tom Lane wrote: > As far as 4) goes, I think the code in ExtractReplicaIdentity is pretty > duff anyway, because it doesn't bother to check for the defined failure > return for RelationIdGetRelation. But if we're concerned about the > cost of recalculating this stuff per-row, couldn't we cache it a little > better? It should be safe to assume the set of index columns isn't > changing intra-query. I agree that it ought to be more efficent - but also about as equally safe? I.e. if the previous code wasn't safe, the new code wouldn't be safe either? As in, we're "just" avoiding the assert, but not increasing safety? > ... in fact, isn't all the infrastructure for that present already? > Why is this code looking directly at the index at all, rather than > using the relcache's rd_idattr bitmap? No idea, that's too long ago :( > I'm a bit suspicious of the exclusion for idattrs being empty, but > if I remove that, some of the contrib/test_decoding test results > change. Anybody want to comment on that? If that's actually an > expected situation, why is there an elog(DEBUG) in that path? I think Amit's explanation here is probably accurate. Greetings, Andres Freund
Re: Patch to add hook to copydir()
Thank you both for the feedback. To give some background on what I am trying to do. Mostly experimental work right now. I've built an extension that takes advantage of the copy-on-write properties of ZFS and BTRFS. A tool transforms a data directory into a set of datasets/sub volumes. When a new database is created and the directories are copied, the extension uses the filesystem's ability to create a snapshot/clone of the source dataset/sub volume. As createdb() iterates over the template database's tablespaces and copies the directories to the new one, the copydir() gets invoked (from dbcommands.c). My extension hooks copydir() and instead snapshots the source dir and clones it into the target dir (respecting the recurse flag). If the source dir is not a ZFS data set or BTRFS sub volume, the standard implementation gets invoked and everything is business as usual. As described in the first paragraph, I've built a small tool that transforms all base/* directories in the data directory into ZFS datasets or BTRFS sub volumes. New tablespaces that get created later would have to be ZFS datasets or BTRFS sub volumes as well in order for the extension to work. As explained above, if they are not, the extension will just copy the files. I also hook ProcessUtility in order to clean up the ZFS datasets and BTRFS subvolumes _after_ a database gets deleted. Postgres just handles the deletion of the files and the extension cleans up any dangling ZFS/BTRFS resources. Is there anything that I am missing? My early experiments have been very promising but my experience with Postgres internals is limited. Any help or feedback would be much appreciated. On Mon, Sep 2, 2019 at 11:06 PM Alvaro Herrera wrote: > > On 2019-Sep-02, Peter Eisentraut wrote: > > > On 2019-09-02 20:54, Swen Kooij wrote: > > > I've been working on an extension that tightly integrates > > > postgres with underlying filesystem . I need to customize > > > how postgres copies directories for new databases. > > > > Could you share some more details, so we can assess whether that is a > > sensible way to go about it, and what other hooks might be needed? > > It seems either terribly high-level, or terribly low-level, depending on > how you look at it. I wonder to what extent it conflicts with the table > AM work, and with the idea of allowing FDWs to have real relfilenodes. > I wonder if this is just one missing piece that's needed to complete > something of a layer for which other pieces are already satisfied by > other hooks. > > As is and pending further explanation, it seems a bad idea to me. If we > want pluggability here, maybe we need some new abstraction layer. > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Patch to add hook to copydir()
On 2019-Sep-02, Peter Eisentraut wrote: > On 2019-09-02 20:54, Swen Kooij wrote: > > I've been working on an extension that tightly integrates > > postgres with underlying filesystem . I need to customize > > how postgres copies directories for new databases. > > Could you share some more details, so we can assess whether that is a > sensible way to go about it, and what other hooks might be needed? It seems either terribly high-level, or terribly low-level, depending on how you look at it. I wonder to what extent it conflicts with the table AM work, and with the idea of allowing FDWs to have real relfilenodes. I wonder if this is just one missing piece that's needed to complete something of a layer for which other pieces are already satisfied by other hooks. As is and pending further explanation, it seems a bad idea to me. If we want pluggability here, maybe we need some new abstraction layer. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PG 12 draft release notes
On 5/12/19 11:42 PM, Bruce Momjian wrote: > On Sun, May 12, 2019 at 10:49:07AM -0400, Jonathan Katz wrote: >> Hi Bruce, >> >> On 5/11/19 4:33 PM, Bruce Momjian wrote: >>> I have posted a draft copy of the PG 12 release notes here: >>> >>> http://momjian.us/pgsql_docs/release-12.html >>> >>> They are committed to git. It includes links to the main docs, where >>> appropriate. Our official developer docs will rebuild in a few hours. >> >> Thank you for working on this, I know it's a gargantuan task. >> >> I have a small modification for a section entitled "Source Code" which >> is a repeat of the previous section. Based on the bullet points in that >> part, I thought "Documentation" might be a more appropriate name; please >> see attached. > > Yes, I saw that myself and just updated it. Thanks. Great! There is a placeholder for PG_COLORS: This is enabled with by setting environment variable PG_COLORS. EXAMPLE? I've attached the following that provides an example of how to use this environmental variable. Thanks! Jonathan diff --git a/doc/src/sgml/release-12.sgml b/doc/src/sgml/release-12.sgml index 8b668ade0f..e05aacaf3c 100644 --- a/doc/src/sgml/release-12.sgml +++ b/doc/src/sgml/release-12.sgml @@ -2479,7 +2479,9 @@ Author: Peter Eisentraut This is enabled with by setting environment variable - PG_COLORS. EXAMPLE? + PG_COLORS using ANSI escape codes for colors. For example, + the default setting is + PG_COLORS="error=01;31:warning=01;35:locus=01" signature.asc Description: OpenPGP digital signature
Re: Proposal for Signal Detection Refactoring
On 2019-Mar-06, Chris Travers wrote: > Here's a new patch. No rush on it. I am moving it to next commitfest > anyway because as code documentation I think this is a low priority late in > the release cycle. Chris, are you submitting an updated patch? There's some unaddressed feedback from Tom, which looks like it wouldn't take much time to address. There are also some comments from Craig, but they're not directly related as he's suggesting to fix code that does not follow what you're documenting; so while it'd be a good idea to fix it, it doesn't get in the way of your patch. I won't stop you if you want to post patches for that, though. Either way, please keep things moving. Thanks, -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Patch to add hook to copydir()
I just realized I completely borked the patch file. My apologies. Attached a (hopefully) correct patch file. --- Swen Kooij On Mon, Sep 2, 2019 at 9:54 PM Swen Kooij wrote: > > Hello all, > > I've been working on an extension that tightly integrates > postgres with underlying filesystem . I need to customize > how postgres copies directories for new databases. > > I first looked at the ProcessUtility_hook. This would require > me to copy or rewrite most of the createdb() function. This is > less than ideal of course. Someone on the IRC channel > suggested I could add a hook for copydir(). > > I implemented the hook similar to how the > ProcessUtility_hook is implemented. I couldn't find any tests > for any of the existing hooks. I've been looking at the regression > tests, but I am not entirely sure how to proceed on that front. > > I tested my patch extensively against master and > the REL_12_STABLE branch. All tests pass and the patch has > been working great with my extension. > > I attached a first draft of the patch against master. > > --- > Swen Kooij From 7033aed34e3b35859c470a738a5efbe48f101418 Mon Sep 17 00:00:00 2001 From: Swen Kooij Date: Sun, 1 Sep 2019 22:42:04 +0300 Subject: [PATCH] Add hook for copydir() Plugins/extensions that want to customize how PostgreSQL interact with the filesystem can hook in copydir() instead of having to hook in on a higher level and potentially copy large amounts of code. One example use case is wanting to customize how directories for new databases are created. Without this hook, one would have to use ProcessUtility_hook. This would require copying the vast majority of createdb(). --- src/backend/storage/file/copydir.c | 19 +++ src/include/storage/copydir.h | 5 + 2 files changed, 24 insertions(+) diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c index ca1c9cd765..e44bebf947 100644 --- a/src/backend/storage/file/copydir.c +++ b/src/backend/storage/file/copydir.c @@ -27,6 +27,9 @@ #include "miscadmin.h" #include "pgstat.h" +/* Hook for plugins to get control in copydir() */ +copydir_hook_type copydir_hook = NULL; + /* * copydir: copy a directory * @@ -36,6 +39,22 @@ void copydir(char *fromdir, char *todir, bool recurse) { + if (copydir_hook) + (*copydir_hook) (fromdir, todir, recurse); + else + standard_copydir(fromdir, todir, recurse); +} + +/* + * copydir: copy a directory + * + * If recurse is false, subdirectories are ignored. Anything that's not + * a directory or a regular file is ignored. + */ +void +standard_copydir(char *fromdir, char *todir, bool recurse) +{ + DIR *xldir; struct dirent *xlde; char fromfile[MAXPGPATH * 2]; diff --git a/src/include/storage/copydir.h b/src/include/storage/copydir.h index 525cc6203e..8b73184177 100644 --- a/src/include/storage/copydir.h +++ b/src/include/storage/copydir.h @@ -13,7 +13,12 @@ #ifndef COPYDIR_H #define COPYDIR_H +/* Hook for plugins to get control in copydir() */ +typedef void (*copydir_hook_type) (char *fromdir, char *todir, bool recurse); +extern PGDLLIMPORT copydir_hook_type copydir_hook; + extern void copydir(char *fromdir, char *todir, bool recurse); +extern void standard_copydir(char *fromdir, char *todir, bool recurse); extern void copy_file(char *fromfile, char *tofile); #endif /* COPYDIR_H */ -- 2.20.1
Re: Patch to add hook to copydir()
On 2019-09-02 20:54, Swen Kooij wrote: > I've been working on an extension that tightly integrates > postgres with underlying filesystem . I need to customize > how postgres copies directories for new databases. Could you share some more details, so we can assess whether that is a sensible way to go about it, and what other hooks might be needed? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Patch to add hook to copydir()
Hello all, I've been working on an extension that tightly integrates postgres with underlying filesystem . I need to customize how postgres copies directories for new databases. I first looked at the ProcessUtility_hook. This would require me to copy or rewrite most of the createdb() function. This is less than ideal of course. Someone on the IRC channel suggested I could add a hook for copydir(). I implemented the hook similar to how the ProcessUtility_hook is implemented. I couldn't find any tests for any of the existing hooks. I've been looking at the regression tests, but I am not entirely sure how to proceed on that front. I tested my patch extensively against master and the REL_12_STABLE branch. All tests pass and the patch has been working great with my extension. I attached a first draft of the patch against master. --- Swen Kooij [1mdiff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c[m [1mindex ca1c9cd765..e44bebf947 100644[m [1m--- a/src/backend/storage/file/copydir.c[m [1m+++ b/src/backend/storage/file/copydir.c[m [36m@@ -27,6 +27,9 @@[m #include "miscadmin.h"[m #include "pgstat.h"[m [m [32m+[m[32m/* Hook for plugins to get control in copydir() */[m [32m+[m[32mcopydir_hook_type copydir_hook = NULL;[m [32m+[m /*[m * copydir: copy a directory[m *[m [36m@@ -36,6 +39,22 @@[m void[m copydir(char *fromdir, char *todir, bool recurse)[m {[m [32m+[m [32mif (copydir_hook)[m [32m+[m [32m(*copydir_hook) (fromdir, todir, recurse);[m [32m+[m [32melse[m [32m+[m [32mstandard_copydir(fromdir, todir, recurse);[m [32m+[m[32m}[m [32m+[m [32m+[m[32m/*[m [32m+[m[32m * copydir: copy a directory[m [32m+[m[32m *[m [32m+[m[32m * If recurse is false, subdirectories are ignored. Anything that's not[m [32m+[m[32m * a directory or a regular file is ignored.[m [32m+[m[32m */[m [32m+[m[32mvoid[m [32m+[m[32mstandard_copydir(char *fromdir, char *todir, bool recurse)[m [32m+[m[32m{[m [32m+[m DIR *xldir;[m struct dirent *xlde;[m char fromfile[MAXPGPATH * 2];[m [1mdiff --git a/src/include/storage/copydir.h b/src/include/storage/copydir.h[m [1mindex 525cc6203e..8b73184177 100644[m [1m--- a/src/include/storage/copydir.h[m [1m+++ b/src/include/storage/copydir.h[m [36m@@ -13,7 +13,12 @@[m #ifndef COPYDIR_H[m #define COPYDIR_H[m [m [32m+[m[32m/* Hook for plugins to get control in copydir() */[m [32m+[m[32mtypedef void (*copydir_hook_type) (char *fromdir, char *todir, bool recurse);[m [32m+[m[32mextern PGDLLIMPORT copydir_hook_type copydir_hook;[m [32m+[m extern void copydir(char *fromdir, char *todir, bool recurse);[m [32m+[m[32mextern void standard_copydir(char *fromdir, char *todir, bool recurse);[m extern void copy_file(char *fromfile, char *tofile);[m [m #endif /* COPYDIR_H */[m
Re: Patch to document base64 encoding
Hi Alvaro, On Mon, 2 Sep 2019 13:56:28 -0400 Alvaro Herrera wrote: > Are you submitting an updated version soon? I don't expect to be able to make a new patch for at least another week. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Re: Patch to document base64 encoding
On 2019-Aug-02, Karl O. Pinc wrote: > On Fri, 02 Aug 2019 10:44:43 -0400 > Tom Lane wrote: > > > I don't really have a problem with > > repeating the entries for other functions that exist in both > > text and bytea variants, either. > > Ok. Thanks. I'll repeat entries then. Hello Karl, Are you submitting an updated version soon? Tom, you're still listed as committer for this patch. Just a heads up. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_upgrade version checking questions
On 2019-Jul-30, Daniel Gustafsson wrote: > I’ll take a stab at tidying all of this up to require less duplication, we’ll > see where that ends up. Hello Daniel, are you submitting a new version soon? Thanks, -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: REL_12_STABLE crashing with assertion failure in ExtractReplicaIdentity
Amit Langote writes: > On Mon, Sep 2, 2019 at 6:31 AM Tom Lane wrote: >> Here's a proposed patch along those lines. It fixes Hadi's original >> crash case and passes check-world. > Agree that this patch would be a better solution for Hadi's report, > although I also agree that the situation with index locking for DELETE > isn't perfect. Thanks for checking! >> I'm a bit suspicious of the exclusion for idattrs being empty, but >> if I remove that, some of the contrib/test_decoding test results >> change. Anybody want to comment on that? If that's actually an >> expected situation, why is there an elog(DEBUG) in that path? > ISTM that the exclusion case may occur with the table's replica > identity being REPLICA_IDENTITY_DEFAULT and there being no primary > index defined, in which case nothing needs to get logged. Looking more closely, the case is unreachable in the heap_update path because key_changed will necessarily be false if the idattrs set is empty. But it is reachable in heap_delete because that just passes key_changed = constant true, whether or not there's any defined replica identity. In view of that, I think we should just remove the elog(DEBUG) ... and maybe add a comment explaining this. regards, tom lane
Re: Speed up transaction completion faster after many relations are accessed in a transaction
On 2019-Aug-14, David Rowley wrote: > For now, I'm out of ideas. If anyone else feels like suggesting > something of picking this up, feel free. Hmm ... is this patch rejected, or is somebody still trying to get it to committable state? David, you're listed as committer. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add test case for sslinfo
Hao Wu, Are you submitting an updated version of this patch soon? Thanks, -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PG 12 draft release notes
On 9/2/19 1:37 PM, Andrey Borodin wrote: > > >> 2 сент. 2019 г., в 22:02, Jonathan S. Katz написал(а): >> >> >> Attached is a patch proposing items for the major items section. This is >> working off of the ongoing draft of the press release[1]. Feedback >> welcome. With respect to the linking, I tried I to give a bunch of >> jumping off points for users to explore the features, but visually tried >> to ensure the readability was consistent. > > + > + Reduction of WAL overhead of > + GiST, GIN, > and > + SP-GiST indexes and added support > + for covering indexes via the > + INCLUDE clause for > + SP-GiST indexes > + > > Maybe I'm missing something, but covering indexes are supported in GiST, not > in SP-GiST. Nope, you're absolutely correct: that was a typo as a result of copying/pasting formatting. Attached is a revision that correctly specifies covering indexes for GiST. Thanks! Jonathan diff --git a/doc/src/sgml/release-12.sgml b/doc/src/sgml/release-12.sgml index 8b668ade0f..62ab9fb21f 100644 --- a/doc/src/sgml/release-12.sgml +++ b/doc/src/sgml/release-12.sgml @@ -21,7 +21,125 @@ - big item + + General performance improvements, including: + + + + Optimizations to space utilization and read/write performance for + B-tree indexes that are frequently modified + + + + + Partitioning enhancements that include query performance improvements + on tables with thousands of partitions, improved performance with + and , and the + ability to execute + ATTACH PARTITION without blocking queries + + + + + Common table expressions + (CTE) can now be automatically inlined + + + + + Reduction of WAL overhead of + GiST, GIN, and + SP-GiST indexes and added support + for covering indexes via the + INCLUDE clause for + GiST indexes + + + + + now supports most-common value + (MCV) statistics to support improved query plans for columns that are + nonuniformly distributed + + + + + + + + + Enhancements to administrative functionality, including: + + + + Rebuild an index without blocking writes to a table with the + + REINDEX CONCURRENTLY command + + + + + can enable/disable page checksums, + used for detecting data corruption, in an offline cluster + + + Progress reporting statistics for , + , , + VACUUM FULL, and + + + + + + + + + + Support for the SQL/JSON + path language + + + + + + Stored generated columns + + + + + + Nondeterministic ICU + collations that enable + case-insensitive and accent-insensitive ordering + + + + + + New authentication features, including: + + + + Client and server-side encryption for authentication over the + GSSAPI + + + + + Discovery of LDAP servers if PostgreSQL is + built with OpenLDAP + + + + + Support multifactor authentication with the + clientcert=verify-full option when combined + with an additional authentication method in + pg_hba.conf + + + + signature.asc Description: OpenPGP digital signature
Re: [PATCH] kNN for btree
On 2019-Jul-12, Nikita Glukhov wrote: > Attached 13th version of the patches. Hello Nikita, Can you please rebase this again? Thanks, -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_waldump and PREPARE
On 2019-Aug-01, Michael Paquier wrote: > I may be missing something of course, but in this case we argued about > adding a couple of more fields. In consequence, the patch should be > waiting on its author, no? Fujii, Are you in a position to submit an updated version of this patch? Maybe Vignesh is in a position to help to complete this, since he has been eyeing this code lately. Vignesh? Thanks, -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Contribution to Perldoc for TestLib module in Postgres
On 2019-Jul-30, Michael Paquier wrote: > I think that the SYNOPSIS could be shaped better. As of now it is a > simple succession of the same commands listed without any link to each > other, which is contrary for example to what we do in PostgresNode.pm, > where it shows up a set of small examples which are useful to > understand how to shape the tests and the possible interactions > between the routines of the module. My take would be to keep it > simple and minimal as TestLib.pm is the lowest level of our TAP test > infrastructure. Agreed ... that's pretty much the same thing I tried to say upthread. I wrote my own synopsis, partly using your suggestions. I reworded the description for the routines (I don't think there's any I didn't touch), added a mention of $windows_os, added a =head1 to split out the ad-hoc routines from the Test::More wrappers. And pushed. Please give it another look. It might need more polish. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PG 12 draft release notes
> 2 сент. 2019 г., в 22:02, Jonathan S. Katz написал(а): > > > Attached is a patch proposing items for the major items section. This is > working off of the ongoing draft of the press release[1]. Feedback > welcome. With respect to the linking, I tried I to give a bunch of > jumping off points for users to explore the features, but visually tried > to ensure the readability was consistent. + + Reduction of WAL overhead of + GiST, GIN, and + SP-GiST indexes and added support + for covering indexes via the + INCLUDE clause for + SP-GiST indexes + Maybe I'm missing something, but covering indexes are supported in GiST, not in SP-GiST. Best regards, Andrey Borodin.
Re: XPRS
On Mon, Sep 02, 2019 at 02:19:15PM +1200, Thomas Munro wrote: On Sat, Aug 24, 2019 at 3:19 AM Tomas Vondra wrote: > Although “The Wei Hong Optimizer” was designed in the context of > Postgres, it became the standard approach for many of the parallel > query optimizers in industry." I assume this quote is from 30 years ago. I wonder if the claim is still true, on current hardware (including e.g. distributed databases). The quote is from 2018, and appears in the article I linked (it's a chapter from the book Making Databases Work: The Wisdom of Michael Stonebraker), but I'm not sure which systems it's referring to. Hmm, that's unfortunate - it'd be quite interesting to know which databases it's referring to. I suspect no optimizer is ideal in this regard, i.e. each database has some "gaps" where some nodes don't have a straightforward parallel version. Speculation: Many other systems have what we call parallel-oblivious operators only, and then insert various exchange operators to make a parallel plan. That is, they don't necessarily have a direct equivalent of our "Parallel Sequential Scan", "Parallel Index Scan", "Parallel Foreign Scan": they just use their regular scans, possibly with the addition of some kind of "Parallel Scatter" node (that's my made up name, it's the opposite of Gather, called various things like "page supplier" or "block iterator") or "Parallel Repartition" inserted in the right places. Perhaps they create a serial plan first, and then try to parallelise it by inserting various parallel operators and then recomputing the costs? Rather than considering the separate paths in the first phase of the optimiser, as we do. The cases where Hong's best-parallel-plan hypothesis isn't true for us now might go away if we had Parallel Repartition, so that each 'stream' would be the complete set of tuples for some known partition. I don't know. It kinda reminds me planning with distributed databases, which also need exchange data between nodes in various cases - say, when joining two relations distributed in different ways. The redistribution is however pretty costly (network I/O, bandwidth etc.) to the extent that it's often much better to pick a very different join to reduce the amount of data to exchange, or eliminate the redistribution altogether. For parallelism the costs are much lower, of course, but I don't think we can just ignore those. FWIW it's not clear to me why the cost would need to be recomputed after constructing the parallel version of the plan? My understanding is that the idea is to do cost-based planning for the serial plan, and then just "mechanically" construct a parallel plan. Although, maybe there could be multiple parallel alternatives ... To be clear, I'm not suggesting we do that necessarily, just pointing out some interesting claims about ancient POSTGRES wisdom, in a highly speculative water cooler thread. Actually, this isn't the first time it's occurred to me that elements of our design were falling out of the order that we chose to implement things in. Another example is the "Shared Hash" that I had in an early version of my work on Parallel Hash Join, where just one process would run a parallel-safe but non-parallel-oblivious plan to build a shared hash table while other workers twiddled their thumbs; I dropped it because our cost model has no penalty for running N copies of the same plan rather than just one so there was no way to pick that plan, and that's because we don't have a cost model like Hong's that considers resource usage too. Another more speculative observation: maybe no-partition/shared Parallel Hash Join is only obvious if you already have the general concept of parallel-aware executor nodes. AFAIK Robert and Amit invented those to be able to coordinate parallel scans between processes, where thread-based systems might be able to share a single scan state somehow under a Scatter-like operator. If you had threads, you might not need that concept that allows arbitrary executor nodes to communicate with each other between workers, and then it might be more obvious and natural to use repartitioning for parallelising hash joins. FWIW I think we'll have to do something about resource acquisition, sooner or later. It was always quite annoying that we don't really consider memory consumption of the query as a whole during planning, and parallel query made it a bit more painful. Agreed. Here's an approach I have been wondering about to cap total executor memory usage, which is a much more down-to-Earth idea than any of the above space cadet planner problems. Let's start at the other end of the problem, by introducing admission control and memory quotas. That is, keep using work_mem with its current per-node-copy meaning at planning time, for now, and then: 1. Compute the peak amount of memory each plan thinks it will need. Initially that could be done by by summing estimates from all nodes and considering workers. A lat
Re: PG 12 draft release notes
On 5/11/19 4:33 PM, Bruce Momjian wrote: > I have posted a draft copy of the PG 12 release notes here: > > http://momjian.us/pgsql_docs/release-12.html > > They are committed to git. It includes links to the main docs, where > appropriate. Our official developer docs will rebuild in a few hours. > Thanks again for compiling and writing up the release notes. I know it is no small effort. Attached is a patch proposing items for the major items section. This is working off of the ongoing draft of the press release[1]. Feedback welcome. With respect to the linking, I tried I to give a bunch of jumping off points for users to explore the features, but visually tried to ensure the readability was consistent. I also attached a patch addressing the "MENTION ITS AFFECT ON ORDERING?" in which I choose to answer it "yes" and added a comment :) Thanks, Jonathan [1] https://www.postgresql.org/message-id/c56eeb88-4a8c-2c6c-b5f1-9d46872c247c%40postgresql.org diff --git a/doc/src/sgml/release-12.sgml b/doc/src/sgml/release-12.sgml index 8b668ade0f..e5929f887c 100644 --- a/doc/src/sgml/release-12.sgml +++ b/doc/src/sgml/release-12.sgml @@ -21,7 +21,125 @@ - big item + + General performance improvements, including: + + + + Optimizations to space utilization and read/write performance for + B-tree indexes that are frequently modified + + + + + Partitioning enhancements that include query performance improvements + on tables with thousands of partitions, improved performance with + and , and the + ability to execute + ATTACH PARTITION without blocking queries + + + + + Common table expressions + (CTE) can now be automatically inlined + + + + + Reduction of WAL overhead of + GiST, GIN, and + SP-GiST indexes and added support + for covering indexes via the + INCLUDE clause for + SP-GiST indexes + + + + + now supports most-common value + (MCV) statistics to support improved query plans for columns that are + nonuniformly distributed + + + + + + + + + Enhancements to administrative functionality, including: + + + + Rebuild an index without blocking writes to a table with the + + REINDEX CONCURRENTLY command + + + + + can enable/disable page checksums, + used for detecting data corruption, in an offline cluster + + + Progress reporting statistics for , + , , + VACUUM FULL, and + + + + + + + + + + Support for the SQL/JSON + path language + + + + + + Stored generated columns + + + + + + Nondeterministic ICU + collations that enable + case-insensitive and accent-insensitive ordering + + + + + + New authentication features, including: + + + + Client and server-side encryption for authentication over the + GSSAPI + + + + + Discovery of LDAP servers if PostgreSQL is + built with OpenLDAP + + + + + Support multifactor authentication with the + clientcert=verify-full option when combined + with an additional authentication method in + pg_hba.conf + + + + diff --git a/doc/src/sgml/release-12.sgml b/doc/src/sgml/release-12.sgml index 8b668ade0f..20ca5eb569 100644 --- a/doc/src/sgml/release-12.sgml +++ b/doc/src/sgml/release-12.sgml @@ -1941,9 +1941,9 @@ Author: Peter Eisentraut - This allows case and accent-agnostic equality comparisons using - "nondeterministic" collations. This is only supported for - ICU collations. MENTION ITS AFFECT ON ORDERING? + This allows case and accent-agnostic equality comparisons, such as during + an ordering operation, using "nondeterministic" collations. This is only + supported for ICU collations. signature.asc Description: OpenPGP digital signature
Re: safe to overload objectSubId for a type?
On 09/02/19 11:41, Tom Lane wrote: > Hm, apparently we already do handle that in some way, because > this works: > ... > Nonetheless, I'd be pretty hesitant to allow somebody to use > objsubid with some entirely different meaning for types. As long as it stays an internal detail of a caching scheme used by me and will be masked back to 0 before anything escapes to elsewhere, so it really only exploits the current fact that objSubId for types carries no information at all ... and no one seems to be chiming in with a pressing case for changing that ... I think I can get away with the simple scheme, admitting the possibility of having to revisit it down the road. Regards, -Chap
Re: [PATCH] Implement uuid_version()
On 2019-Jul-13, Jose Luis Tallon wrote: > Considering the later arguments on-list, I plan on submitting a more > elaborate patchset integrating the feedback received so far, and along the > following lines: > > - uuid type, v4 generation and uuid_version() in core > > - Provide a means to rename/supercede extensions keeping backwards > compatibility (i.e. uuid-ossp -> uuid, keep old code working) It is wholly unclear what this commitfest entry is all about; in the thread there's a mixture about a new uuid_version(), some new v4 stuff migrating from pgcrypto (which apparently was done), plus some kind of mechanism to allow upgrading extension names; all this stemming from feedback from the patch submitted in April. But there hasn't been a new patch in a long time, and there won't be a new patch version during the current commitfest. Therefore, I'm closing this entry as Returned with Feedback. The author(s) can create a new entry in a future commitfest once they have a new patch. I do suggest to keep such a patch restricted in scope. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Proposal: roll pg_stat_statements into core
Euler Taveira writes: > At least if pg_stat_statements > was in core you could load it by default and have a GUC to turn it > on/off without restarting the server (that was Magnus proposal and > Andres agreed). That assertion is 100% bogus. To turn it on or off on-the-fly, you'd need some way to acquire or release its shared memory on-the-fly. It's probably now possible to do something like that, using the DSM mechanisms, but the code for it hasn't been written (AFAIK). And it wouldn't have much to do with whether the module was in core or stayed where it is. Another concern that I have about moving pg_stat_statements into core is that doing so would effectively nail down Query.queryId as belonging to pg_stat_statements, whereas currently it's possible for other plugins to commandeer that if they wish. This isn't academic, because of the fact that not everybody is satisfied with the way pg_stat_statements defines queryId [1]. regards, tom lane [1] https://www.postgresql.org/message-id/flat/1553029215728-0.post%40n3.nabble.com
Re: safe to overload objectSubId for a type?
Chapman Flack writes: > On 09/02/19 00:29, Tom Lane wrote: >> If we ever do make ObjectAddress.objectSubId mean something for types, >> I'd expect --- based on the precedent of relation columns --- that it'd >> specify a column number for a column of a composite type. There are >> fairly obvious use-cases for that, such as allowing a DROP of a column >> type to not propagate to the whole composite type. > Could you give an illustration to make sure I'm following? Hm, apparently we already do handle that in some way, because this works: regression=# create type myenum as enum ('red', 'black'); CREATE TYPE regression=# create type mycomposite as (f1 int, f2 myenum, f3 text); CREATE TYPE regression=# drop type myenum; ERROR: cannot drop type myenum because other objects depend on it DETAIL: column f2 of composite type mycomposite depends on type myenum HINT: Use DROP ... CASCADE to drop the dependent objects too. regression=# drop type myenum cascade; NOTICE: drop cascades to column f2 of composite type mycomposite DROP TYPE regression=# \d mycomposite Composite type "public.mycomposite" Column | Type | Collation | Nullable | Default +-+---+--+- f1 | integer | | | f3 | text| | | I'm too lazy to go check, but I suspect that the representation of this situation in pg_depend makes use of a dependency for a column of the relation associated with the mycomposite type, so that the entry having nonzero objsubid has classid pg_class not classid pg_type. Nonetheless, I'd be pretty hesitant to allow somebody to use objsubid with some entirely different meaning for types. That seems certain to lead to confusion and bugs. regards, tom lane
Re: Problem while updating a foreign table pointing to a partitioned table on foreign server
On Wed, Aug 14, 2019 at 11:51 AM Etsuro Fujita wrote: > This is my TODO item for PG13, but I'll give priority to other things > in the next commitfest. If anyone wants to work on it, feel free; > else I'll move this to the November commitfest when it opens. Moved. Best regards, Etsuro Fujita
Re: add a MAC check for TRUNCATE
Hello Yuli, 2019年7月25日(木) 3:52 Yuli Khodorkovskiy : > Since all DAC checks should have corresponding MAC, this patch adds a > hook to allow extensions to implement a MAC check on TRUNCATE. I have > also implemented this access check in the sepgsql extension. > > One important thing to note is that refpolicy [1] and Redhat based > distributions do not have the SELinux permission for db_table {truncate} > implemented. > How db_table:{delete} permission is different from truncate? >From the standpoint of data access, TRUNCATE is equivalent to DELETE without WHERE, isn't it? Of course, there are some differences between them. TRUNCATE takes exclusive locks and eliminates underlying data blocks, on the other hands, DELETE removes rows under MVCC manner. However, both of them eventually removes data from the target table. I like to recommend to reuse "db_table:{delete}" permission for TRUNCATE. How about your opinions? Best regards, -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei
Re: pg_dump --exclude-* options documentation
Fabien COELHO writes: >> Should we change the documentation of the old pg_dump options to also >> take a "pattern", or change the documentation of the new pg_dumpall >> option to read "database"? > My 0.02€: we should use the more general and precise, i.e. pattern. +1 ... it doesn't seem to me that "--exclude-schema=schema" conveys anything useful. regards, tom lane
Re: safe to overload objectSubId for a type?
On 09/02/19 00:29, Tom Lane wrote: > If this is totally independent of ObjectAddress, why are you even > asking? I assume that what you mean is you'd like these values to > bleed into ObjectAddresses or vice versa. Only that I'm working on a data structure of my own to cache my own representations for objects, which I can find by a simple hash lookup on a key of three ints, given an ObjectAddress. But I'd like to have distinct instances representing distinct typmods of a type, and if I had a method to look up a type instance given an ObjectAddress and typmod, and that method could simply use the typmod (or a trivially transformed version, -1 <-> 0) as the third int of the key, then my single hash structure and one lookup serves for everything. Otherwise, I essentially need a second whole layer of the same machinery to track typmod-variants of types (and forget them when unreferenced, invalidated, yada yada), or variable-length cache keys so type keys are four ints instead of three, or four-int keys for everything, just because types. Any of those alternatives would just be a SMOP if truly necessary, but uglier and more complex. > If we ever do make ObjectAddress.objectSubId mean something for types, > I'd expect --- based on the precedent of relation columns --- that it'd > specify a column number for a column of a composite type. There are > fairly obvious use-cases for that, such as allowing a DROP of a column > type to not propagate to the whole composite type. Could you give an illustration to make sure I'm following? I tried to create my own example with: CREATE TYPE foo AS (bar int, baz int); CREATE TABLE quux (a foo); INSERT INTO quux (a.bar, a.baz) VALUES (12, 34); SELECT * FROM quux; a - (12,34) ALTER TYPE foo DROP ATTRIBUTE baz; SELECT * FROM quux; a -- (12) but I guess that must not exercise the case you had in mind. I could say my main reason for asking was to judge the imminence of any change. If there's a vague conceivable possibility of objectSubId meaning something for types in the indefinite future, I might do my simple implementation now, and some uglier SMOP alternative later when/if needed. If somebody has such a change in mind for the near term, that'd be different. Regards, -Chap
Re: Commit fest 2019-09
On 2019-Sep-01, Michael Paquier wrote: > I am not sure if somebody would like to volunteer, but I propose > myself as commit fest manager for the next session. Feel free to > reply to this thread if you feel that you could help out as manager > for this time. Hello, Thanks Michael for handling these tasks over the weekend. I had offered myself as CFM for this one when the cycle started, so I'm happy to do it. At this time, the commitfest starts with: status │ count ┼─── Needs review │ 165 Waiting on Author │30 Ready for Committer│11 Returned with Feedback │ 1 Moved to next CF │ 2 Committed │14 Rejected │ 1 Withdrawn │ 4 There's an embarrasingly large number of patches in the "Bug Fixes" section. I encourage reviewers to look at those that "Need Review" so that they can set "Ready for Committer" and get a committer to finish them up: status│ count ─┼─── Committed │ 2 Needs review│20 Ready for Committer │ 2 Waiting on Author │ 4 Withdrawn │ 1 Thanks, -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Proposal: roll pg_stat_statements into core
Em seg, 2 de set de 2019 às 05:11, Adrien Nayrat escreveu: > > On 9/1/19 8:54 PM, Tom Lane wrote: > >> - The overhead for most use cases is low compared to the benefit. > > Please do not expect that we're going to accept such assertions > > unsupported by evidence. (As a very quick-n-dirty test, I tried > > "pgbench -S" and got somewhere around 4% TPS degradation with > > pg_stat_statements loaded. That doesn't seem negligible.) > > AFAIR Andres pointed overhead could be much more when you have more queries > than > pg_stat_statements.max [1]. Eviction can be costly. > pg_stat_statements is important in a lot of query analysis. If you make a comparison between pg_stat_statements and pgbadger, you can capture queries for the latter changing log_min_duration_statement or log_statement without restart the server, however, the former you can't without restarting the server. At least if pg_stat_statements was in core you could load it by default and have a GUC to turn it on/off without restarting the server (that was Magnus proposal and Andres agreed). I support this idea. In v10 we changed some GUCs to perform replication out-of-box; we should do the same for query analysis. Regards, -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Re: Index Skip Scan
> On Wed, Aug 28, 2019 at 9:32 PM Floris Van Nee > wrote: > > I'm afraid I did manage to find another incorrect query result though Yes, it's an example of what I was mentioning before, that the current modified implementation of `_bt_readpage` wouldn't work well in case of going between pages. So far it seems that the only problem we can have is when previous and next items located on a different pages. I've checked how this issue can be avoided, I hope I will post a new version relatively soon.
Re: SegFault on 9.6.14
On Fri, Aug 9, 2019 at 6:29 PM Robert Haas wrote: > > On Wed, Aug 7, 2019 at 5:45 AM vignesh C wrote: > > I have made a patch based on the above lines. > > I have tested the scenarios which Thomas had shared in the earlier > > mail and few more tests based on Thomas's tests. > > I'm not sure if we will be going ahead with this solution or not. > > Let me know your opinion on the same. > > If you feel this approach is ok, we can add few of this tests into pg tests. > > I think this patch is bizarre: > > - It introduces a new function called getParallelModeLevel(), which is > randomly capitalized different from the other functions that do > similar things, and then uses it to do the same thing that could have > been done with the existing function IsInParallelMode(). > - It contains an "if" statement whose only content is an Assert(). > Don't write if (a) Assert(b); write Assert(!a || b). > - It contains zero lines of comment changes, which is obviously not > enough for a patch that proposes to fix a very thorny issue. This > failure has two parts. First, it adds no new comments to explain the > bug being fixed or the theory of operation of the new code. Second, it > does not even bother updating existing comments that are falsified by > the patch, such as the function header comments for > ExecParallelCleanup and ExecShutdownGather. > - It changes what ExecParallelCleanup does while adjusting only one of > the two callers to match the behavior change. nodeGatherMerge.c > manages to be completed untouched by this patch. If you change what a > function does, you really need to grep for all the calls to that > function and adjust all callers to match the new set of expectations. > > It's a little hard to get past all of those issues and look at what > the patch actually does, but I'm going to try: the theory of operation > of the patch seems to be that we can skip destroying the parallel > context when performing ExecParallelCleanup and in fact when exiting > parallel mode, and then when we get to executor end time the context > will still be there and we can fish the instrumentation out of it. But > this seems problematic for several reasons. For one thing, as Amit > already observed, the code currently contains an assertion which > ensure that a ParallelContext can't outlive the time spent in parallel > mode, and it doesn't seem desirable to relax that assertion (this > patch removes it). > > But beyond that, the issue here is that the Limit node is shutting > down the Gather node too early, and the right fix must be to stop > doing that, not to change the definition of what it means to shut down > a node, as this patch does. So maybe a possible approach here - which > I think is more or less what Tom is proposing - is: > > 1. Remove the code from ExecLimit() that calls ExecShutdownNode(). > Attached patch does that. I have also added one test as a separate patch so that later if we introduce shutting down resources in Limit node, we don't break anything. As of now, I have kept it separate for easy verification, but if we decide to go with this approach and test appears fine, we can merge it along with the fix. > 2. Adjust ExecutePlan() so that it ensures that ExecuteShutdownNode() > gets called at the very end of execution, at least when execute_once > is set, before exiting parallel mode. > I am not sure if I completely understand this point. AFAICS, the ExecuteShutdownNode is called whenever we are done getting the tuples. One place where it is not there in that function is when we assume destination is closed, basically below code: ExecutePlan() { .. if (!dest->receiveSlot(slot, dest)) break; .. } Do you expect this case to be also dealt or you have something else in mind? The other possibility could be that we move the shutdown of the node at the end of the function when we exit parallel mode but doing that lead to some regression failure on my machine. I will investigate the same. > 3. Figure out, possibly at a later time or only in HEAD, how to make > the early call to ExecLimit() in ExecShutdownNode(), and then put it > back. Okay, Tom has suggested a design to address this, but that will be for HEAD only. To be clear, I am not planning to spend time on that at this moment, but OTOH, I want the bug reported in this thread to be closed, so for now, we need to proceed with some minimum fix as mentioned by you in above two points. If someone else can write a patch, I can help in the review of same. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com 0001-Forbid-Limit-node-to-shutdown-resources.patch Description: Binary data 0002-Test-rescans-when-parallel-node-is-beneath-Limit-nod.patch Description: Binary data
Re: [HACKERS] CLUSTER command progress monitor
On Mon, Sep 2, 2019 at 4:59 PM Michael Paquier wrote: > > On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote: > > > I tried 1. and it shown index_rebuild_count, but it also shown > > > "initializing" phase again and other columns were empty. So, it is > > > not suitable to fix the problem. :( > > > I'm going to try 2. and 3., but, unfortunately, I can't get enough > > > time to do that after PGConf.Asia 2019. > > > If we selected 3., it affects following these progress reporting > > > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay, > > > I suppose. Any comments welcome! :) > > > > I looked at this open item. I prefer #3 but I think it would be enough > > to have a small stack using a fixed length array to track nested > > progress information and the current executing command (maybe 4 or 8 > > would be enough for maximum nested level for now?). That way, we don't > > need any change the interfaces. AFAICS there is no case where we call > > only either pgstat_progress_start_command or > > pgstat_progress_end_command without each other (although > > pgstat_progress_update_param is called without start). So I think that > > having a small stack for tracking multiple reports would work. > > > > Attached the draft patch that fixes this issue. Please review it. > > Do we actually want to show to the user information about CREATE INDEX > which is different than CLUSTER? It could be confusing for the user > to see a progress report from a command different than the one > actually launched. I personally think it would be helpful for users. We provide the progress reporting for each commands but it might not be detailed enough. But we can provide more details of progress information of each commands by combining them. Only users who want to confirm the details need to see different progress reports. > There could be a method 4 here: do not start a new > command progress when there is another one already started, and do not > try to end it in the code path where it could not be started as it did > not stack. So while I see the advantages of stacking the progress > records as you do when doing cascading calls of the progress > reporting, I am not sure that: > 1) We should bloat more PgBackendStatus for that. > 2) We want to add more complication in this area close to the > release of 12. I agreed, especially to 2. We can live with #4 method in PG12 and the patch I proposed could be discussed as a new feature. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: SIGQUIT on archiver child processes maybe not such a hot idea?
At Mon, 2 Sep 2019 15:51:34 +0900, Michael Paquier wrote in <20190902065134.ge1...@paquier.xyz> > On Mon, Sep 02, 2019 at 12:27:09AM +, Tsunakawa, Takayuki wrote: > > From: Tom Lane [mailto:t...@sss.pgh.pa.us] > >> After investigation, the mechanism that's causing that is that the > >> src/test/recovery/t/010_logical_decoding_timelines.pl test shuts > >> down its replica server with a mode-immediate stop, which causes > >> that postmaster to shut down all its children with SIGQUIT, and > >> in particular that signal propagates to a "cp" command that the > >> archiver process is executing. The "cp" is unsurprisingly running > >> with default SIGQUIT handling, which per the signal man page > >> includes dumping core. > > > > We've experienced this (core dump in the data directory by an > > archive command) years ago. Related to this, the example of using > > cp in the PostgreSQL manual is misleading, because cp doesn't > > reliably persist the WAL archive file. > > The previous talks about having pg_copy are still where they were a > couple of years ago as we did not agree on which semantics it should > have. If we could move forward with that and update the documentation > from its insanity that would be great and... The signal handling is > something else we could customize in a more favorable way with the > archiver. Anyway, switching from something else than SIGQUIT to stop > the archiver will not prevent any other tools from generating core > dumps with this other signal. Since we are allowing OPs to use arbitrary command as archive_command, providing a replacement with non-standard signal handling for a specific command doesn't seem a general solution to me. Couldn't we have pg_system(a tentative name), which intercepts SIGQUIT then sends SIGINT to children? Might be need to resend SIGQUIT after some interval, though.. > > We enable the core dump in production to help the investigation just in > > case. > > So do I in some of the stuff I work on. > > > some_command also catches SIGQUIT just exit. It copies and syncs the file. > > > > I proposed something in this line as below, but I couldn't respond to > > Peter's review comments due to other tasks. Does anyone think it's worth > > resuming this? > > > > https://www.postgresql.org/message-id/7E37040CF3804EA5B018D7A022822984@maumau > > And I was looking for this thread a couple of lines ago :) > Thanks. # Is there any means to view the whole of a thread from archive? # I'm a kind of reluctant to wander among messages like a rat in # a maze:p regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout
On Mon, Sep 02, 2019 at 08:06:22AM +, r.takahash...@fujitsu.com wrote: > I'm not sure whether this patch should be applied to postgres below 11 > since I'm not sure whether the OS parameters (ex. tcp_retries2) cause the > same error. Thinking wider, don't we have the same problem with wal_sender_timeout in the case where a sync request takes longer than the time it would take the backend to terminate the connection? -- Michael signature.asc Description: PGP signature
Re: Proposal: roll pg_stat_statements into core
On 9/1/19 8:54 PM, Tom Lane wrote: >> - The overhead for most use cases is low compared to the benefit. > Please do not expect that we're going to accept such assertions > unsupported by evidence. (As a very quick-n-dirty test, I tried > "pgbench -S" and got somewhere around 4% TPS degradation with > pg_stat_statements loaded. That doesn't seem negligible.) AFAIR Andres pointed overhead could be much more when you have more queries than pg_stat_statements.max [1]. Eviction can be costly. 1: https://twitter.com/AndresFreundTec/status/1105585237772263424 signature.asc Description: OpenPGP digital signature
RE: pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout
Hi Michael-san, > Attached is a patch to do that, which should go down to v12 where > tcp_user_timeout has been introduced. Takahashi-san, what do you > think? Thank you for creating the patch. This patch is what I expected. I'm not sure whether this patch should be applied to postgres below 11 since I'm not sure whether the OS parameters (ex. tcp_retries2) cause the same error. Regards, Ryohei Takahashi
Re: [HACKERS] CLUSTER command progress monitor
On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote: > > I tried 1. and it shown index_rebuild_count, but it also shown > > "initializing" phase again and other columns were empty. So, it is > > not suitable to fix the problem. :( > > I'm going to try 2. and 3., but, unfortunately, I can't get enough > > time to do that after PGConf.Asia 2019. > > If we selected 3., it affects following these progress reporting > > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay, > > I suppose. Any comments welcome! :) > > I looked at this open item. I prefer #3 but I think it would be enough > to have a small stack using a fixed length array to track nested > progress information and the current executing command (maybe 4 or 8 > would be enough for maximum nested level for now?). That way, we don't > need any change the interfaces. AFAICS there is no case where we call > only either pgstat_progress_start_command or > pgstat_progress_end_command without each other (although > pgstat_progress_update_param is called without start). So I think that > having a small stack for tracking multiple reports would work. > > Attached the draft patch that fixes this issue. Please review it. Do we actually want to show to the user information about CREATE INDEX which is different than CLUSTER? It could be confusing for the user to see a progress report from a command different than the one actually launched. There could be a method 4 here: do not start a new command progress when there is another one already started, and do not try to end it in the code path where it could not be started as it did not stack. So while I see the advantages of stacking the progress records as you do when doing cascading calls of the progress reporting, I am not sure that: 1) We should bloat more PgBackendStatus for that. 2) We want to add more complication in this area close to the release of 12. Another solution as mentioned by Yamada-san is just to start again the progress for the cluster command in reindex_relation() however you got an issue here as reindex_relation() is also called by REINDEX TABLE. I find actually very weird that we have a cluster-related field added for REINDEX, so it seems to me that all the interactions between the code paths of CLUSTER and REINDEX have not been completely thought through. This part has been added by 6f97457 :( -- Michael signature.asc Description: PGP signature
Re: block-level incremental backup
Hi Robert, On Sat, Aug 31, 2019 at 8:29 AM Robert Haas wrote: > On Thu, Aug 29, 2019 at 10:41 AM Jeevan Ladhe > wrote: > > Due to the inherent nature of pg_basebackup, the incremental backup also > > allows taking backup in tar and compressed format. But, pg_combinebackup > > does not understand how to restore this. I think we should either make > > pg_combinebackup support restoration of tar incremental backup or > restrict > > taking the incremental backup in tar format until pg_combinebackup > > supports the restoration by making option '--lsn' and '-Ft' exclusive. > > > > It is arguable that one can take the incremental backup in tar format, > extract > > that manually and then give the resultant directory as input to the > > pg_combinebackup, but I think that kills the purpose of having > > pg_combinebackup utility. > > I don't agree. You're right that you would have to untar (and > uncompress) the backup to run pg_combinebackup, but you would also > have to do that to restore a non-incremental backup, so it doesn't > seem much different. > Thanks. Yes I agree about the similarity between restoring non-incremental and incremental backup in this case. > I don't think it's worth doing that at this point; I definitely don't > think it > needs to be part of the first patch. > Makes sense. Regards, Jeevan Ladhe
Re: Bug in GiST paring heap comparator
On 02/09/2019 07:54, Alexander Korotkov wrote: Andrey Borodin noticed me that results of some KNN-GIST tests are obviously wrong and don't match results of sort node. SELECT * FROM point_tbl ORDER BY f1 <-> '0,1'; f1 --- (10,10) (NaN,NaN) (0,0) (1e-300,-1e-300) (-3,4) (-10,0) (-5,-12) (5.1,34.5) (1e+300,Infinity) (10 rows) So we've memorized incorrect results in the expected output. Ouch! It appears to be related to implementation of comparison function in pairing heap used as priority queue for KNN. It used plain float comparison, which doesn't handle Inf and Nan values well. Attached patch replaced it with float8_cmp_internal(). Also, note that with patch KNN results still don't fully match results of sort node. SELECT * FROM point_tbl ORDER BY f1 <-> '0,1'; f1 --- (0,0) (1e-300,-1e-300) (-3,4) (-10,0) (10,10) (-5,-12) (5.1,34.5) (1e+300,Infinity) (NaN,NaN) (10 rows) NULL and '(NaN,NaN)' are swapped. It happens so, because we assume distance to NULL to be Inf, while float8_cmp_internal() assumes NaN to be greater than NULL. If even we would assume distance to NULL to be Nan, it doesn't guarantee that NULLs would be last. It looks like we can handle this only by introduction array of "distance is null" flags to GISTSearchItem. But does it worth it? I don't think we have much choice, returning wrong results is not an option. At first I thought we could set the "recheck" flag if we see NULLs or NaNs, but that won't work because rechecking is not supported with Index-Only Scans. Looking at the corresponding SP-GiST code, pairingheap_SpGistSearchItem_cmp() gets this right. I'd suggest copy-pasting the implementation from there, so that they would be as similar as possible. (SP-GiST gets away with just one isNull flag, because it doesn't support multi-column indexes. In GiST, you need an array, as you said.) - Heikki