RE: Wrong statistics for size of XLOG_SWITCH during pg_waldump.
>Thanks for review, and sorry for reply so later. > >>I reviewed the patch and found some problems. >>>+ if(startSegNo != endSegNo) >>>+ else if(record->ReadRecPtr / XLOG_BLCKSZ != >>>+ if(rmid == RM_XLOG_ID && info == XLOG_SWITCH) >>>+ if(ri == RM_XLOG_ID) >>>+ if(info == XLOG_SWITCH) >>You need to put a space after the "if". >All fix and thanks for point the issue. > >>>@@ -24,6 +24,7 @@ >>>#include "common/logging.h" >>>#include "getopt_long.h" >>>#include "rmgrdesc.h" >>>+#include "catalog/pg_control.h" >>I think the include statements should be arranged in alphabetical order. >Fix. Thank you for your revision! >>>+ info = (rj << 4) & ~XLR_INFO_MASK; >>>+ if(info == XLOG_SWITCH) >>>+ XLogDumpStatsRow(psprintf("XLOG/SWITCH_JUNK"), >>>+ 0, total_count, stats->junk_size, total_rec_len, >>>+ 0, total_fpi_len, stats->junk_size, total_len); > >>Can't be described in the same way as "XLogDumpStatsRow(psprintf("%s/%s", >>desc->rm_name, id)..."? >>Only this part looks strange. >>Why are the "count" and "fpi_len" fields 0? >The 'SWITCH_JUNK' is not a real record and it relys on 'XLOG_SWITCH' record, >so I think we can't count >'SWITCH_JUNK', so the "count" is 0. And it never contain FPI, so the "fpi_len" >is 0. > >But 0 value maybe looks strange, so in current version I show it like below: >Type N (%) Record size (%) FPI size (%) Combined size (%) > - --- --- --- --- - --- >... >XLOG/SWITCH_JUNK - ( -) 11006248 ( 72.26) - ( -) 11006248 ( 65.78) >Transaction/COMMIT 10 ( 0.03) 340 ( 0.00) 0 ( 0.00) 340 ( 0.00) > I just wanted to know why the "count" and "fpi_len" fields 0 are. So, it would be nice to have 0 values. Sorry for confusing you. Regards, Shinya Kato
Re: Single transaction in the tablesync worker?
Hi Amit. PSA the v12 patch for the Tablesync Solution1. Differences from v11: + Added PG docs to mention the tablesync slot + Refactored tablesync slot drop (done by DropSubscription/process_syncing_tables_for_sync) + Fixed PG docs mentioning wrong state code + Fixed wrong code comment describing TCOPYDONE state Features: * The tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot na * The tablesync slot cleanup (drop) code is added for DropSubscription and for process_syncing_tables_for_sync functions * The tablesync worker is now allowing multiple tx instead of single tx * A new state (SUBREL_STATE_TCOPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart. * If a re-launched tablesync finds state SUBREL_STATE_TCOPYDONE then it will bypass the initial copy_table phase. * Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * The tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply. * The DropSubscription cleanup code was enhanced (v7+) to take care of any crashed tablesync workers. * Updates to PG docs TODO / Known Issues: * Address review comments * Patch applies with whitespace warning --- Kind Regards, Peter Smith. Fujitsu Australia v12-0001-Tablesync-Solution1.patch Description: Binary data v12-0002-Tablesync-extra-logging.patch Description: Binary data
Re: CheckpointLock needed in CreateCheckPoint()?
On Thu, Jan 7, 2021 at 12:45 PM Bharath Rupireddy wrote: > > On Thu, Jan 7, 2021 at 11:32 AM Amul Sul wrote: > > Snip from CreateCheckPoint(): > > -- > > /* > > * Acquire CheckpointLock to ensure only one checkpoint happens at a time. > > * (This is just pro forma, since in the present system structure there is > > * only one process that is allowed to issue checkpoints at any given > > * time.) > > */ > > LWLockAcquire(CheckpointLock, LW_EXCLUSIVE); > > -- > > > > As per this comment, it seems to be that we don't really need this LW lock. > > We > > could have something else instead if we are afraid of having multiple > > checkpoints at any given time which isn't possible, btw. > > Looks like CheckpointLock() can also be called in standalone backends > (RequestCheckpoint->CreateCheckPoint) along with the checkpointer > process. Having said that, I'm not quite sure whether it can get > called at the same time from a backend(may be with checkpoint; > command) and the checkpointer process. > > /* > * If in a standalone backend, just do it ourselves. > */ > if (!IsPostmasterEnvironment) > { > /* > * There's no point in doing slow checkpoints in a standalone backend, > * because there's no other backends the checkpoint could disrupt. > */ > CreateCheckPoint(flags | CHECKPOINT_IMMEDIATE); > > See the below comment for IsPostmasterEnvironment: > > /* > * IsPostmasterEnvironment is true in a postmaster process and any postmaster > * child process; it is false in a standalone process (bootstrap or > * standalone backend). > I am not sure I understood your point completely. Do you mean there could be multiple bootstrap or standalone backends that could exist at a time and can perform checkpoint? > > This CheckpointLock is acquired almost at the start of CreateCheckPoint() > > and > > released at the end. The in-between code can take a while to be complete. > > All > > that time interrupt will be on hold which doesn't seem to be a great idea > > but > > that wasn't problematic until now. I am having trouble due to this > > interrupt > > hold for a long since I was trying to add code changes where the > > checkpointer > > process is suppose to respond to the system read-write state change request > > as > > soon as possible[1]. That cannot be done if the interrupt is disabled > > since read-write state change uses the global barrier mechanism to convey > > system > > state changes to other running processes. > > > > So my question is, can we completely get rid of the CheckpointLock need in > > CreateCheckPoint()? > > > > Thoughts/Comments/Suggestions? > > I don't think we can completely get rid of CheckpointLock in > CreateCheckPoint given the fact that it can get called from multiple > processes. > How? > How about serving that ASRO barrier request alone for checkpointer > process in ProcessInterrupts even though the CheckpointLock is held by > the checkpointer process? And also while the checkpointing is > happening, may be we should call CHECK_FOR_INTERRUPTS to see if the > ASRO barrier has arrived. This may sound bit hacky, but just a > thought. I'm thinking that in ProcessInterrupts we can get to know the > checkpointer process type via MyAuxProcType, and whether the lock is > held or not using CheckpointLock, and then we can handle the ASRO > global barrier request. > I am afraid that this will easily get rejected by the community. Note that this is a very crucial code path of the database server. There are other options too, but don't want to propose them until clarity on the CheckpointLock necessity. Regards, Amul
Re: Incorrect allocation handling for cryptohash functions with OpenSSL
On 07/01/2021 06:15, Michael Paquier wrote: On Wed, Jan 06, 2021 at 03:58:22PM +0200, Heikki Linnakangas wrote: contrib/pgcrypto/internal-sha2.c and src/backend/utils/adt/cryptohashfuncs.c: the call to pg_cryptohash_create() is missing check for NULL result. With your latest patch, that's OK because the subsequent pg_cryptohash_update() calls will fail if passed a NULL context. But seems sloppy. Is it? pg_cryptohash_create() will never return NULL for the backend. Ah, you're right. src/backend/libpq/auth-scram.c: SHA256 is used in the mock authentication. If the pg_cryptohash functions fail, we throw a distinct error "could not encode salt" that reveals that it was a mock authentication. I don't think this is a big deal in practice, it would be hard for an attacker to induce the SHA256 computation to fail, and there are probably better ways to distinguish mock authentication from real, like timing attacks. But still. This maps with the second error in the mock routine, so wouldn't it be better to change both and back-patch? The last place where this error message is used is pg_be_scram_build_secret() for the generation of what's stored in pg_authid. An idea would be to use "out of memory". That can be faced for any palloc() calls. Hmm. Perhaps it would be best to change all the errors in mock authentication to COMMERROR. It'd be nice to have an accurate error message in the log, but no need to send it to the client. src/include/common/checksum_helper.h: in pg_checksum_raw_context, do we still need separate fields for the different sha variants. Using separate fields looked cleaner to me if it came to debugging, and the cleanup was rather minimal except if we use more switch/case to set up the various variables. What about something like the attached? +1 - Heikki
Re: CheckpointLock needed in CreateCheckPoint()?
On Thu, Jan 7, 2021 at 12:45 PM Bharath Rupireddy wrote: > > On Thu, Jan 7, 2021 at 11:32 AM Amul Sul wrote: > > Snip from CreateCheckPoint(): > > -- > > /* > > * Acquire CheckpointLock to ensure only one checkpoint happens at a time. > > * (This is just pro forma, since in the present system structure there is > > * only one process that is allowed to issue checkpoints at any given > > * time.) > > */ > > LWLockAcquire(CheckpointLock, LW_EXCLUSIVE); > > -- > > > > As per this comment, it seems to be that we don't really need this LW lock. > > We > > could have something else instead if we are afraid of having multiple > > checkpoints at any given time which isn't possible, btw. > > Looks like CheckpointLock() can also be called in standalone backends > (RequestCheckpoint->CreateCheckPoint) along with the checkpointer > process. Having said that, I'm not quite sure whether it can get > called at the same time from a backend(may be with checkpoint; > command) and the checkpointer process. If it is a standalone backend then there will be no postmaster and no other process i.e. no checkpoint process also. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: A failure of standby to follow timeline switch
At Thu, 7 Jan 2021 11:55:33 +0900, Fujii Masao wrote in > > > On 2021/01/05 17:26, Kyotaro Horiguchi wrote: > > At Mon, 4 Jan 2021 19:00:21 +0900, Fujii Masao > > wrote in > >> > >> > >> On 2021/01/04 12:06, Kyotaro Horiguchi wrote: > >>> At Sat, 26 Dec 2020 02:15:06 +0900, Fujii Masao > >>> wrote in > > On 2020/12/25 12:03, Kyotaro Horiguchi wrote: > >>> The attached is the fixed version. > >> > >> Thanks for updating the patches! > >> > >>> In the first patch, the test added to 001_stream_rep.pl involves two > >>> copied functions related to server-log investigation from > >>> 019_repslot_limit.pl. > >> > >> So you're planning to define them commonly in TestLib.pm or elsewhere? > > Yeah.. That's correct. Newly added as the first patch. > > While making that change, I extended the interface of slurp_file to > > allow reading from arbitrary position. > > Is this extension really helpful for current use case? > At least I'd like to avoid back-patching this since it's an exntesion... Yeah, I felt a hesitattion about it a bit. It's less useful assuming that log files won't get so large. Removed in this version. > OsFHandleOpen(my $fh = IO::Handle->new(), $fHandle, 'r') > or croak "could not read \"$filename\": $^E\n"; > + seek($fh, $from, 0) > + or croak "could not seek \"$filename\" to $from: $^E\n"; > > I'm not familiar with this area, but SetFilePointer() is more suitable > rather than seek()? SetFilePointer() works for a native handle, IO::Handle->new() here. seek() works on $fh, a perl handle. If ReadFile is used later SetFilePointer() might be needed separately. Anyway, it is removed. -- Kyotaro Horiguchi NTT Open Source Software Center >From 39f284c2109b63e444eb8d437fe51ae3268d305b Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 5 Jan 2021 13:34:36 +0900 Subject: [PATCH v4 1/3] Move TAP log-searching feature to common modules Some private functions in 019_repslot_limit.pl will be used in other tests. Move them to common modules so that they are available to the new tests. --- src/test/perl/PostgresNode.pm | 36 ++ src/test/recovery/t/019_replslot_limit.pl | 37 --- 2 files changed, 42 insertions(+), 31 deletions(-) diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index 9667f7667e..d78e9f93fb 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -2223,6 +2223,42 @@ sub pg_recvlogical_upto =pod +=item $node->current_log_position() + +Return the current position of server log. + +=cut + +sub current_log_position +{ + my $self = shift; + + return (stat $self->logfile)[7]; +} + +=pod + +=item $node->find_in_log($pattern, $startpos) + +Returns whether the $pattern occurs after $startpos in the server log. + +=cut + +sub find_in_log +{ + my ($self, $pattern, $startpos) = @_; + + $startpos = 0 unless defined $startpos; + my $log = TestLib::slurp_file($self->logfile); + return 0 if (length($log) <= $startpos); + + $log = substr($log, $startpos); + + return $log =~ m/$pattern/; +} + +=pod + =back =cut diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl index 20f4a1bbc3..b298d9992f 100644 --- a/src/test/recovery/t/019_replslot_limit.pl +++ b/src/test/recovery/t/019_replslot_limit.pl @@ -165,19 +165,17 @@ $node_primary->wait_for_catchup($node_standby, 'replay', $start_lsn); $node_standby->stop; -ok( !find_in_log( - $node_standby, - "requested WAL segment [0-9A-F]+ has already been removed"), +ok( !$node_standby->find_in_log( + "requested WAL segment [0-9A-F]+ has already been removed"), 'check that required WAL segments are still available'); # Advance WAL again, the slot loses the oldest segment. -my $logstart = get_log_size($node_primary); +my $logstart = $node_primary->current_log_position(); advance_wal($node_primary, 7); $node_primary->safe_psql('postgres', "CHECKPOINT;"); # WARNING should be issued -ok( find_in_log( - $node_primary, +ok( $node_primary->find_in_log( "invalidating slot \"rep1\" because its restart_lsn [0-9A-F/]+ exceeds max_slot_wal_keep_size", $logstart), 'check that the warning is logged'); @@ -190,14 +188,13 @@ is($result, "rep1|f|t|lost|", 'check that the slot became inactive and the state "lost" persists'); # The standby no longer can connect to the primary -$logstart = get_log_size($node_standby); +$logstart = $node_standby->current_log_position(); $node_standby->start; my $failed = 0; for (my $i = 0; $i < 1; $i++) { - if (find_in_log( - $node_standby, + if ($node_standby->find_in_log( "requested WAL segment [0-9A-F]+ has already been removed", $logstart)) { @@ -264,25 +261,3 @@ sub advance_wal } return; } - -# return the size of logfile of $node in bytes -sub get_log_size -{ - my ($node) = @_; - - return (stat $node->logfile)[7]; -} - -# find $pat in logfile of $node after
Re: CheckpointLock needed in CreateCheckPoint()?
On Thu, Jan 7, 2021 at 11:32 AM Amul Sul wrote: > Snip from CreateCheckPoint(): > -- > /* > * Acquire CheckpointLock to ensure only one checkpoint happens at a time. > * (This is just pro forma, since in the present system structure there is > * only one process that is allowed to issue checkpoints at any given > * time.) > */ > LWLockAcquire(CheckpointLock, LW_EXCLUSIVE); > -- > > As per this comment, it seems to be that we don't really need this LW lock. We > could have something else instead if we are afraid of having multiple > checkpoints at any given time which isn't possible, btw. Looks like CheckpointLock() can also be called in standalone backends (RequestCheckpoint->CreateCheckPoint) along with the checkpointer process. Having said that, I'm not quite sure whether it can get called at the same time from a backend(may be with checkpoint; command) and the checkpointer process. /* * If in a standalone backend, just do it ourselves. */ if (!IsPostmasterEnvironment) { /* * There's no point in doing slow checkpoints in a standalone backend, * because there's no other backends the checkpoint could disrupt. */ CreateCheckPoint(flags | CHECKPOINT_IMMEDIATE); See the below comment for IsPostmasterEnvironment: /* * IsPostmasterEnvironment is true in a postmaster process and any postmaster * child process; it is false in a standalone process (bootstrap or * standalone backend). > This CheckpointLock is acquired almost at the start of CreateCheckPoint() and > released at the end. The in-between code can take a while to be complete. All > that time interrupt will be on hold which doesn't seem to be a great idea but > that wasn't problematic until now. I am having trouble due to this interrupt > hold for a long since I was trying to add code changes where the checkpointer > process is suppose to respond to the system read-write state change request as > soon as possible[1]. That cannot be done if the interrupt is disabled > since read-write state change uses the global barrier mechanism to convey > system > state changes to other running processes. > > So my question is, can we completely get rid of the CheckpointLock need in > CreateCheckPoint()? > > Thoughts/Comments/Suggestions? I don't think we can completely get rid of CheckpointLock in CreateCheckPoint given the fact that it can get called from multiple processes. How about serving that ASRO barrier request alone for checkpointer process in ProcessInterrupts even though the CheckpointLock is held by the checkpointer process? And also while the checkpointing is happening, may be we should call CHECK_FOR_INTERRUPTS to see if the ASRO barrier has arrived. This may sound bit hacky, but just a thought. I'm thinking that in ProcessInterrupts we can get to know the checkpointer process type via MyAuxProcType, and whether the lock is held or not using CheckpointLock, and then we can handle the ASRO global barrier request. I may be missing something. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion
On Thu, Jan 7, 2021 at 1:30 PM Fujii Masao wrote: > > > > On 2021/01/07 12:42, Masahiko Sawada wrote: > > On Thu, Jan 7, 2021 at 10:59 AM Fujii Masao > > wrote: > >> > >> > >> > >> On 2021/01/07 10:01, Masahiko Sawada wrote: > >>> On Wed, Jan 6, 2021 at 3:37 PM wrote: > > > +#define Query_for_list_of_cursors \ > > +" SELECT name FROM pg_cursors"\ > > > > This query should be the following? > > > > " SELECT pg_catalog.quote_ident(name) "\ > > " FROM pg_catalog.pg_cursors "\ > > " WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'" > > > > +/* CLOSE */ > > + else if (Matches("CLOSE")) > > + COMPLETE_WITH_QUERY(Query_for_list_of_cursors > > + " UNION ALL > > SELECT 'ALL'"); > > > > "UNION ALL" should be "UNION"? > > Thank you for your advice, and I corrected them. > > >> + COMPLETE_WITH_QUERY(Query_for_list_of_cursors > >> + " UNION SELECT > >> 'ABSOLUTE'" > >> + " UNION SELECT > >> 'BACKWARD'" > >> + " UNION SELECT > >> 'FORWARD'" > >> + " UNION SELECT > >> 'RELATIVE'" > >> + " UNION SELECT > >> 'ALL'" > >> + " UNION SELECT > >> 'NEXT'" > >> + " UNION SELECT > >> 'PRIOR'" > >> + " UNION SELECT > >> 'FIRST'" > >> + " UNION SELECT > >> 'LAST'" > >> + " UNION SELECT > >> 'FROM'" > >> + " UNION SELECT > >> 'IN'"); > >> > >> This change makes psql unexpectedly output "FROM" and "IN" just after > >> "FETCH". > > > > I think "FROM" and "IN" can be placed just after "FETCH". According to > > the documentation, the direction can be empty. For instance, we can do > > like: > > Thank you! > > > I've attached the patch improving the tab completion for DECLARE as an > > example. What do you think? > > > > BTW according to the documentation, the options of DECLARE statement > > (BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive. > > > > DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ] > > CURSOR [ { WITH | WITHOUT } HOLD ] FOR query > > > > But I realized that these options are actually order-insensitive. For > > instance, we can declare a cursor like: > > > > =# declare abc scroll binary cursor for select * from pg_class; > > DECLARE CURSOR > > > > The both parser code and documentation has been unchanged from 2003. > > Is it a documentation bug? > > Thank you for your patch, and it is good. > I cannot find the description "(BINARY, INSENSITIVE, SCROLL, and NO > SCROLL) are order-sensitive." > I saw "The key words BINARY, INSENSITIVE, and SCROLL can appear in any > order." , according to the documentation. > >>> > >>> Thanks, you're right. I missed that sentence. I still don't think the > >>> syntax of DECLARE statement in the documentation doesn't match its > >>> implementation but I agree that it's order-insensitive. > >>> > I made a new patch, but the amount of codes was large due to > order-insensitive. > >>> > >>> Thank you for updating the patch! > >>> > >>> Yeah, I'm also afraid a bit that conditions will exponentially > >>> increase when a new option is added to DECLARE statement in the > >>> future. Looking at the parser code for DECLARE statement, we can put > >>> the same options multiple times (that's also why I don't think it > >>> matches). For instance, > >>> > >>> postgres(1:44758)=# begin; > >>> BEGIN > >>> postgres(1:44758)=# declare test binary binary binary cursor for > >>> select * from pg_class; > >>> DECLARE CURSOR > >>> > >>> So how about simplify the above code as follows? > >>> > >>> @@ -3005,8 +3014,23 @@ psql_completion(const char *text, int start, int > >>> end) > >>> else if (Matches("DECLARE", MatchAny)) > >>> COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL", > >>> "CURSOR"); > >>> + /* > >>> +* Complete DECLARE with one of BINARY, INSENSITIVE, SCROLL, > >>> +* NO SCROLL, and CURSOR. The tail doesn't match any keywords for > >>> +* DECLARE, assume we want options. > >>> +*/ > >>> + else if (HeadMatches("DECLARE", MatchAny, "*") && > >>> +TailMatches(MatchAnyExcept(
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On Thu, Jan 7, 2021 at 3:32 AM Peter Geoghegan wrote: > > On Wed, Jan 6, 2021 at 1:29 PM Michael Banck > wrote: > > That one seems to be 5min everywhere, and one can change it except on > > Azure. > > Okay, thanks for clearing that up. Looks like all of the big 3 cloud > providers use Postgres checksums in a straightforward way. > But they might have done something to reduce the impact of enabling checksums like by using a different checksum (for data blocks) and or compression (for WAL) technique. > I don't have much more to say on this thread. I am -1 on the current > proposal to enable page-level checksums by default. > -1 from me too with the current impact on performance and WAL it can have. I was looking at some old thread related to this topic and came across the benchmarking done by Tomas Vondra [1]. It clearly shows that enabling checksums can have a huge impact on init time, WAL, and TPS. Having said that, if we really want to enable checksums, can't we think of improving performance when it is enabled? I could think of two things to improve (a) a better algorithm for wal compression (currently we use pglz), this will allow us to enable wal_compression at least when data_checksums are enabled (b) a better technique for checksums to reduce the cost of PageSetChecksumCopy. I don't have good ideas to offer to improve things in these two areas but I think it is worth investigating if we want to enable checksums. [1] - https://www.postgresql.org/message-id/20190330192543.GH4719%40development -- With Regards, Amit Kapila.
Re: [PATCH] Identify LWLocks in tracepoints
On Mon, 28 Dec 2020 at 20:09, Masahiko Sawada wrote: > Hi Craig, > > On Sat, Dec 19, 2020 at 2:00 PM Craig Ringer > wrote: > > > > Hi all > > > > The attached patch set follows on from the discussion in [1] "Add LWLock > blocker(s) information" by adding the actual LWLock* and the numeric > tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint. > > > > This does not provide complete information on blockers, because it's not > necessarily valid to compare any two LWLock* pointers between two process > address spaces. The locks could be in DSM segments, and those DSM segments > could be mapped at different addresses. > > > > I wasn't able to work out a sensible way to map a LWLock* to any sort of > (tranche-id, lock-index) because there's no requirement that locks in a > tranche be contiguous or known individually to the lmgr. > > > > Despite that, the patches improve the information available for LWLock > analysis significantly. > > > > Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be > fired from LWLockWaitForVar, despite that function never actually acquiring > the lock. > > > > Patch 2 adds the tranche id and lock pointer for each trace hit. This > makes it possible to differentiate between individual locks within a > tranche, and (so long as they aren't tranches in a DSM segment) compare > locks between processes. That means you can do lock-order analysis etc, > which was not previously especially feasible. Traces also don't have to do > userspace reads for the tranche name all the time, so the trace can run > with lower overhead. > > > > Patch 3 adds a single-path tracepoint for all lock acquires and > releases, so you only have to probe the lwlock__acquired and > lwlock__release events to see all acquires/releases, whether conditional or > otherwise. It also adds start markers that can be used for timing wallclock > duration of LWLock acquires/releases. > > > > Patch 4 adds some comments on LWLock tranches to try to address some > points I found confusing and hard to understand when investigating this > topic. > > > > You sent in your patch to pgsql-hackers on Dec 19, but you did not > post it to the next CommitFest[1]. If this was intentional, then you > need to take no action. However, if you want your patch to be > reviewed as part of the upcoming CommitFest, then you need to add it > yourself before 2021-01-01 AoE[2]. Thanks for your contributions. > > Regards, > > [1] https://commitfest.postgresql.org/31/ > [2] https://en.wikipedia.org/wiki/Anywhere_on_Earth > Thanks. CF entry created at https://commitfest.postgresql.org/32/2927/ . I don't think it's urgent and will have limited review time so I didn't try to wedge it into the current CF.
CheckpointLock needed in CreateCheckPoint()?
Hi ALL, Snip from CreateCheckPoint(): -- /* * Acquire CheckpointLock to ensure only one checkpoint happens at a time. * (This is just pro forma, since in the present system structure there is * only one process that is allowed to issue checkpoints at any given * time.) */ LWLockAcquire(CheckpointLock, LW_EXCLUSIVE); -- As per this comment, it seems to be that we don't really need this LW lock. We could have something else instead if we are afraid of having multiple checkpoints at any given time which isn't possible, btw. This CheckpointLock is acquired almost at the start of CreateCheckPoint() and released at the end. The in-between code can take a while to be complete. All that time interrupt will be on hold which doesn't seem to be a great idea but that wasn't problematic until now. I am having trouble due to this interrupt hold for a long since I was trying to add code changes where the checkpointer process is suppose to respond to the system read-write state change request as soon as possible[1]. That cannot be done if the interrupt is disabled since read-write state change uses the global barrier mechanism to convey system state changes to other running processes. So my question is, can we completely get rid of the CheckpointLock need in CreateCheckPoint()? Thoughts/Comments/Suggestions? Thank you! Regards, Amul 1]http://postgr.es/m/CA+TgmoYexwDQjdd1=15KMz+7VfHVx8VHNL2qjRRK92P=csz...@mail.gmail.com
Re: [PATCH] Runtime control of CLOBBER_CACHE_ALWAYS
On Wed, 6 Jan 2021 at 18:23, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 2021-01-06 00:30, Craig Ringer wrote: > > Perhaps debug_invalidate_system_caches_always ? It's a bit long but we > > use the debug prefix elsewhere too. > > Committed with that name. > Thanks very much. > In your original patch, this hunk in pg_config_manual.h: > > + * You can define these by editing pg_config_manual.h but it's usually > + * sufficient to pass CFLAGS to configure, e.g. > + * > + * ./configure --enable-debug --enable-cassert CFLAGS="-DUSE_VALGRIND" > + * > + * The flags are persisted in Makefile.global so they will be correctly > + * applied to extensions, including those built by PGXS. > > I don't think that necessarily works for all settings. Maybe we should > make a pass through it and ensure it all works sensibly, but that seems > like a separate project. > It should work for everything, since we persist the CFLAGS string, rather than individual settings. But I didn't mean to leave that in the same patch anyway, it's separate.
Re: Terminate the idle sessions
On Thu, Jan 7, 2021 at 4:51 PM Tom Lane wrote: > Thomas Munro writes: > > One of the strange things about these errors is that they're > > asynchronous/unsolicited, but they appear to the client to be the > > response to their next request (if it doesn't eat ECONNRESET instead). > > Right, which is what makes class 57 (operator intervention) seem > attractive to me. From the client's standpoint these look little > different from ERRCODE_ADMIN_SHUTDOWN or ERRCODE_CRASH_SHUTDOWN, > which are in that category. Yeah, that's a good argument.
Re: plpgsql variable assignment with union is broken
čt 7. 1. 2021 v 4:20 odesílatel Merlin Moncure napsal: > On Tue, Jan 5, 2021 at 3:40 PM Tom Lane wrote: > > > > easter...@verfriemelt.org writes: > > > i found, that the behaviour of variable assignment in combination with > union is not working anymore: > > > DO $$ > > > DECLARE t bool; > > > begin > > > t := a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true > AS a; > > > END $$; > > > > > is this an intended change or is it a bug? > > > > It's an intended change, or at least I considered the case and thought > > that it was useless because assignment will reject any result with more > > than one row. Do you have any non-toy example that wouldn't be as > > clear or clearer without using UNION? The above sure seems like an > > example of awful SQL code. > > What is the definition of broken here? What is the behavior of the > query with the change and why? > > OP's query provably returns a single row and ought to always assign > true as written. A real world example might evaluate multiple > condition branches so that the assignment resolves true if any branch > is true. It could be rewritten with 'OR' of course. > > Is this also "broken"? > t := a FROM ( SELECT 'something' WHERE _Flag) t(a) UNION SELECT > 'something else' AS a WHERE NOT _Flag; > > What about this? > SELECT INTO t true WHERE false > UNION select true; > ANSI SQL allows only SELECT INTO or var := SQL expression and SQL expression can be (subquery) too do $$ declare t bool; begin t := (SELECT true WHERE false UNION SELECT true ); end; $$; Regards Pavel > merlin > > >
Re: Terminate the idle sessions
-- Best regards Japin Li On Jan 7, 2021, at 10:03 AM, Thomas Munro mailto:thomas.mu...@gmail.com>> wrote: * I'm not entirely comfortable with the name "idle_session_timeout", because it sounds like it applies to all idle states, but actually it only applies when we're not in a transaction. I left the name alone and tried to clarify the docs, but I wonder whether a different name would be better. (Alternatively, we could say that it does apply in all cases, making the effective timeout when in a transaction the smaller of the two GUCs. But that'd be more complex to implement and less flexible, so I'm not in favor of that answer.) Hmm, it is a bit confusing, but having them separate is indeed more flexible. Yes! It is a bit confusing. How about interactive_timeout? This is used by MySQL [1]. * The SQLSTATE you chose for the new error condition seems pretty random. I do not see it in the SQL standard, so using a code that's within the spec-reserved code range is certainly wrong. I went with 08P02 (the "P" makes it outside the reserved range), but I'm not really happy either with using class 08 ("Connection Exception", which seems to be mainly meant for connection-request failures), or the fact that ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT is practically identical but it's not even in the same major class. Now 25 ("Invalid Transaction State") is certainly not right for this new error, but I think what that's telling us is that 25 was a misguided choice for the other error. In a green field I think I'd put both of them in class 53 ("Insufficient Resources") or maybe class 57 ("Operator Intervention"). Can we get away with renumbering the older error at this point? In any case I'd be inclined to move the new error to 53 or 57 --- anybody have a preference which? I don't have a strong view here, but 08 with a P doesn't seem crazy to me. Unlike eg 57014 (statement_timeout), 57017 (deadlock_timeout), 55P03 (lock_timeout... huh, I just noticed that DB2 uses 57017 for that, distinguished from deadlock by another error field), after these timeouts you don't have a session/connection anymore. The two are a bit different though: in the older one, you were in a transaction, and it seems to me quite newsworthy that your transaction has been aborted, information that is not conveyed quite so clearly with a connection-related error class. Apologize! I just think it is a Connection Exception. [1] https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_interactive_timeout
Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion
On 2021/01/07 12:42, Masahiko Sawada wrote: On Thu, Jan 7, 2021 at 10:59 AM Fujii Masao wrote: On 2021/01/07 10:01, Masahiko Sawada wrote: On Wed, Jan 6, 2021 at 3:37 PM wrote: +#define Query_for_list_of_cursors \ +" SELECT name FROM pg_cursors"\ This query should be the following? " SELECT pg_catalog.quote_ident(name) "\ " FROM pg_catalog.pg_cursors "\ " WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'" +/* CLOSE */ + else if (Matches("CLOSE")) + COMPLETE_WITH_QUERY(Query_for_list_of_cursors + " UNION ALL SELECT 'ALL'"); "UNION ALL" should be "UNION"? Thank you for your advice, and I corrected them. + COMPLETE_WITH_QUERY(Query_for_list_of_cursors + " UNION SELECT 'ABSOLUTE'" + " UNION SELECT 'BACKWARD'" + " UNION SELECT 'FORWARD'" + " UNION SELECT 'RELATIVE'" + " UNION SELECT 'ALL'" + " UNION SELECT 'NEXT'" + " UNION SELECT 'PRIOR'" + " UNION SELECT 'FIRST'" + " UNION SELECT 'LAST'" + " UNION SELECT 'FROM'" + " UNION SELECT 'IN'"); This change makes psql unexpectedly output "FROM" and "IN" just after "FETCH". I think "FROM" and "IN" can be placed just after "FETCH". According to the documentation, the direction can be empty. For instance, we can do like: Thank you! I've attached the patch improving the tab completion for DECLARE as an example. What do you think? BTW according to the documentation, the options of DECLARE statement (BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive. DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ] CURSOR [ { WITH | WITHOUT } HOLD ] FOR query But I realized that these options are actually order-insensitive. For instance, we can declare a cursor like: =# declare abc scroll binary cursor for select * from pg_class; DECLARE CURSOR The both parser code and documentation has been unchanged from 2003. Is it a documentation bug? Thank you for your patch, and it is good. I cannot find the description "(BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive." I saw "The key words BINARY, INSENSITIVE, and SCROLL can appear in any order." , according to the documentation. Thanks, you're right. I missed that sentence. I still don't think the syntax of DECLARE statement in the documentation doesn't match its implementation but I agree that it's order-insensitive. I made a new patch, but the amount of codes was large due to order-insensitive. Thank you for updating the patch! Yeah, I'm also afraid a bit that conditions will exponentially increase when a new option is added to DECLARE statement in the future. Looking at the parser code for DECLARE statement, we can put the same options multiple times (that's also why I don't think it matches). For instance, postgres(1:44758)=# begin; BEGIN postgres(1:44758)=# declare test binary binary binary cursor for select * from pg_class; DECLARE CURSOR So how about simplify the above code as follows? @@ -3005,8 +3014,23 @@ psql_completion(const char *text, int start, int end) else if (Matches("DECLARE", MatchAny)) COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL", "CURSOR"); + /* +* Complete DECLARE with one of BINARY, INSENSITIVE, SCROLL, +* NO SCROLL, and CURSOR. The tail doesn't match any keywords for +* DECLARE, assume we want options. +*/ + else if (HeadMatches("DECLARE", MatchAny, "*") && +TailMatches(MatchAnyExcept("CURSOR|WITH|WITHOUT|HOLD|FOR"))) + COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL", + "CURSOR"); This change seems to cause "DECLARE ... CURSOR FOR SELECT " to unexpectedly output BINARY, INSENSITIVE, etc. Indeed. Is the following not complete but much better? Yes, I think that's better. @@ -3002,11 +3011,18 @@ psql_completion(const char *text, int start, int end) " UNION SELECT 'ALL'"); /* DECLARE */ - else if (Matches("DECLARE", MatchAny)) - COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL", - "CURSOR"); + /* + * Complete DECLARE with one of BINARY, INSENSITIVE, SCROLL, + * NO SCROLL, and CURSOR. If the tail is any one of options, assume we + * still want options. + */ + else if (Matches("DECLARE", MatchAny) || +TailMatches("BINARY|INSENSITI
Re: Parallel Inserts in CREATE TABLE AS
Thanks for the clarification. w.r.t. the commit message, maybe I was momentarily sidetracked by the phrase: With this change. It seems the scenarios you listed are known parallel safety constraints. Probably rephrase that sentence a little bit to make this clearer. Cheers On Wed, Jan 6, 2021 at 8:01 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Thu, Jan 7, 2021 at 5:12 AM Zhihong Yu wrote: > > > > Hi, > > For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch : > > > > workers to Gather node to 0. With this change, there are chances > > that the planner may choose the parallel plan. > > > > It would be nice if the scenarios where a parallel plan is not chosen > are listed. > > There are many reasons, the planner may not choose a parallel plan for > the select part, for instance if there are temporary tables, parallel > unsafe functions, or the parallelism GUCs are not set properly, > foreign tables and so on. see > https://www.postgresql.org/docs/devel/parallel-safety.html. I don't > think so, we will add all the scenarios into the commit message. > > Having said that, we have extensive comments in the code(especially in > the function SetParallelInsertState) about when parallel inserts are > chosen. > > + * Parallel insertions are possible only if the upper node is Gather. > */ > +if (!IsA(gstate, GatherState)) > return; > > + * Parallelize inserts only when the upper Gather node has no > projections. > */ > +if (!gstate->ps.ps_ProjInfo) > +{ > +/* Okay to parallelize inserts, so mark it. */ > +if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) > +((DR_intorel *) dest)->is_parallel = true; > + > +/* > + * For parallelizing inserts, we must send some information so > that the > + * workers can build their own dest receivers. For CTAS, this > info is > + * into clause, object id (to open the created table). > + * > + * Since the required information is available in the dest > receiver, > + * store a reference to it in the Gather state so that it will be > used > + * in ExecInitParallelPlan to pick the information. > + */ > +gstate->dest = dest; > +} > +else > +{ > +/* > + * Upper Gather node has projections, so parallel insertions are > not > + * allowed. > + */ > > > + if ((root->parse->parallelInsCmdTupleCostOpt & > > +PARALLEL_INSERT_SELECT_QUERY) && > > + (root->parse->parallelInsCmdTupleCostOpt & > > +PARALLEL_INSERT_CAN_IGN_TUP_COST)) > > + { > > + /* We are ignoring the parallel tuple cost, so mark it. */ > > + root->parse->parallelInsCmdTupleCostOpt |= > > + > PARALLEL_INSERT_TUP_COST_IGNORED; > > > > If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and > PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED > is implied. > > > > Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the > setting (1) of the first two bits should suffice. > > The way these flags work is as follows: before planning in CTAS, we > set PARALLEL_INSERT_SELECT_QUERY, before we go for generating upper > gather path we set PARALLEL_INSERT_CAN_IGN_TUP_COST, and when we > actually ignored the tuple cost in cost_gather we set > PARALLEL_INSERT_TUP_COST_IGNORED. There are chances that we set > PARALLEL_INSERT_CAN_IGN_TUP_COST before calling > generate_useful_gather_paths, and the function > generate_useful_gather_paths can return before reaching cost_gather, > see below snippets. So, we need the 3 flags. > > void > generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool > override_rows) > { > ListCell *lc; > doublerows; > double *rowsp = NULL; > List *useful_pathkeys_list = NIL; > Path *cheapest_partial_path = NULL; > > /* If there are no partial paths, there's nothing to do here. */ > if (rel->partial_pathlist == NIL) > return; > > /* Should we override the rel's rowcount estimate? */ > if (override_rows) > rowsp = &rows; > > /* generate the regular gather (merge) paths */ > generate_gather_paths(root, rel, override_rows); > > void > generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool > override_rows) > { > Path *cheapest_partial_path; > Path *simple_gather_path; > ListCell *lc; > doublerows; > double *rowsp = NULL; > > /* If there are no partial paths, there's nothing to do here. */ > if (rel->partial_pathlist == NIL) > return; > > > With Regards, > Bharath Rupireddy. > EnterpriseDB: http://www.enterprisedb.com >
Re: Single transaction in the tablesync worker?
On Wed, Jan 6, 2021 at 3:39 PM Amit Kapila wrote: > > On Wed, Jan 6, 2021 at 2:13 PM Peter Smith wrote: > > > > I think it makes sense. If there can be a race between the tablesync > > re-launching (after error), and the AlterSubscription_refresh removing > > some table’s relid from the subscription then there could be lurking > > slot/origin tablesync resources (of the removed table) which a > > subsequent DROP SUBSCRIPTION cannot discover. I will think more about > > how/if it is possible to make this happen. Anyway, I suppose I ought > > to refactor/isolate some of the tablesync cleanup code in case it > > needs to be commonly called from DropSubscription and/or from > > AlterSubscription_refresh. > > > > Fair enough. > I think before implementing, we should once try to reproduce this case. I understand this is a timing issue and can be reproduced only with the help of debugger but we should do that. > BTW, I have analyzed whether we need any modifications to > pg_dump/restore for this patch as this changes the state of one of the > fields in the system table and concluded that we don't need any > change. For subscriptions, we don't dump any of the information from > pg_subscription_rel, rather we just dump subscriptions with the > connect option as false which means users need to enable the > subscription and refresh publication after restore. I have checked > this in the code and tested it as well. The related information is > present in pg_dump doc page [1], see from "When dumping logical > replication subscriptions ". > I have further analyzed that we don't need to do anything w.r.t pg_upgrade as well because it uses pg_dump/pg_dumpall to dump the schema info of the old cluster and then restore it to the new cluster. And, we know that pg_dump ignores the info in pg_subscription_rel, so we don't need to change anything as our changes are specific to the state of one of the columns in pg_subscription_rel. I have not tested this but we should test it by having some relations in not_ready state and then allow the old cluster (<=PG13) to be upgraded to new (pg14) both with and without this patch and see if there is any change in behavior. -- With Regards, Amit Kapila.
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
On 2021/01/05 16:56, Bharath Rupireddy wrote: On Sat, Jan 2, 2021 at 11:19 AM Bharath Rupireddy wrote: I'm sorry for the mess. I missed adding the new files into the v6-0001 patch. Please ignore the v6 patch set and consder the v7 patch set for further review. Note that 0002 and 0003 patches have no difference from v5 patch set. It seems like cf bot was failing on v7 patches. On Linux, it fails while building documentation in 0001 patch, I corrected that. On FreeBSD, it fails in one of the test cases I added, since it was unstable, I corrected it now. Attaching v8 patch set. Hopefully, cf bot will be happy with v8. Please consider the v8 patch set for further review. Thanks for the patch! -DATA = postgres_fdw--1.0.sql +DATA = postgres_fdw--1.1.sql postgres_fdw--1.0--1.1.sql Shouldn't we leave 1.0.sql as it is and create 1.0--1.1.sql so that we can run the followings? CREATE EXTENSION postgres_fdw VERSION "1.0"; ALTER EXTENSION postgres_fdw UPDATE TO "1.1"; + + Functions The document format for functions should be consistent with that in other contrib module like pgstattuple? + When called in the local session, it returns an array with each element as a + pair of the foreign server names of all the open connections that are + previously made to the foreign servers and true or + false to show whether or not the connection is valid. We thought that the information about whether the connection is valid or not was useful to, for example, identify and close explicitly the long-living invalid connections because they were useless. But thanks to the recent bug fix for connection leak issue, that information would be no longer so helpful for us? False is returned only when the connection is used in this local transaction but it's marked as invalidated. In this case that connection cannot be explicitly closed because it's used in this transaction. It will be closed at the end of transaction. Thought? I guess that you made postgres_fdw_get_connections() return the array because the similar function dblink_get_connections() does that. But isn't it more convenient to make that return the set of records? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Incorrect allocation handling for cryptohash functions with OpenSSL
On Wed, Jan 06, 2021 at 03:58:22PM +0200, Heikki Linnakangas wrote: > contrib/pgcrypto/internal-sha2.c and > src/backend/utils/adt/cryptohashfuncs.c: the call to pg_cryptohash_create() > is missing check for NULL result. With your latest patch, that's OK because > the subsequent pg_cryptohash_update() calls will fail if passed a NULL > context. But seems sloppy. Is it? pg_cryptohash_create() will never return NULL for the backend. > contrib/pgcrypto/internal.c: all the calls to pg_cryptohash_* functions are > missing checks for error return codes. Indeed, these are incorrect, thanks. I'll go fix that separately. > contrib/uuid-ossp/uuid-ossp.c: uses pg_cryptohash for MD5, but borrows the > built-in implementation of SHA1 on some platforms. Should we add support for > SHA1 in pg_cryptohash and use that for consistency? Yeah, I have sent a separate patch for that: https://commitfest.postgresql.org/31/2868/ The cleanups produced by this patch are kind of nice. > src/backend/libpq/auth-scram.c: SHA256 is used in the mock authentication. > If the pg_cryptohash functions fail, we throw a distinct error "could not > encode salt" that reveals that it was a mock authentication. I don't think > this is a big deal in practice, it would be hard for an attacker to induce > the SHA256 computation to fail, and there are probably better ways to > distinguish mock authentication from real, like timing attacks. But still. This maps with the second error in the mock routine, so wouldn't it be better to change both and back-patch? The last place where this error message is used is pg_be_scram_build_secret() for the generation of what's stored in pg_authid. An idea would be to use "out of memory". That can be faced for any palloc() calls. > src/include/common/checksum_helper.h: in pg_checksum_raw_context, do we > still need separate fields for the different sha variants. Using separate fields looked cleaner to me if it came to debugging, and the cleanup was rather minimal except if we use more switch/case to set up the various variables. What about something like the attached? -- Michael diff --git a/src/include/common/checksum_helper.h b/src/include/common/checksum_helper.h index ebdf1ccf44..cac7570ea1 100644 --- a/src/include/common/checksum_helper.h +++ b/src/include/common/checksum_helper.h @@ -42,10 +42,7 @@ typedef enum pg_checksum_type typedef union pg_checksum_raw_context { pg_crc32c c_crc32c; - pg_cryptohash_ctx *c_sha224; - pg_cryptohash_ctx *c_sha256; - pg_cryptohash_ctx *c_sha384; - pg_cryptohash_ctx *c_sha512; + pg_cryptohash_ctx *c_sha2; } pg_checksum_raw_context; /* diff --git a/src/common/checksum_helper.c b/src/common/checksum_helper.c index f6b49de405..2881b2c178 100644 --- a/src/common/checksum_helper.c +++ b/src/common/checksum_helper.c @@ -93,42 +93,42 @@ pg_checksum_init(pg_checksum_context *context, pg_checksum_type type) INIT_CRC32C(context->raw_context.c_crc32c); break; case CHECKSUM_TYPE_SHA224: - context->raw_context.c_sha224 = pg_cryptohash_create(PG_SHA224); - if (context->raw_context.c_sha224 == NULL) + context->raw_context.c_sha2 = pg_cryptohash_create(PG_SHA224); + if (context->raw_context.c_sha2 == NULL) return -1; - if (pg_cryptohash_init(context->raw_context.c_sha224) < 0) + if (pg_cryptohash_init(context->raw_context.c_sha2) < 0) { -pg_cryptohash_free(context->raw_context.c_sha224); +pg_cryptohash_free(context->raw_context.c_sha2); return -1; } break; case CHECKSUM_TYPE_SHA256: - context->raw_context.c_sha256 = pg_cryptohash_create(PG_SHA256); - if (context->raw_context.c_sha256 == NULL) + context->raw_context.c_sha2 = pg_cryptohash_create(PG_SHA256); + if (context->raw_context.c_sha2 == NULL) return -1; - if (pg_cryptohash_init(context->raw_context.c_sha256) < 0) + if (pg_cryptohash_init(context->raw_context.c_sha2) < 0) { -pg_cryptohash_free(context->raw_context.c_sha256); +pg_cryptohash_free(context->raw_context.c_sha2); return -1; } break; case CHECKSUM_TYPE_SHA384: - context->raw_context.c_sha384 = pg_cryptohash_create(PG_SHA384); - if (context->raw_context.c_sha384 == NULL) + context->raw_context.c_sha2 = pg_cryptohash_create(PG_SHA384); + if (context->raw_context.c_sha2 == NULL) return -1; - if (pg_cryptohash_init(context->raw_context.c_sha384) < 0) + if (pg_cryptohash_init(context->raw_context.c_sha2) < 0) { -pg_cryptohash_free(context->raw_context.c_sha384); +pg_cryptohash_free(context->raw_context.c_sha2); return -1; } break; case CHECKSUM_TYPE_SHA512: - context->raw_context.c_sha512 = pg_cryptohash_create(PG_SHA512); - if (context->raw_context.c_sha512 == NULL) + context->raw_context.c_sha2 = pg_cryptohash_create(PG_SHA512); + if (context->raw_context.c_sha2 == NULL) return -1; - if (pg_cryptohash_init(context->raw_context.c_sha512) < 0) + if (pg_cryptohash_init(context->raw_context.c_sha2) < 0)
Re: plpgsql variable assignment with union is broken
On Wed, Jan 6, 2021 at 9:39 PM Tom Lane wrote: > > Merlin Moncure writes: > > On Tue, Jan 5, 2021 at 3:40 PM Tom Lane wrote: > >> easter...@verfriemelt.org writes: > >>> i found, that the behaviour of variable assignment in combination with > >>> union is not working anymore: > >>> DO $$ > >>> DECLARE t bool; > >>> begin > >>> t := a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS a; > >>> END $$; > >>> is this an intended change or is it a bug? > > >> It's an intended change, or at least I considered the case and thought > >> that it was useless because assignment will reject any result with more > >> than one row. Do you have any non-toy example that wouldn't be as > >> clear or clearer without using UNION? The above sure seems like an > >> example of awful SQL code. > > > What is the definition of broken here? What is the behavior of the > > query with the change and why? > > The OP is complaining that that gets a syntax error since c9d529848. > > > OP's query provably returns a single row and ought to always assign > > true as written. > > My opinion is that (a) it's useless and (b) there has never been any > documentation that claimed that you could do this. Here is what the documentation says: > variable { := | = } expression; > As explained previously, the expression in such a statement is evaluated by > means of an SQL SELECT command sent to the main database engine. This is valid SQL: SELECT a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS a; So I'd argue that OP's query *is* syntactically valid per the rules as I understand them. and is my opinion entirely consistent with the documentation in that it a) resolves exactly one row, and: b) is made syntactically valid by prefixing the expression with SELECT. Aesthetical considerations are irrelevant IMO. merlin
Re: Parallel Inserts in CREATE TABLE AS
On Thu, Jan 7, 2021 at 5:12 AM Zhihong Yu wrote: > > Hi, > For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch : > > workers to Gather node to 0. With this change, there are chances > that the planner may choose the parallel plan. > > It would be nice if the scenarios where a parallel plan is not chosen are > listed. There are many reasons, the planner may not choose a parallel plan for the select part, for instance if there are temporary tables, parallel unsafe functions, or the parallelism GUCs are not set properly, foreign tables and so on. see https://www.postgresql.org/docs/devel/parallel-safety.html. I don't think so, we will add all the scenarios into the commit message. Having said that, we have extensive comments in the code(especially in the function SetParallelInsertState) about when parallel inserts are chosen. + * Parallel insertions are possible only if the upper node is Gather. */ +if (!IsA(gstate, GatherState)) return; + * Parallelize inserts only when the upper Gather node has no projections. */ +if (!gstate->ps.ps_ProjInfo) +{ +/* Okay to parallelize inserts, so mark it. */ +if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) +((DR_intorel *) dest)->is_parallel = true; + +/* + * For parallelizing inserts, we must send some information so that the + * workers can build their own dest receivers. For CTAS, this info is + * into clause, object id (to open the created table). + * + * Since the required information is available in the dest receiver, + * store a reference to it in the Gather state so that it will be used + * in ExecInitParallelPlan to pick the information. + */ +gstate->dest = dest; +} +else +{ +/* + * Upper Gather node has projections, so parallel insertions are not + * allowed. + */ > + if ((root->parse->parallelInsCmdTupleCostOpt & > +PARALLEL_INSERT_SELECT_QUERY) && > + (root->parse->parallelInsCmdTupleCostOpt & > +PARALLEL_INSERT_CAN_IGN_TUP_COST)) > + { > + /* We are ignoring the parallel tuple cost, so mark it. */ > + root->parse->parallelInsCmdTupleCostOpt |= > + PARALLEL_INSERT_TUP_COST_IGNORED; > > If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and > PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED is > implied. > > Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the setting > (1) of the first two bits should suffice. The way these flags work is as follows: before planning in CTAS, we set PARALLEL_INSERT_SELECT_QUERY, before we go for generating upper gather path we set PARALLEL_INSERT_CAN_IGN_TUP_COST, and when we actually ignored the tuple cost in cost_gather we set PARALLEL_INSERT_TUP_COST_IGNORED. There are chances that we set PARALLEL_INSERT_CAN_IGN_TUP_COST before calling generate_useful_gather_paths, and the function generate_useful_gather_paths can return before reaching cost_gather, see below snippets. So, we need the 3 flags. void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows) { ListCell *lc; doublerows; double *rowsp = NULL; List *useful_pathkeys_list = NIL; Path *cheapest_partial_path = NULL; /* If there are no partial paths, there's nothing to do here. */ if (rel->partial_pathlist == NIL) return; /* Should we override the rel's rowcount estimate? */ if (override_rows) rowsp = &rows; /* generate the regular gather (merge) paths */ generate_gather_paths(root, rel, override_rows); void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows) { Path *cheapest_partial_path; Path *simple_gather_path; ListCell *lc; doublerows; double *rowsp = NULL; /* If there are no partial paths, there's nothing to do here. */ if (rel->partial_pathlist == NIL) return; With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
RE: [Patch] Optimize dropping of relation buffers using dlist
>I'd like take a look at them and redo some of the tests using my machine. I'll >send my test reults in a separate email after this. I did the same tests with Kirk's scripts using the latest patch on my own machine. The results look pretty good and similar with Kirk's. average of 5 runs. [VACUUM failover test for 1000 relations] Unit is second, %reg=(patched-master)/ master | s_b | Master| Patched | %reg | |--|---|--|--| | 128 MB| 0.408 | 0.308 | -24.44% | | 1 GB | 0.809 | 0.308 | -61.94% | | 20 GB | 12.529| 0.308 | -97.54% | | 100 GB| 59.310| 0.369 | -99.38% | [TRUNCATE failover test for 1000 relations] Unit is second, %reg=(patched-master)/ master | s_b | Master| Patched | %reg | |--|---|--|--| | 128 MB| 0.287 | 0.207 | -27.91% | | 1 GB | 0.688 | 0.208 | -69.84% | | 20 GB | 12.449| 0.208 | -98.33% | | 100 GB| 61.800| 0.207 | -99.66% | Besides, I did the test for threshold value again. (I rechecked my test process and found out that I forgot to check the data synchronization state on standby which may introduce some NOISE to my results.) The following results show we can't get optimize over NBuffers/32 just like Kirk's test results, so I do approve with Kirk on the threshold value. %regression: | rel_size |128MB|1GB|20GB| 100GB | |--||||---| | NB/512 | 0% | 0% | 0% | -48% | | NB/256 | 0% | 0% | 0% | -33% | | NB/128 | 0% | 0% | 0% | -9% | | NB/64| 0% | 0% | 0% | -5% | | NB/32| 0% | 0% |-4% | -3% | | NB/16| 0% | 0% |-4% | 1%| | NB/8 | 1% | 0% | 1% | 3%| Optimization details(unit: second): patched: shared_buffers NBuffers/512NBuffers/256NBuffers/128NBuffers/64 NBuffers/32 NBuffers/16 NBuffers/8 - 128M0.107 0.107 0.107 0.106 0.107 0.107 0.107 1G 0.107 0.107 0.107 0.107 0.107 0.107 0.107 20G 0.107 0.108 0.207 0.307 0.442 0.876 1.577 100G0.208 0.308 0.559 1.060 1.961 4.567 7.922 master: shared_buffers NBuffers/512NBuffers/256NBuffers/128NBuffers/64 NBuffers/32 NBuffers/16 NBuffers/8 - 128M0.107 0.107 0.107 0.107 0.107 0.107 0.106 1G 0.107 0.107 0.107 0.107 0.107 0.107 0.107 20G 0.107 0.107 0.208 0.308 0.457 0.910 1.560 100G0.308 0.409 0.608 1.110 2.011 4.516 7.721 [Specs] CPU : 40 processors (Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz) Memory: 128G OS: CentOS 8 Any question to my test is welcome. Regards, Tang
Re: Single transaction in the tablesync worker?
Thankyou for the feedback. On Thu, Jan 7, 2021 at 12:45 PM Hou, Zhijie wrote: > > > PSA the v11 patch for the Tablesync Solution1. > > > > Difference from v10: > > - Addresses several recent review comments. > > - pg_indent has been run > > > Hi > > I took a look into the patch and have some comments. > > 1. > * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT -> > - * CATCHUP -> SYNCDONE -> READY. > + * CATCHUP -> (sync worker TCOPYDONE) -> SYNCDONE -> READY. > > I noticed the new state TCOPYDONE is commented between CATCHUP and SYNCDONE, > But It seems the SUBREL_STATE_TCOPYDONE is actually set before > SUBREL_STATE_SYNCWAIT[1]. > Did i miss something here ? > > [1]- > + UpdateSubscriptionRelState(MyLogicalRepWorker->subid, > + > MyLogicalRepWorker->relid, > + > SUBREL_STATE_TCOPYDONE, > + > MyLogicalRepWorker->relstate_lsn); > ... > /* > * We are done with the initial data synchronization, update the > state. > */ > SpinLockAcquire(&MyLogicalRepWorker->relmutex); > MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCWAIT; > -- > Thanks for reporting this mistake. I will correct the comment for the next patch (v12) > > 2. > i = initialize, > d = data is being copied, > + C = table data has been copied, > s = synchronized, > r = ready (normal replication) > > +#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed > +* > (sublsn NULL) */ > The character representing 'data has been copied' in the catalog seems > different from the macro define. > Yes, same was already previously reported [1] [1] https://www.postgresql.org/message-id/CAA4eK1Kyi037XZzyrLE71MS2KoMmNSqa6RrQLdSCeeL27gnL%2BA%40mail.gmail.com It will be fixed in the next patch (v12) Kind Regards, Peter Smith. Fujitsu Australia.
Re: Terminate the idle sessions
Thomas Munro writes: > One of the strange things about these errors is that they're > asynchronous/unsolicited, but they appear to the client to be the > response to their next request (if it doesn't eat ECONNRESET instead). Right, which is what makes class 57 (operator intervention) seem attractive to me. From the client's standpoint these look little different from ERRCODE_ADMIN_SHUTDOWN or ERRCODE_CRASH_SHUTDOWN, which are in that category. regards, tom lane
Re: vacuum_cost_page_miss default value and modern hardware
On Wed, Jan 6, 2021 at 5:39 PM Masahiko Sawada wrote: > > On Mon, Dec 28, 2020 at 5:17 AM Peter Geoghegan wrote: > > > > Simply decreasing vacuum_cost_page_dirty seems like a low risk way of > > making the VACUUM costing more useful within autovacuum workers. > > Halving vacuum_cost_page_dirty to 5 would be a good start, though I > > think that a value as low as 2 would be better. That would make it > > only 2x vacuum_cost_page_hit's default (i.e 2x the cost of processing > > a page that is in shared_buffers but did not need to be dirtied), > > which seems sensible to me when considered in the context in which the > > value is actually applied (and not some abstract theoretical context). > > Perhaps you meant to decrease vacuumm_cost_page_miss instead of > vacuum_cost_page_dirty? > > > > > There are a few reasons why this seems like a good idea now: > > > > * Throttling/delaying VACUUM is only useful as a way of smoothing the > > impact on production queries, which is an important goal, but > > currently we don't discriminate against the cost that we really should > > keep under control (dirtying new pages within VACUUM) very well. > > > > This is due to the aforementioned trends, the use of a strategy ring > > buffer by VACUUM, the fact that indexes are usually vacuumed in > > sequential physical order these days, and many other things that were > > not a factor in 2004. > > > > * There is a real downside to throttling VACUUM unnecessarily, and the > > effects are *non-linear*. On a large table, the oldest xmin cutoff may > > become very old by the time we're only (say) half way through the > > initial table scan in lazy_scan_heap(). There may be relatively little > > work to do because most of the dead tuples won't be before the oldest > > xmin cutoff by that time (VACUUM just cannot keep up). Excessive > > throttling for simple page misses may actually *increase* the amount > > of I/O that VACUUM has to do over time -- we should try to get to the > > pages that actually need to be vacuumed quickly, which are probably > > already dirty anyway (and thus are probably going to add little to the > > cost delay limit in practice). Everything is connected to everything > > else. > > > > * vacuum_cost_page_miss is very much not like random_page_cost, and > > the similar names confuse the issue -- this is not an optimization > > problem. Thinking about VACUUM as unrelated to the workload itself is > > obviously wrong. Changing the default is also an opportunity to clear > > that up. > > > > Even if I am wrong to suggest that a miss within VACUUM should only be > > thought of as 2x as expensive as a hit in some *general* sense, I am > > concerned about *specific* consequences. There is no question about > > picking the best access path here -- we're still going to have to > > access the same blocks in the same way sooner or later. In general I > > think that we should move in the direction of more frequent, cheaper > > VACUUM operations [1], though we've already done a lot of work in that > > direction (e.g. freeze map work). > > I agree with that direction. > > > > > * Some impact from VACUUM on query performance may in fact be a good thing. > > > > Deferring the cost of vacuuming can only make sense if we'll > > eventually be able to catch up because we're experiencing a surge in > > demand, which seems kind of optimistic -- it seems more likely that > > the GC debt will just grow and grow. Why should the DBA not expect to > > experience some kind of impact, which could be viewed as a natural > > kind of backpressure? The most important thing is consistent > > performance. > > > > * Other recent work such as the vacuum_cleanup_index_scale_factor > > patch has increased the relative cost of index vacuuming in some > > important cases: we don't have a visibility/freeze map for indexes, > > but index vacuuming that doesn't dirty any pages and has a TID kill > > list that's concentrated at the end of the heap/table is pretty cheap > > (the TID binary search is cache efficient/cheap). This change will > > help these workloads by better reflecting the way in which index > > vacuuming can be cheap for append-only tables with a small amount of > > garbage for recently inserted tuples that also got updated/deleted. > > > > * Lowering vacuum_cost_page_miss's default (as opposed to changing > > something else) is a simple and less disruptive way of achieving these > > goals. > > > > This approach seems unlikely to break existing VACUUM-related custom > > settings from current versions that get reused on upgrade. I expect > > little impact on small installations. > > > > I recalled the discussion decreasing the default value for > autovacuum_cost_delay from 20ms to 2ms on PostgreSQL 12. I re-read > through the discussion but there wasn't the discussion changing > hit/miss/dirty. > > Whereas the change we did for autovacuum_cost_delay affects every > installation, lowering vacuum_cost_page_miss would bring a different > i
Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion
On Thu, Jan 7, 2021 at 10:59 AM Fujii Masao wrote: > > > > On 2021/01/07 10:01, Masahiko Sawada wrote: > > On Wed, Jan 6, 2021 at 3:37 PM wrote: > >> > >>> +#define Query_for_list_of_cursors \ > >>> +" SELECT name FROM pg_cursors"\ > >>> > >>> This query should be the following? > >>> > >>> " SELECT pg_catalog.quote_ident(name) "\ > >>> " FROM pg_catalog.pg_cursors "\ > >>> " WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'" > >>> > >>> +/* CLOSE */ > >>> + else if (Matches("CLOSE")) > >>> + COMPLETE_WITH_QUERY(Query_for_list_of_cursors > >>> + " UNION ALL SELECT > >>> 'ALL'"); > >>> > >>> "UNION ALL" should be "UNION"? > >> > >> Thank you for your advice, and I corrected them. > >> > + COMPLETE_WITH_QUERY(Query_for_list_of_cursors > + " UNION SELECT > 'ABSOLUTE'" > + " UNION SELECT > 'BACKWARD'" > + " UNION SELECT > 'FORWARD'" > + " UNION SELECT > 'RELATIVE'" > + " UNION SELECT > 'ALL'" > + " UNION SELECT > 'NEXT'" > + " UNION SELECT > 'PRIOR'" > + " UNION SELECT > 'FIRST'" > + " UNION SELECT > 'LAST'" > + " UNION SELECT > 'FROM'" > + " UNION SELECT > 'IN'"); > > This change makes psql unexpectedly output "FROM" and "IN" just after > "FETCH". > >>> > >>> I think "FROM" and "IN" can be placed just after "FETCH". According to > >>> the documentation, the direction can be empty. For instance, we can do > >>> like: > >> > >> Thank you! > >> > >>> I've attached the patch improving the tab completion for DECLARE as an > >>> example. What do you think? > >>> > >>> BTW according to the documentation, the options of DECLARE statement > >>> (BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive. > >>> > >>> DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ] > >>> CURSOR [ { WITH | WITHOUT } HOLD ] FOR query > >>> > >>> But I realized that these options are actually order-insensitive. For > >>> instance, we can declare a cursor like: > >>> > >>> =# declare abc scroll binary cursor for select * from pg_class; > >>> DECLARE CURSOR > >>> > >>> The both parser code and documentation has been unchanged from 2003. > >>> Is it a documentation bug? > >> > >> Thank you for your patch, and it is good. > >> I cannot find the description "(BINARY, INSENSITIVE, SCROLL, and NO > >> SCROLL) are order-sensitive." > >> I saw "The key words BINARY, INSENSITIVE, and SCROLL can appear in any > >> order." , according to the documentation. > > > > Thanks, you're right. I missed that sentence. I still don't think the > > syntax of DECLARE statement in the documentation doesn't match its > > implementation but I agree that it's order-insensitive. > > > >> I made a new patch, but the amount of codes was large due to > >> order-insensitive. > > > > Thank you for updating the patch! > > > > Yeah, I'm also afraid a bit that conditions will exponentially > > increase when a new option is added to DECLARE statement in the > > future. Looking at the parser code for DECLARE statement, we can put > > the same options multiple times (that's also why I don't think it > > matches). For instance, > > > > postgres(1:44758)=# begin; > > BEGIN > > postgres(1:44758)=# declare test binary binary binary cursor for > > select * from pg_class; > > DECLARE CURSOR > > > > So how about simplify the above code as follows? > > > > @@ -3005,8 +3014,23 @@ psql_completion(const char *text, int start, int end) > > else if (Matches("DECLARE", MatchAny)) > > COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL", > >"CURSOR"); > > + /* > > +* Complete DECLARE with one of BINARY, INSENSITIVE, SCROLL, > > +* NO SCROLL, and CURSOR. The tail doesn't match any keywords for > > +* DECLARE, assume we want options. > > +*/ > > + else if (HeadMatches("DECLARE", MatchAny, "*") && > > +TailMatches(MatchAnyExcept("CURSOR|WITH|WITHOUT|HOLD|FOR"))) > > + COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL", > > + "CURSOR"); > > This change seems to cause "DECLARE ... CURSOR FOR SELECT " to > unexpectedly output BINARY, INSENSITIVE, etc. Indeed. Is the following not complete but much better? @@ -3002,11 +3011,18 @@ psql_completion(co
Re: Incorrect allocation handling for cryptohash functions with OpenSSL
On Wed, Jan 06, 2021 at 03:27:03PM +0200, Heikki Linnakangas wrote: > Looks fine to me. Thanks, I have been able to get this part done as of 55fe26a. -- Michael signature.asc Description: PGP signature
Re: Some more hackery around cryptohashes (some fixes + SHA1)
On Mon, Dec 14, 2020 at 12:48:15PM +0900, Michael Paquier wrote: > This is a nice cleanup, so I have moved ahead and applied it. A > rebased version of the SHA1 business is attached. Rebased version attached to address the conflicts caused by 55fe26a. I have fixed three places in pgcrypto where this missed to issue an error if one of the init/update/final cryptohash calls failed for SHA1. -- Michael From fc0819047f18f664efa075a578af66c83854b82c Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 7 Jan 2021 12:31:10 +0900 Subject: [PATCH v3] Introduce SHA1 in cryptohash infrastructure --- src/include/common/cryptohash.h | 1 + src/include/common/sha1.h | 19 + src/common/Makefile | 1 + src/common/cryptohash.c | 11 + src/common/cryptohash_openssl.c | 3 + src/common/sha1.c | 369 ++ .../pgcrypto/sha1.h => src/common/sha1_int.h | 43 +- contrib/pgcrypto/Makefile | 2 +- contrib/pgcrypto/internal.c | 34 +- contrib/pgcrypto/sha1.c | 331 contrib/uuid-ossp/.gitignore | 1 - contrib/uuid-ossp/Makefile| 6 - contrib/uuid-ossp/uuid-ossp.c | 27 +- configure | 19 +- configure.ac | 24 +- src/Makefile.global.in| 1 - src/tools/msvc/Mkvcbuild.pm | 9 +- 17 files changed, 478 insertions(+), 423 deletions(-) create mode 100644 src/include/common/sha1.h create mode 100644 src/common/sha1.c rename contrib/pgcrypto/sha1.h => src/common/sha1_int.h (72%) delete mode 100644 contrib/pgcrypto/sha1.c diff --git a/src/include/common/cryptohash.h b/src/include/common/cryptohash.h index 3ecaf62113..32d7784ca5 100644 --- a/src/include/common/cryptohash.h +++ b/src/include/common/cryptohash.h @@ -19,6 +19,7 @@ typedef enum { PG_MD5 = 0, + PG_SHA1, PG_SHA224, PG_SHA256, PG_SHA384, diff --git a/src/include/common/sha1.h b/src/include/common/sha1.h new file mode 100644 index 00..a61bc47ded --- /dev/null +++ b/src/include/common/sha1.h @@ -0,0 +1,19 @@ +/*- + * + * sha1.h + * Constants related to SHA1. + * + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/common/sha1.h + * + *- + */ +#ifndef PG_SHA1_H +#define PG_SHA1_H + +/* Size of result generated by SHA1 computation */ +#define SHA1_DIGEST_LENGTH 20 + +#endif /* PG_SHA1_H */ diff --git a/src/common/Makefile b/src/common/Makefile index f624977939..04a4da0576 100644 --- a/src/common/Makefile +++ b/src/common/Makefile @@ -88,6 +88,7 @@ else OBJS_COMMON += \ cryptohash.o \ md5.o \ + sha1.o \ sha2.o endif diff --git a/src/common/cryptohash.c b/src/common/cryptohash.c index efedadd626..eecf17081b 100644 --- a/src/common/cryptohash.c +++ b/src/common/cryptohash.c @@ -25,6 +25,7 @@ #include "common/cryptohash.h" #include "md5_int.h" +#include "sha1_int.h" #include "sha2_int.h" /* @@ -47,6 +48,7 @@ struct pg_cryptohash_ctx union { pg_md5_ctx md5; + pg_sha1_ctx sha1; pg_sha224_ctx sha224; pg_sha256_ctx sha256; pg_sha384_ctx sha384; @@ -97,6 +99,9 @@ pg_cryptohash_init(pg_cryptohash_ctx *ctx) case PG_MD5: pg_md5_init(&ctx->data.md5); break; + case PG_SHA1: + pg_sha1_init(&ctx->data.sha1); + break; case PG_SHA224: pg_sha224_init(&ctx->data.sha224); break; @@ -132,6 +137,9 @@ pg_cryptohash_update(pg_cryptohash_ctx *ctx, const uint8 *data, size_t len) case PG_MD5: pg_md5_update(&ctx->data.md5, data, len); break; + case PG_SHA1: + pg_sha1_update(&ctx->data.sha1, data, len); + break; case PG_SHA224: pg_sha224_update(&ctx->data.sha224, data, len); break; @@ -167,6 +175,9 @@ pg_cryptohash_final(pg_cryptohash_ctx *ctx, uint8 *dest) case PG_MD5: pg_md5_final(&ctx->data.md5, dest); break; + case PG_SHA1: + pg_sha1_final(&ctx->data.sha1, dest); + break; case PG_SHA224: pg_sha224_final(&ctx->data.sha224, dest); break; diff --git a/src/common/cryptohash_openssl.c b/src/common/cryptohash_openssl.c index 551ec392b6..006e867403 100644 --- a/src/common/cryptohash_openssl.c +++ b/src/common/cryptohash_openssl.c @@ -131,6 +131,9 @@ pg_cryptohash_init(pg_cryptohash_ctx *ctx) case PG_MD5: status = EVP_DigestInit_ex(ctx->evpctx, EVP_md5(), NULL); break; + case PG_SHA1: + status = EVP_DigestInit_ex(ctx->evpctx, EVP_sha1(), NULL); + break; case PG_SHA224: status = EVP_DigestInit_ex(ctx->evpctx, EVP_sha224(), NULL); break; diff --git a/src/common/sha1.c b/src/common/sha1.c new fil
Re: plpgsql variable assignment with union is broken
Merlin Moncure writes: > On Tue, Jan 5, 2021 at 3:40 PM Tom Lane wrote: >> easter...@verfriemelt.org writes: >>> i found, that the behaviour of variable assignment in combination with >>> union is not working anymore: >>> DO $$ >>> DECLARE t bool; >>> begin >>> t := a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS a; >>> END $$; >>> is this an intended change or is it a bug? >> It's an intended change, or at least I considered the case and thought >> that it was useless because assignment will reject any result with more >> than one row. Do you have any non-toy example that wouldn't be as >> clear or clearer without using UNION? The above sure seems like an >> example of awful SQL code. > What is the definition of broken here? What is the behavior of the > query with the change and why? The OP is complaining that that gets a syntax error since c9d529848. > OP's query provably returns a single row and ought to always assign > true as written. My opinion is that (a) it's useless and (b) there has never been any documentation that claimed that you could do this. As for (a), given the execution restriction that you can't return more than one row, I can't think of anything you could do with this that couldn't be done, both more clearly and far more cheaply, with a CASE or similar construct. As for (b), the docs have always given the syntax of plpgsql assignment as "variable := expression"; whatever you might think an "expression" is, I dispute that a syntactically-invalid fragment of a UNION query qualifies. The fact that it was accepted is a completely accidental artifact of the lax way that plpgsql assignment has been parsed up to now. (Having said that, I would've preserved it if I could, but it's exactly the fact that it *isn't* a syntactically valid UNION construct that makes it hard to do so. If it's possible at all, it would take some really ugly refactoring of the grammar.) One last point is that if you're really intent on writing things this way, you can still do it with SELECT INTO instead of assignment syntax. In case you're wondering, INTERSECT and EXCEPT are in the same boat. regards, tom lane
Re: Phrase search vs. multi-lexeme tokens
Hi! On Wed, Jan 6, 2021 at 8:18 PM Tom Lane wrote: > Alexander Korotkov writes: > > # select to_tsvector('pg_class foo') @@ websearch_to_tsquery('"pg_class > > foo"'); > > ?column? > > -- > > f > > Yeah, surely this is wrong. Thank you for confirming my thoughts. I also felt that is wrong but doubted such a basic bug could exist for so long. > > # select to_tsquery('pg_class <-> foo'); > > to_tsquery > > -- > > ( 'pg' & 'class' ) <-> 'foo' > > > I think if a user writes 'pg_class <-> foo', then it's expected to > > match 'pg_class foo' independently on which lexemes 'pg_class' is > > split into. > > Indeed. It seems to me that this: > > regression=# select to_tsquery('pg_class'); >to_tsquery > > 'pg' & 'class' > (1 row) > > is wrong all by itself. Now that we have phrase search, a much > saner translation would be "'pg' <-> 'class'". If we fixed that > then it seems like the more complex case would just work. Nice idea! Fixing this way should be much easier than fixing only the case when we have the phrase operator on the upper level. > I read your patch over quickly and it seems like a reasonable > approach (but sadly underdocumented). Can we extend the idea > to fix the to_tsquery case? Sure, I'll provide a revised patch. -- Regards, Alexander Korotkov
Re: Terminate the idle sessions
On Thu, Jan 7, 2021 at 3:03 PM Thomas Munro wrote: > On Thu, Jan 7, 2021 at 12:55 PM Tom Lane wrote: > > * The SQLSTATE you chose for the new error condition seems pretty > > random. I do not see it in the SQL standard, so using a code that's > > within the spec-reserved code range is certainly wrong. I went with > > 08P02 (the "P" makes it outside the reserved range), but I'm not > > really happy either with using class 08 ("Connection Exception", > > which seems to be mainly meant for connection-request failures), > > or the fact that ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT is > > practically identical but it's not even in the same major class. > > Now 25 ("Invalid Transaction State") is certainly not right for > > this new error, but I think what that's telling us is that 25 was a > > misguided choice for the other error. In a green field I think I'd > > put both of them in class 53 ("Insufficient Resources") or maybe class > > 57 ("Operator Intervention"). Can we get away with renumbering the > > older error at this point? In any case I'd be inclined to move the > > new error to 53 or 57 --- anybody have a preference which? > > I don't have a strong view here, but 08 with a P doesn't seem crazy to > me. Unlike eg 57014 (statement_timeout), 57017 (deadlock_timeout), > 55P03 (lock_timeout... huh, I just noticed that DB2 uses 57017 for > that, distinguished from deadlock by another error field), after these > timeouts you don't have a session/connection anymore. The two are a > bit different though: in the older one, you were in a transaction, and > it seems to me quite newsworthy that your transaction has been > aborted, information that is not conveyed quite so clearly with a > connection-related error class. Hmm, on further reflection it's still more similar than different and I'd probably have voted for 08xxx for the older one too if I'd been paying attention. One of the strange things about these errors is that they're asynchronous/unsolicited, but they appear to the client to be the response to their next request (if it doesn't eat ECONNRESET instead). That makes me think we should try to make it clear that it's a sort of lower level thing, and not really a response to your next request at all. Perhaps 08 does that. But it's not obvious... I see a couple of DB2 extension SQLSTATEs saying you have no connection: 57015 = "Connection to the local Db2 not established" and 51006 = "A valid connection has not been established", and then there's standard 08003 = "connection does not exist" which we're currently using to shout into the void when the *client* goes away (and also for dblink failure to find named connection, a pretty unrelated meaning).
Re: plpgsql variable assignment with union is broken
On Tue, Jan 5, 2021 at 3:40 PM Tom Lane wrote: > > easter...@verfriemelt.org writes: > > i found, that the behaviour of variable assignment in combination with > > union is not working anymore: > > DO $$ > > DECLARE t bool; > > begin > > t := a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS a; > > END $$; > > > is this an intended change or is it a bug? > > It's an intended change, or at least I considered the case and thought > that it was useless because assignment will reject any result with more > than one row. Do you have any non-toy example that wouldn't be as > clear or clearer without using UNION? The above sure seems like an > example of awful SQL code. What is the definition of broken here? What is the behavior of the query with the change and why? OP's query provably returns a single row and ought to always assign true as written. A real world example might evaluate multiple condition branches so that the assignment resolves true if any branch is true. It could be rewritten with 'OR' of course. Is this also "broken"? t := a FROM ( SELECT 'something' WHERE _Flag) t(a) UNION SELECT 'something else' AS a WHERE NOT _Flag; What about this? SELECT INTO t true WHERE false UNION select true; merlin
Re: A failure of standby to follow timeline switch
On 2021/01/05 17:26, Kyotaro Horiguchi wrote: At Mon, 4 Jan 2021 19:00:21 +0900, Fujii Masao wrote in On 2021/01/04 12:06, Kyotaro Horiguchi wrote: At Sat, 26 Dec 2020 02:15:06 +0900, Fujii Masao wrote in On 2020/12/25 12:03, Kyotaro Horiguchi wrote: The attached is the fixed version. Thanks for updating the patches! In the first patch, the test added to 001_stream_rep.pl involves two copied functions related to server-log investigation from 019_repslot_limit.pl. So you're planning to define them commonly in TestLib.pm or elsewhere? Yeah.. That's correct. Newly added as the first patch. While making that change, I extended the interface of slurp_file to allow reading from arbitrary position. Is this extension really helpful for current use case? At least I'd like to avoid back-patching this since it's an exntesion... OsFHandleOpen(my $fh = IO::Handle->new(), $fHandle, 'r') or croak "could not read \"$filename\": $^E\n"; + seek($fh, $from, 0) + or croak "could not seek \"$filename\" to $from: $^E\n"; I'm not familiar with this area, but SetFilePointer() is more suitable rather than seek()? Thanks. The attached is the revised patchset. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Transactions involving multiple postgres foreign servers, take 2
Hi, For pg-foreign/v31-0004-Add-PrepareForeignTransaction-API.patch : However these functions are not neither committed nor aborted at I think the double negation was not intentional. Should be 'are neither ...' For FdwXactShmemSize(), is another MAXALIGN(size) needed prior to the return statement ? + fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part); For the function name, Fdw and Xact appear twice, each. Maybe one of them can be dropped ? +* we don't need to anything for this participant because all foreign 'need to' -> 'need to do' + else if (TransactionIdDidAbort(xid)) + return FDWXACT_STATUS_ABORTING; + the 'else' can be omitted since the preceding if would return. + if (max_prepared_foreign_xacts <= 0) I wonder when the value for max_prepared_foreign_xacts would be negative (and whether that should be considered an error). Cheers On Wed, Jan 6, 2021 at 5:45 PM Masahiko Sawada wrote: > On Mon, Dec 28, 2020 at 11:24 PM Masahiko Sawada > wrote: > > > > On Wed, Nov 25, 2020 at 9:50 PM Masahiko Sawada > wrote: > > > > > > Since the previous version conflicts with the current HEAD I've > > > attached the rebased version patch set. > > > > Rebased the patch set again to the current HEAD. > > > > The discussion of this patch is very long so here is a short summary > > of the current state: > > > > It’s still under discussion which approaches are the best for the > > distributed transaction commit as a building block of built-in sharing > > using foreign data wrappers. > > > > Since we’re considering that we use this feature for built-in > > sharding, the design depends on the architecture of built-in sharding. > > For example, with the current patch, the PostgreSQL node that received > > a COMMIT from the client works as a coordinator and it commits the > > transactions using 2PC on all foreign servers involved with the > > transaction. This approach would be good with the de-centralized > > sharding architecture but not with centralized architecture like the > > GTM node of Postgres-XC and Postgres-XL that is a dedicated component > > that is responsible for transaction management. Since we don't get a > > consensus on the built-in sharding architecture yet, it's still an > > open question that this patch's approach is really good as a building > > block of the built-in sharding. > > > > On the other hand, this feature is not necessarily dedicated to the > > built-in sharding. For example, the distributed transaction commit > > through FDW is important also when atomically moving data between two > > servers via FDWs. Using a dedicated process or server like GTM could > > be an over solution. Having the node that received a COMMIT work as a > > coordinator would be better and straight forward. > > > > There is no noticeable TODO in the functionality so far covered by > > this patch set. This patchset adds new FDW APIs to support 2PC, > > introduces the global transaction manager, and implement those FDW > > APIs to postgres_fdw. Also, it has regression tests and documentation. > > Transactions on foreign servers involved with the distributed > > transaction are committed using 2PC. Committing using 2PC is performed > > asynchronously and transparently to the user. Therefore, it doesn’t > > guarantee that transactions on the foreign server are also committed > > when the client gets an acknowledgment of COMMIT. The patch doesn't > > cover synchronous foreign transaction commit via 2PC is not covered by > > this patch as we still need a discussion on the design. > > > > I've attached the rebased patches to make cfbot happy. > > > Regards, > > -- > Masahiko Sawada > EnterpriseDB: https://www.enterprisedb.com/ >
Re: vacuum_cost_page_miss default value and modern hardware
On Wed, Jan 6, 2021 at 5:39 PM Masahiko Sawada wrote: > Perhaps you meant to decrease vacuumm_cost_page_miss instead of > vacuum_cost_page_dirty? You're right. Evidently I didn't write this email very carefully. Sorry about that. To say it again: I think that a miss (without dirtying the page) should be cheaper than dirtying a page. This thread began because I wanted to discuss the relative cost of different kinds of I/O operations to VACUUM, without necessarily discussing the absolute costs/time delays. -- Peter Geoghegan
RE: Enhance traceability of wal_level changes for backup management
From: osumi.takami...@fujitsu.com > I wondered, couldn't backup management tools utilize the information > in the backup that is needed to be made when wal_level is changed to "none" > for example ? IIRC, someone proposed in the original thread that the change count can be recorded in pg_control. The change count is incremented when wal_level is changed from replica or higher to minimal or lower. Maybe you can do it easily in XLogReportParameters(). Then, the backup management tool compares the change counts of pg_control in a backup and that of the current pg_control. If the change count is different, the tool assumes that the backup cannot be used to recover the database up to date. Ideally, it'd be desirable for PostgreSQL core to have a backup catalog management capability like Oracle RMAN. Then, when the wal_level is changed, Postgres may be able to invalidate all backups in the backup catalog. > As I said before, existing backup management tools support > only wal_level=replica or logical at present. And, if they would wish to > alter the > status quo and want to cover the changes over wal_levels, I felt it's natural > that > they support feature like taking a full backup, trigged by the wal_level > changes > (or server stop). In that regard, a feature like Oracle Server Alert would be useful. When important events occur, the database server records them in the alert queue. Administration tools read from the alert queue and act accordingly. wal_level change can be recorded in the alert queue, and the backup management tool polls the queue and detect the change. Regards Takayuki Tsunakawa
Re: Terminate the idle sessions
On Thu, Jan 7, 2021 at 12:55 PM Tom Lane wrote: > * Thomas' patch for improving timeout.c seems like a great idea, but > it did indeed have a race condition, and I felt the comments could do > with more work. Oops, and thanks! Very happy to see this one in the tree. > * I'm not entirely comfortable with the name "idle_session_timeout", > because it sounds like it applies to all idle states, but actually > it only applies when we're not in a transaction. I left the name > alone and tried to clarify the docs, but I wonder whether a different > name would be better. (Alternatively, we could say that it does > apply in all cases, making the effective timeout when in a transaction > the smaller of the two GUCs. But that'd be more complex to implement > and less flexible, so I'm not in favor of that answer.) Hmm, it is a bit confusing, but having them separate is indeed more flexible. > * The SQLSTATE you chose for the new error condition seems pretty > random. I do not see it in the SQL standard, so using a code that's > within the spec-reserved code range is certainly wrong. I went with > 08P02 (the "P" makes it outside the reserved range), but I'm not > really happy either with using class 08 ("Connection Exception", > which seems to be mainly meant for connection-request failures), > or the fact that ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT is > practically identical but it's not even in the same major class. > Now 25 ("Invalid Transaction State") is certainly not right for > this new error, but I think what that's telling us is that 25 was a > misguided choice for the other error. In a green field I think I'd > put both of them in class 53 ("Insufficient Resources") or maybe class > 57 ("Operator Intervention"). Can we get away with renumbering the > older error at this point? In any case I'd be inclined to move the > new error to 53 or 57 --- anybody have a preference which? I don't have a strong view here, but 08 with a P doesn't seem crazy to me. Unlike eg 57014 (statement_timeout), 57017 (deadlock_timeout), 55P03 (lock_timeout... huh, I just noticed that DB2 uses 57017 for that, distinguished from deadlock by another error field), after these timeouts you don't have a session/connection anymore. The two are a bit different though: in the older one, you were in a transaction, and it seems to me quite newsworthy that your transaction has been aborted, information that is not conveyed quite so clearly with a connection-related error class. > * I noted the discussion about dropping setitimer in place of some > newer kernel API. I'm not sure that that is worth the trouble in > isolation, but it might be worth doing if we can switch the whole > module over to relying on CLOCK_MONOTONIC, so as to make its behavior > less flaky if the sysadmin steps the system clock. Portability > might be a big issue here though, plus we'd have to figure out how > we want to define enable_timeout_at(), which is unlikely to want to > use CLOCK_MONOTONIC values. In any case, that's surely material > for a whole new thread. +1
Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion
On 2021/01/07 10:01, Masahiko Sawada wrote: On Wed, Jan 6, 2021 at 3:37 PM wrote: +#define Query_for_list_of_cursors \ +" SELECT name FROM pg_cursors"\ This query should be the following? " SELECT pg_catalog.quote_ident(name) "\ " FROM pg_catalog.pg_cursors "\ " WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'" +/* CLOSE */ + else if (Matches("CLOSE")) + COMPLETE_WITH_QUERY(Query_for_list_of_cursors + " UNION ALL SELECT 'ALL'"); "UNION ALL" should be "UNION"? Thank you for your advice, and I corrected them. + COMPLETE_WITH_QUERY(Query_for_list_of_cursors + " UNION SELECT 'ABSOLUTE'" + " UNION SELECT 'BACKWARD'" + " UNION SELECT 'FORWARD'" + " UNION SELECT 'RELATIVE'" + " UNION SELECT 'ALL'" + " UNION SELECT 'NEXT'" + " UNION SELECT 'PRIOR'" + " UNION SELECT 'FIRST'" + " UNION SELECT 'LAST'" + " UNION SELECT 'FROM'" + " UNION SELECT 'IN'"); This change makes psql unexpectedly output "FROM" and "IN" just after "FETCH". I think "FROM" and "IN" can be placed just after "FETCH". According to the documentation, the direction can be empty. For instance, we can do like: Thank you! I've attached the patch improving the tab completion for DECLARE as an example. What do you think? BTW according to the documentation, the options of DECLARE statement (BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive. DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ] CURSOR [ { WITH | WITHOUT } HOLD ] FOR query But I realized that these options are actually order-insensitive. For instance, we can declare a cursor like: =# declare abc scroll binary cursor for select * from pg_class; DECLARE CURSOR The both parser code and documentation has been unchanged from 2003. Is it a documentation bug? Thank you for your patch, and it is good. I cannot find the description "(BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive." I saw "The key words BINARY, INSENSITIVE, and SCROLL can appear in any order." , according to the documentation. Thanks, you're right. I missed that sentence. I still don't think the syntax of DECLARE statement in the documentation doesn't match its implementation but I agree that it's order-insensitive. I made a new patch, but the amount of codes was large due to order-insensitive. Thank you for updating the patch! Yeah, I'm also afraid a bit that conditions will exponentially increase when a new option is added to DECLARE statement in the future. Looking at the parser code for DECLARE statement, we can put the same options multiple times (that's also why I don't think it matches). For instance, postgres(1:44758)=# begin; BEGIN postgres(1:44758)=# declare test binary binary binary cursor for select * from pg_class; DECLARE CURSOR So how about simplify the above code as follows? @@ -3005,8 +3014,23 @@ psql_completion(const char *text, int start, int end) else if (Matches("DECLARE", MatchAny)) COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL", "CURSOR"); + /* +* Complete DECLARE with one of BINARY, INSENSITIVE, SCROLL, +* NO SCROLL, and CURSOR. The tail doesn't match any keywords for +* DECLARE, assume we want options. +*/ + else if (HeadMatches("DECLARE", MatchAny, "*") && +TailMatches(MatchAnyExcept("CURSOR|WITH|WITHOUT|HOLD|FOR"))) + COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL", + "CURSOR"); This change seems to cause "DECLARE ... CURSOR FOR SELECT " to unexpectedly output BINARY, INSENSITIVE, etc. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
RE: When (and whether) should we improve the chapter on parallel query to accommodate parallel data updates?
From: Amit Kapila > I think each feature should develop the docs as part of feature > development but if we want to see some additional work like improving > overall docs for parallel execution as you are suggesting then it can > be done separately as well. > > I think you have a valid point but maybe it would be better to do the > improvements you are suggesting once the parallel inserts related work > is committed. OK. I think at least the words query, statement, execution, and/or operation should be used appropriately before RC (ideally, before beta). Maybe parallel copy should also be put in the chapter. Anyway, I agree that each developer focuses on their features for the moment (parallel insert select, parallel copy, parallel CTAS, etc), and then make the documentation consistent and well-organized after they've settled. (I just felt a bit uneasy to say "this patch looks good" while leaving the document consistency.) Regards Takayuki Tsunakawa
RE: Single transaction in the tablesync worker?
> PSA the v11 patch for the Tablesync Solution1. > > Difference from v10: > - Addresses several recent review comments. > - pg_indent has been run > Hi I took a look into the patch and have some comments. 1. * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT -> - * CATCHUP -> SYNCDONE -> READY. + * CATCHUP -> (sync worker TCOPYDONE) -> SYNCDONE -> READY. I noticed the new state TCOPYDONE is commented between CATCHUP and SYNCDONE, But It seems the SUBREL_STATE_TCOPYDONE is actually set before SUBREL_STATE_SYNCWAIT[1]. Did i miss something here ? [1]- + UpdateSubscriptionRelState(MyLogicalRepWorker->subid, + MyLogicalRepWorker->relid, + SUBREL_STATE_TCOPYDONE, + MyLogicalRepWorker->relstate_lsn); ... /* * We are done with the initial data synchronization, update the state. */ SpinLockAcquire(&MyLogicalRepWorker->relmutex); MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCWAIT; -- 2. i = initialize, d = data is being copied, + C = table data has been copied, s = synchronized, r = ready (normal replication) +#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed +* (sublsn NULL) */ The character representing 'data has been copied' in the catalog seems different from the macro define. Best regards, houzj
Re: vacuum_cost_page_miss default value and modern hardware
On Mon, Dec 28, 2020 at 5:17 AM Peter Geoghegan wrote: > > Simply decreasing vacuum_cost_page_dirty seems like a low risk way of > making the VACUUM costing more useful within autovacuum workers. > Halving vacuum_cost_page_dirty to 5 would be a good start, though I > think that a value as low as 2 would be better. That would make it > only 2x vacuum_cost_page_hit's default (i.e 2x the cost of processing > a page that is in shared_buffers but did not need to be dirtied), > which seems sensible to me when considered in the context in which the > value is actually applied (and not some abstract theoretical context). Perhaps you meant to decrease vacuumm_cost_page_miss instead of vacuum_cost_page_dirty? > > There are a few reasons why this seems like a good idea now: > > * Throttling/delaying VACUUM is only useful as a way of smoothing the > impact on production queries, which is an important goal, but > currently we don't discriminate against the cost that we really should > keep under control (dirtying new pages within VACUUM) very well. > > This is due to the aforementioned trends, the use of a strategy ring > buffer by VACUUM, the fact that indexes are usually vacuumed in > sequential physical order these days, and many other things that were > not a factor in 2004. > > * There is a real downside to throttling VACUUM unnecessarily, and the > effects are *non-linear*. On a large table, the oldest xmin cutoff may > become very old by the time we're only (say) half way through the > initial table scan in lazy_scan_heap(). There may be relatively little > work to do because most of the dead tuples won't be before the oldest > xmin cutoff by that time (VACUUM just cannot keep up). Excessive > throttling for simple page misses may actually *increase* the amount > of I/O that VACUUM has to do over time -- we should try to get to the > pages that actually need to be vacuumed quickly, which are probably > already dirty anyway (and thus are probably going to add little to the > cost delay limit in practice). Everything is connected to everything > else. > > * vacuum_cost_page_miss is very much not like random_page_cost, and > the similar names confuse the issue -- this is not an optimization > problem. Thinking about VACUUM as unrelated to the workload itself is > obviously wrong. Changing the default is also an opportunity to clear > that up. > > Even if I am wrong to suggest that a miss within VACUUM should only be > thought of as 2x as expensive as a hit in some *general* sense, I am > concerned about *specific* consequences. There is no question about > picking the best access path here -- we're still going to have to > access the same blocks in the same way sooner or later. In general I > think that we should move in the direction of more frequent, cheaper > VACUUM operations [1], though we've already done a lot of work in that > direction (e.g. freeze map work). I agree with that direction. > > * Some impact from VACUUM on query performance may in fact be a good thing. > > Deferring the cost of vacuuming can only make sense if we'll > eventually be able to catch up because we're experiencing a surge in > demand, which seems kind of optimistic -- it seems more likely that > the GC debt will just grow and grow. Why should the DBA not expect to > experience some kind of impact, which could be viewed as a natural > kind of backpressure? The most important thing is consistent > performance. > > * Other recent work such as the vacuum_cleanup_index_scale_factor > patch has increased the relative cost of index vacuuming in some > important cases: we don't have a visibility/freeze map for indexes, > but index vacuuming that doesn't dirty any pages and has a TID kill > list that's concentrated at the end of the heap/table is pretty cheap > (the TID binary search is cache efficient/cheap). This change will > help these workloads by better reflecting the way in which index > vacuuming can be cheap for append-only tables with a small amount of > garbage for recently inserted tuples that also got updated/deleted. > > * Lowering vacuum_cost_page_miss's default (as opposed to changing > something else) is a simple and less disruptive way of achieving these > goals. > > This approach seems unlikely to break existing VACUUM-related custom > settings from current versions that get reused on upgrade. I expect > little impact on small installations. > I recalled the discussion decreasing the default value for autovacuum_cost_delay from 20ms to 2ms on PostgreSQL 12. I re-read through the discussion but there wasn't the discussion changing hit/miss/dirty. Whereas the change we did for autovacuum_cost_delay affects every installation, lowering vacuum_cost_page_miss would bring a different impact depending on workload and database size etc. For example, the user would have a larger I/O spike in a case where the database doesn’t fit in the server's RAM and doing vacuuming cold tables/indexes, for example, when anti-wraparound va
Re: [PATCH] Simple progress reporting for COPY command
On Wed, Jan 06, 2021 at 10:44:49PM +0100, Tomas Vondra wrote: > On 1/5/21 11:02 AM, Josef Šimánek wrote: > > I'm attaching the whole patch since commitfest failed to ingest the > > last incremental on CI. > > > > Yeah, the whole patch needs to be attached for the commitfest tester to work > correctly - it can't apply pieces from multiple messages, etc. > > Anyway, I pushed this last version of patch, after a couple more tweaks, > mainly to the docs - one place used pg_stat_copy_progress, the section was > not indexed properly, and so on. Some more doc fixes. >From 0616f448057905ab9b3218ebddfdd3af52e62bac Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Wed, 6 Jan 2021 19:08:11 -0600 Subject: [PATCH] doc review: COPY progress: 8a4f618e7ae3cb11b0b37d0f06f05c8ff905833f --- doc/src/sgml/monitoring.sgml | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 43fe8ae383..3cdb1aff3c 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -6414,9 +6414,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, Whenever COPY is running, the pg_stat_progress_copy view will contain one row - for each backend that is currently running COPY command. - The table bellow describes the information that will be reported and provide - information how to interpret it. + for each backend that is currently running a COPY command. + The table below describes the information that will be reported and provides + information about how to interpret it. @@ -6445,7 +6445,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, - datid text + datid oid OID of the database to which this backend is connected. @@ -6467,7 +6467,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, OID of the table on which the COPY command is executed. - It is set to 0 if SELECT query is provided. + It is set to 0 if copying from a SELECT query. -- 2.17.0 >From 0616f448057905ab9b3218ebddfdd3af52e62bac Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Wed, 6 Jan 2021 19:08:11 -0600 Subject: [PATCH] doc review: COPY progress: 8a4f618e7ae3cb11b0b37d0f06f05c8ff905833f --- doc/src/sgml/monitoring.sgml | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 43fe8ae383..3cdb1aff3c 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -6414,9 +6414,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, Whenever COPY is running, the pg_stat_progress_copy view will contain one row - for each backend that is currently running COPY command. - The table bellow describes the information that will be reported and provide - information how to interpret it. + for each backend that is currently running a COPY command. + The table below describes the information that will be reported and provides + information about how to interpret it. @@ -6445,7 +6445,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, - datid text + datid oid OID of the database to which this backend is connected. @@ -6467,7 +6467,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, OID of the table on which the COPY command is executed. - It is set to 0 if SELECT query is provided. + It is set to 0 if copying from a SELECT query. -- 2.17.0
Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion
On Wed, Jan 6, 2021 at 3:37 PM wrote: > > >+#define Query_for_list_of_cursors \ > >+" SELECT name FROM pg_cursors"\ > > > >This query should be the following? > > > >" SELECT pg_catalog.quote_ident(name) "\ > >" FROM pg_catalog.pg_cursors "\ > >" WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'" > > > >+/* CLOSE */ > >+ else if (Matches("CLOSE")) > >+ COMPLETE_WITH_QUERY(Query_for_list_of_cursors > >+ " UNION ALL SELECT > >'ALL'"); > > > >"UNION ALL" should be "UNION"? > > Thank you for your advice, and I corrected them. > > >> + COMPLETE_WITH_QUERY(Query_for_list_of_cursors > >> + " UNION SELECT > >> 'ABSOLUTE'" > >> + " UNION SELECT > >> 'BACKWARD'" > >> + " UNION SELECT > >> 'FORWARD'" > >> + " UNION SELECT > >> 'RELATIVE'" > >> + " UNION SELECT > >> 'ALL'" > >> + " UNION SELECT > >> 'NEXT'" > >> + " UNION SELECT > >> 'PRIOR'" > >> + " UNION SELECT > >> 'FIRST'" > >> + " UNION SELECT > >> 'LAST'" > >> + " UNION SELECT > >> 'FROM'" > >> + " UNION SELECT > >> 'IN'"); > >> > >> This change makes psql unexpectedly output "FROM" and "IN" just after > >> "FETCH". > > > >I think "FROM" and "IN" can be placed just after "FETCH". According to > >the documentation, the direction can be empty. For instance, we can do > >like: > > Thank you! > > >I've attached the patch improving the tab completion for DECLARE as an > >example. What do you think? > > > >BTW according to the documentation, the options of DECLARE statement > >(BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive. > > > >DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ] > >CURSOR [ { WITH | WITHOUT } HOLD ] FOR query > > > >But I realized that these options are actually order-insensitive. For > >instance, we can declare a cursor like: > > > >=# declare abc scroll binary cursor for select * from pg_class; > >DECLARE CURSOR > > > >The both parser code and documentation has been unchanged from 2003. > >Is it a documentation bug? > > Thank you for your patch, and it is good. > I cannot find the description "(BINARY, INSENSITIVE, SCROLL, and NO SCROLL) > are order-sensitive." > I saw "The key words BINARY, INSENSITIVE, and SCROLL can appear in any > order." , according to the documentation. Thanks, you're right. I missed that sentence. I still don't think the syntax of DECLARE statement in the documentation doesn't match its implementation but I agree that it's order-insensitive. > I made a new patch, but the amount of codes was large due to > order-insensitive. Thank you for updating the patch! Yeah, I'm also afraid a bit that conditions will exponentially increase when a new option is added to DECLARE statement in the future. Looking at the parser code for DECLARE statement, we can put the same options multiple times (that's also why I don't think it matches). For instance, postgres(1:44758)=# begin; BEGIN postgres(1:44758)=# declare test binary binary binary cursor for select * from pg_class; DECLARE CURSOR So how about simplify the above code as follows? @@ -3005,8 +3014,23 @@ psql_completion(const char *text, int start, int end) else if (Matches("DECLARE", MatchAny)) COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL", "CURSOR"); + /* +* Complete DECLARE with one of BINARY, INSENSITIVE, SCROLL, +* NO SCROLL, and CURSOR. The tail doesn't match any keywords for +* DECLARE, assume we want options. +*/ + else if (HeadMatches("DECLARE", MatchAny, "*") && +TailMatches(MatchAnyExcept("CURSOR|WITH|WITHOUT|HOLD|FOR"))) + COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL", + "CURSOR"); + /* +* Complete DECLARE ... CURSOR with one of WITH HOLD, WITHOUT HOLD, +* and FOR. +*/ else if (HeadMatches("DECLARE") && TailMatches("CURSOR")) COMPLETE_WITH("WITH HOLD", "WITHOUT HOLD", "FOR"); + else if (HeadMatches("DECLARE") && TailMatches("HOLD")) + COMPLETE_WITH("FOR"); Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Re: list of extended statistics on psql
On 1/7/21 1:46 AM, Tatsuro Yamada wrote: Hi Tomas, Thanks, and Happy new year to you too. I almost pushed this, but I have one more question. listExtendedStats first checks the server version, and errors out for pre-10 servers. Shouldn't the logic building query check the server version too, to decide whether to check the MCV stats? Otherwise it won't work on 10 and 11, which does not support the "mcv" stats. I don't recall what exactly is our policy regarding new psql with older server versions, but it seems strange to check for 10.0 and then fail anyway for "supported" versions. Thanks for your comments. I overlooked the check for MCV in the logic building query because I created the patch as a new feature on PG14. I'm not sure whether we should do back patch or not. However, I'll add the check on the next patch because it is useful if you decide to do the back patch on PG10, 11, 12, and 13. +1 BTW perhaps a quick look at the other \d commands would show if there are precedents. I didn't have time for that. I wonder the column names added by \dX+ is fine? For example, "Ndistinct_size" and "Dependencies_size". It looks like long names, but acceptable? Seems acceptable - I don't have a better idea. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: list of extended statistics on psql
Hi Tomas, Thanks, and Happy new year to you too. I almost pushed this, but I have one more question. listExtendedStats first checks the server version, and errors out for pre-10 servers. Shouldn't the logic building query check the server version too, to decide whether to check the MCV stats? Otherwise it won't work on 10 and 11, which does not support the "mcv" stats. I don't recall what exactly is our policy regarding new psql with older server versions, but it seems strange to check for 10.0 and then fail anyway for "supported" versions. Thanks for your comments. I overlooked the check for MCV in the logic building query because I created the patch as a new feature on PG14. I'm not sure whether we should do back patch or not. However, I'll add the check on the next patch because it is useful if you decide to do the back patch on PG10, 11, 12, and 13. I wonder the column names added by \dX+ is fine? For example, "Ndistinct_size" and "Dependencies_size". It looks like long names, but acceptable? Regards, Tatsuro Yamada
Re: logical replication worker accesses catalogs in error context callback
On Wed, Jan 6, 2021 at 11:42 AM Masahiko Sawada wrote: > > On Wed, Jan 6, 2021 at 11:02 AM Andres Freund wrote: > > > > Hi, > > > > Due to a debug ereport I just noticed that worker.c's > > slot_store_error_callback is doing something quite dangerous: > > > > static void > > slot_store_error_callback(void *arg) > > { > > SlotErrCallbackArg *errarg = (SlotErrCallbackArg *) arg; > > LogicalRepRelMapEntry *rel; > > char *remotetypname; > > Oid remotetypoid, > > localtypoid; > > > > /* Nothing to do if remote attribute number is not set */ > > if (errarg->remote_attnum < 0) > > return; > > > > rel = errarg->rel; > > remotetypoid = rel->remoterel.atttyps[errarg->remote_attnum]; > > > > /* Fetch remote type name from the LogicalRepTypMap cache */ > > remotetypname = logicalrep_typmap_gettypname(remotetypoid); > > > > /* Fetch local type OID from the local sys cache */ > > localtypoid = get_atttype(rel->localreloid, errarg->local_attnum + > > 1); > > > > errcontext("processing remote data for replication target relation > > \"%s.%s\" column \"%s\", " > >"remote type %s, local type %s", > >rel->remoterel.nspname, rel->remoterel.relname, > >rel->remoterel.attnames[errarg->remote_attnum], > >remotetypname, > >format_type_be(localtypoid)); > > } > > > > > > that's not code that can run in an error context callback. It's > > theoretically possible (but unlikely) that > > logicalrep_typmap_gettypname() is safe to run in an error context > > callback. But get_atttype() definitely isn't. > > > > get_attype() may do catalog accesses. That definitely can't happen > > inside an error context callback - the entire transaction might be > > borked at this point! > > You're right. Perhaps calling to format_type_be() is also dangerous > since it does catalog access. We should have added the local type > names to SlotErrCallbackArg so we avoid catalog access in the error > context. > > I'll try to fix this. Attached the patch that fixes this issue. Since logicalrep_typmap_gettypname() could search the sys cache by calling to format_type_be(), I stored both local and remote type names to SlotErrCallbackArg so that we can just set the names in the error callback without sys cache lookup. Please review it. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/ fix_slot_store_error_callback.patch Description: Binary data
Re: Terminate the idle sessions
Li Japin writes: > [ v9-0001-Allow-terminating-the-idle-sessions.patch ] I've reviewed and pushed this. A few notes: * Thomas' patch for improving timeout.c seems like a great idea, but it did indeed have a race condition, and I felt the comments could do with more work. * I'm not entirely comfortable with the name "idle_session_timeout", because it sounds like it applies to all idle states, but actually it only applies when we're not in a transaction. I left the name alone and tried to clarify the docs, but I wonder whether a different name would be better. (Alternatively, we could say that it does apply in all cases, making the effective timeout when in a transaction the smaller of the two GUCs. But that'd be more complex to implement and less flexible, so I'm not in favor of that answer.) * The SQLSTATE you chose for the new error condition seems pretty random. I do not see it in the SQL standard, so using a code that's within the spec-reserved code range is certainly wrong. I went with 08P02 (the "P" makes it outside the reserved range), but I'm not really happy either with using class 08 ("Connection Exception", which seems to be mainly meant for connection-request failures), or the fact that ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT is practically identical but it's not even in the same major class. Now 25 ("Invalid Transaction State") is certainly not right for this new error, but I think what that's telling us is that 25 was a misguided choice for the other error. In a green field I think I'd put both of them in class 53 ("Insufficient Resources") or maybe class 57 ("Operator Intervention"). Can we get away with renumbering the older error at this point? In any case I'd be inclined to move the new error to 53 or 57 --- anybody have a preference which? * I think the original intent in timeout.h was to have 10 slots available for run-time-defined timeout reasons. This is the third predefined one we've added since the header was created, so it's starting to look a little tight. I adjusted the code to hopefully preserve 10 free slots going forward. * I noted the discussion about dropping setitimer in place of some newer kernel API. I'm not sure that that is worth the trouble in isolation, but it might be worth doing if we can switch the whole module over to relying on CLOCK_MONOTONIC, so as to make its behavior less flaky if the sysadmin steps the system clock. Portability might be a big issue here though, plus we'd have to figure out how we want to define enable_timeout_at(), which is unlikely to want to use CLOCK_MONOTONIC values. In any case, that's surely material for a whole new thread. regards, tom lane
Re: Parallel Inserts in CREATE TABLE AS
Hi, For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch : workers to Gather node to 0. With this change, there are chances that the planner may choose the parallel plan. It would be nice if the scenarios where parallel plan is not chosen are listed. + if ((root->parse->parallelInsCmdTupleCostOpt & +PARALLEL_INSERT_SELECT_QUERY) && + (root->parse->parallelInsCmdTupleCostOpt & +PARALLEL_INSERT_CAN_IGN_TUP_COST)) + { + /* We are ignoring the parallel tuple cost, so mark it. */ + root->parse->parallelInsCmdTupleCostOpt |= + PARALLEL_INSERT_TUP_COST_IGNORED; If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED is implied. Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the setting (1) of the first two bits should suffice. Cheers On Mon, Jan 4, 2021 at 7:59 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy > wrote: > > > > > + if > (IS_PARALLEL_CTAS_DEST(gstate->dest) && > > > + ((DR_intorel *) > gstate->dest)->into->rel && > > > + ((DR_intorel *) > gstate->dest)->into->rel->relname) > > > why would rel and relname not be there? if no rows have been inserted? > > > because it seems from the intorel_startup function that that would be > > > set as soon as startup was done, which i assume (wrongly?) is always > done? > > > > Actually, that into clause rel variable is always being set in the > gram.y for CTAS, Create Materialized View and SELECT INTO (because > qualified_name non-terminal is not optional). My bad. I just added it as a > sanity check. Actually, it's not required. > > > > create_as_target: > > qualified_name opt_column_list table_access_method_clause > > OptWith OnCommitOption OptTableSpace > > { > > $$ = makeNode(IntoClause); > > $$->rel = $1; > > create_mv_target: > > qualified_name opt_column_list table_access_method_clause > opt_reloptions OptTableSpace > > { > > $$ = makeNode(IntoClause); > > $$->rel = $1; > > into_clause: > > INTO OptTempTableName > > { > > $$ = makeNode(IntoClause); > >$$->rel = $2; > > > > I will change the below code: > > +if (GetParallelInsertCmdType(gstate->dest) == > > +PARALLEL_INSERT_CMD_CREATE_TABLE_AS && > > +((DR_intorel *) gstate->dest)->into && > > +((DR_intorel *) gstate->dest)->into->rel && > > +((DR_intorel *) > gstate->dest)->into->rel->relname) > > +{ > > > > to: > > +if (GetParallelInsertCmdType(gstate->dest) == > > +PARALLEL_INSERT_CMD_CREATE_TABLE_AS) > > +{ > > > > I will update this in the next version of the patch set. > > Attaching v20 patch set that has above change in 0001 patch, note that > 0002 to 0004 patches have no changes from v19. Please consider the v20 > patch set for further review. > > > With Regards, > Bharath Rupireddy. > EnterpriseDB: http://www.enterprisedb.com >
Re: list of extended statistics on psql
On 1/5/21 5:26 AM, Tatsuro Yamada wrote: Hi, I rebased the patch set on the master (7e5e1bba03), and the regression test is good. Therefore, I changed the status of the patch: "needs review". Happy New Year! I rebased my patches on HEAD. Please find attached files. :-D Thanks, and Happy new year to you too. I almost pushed this, but I have one more question. listExtendedStats first checks the server version, and errors out for pre-10 servers. Shouldn't the logic building query check the server version too, to decide whether to check the MCV stats? Otherwise it won't work on 10 and 11, which does not support the "mcv" stats. I don't recall what exactly is our policy regarding new psql with older server versions, but it seems strange to check for 10.0 and then fail anyway for "supported" versions. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On Wed, Jan 6, 2021 at 1:29 PM Michael Banck wrote: > That one seems to be 5min everywhere, and one can change it except on > Azure. Okay, thanks for clearing that up. Looks like all of the big 3 cloud providers use Postgres checksums in a straightforward way. I don't have much more to say on this thread. I am -1 on the current proposal to enable page-level checksums by default. I may change my mind on this in the future, perhaps when underlying architectural details change, but right now this seems like a net negative. -- Peter Geoghegan
Re: [PATCH] Simple progress reporting for COPY command
On 1/5/21 11:02 AM, Josef Šimánek wrote: I'm attaching the whole patch since commitfest failed to ingest the last incremental on CI. Yeah, the whole patch needs to be attached for the commitfest tester to work correctly - it can't apply pieces from multiple messages, etc. Anyway, I pushed this last version of patch, after a couple more tweaks, mainly to the docs - one place used pg_stat_copy_progress, the section was not indexed properly, and so on. I see Matthias proposed to change "lines" to "tuples" - I only saw the message after pushing, but I probably wouldn't make that change anyway. The CSV docs seem to talk about lines, newlines etc. so it seems fine. If not, we can change that. One more question, though - I now realize the lines_processed ignores rows skipped because of BEFORE INSERT triggers. I wonder if that's the right thing to do? Imagine you know the number of lines in a file. You can't really use (lines_processed / total_lines) to measure progress, because that may ignore many "skipped" rows. So maybe this should be changed to count all rows. OTOH we still have bytes_processed. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Hi, Am Mittwoch, den 06.01.2021, 13:19 -0800 schrieb Peter Geoghegan: > On Wed, Jan 6, 2021 at 1:08 PM Peter Geoghegan wrote: > > So you tested this using "show data_checksums;" in psql in each case? > > > > What does "show full_page_writes;" show you? > > Another consideration is checkpoint_timeout That one seems to be 5min everywhere, and one can change it except on Azure. > and max_wal_size. I think this one is usually based on instance size, but I only have access to two differently sized instance on Azure right now, where it is 1GB for the smallest possible one and 32GB for a production instance. It's 1GB for the small Google Cloud SQL Postgres and 2GB for the small RDS test instance. It is user-changeable everywhere (at least to some degree). Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On Wed, Jan 6, 2021 at 1:08 PM Peter Geoghegan wrote: > So you tested this using "show data_checksums;" in psql in each case? > > What does "show full_page_writes;" show you? Another consideration is checkpoint_timeout and max_wal_size. -- Peter Geoghegan
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Am Mittwoch, den 06.01.2021, 13:08 -0800 schrieb Peter Geoghegan: > On Wed, Jan 6, 2021 at 1:04 PM Michael Banck > wrote: > > At least data_checksums=on for Azure Managed Postgres, Amazon RDS and > > Google Cloud SQL Postgres. It might not be on for all types (I just > > checked the basic one each earlier today), but I guess it is. > > So you tested this using "show data_checksums;" in psql in each case? Yes. > What does "show full_page_writes;" show you? It's also 'on' for all three (wal_log_hints is 'off' everywhere). Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On Wed, Jan 6, 2021 at 1:04 PM Michael Banck wrote: > At least data_checksums=on for Azure Managed Postgres, Amazon RDS and > Google Cloud SQL Postgres. It might not be on for all types (I just > checked the basic one each earlier today), but I guess it is. So you tested this using "show data_checksums;" in psql in each case? What does "show full_page_writes;" show you? -- Peter Geoghegan
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Hi, Am Mittwoch, den 06.01.2021, 12:56 -0800 schrieb Peter Geoghegan: > On Wed, Jan 6, 2021 at 12:30 PM Stephen Frost wrote: > > As already mentioned, it's also, at least today, far > > simpler to disable checksums than to enable them, which is something > > else to consider when thinking about what the default should be. > > That is a valid concern. I just don't think that it's good enough on > its own, given the overwhelming downside of enabling checksums given > the WAL architecture that we have today. > > > That the major cloud providers all have checksums enabled (at least by > > default, though I wonder if they would even let you turn them off..), I don't think so, and it would be very weird if they did, not just due to the fact that shutting down the instance and running pg_checksums is only possible since v13 (and only Google Cloud SQL Postgres supports that so far), but also because this is the kind of decisions cloud providers tend to take for their clients and not allow the users any say in (just like how they do backups or failovers). > > even when we don't have them on by default, strikes me as pretty telling > > that this is something that we should have on by default. > > Please provide supporting evidence. I know that EBS itself uses > checksums at the block device level, so I'm sure that RDS "uses > checksums" in some sense. But does RDS use --data-checksums during > initdb? At least data_checksums=on for Azure Managed Postgres, Amazon RDS and Google Cloud SQL Postgres. It might not be on for all types (I just checked the basic one each earlier today), but I guess it is. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On Wed, Jan 6, 2021 at 12:30 PM Stephen Frost wrote: > As already mentioned, it's also, at least today, far > simpler to disable checksums than to enable them, which is something > else to consider when thinking about what the default should be. That is a valid concern. I just don't think that it's good enough on its own, given the overwhelming downside of enabling checksums given the WAL architecture that we have today. > That the major cloud providers all have checksums enabled (at least by > default, though I wonder if they would even let you turn them off..), > even when we don't have them on by default, strikes me as pretty telling > that this is something that we should have on by default. Please provide supporting evidence. I know that EBS itself uses checksums at the block device level, so I'm sure that RDS "uses checksums" in some sense. But does RDS use --data-checksums during initdb? > Certainly there's a different risk profile between the two and there may > be times when someone is fine with running without fsync, or fine > running without checksums, but those are, in my view, exceptions made > once you understand exactly what risk you're willing to accept, and not > what the default or typical deployment should be. Okay, I'll bite. Here is the important difference: Enabling checksums doesn't actually make data corruption less likely, it just makes it easier to detect. Whereas disabling fsync will reliably produce corruption before too long in almost any installation. It may occasionally be appropriate to disable fsync in a very controlled environment, but it's rare, and not much faster than disabling synchronous commits in any case. It barely ever happens. We added page-level checksums in 9.3. Can you imagine a counterfactual history in which Postgres had page checksums since the 1990s, but only added the fsync feature in 9.3? Please answer this non-rhetorical question. -- Peter Geoghegan
Re: psql \df choose functions by their arguments
On Sat, Jan 2, 2021 at 1:56 AM Thomas Munro wrote: > ... > It looks like there is a collation dependency here that causes the > test to fail on some systems: > Thanks for pointing that out. I tweaked the function definitions to hopefully sidestep the ordering issue - attached is v4. Cheers, Greg v4-psql-df-pick-function-by-type.patch Description: Binary data
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Greetings, * Peter Geoghegan (p...@bowt.ie) wrote: > On Wed, Jan 6, 2021 at 12:03 PM Stephen Frost wrote: > > Do you really believe it to be wrong? Do we stop performing the correct > > write calls in the correct order to the kernel with fsync being off? If > > the kernel actually handles all of our write calls correctly and we > > cleanly shut down and the kernel cleanly shuts down and sync's the disks > > before a reboot, will there be corruption from running with fsync off? > > This is a total straw man. Everyone understands the technical issues > with fsync perfectly well, and everyone understands that everyone > understands the issue, so spare me the "I'm just a humble country > lawyer" style explanation. > > What you seem to be arguing is that the differences between disabling > checksums and disabling fsync is basically quantitative, and so making > a qualitative distinction between those two things is meaningless, and > that it logically follows that disagreeing with you is essentially > irresponsible. This is a tactic that would be an embarrassment to a > high school debate team. It's below you. I can agree that there's a usefulness in making a qualitative distinction between them, but we're talking about a default here, not about if we should even have these options or these capabilities or if we should force them upon everyone or if one is somehow better or worse than the other. As already mentioned, it's also, at least today, far simpler to disable checksums than to enable them, which is something else to consider when thinking about what the default should be. That the major cloud providers all have checksums enabled (at least by default, though I wonder if they would even let you turn them off..), even when we don't have them on by default, strikes me as pretty telling that this is something that we should have on by default. Certainly there's a different risk profile between the two and there may be times when someone is fine with running without fsync, or fine running without checksums, but those are, in my view, exceptions made once you understand exactly what risk you're willing to accept, and not what the default or typical deployment should be. Thanks, Stephen signature.asc Description: PGP signature
Re: [HACKERS] [PATCH] Generic type subscripting
Hi út 5. 1. 2021 v 20:32 odesílatel Dmitry Dolgov <9erthali...@gmail.com> napsal: > > On Mon, Jan 04, 2021 at 06:56:17PM +0100, Pavel Stehule wrote: > > po 4. 1. 2021 v 14:58 odesílatel Dmitry Dolgov <9erthali...@gmail.com> > > napsal: > > postgres=# update foo set a['c']['c'][10] = '10'; > > postgres=# update foo set a['c'][10][10] = '10'; > > Yeah, there was one clumsy memory allocation. On the way I've found and > fixed another issue with jsonb generation, right now I don't see any > other problems. But as my imagination, despite all the sci-fi I've read > this year, is apparently not so versatile, I'll rely on yours, could you > please check this version again? > this case should to raise exception - the value should be changed or error should be raised postgres=# insert into foo values('{}'); INSERT 0 1 postgres=# update foo set a['a'] = '100'; UPDATE 1 postgres=# select * from foo; ┌┐ │ a │ ╞╡ │ {"a": 100} │ └┘ (1 row) postgres=# update foo set a['a'][1] = '-1'; UPDATE 1 postgres=# select * from foo; ┌┐ │ a │ ╞╡ │ {"a": 100} │ └┘ (1 row) Regards Pavel
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On Wed, Jan 6, 2021 at 12:03 PM Stephen Frost wrote: > Do you really believe it to be wrong? Do we stop performing the correct > write calls in the correct order to the kernel with fsync being off? If > the kernel actually handles all of our write calls correctly and we > cleanly shut down and the kernel cleanly shuts down and sync's the disks > before a reboot, will there be corruption from running with fsync off? This is a total straw man. Everyone understands the technical issues with fsync perfectly well, and everyone understands that everyone understands the issue, so spare me the "I'm just a humble country lawyer" style explanation. What you seem to be arguing is that the differences between disabling checksums and disabling fsync is basically quantitative, and so making a qualitative distinction between those two things is meaningless, and that it logically follows that disagreeing with you is essentially irresponsible. This is a tactic that would be an embarrassment to a high school debate team. It's below you. Your argument may be logically consistent, but it's still nonsense. -- Peter Geoghegan
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Greetings, * Peter Geoghegan (p...@bowt.ie) wrote: > On Wed, Jan 6, 2021 at 11:44 AM Stephen Frost wrote: > > Having fsync off won't actually cause corruption unless you have an OS > > crash or don't sync the disks when you reboot the system though- so it's > > a hedge against certain failure conditions, as is checksums. > > I find this argument baffling. Do you really believe this? Do you really believe it to be wrong? Do we stop performing the correct write calls in the correct order to the kernel with fsync being off? If the kernel actually handles all of our write calls correctly and we cleanly shut down and the kernel cleanly shuts down and sync's the disks before a reboot, will there be corruption from running with fsync off? If that's the case, I'd certainly be curious to hear under what conditions, when everything works, we'll end up with corruption simply from running with fsync off. I don't mean to imply that I advocate for such- I'd hope that it would be clear from this discussion that I'm not suggesting that we turn fsync off, and rather the opposite, that we have both fsync and data checksums be on by default, but to claim that having fsync off will always, in every situation, cause corruption is over-stating the case. Thanks, Stephen signature.asc Description: PGP signature
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On Wed, Jan 6, 2021 at 11:44 AM Stephen Frost wrote: > Having fsync off won't actually cause corruption unless you have an OS > crash or don't sync the disks when you reboot the system though- so it's > a hedge against certain failure conditions, as is checksums. I find this argument baffling. Do you really believe this? -- Peter Geoghegan
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2021-01-06 13:01:59 -0500, Stephen Frost wrote: > > * Andres Freund (and...@anarazel.de) wrote: > > > > imv, enabling page checksums is akin to having fsync enabled by default. > > > > Does it impact performance? Yes, surely quite a lot, but it's also the > > > > safe and sane choice when it comes to defaults. > > > > > > Oh for crying out loud. > > > > Not sure what you're hoping to gain from such comments, but it doesn't > > do anything to change my opinion. > > It seems so facetious to compare fsync=off (will cause corruption) with > data_checksums=off (will not cause corruption) that I find the > comparison to be insulting. Having fsync off won't actually cause corruption unless you have an OS crash or don't sync the disks when you reboot the system though- so it's a hedge against certain failure conditions, as is checksums. Yes, having fsync off on a system and then rebooting it (ungracefully..) will likely cause corruption and, no, having data checksums turned off won't cause corruption in that way or at all in its own right- but there's a decent chance that if there does end up being latent corruption that it'll at least be detected, which is why so many (including, apparently, the popular cloud providers) enable it and why we should have it on by default. I don't agree that they are so different as you make them out to be. I do appreciate that the chances of a random reboot happening are higher than the chance of a disk failure being detected by a PG checksum (and not something else first). Thanks, Stephen signature.asc Description: PGP signature
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On Wed, Jan 6, 2021 at 9:55 AM Andres Freund wrote: > Vacuum performance is one of *THE* major complaints about > postgres. Making it run slower by a lot obviously exascerbates that > problem significantly. I think it'd be prohibitively expensive if it > were 1.5x, not to even speak of 15x. +1. I am *strongly* opposed to enabling checksums by default for this reason. I think that it's a total non-starter, unless and until the overhead can be dramatically reduced. The fact that it isn't the fault of the checksum mechanism in some abstract sense is 100% irrelevant. -- Peter Geoghegan
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Hi, On 2021-01-06 13:01:59 -0500, Stephen Frost wrote: > * Andres Freund (and...@anarazel.de) wrote: > > > imv, enabling page checksums is akin to having fsync enabled by default. > > > Does it impact performance? Yes, surely quite a lot, but it's also the > > > safe and sane choice when it comes to defaults. > > > > Oh for crying out loud. > > Not sure what you're hoping to gain from such comments, but it doesn't > do anything to change my opinion. It seems so facetious to compare fsync=off (will cause corruption) with data_checksums=off (will not cause corruption) that I find the comparison to be insulting. Greetings, Andres Freund
Re: set_config() documentation clarification
> > > >> SET >> g = year % 19, >> c = year / 100, >> h = (c - c/4 - (8*c + 13)/25 + 19*g + 15) % 30, >> i = h - (h/28)*(1 - (h/28)*(29/(h + 1))*((21 - g)/11)), >> j = year + year/4 + i + 2 - c + c/4) % 7, >> p = i - j, >> easter_month = 3 + (p + 26)/30, >> easter_day = 1 + (p + 27 + (p + 6)/40) % 31 >> SELECT make_date(year, easter_month, easter_day) >> >> or maybe even WITH like this: >> >> WITH >> year % 19 AS g , >> year / 100 AS c, >> (c - c/4 - (8*c + 13)/25 + 19*g + 15) % 30 AS h, >> h - (h/28)*(1 - (h/28)*(29/(h + 1))*((21 - g)/11)) AS i, >> year + year/4 + i + 2 - c + c/4) % 7 AS j, >> i - j AS p, >> 3 + (p + 26)/30 AS easter_month, >> 1 + (p + 27 + (p + 6)/40) % 31 AS easter_day >> SELECT make_date(year, easter_month, easter_day) >> > > I do not think this clause is necessary (because we have PLpgSQL or C), > but other people can have different opinions (and it is true, so this > feature can have some performance benefit - because it enhances the > possibilities of inlined expressions and custom (own) extensions are > prohibited in cloud environments (and will be) ). Theoretically the > implementation of this feature should not be hard, because these variables > are very local only (the scope is just row), so this is just a game for > parser and for expression's interpreter. But if you introduce this feature, > then it is better to use syntax that is used by some other well known > systems (Oracle or others). > The name for this feature can be "row scope variables" and yes, in OLAP queries there are repeated expressions where this feature can be useful. postgres=# explain verbose select make_date(year, easter_month, easter_day) from (select year, 3 + (p + 26)/30 AS easter_month, 1 + (p + 27 + (p + 6)/40) % 31 AS easter_day from ( select year, i - j AS p from (select year, i, (year + year/4 + i + 2 - c + c/4) % 7 AS j from (select year, c, h - (h/28)*(1 - (h/28)*(29/(h + 1))*((21 - g)/11)) AS i from (select year, g, c, (c - c/4 - (8*c + 13)/25 + 19*g + 15) % 30 AS h from (select year, year % 19 as g, year / 100 as c from generate_series(2019,2020) g(year) offset 0) s1 offset 0) s2 offset 0) s3 offset 0) s4 offset 0) s5 offset 0) s6; ┌─┐ │ QUERY PLAN │ ╞═╡ │ Subquery Scan on s6 (cost=0.00..0.35 rows=2 width=4) │ │ Output: make_date(s6.year, s6.easter_month, s6.easter_day) │ │ -> Subquery Scan on s5 (cost=0.00..0.33 rows=2 width=12) │ │ Output: s5.year, (3 + ((s5.p + 26) / 30)), (1 + (((s5.p + 27) + ((s5.p + 6) / 40)) % 31)) │ │ -> Subquery Scan on s4 (cost=0.00..0.26 rows=2 width=8) │ │ Output: s4.year, (s4.i - s4.j) │ │ -> Subquery Scan on s3 (cost=0.00..0.24 rows=2 width=12) │ │ Output: s3.year, s3.i, ((s3.year + (s3.year / 4)) + s3.i) + 2) - s3.c) + (s3.c / 4)) % 7) │ │ -> Subquery Scan on s2 (cost=0.00..0.18 rows=2 width=12) │ │ Output: s2.year, s2.c, (s2.h - ((s2.h / 28) * (1 - (((s2.h / 28) * (29 / (s2.h + 1))) * ((21 - s2.g) / 11)│ │ -> Subquery Scan on s1 (cost=0.00..0.10 rows=2 width=16)│ │ Output: s1.year, s1.g, s1.c, (s1.c - (s1.c / 4)) - (((8 * s1.c) + 13) / 25)) + (19 * s1.g)) + 15) % 30) │ │ -> Function Scan on pg_catalog.generate_series g (cost=0.00..0.03 rows=2 width=12) │ │ Output: g.year, (g.year % 19), (g.year / 100) │ │ Function Call: generate_series(2019, 2020) │ └─┘ (15 rows) Pavel
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Am Mittwoch, den 06.01.2021, 19:07 +0100 schrieb Michael Banck: > Am Mittwoch, den 06.01.2021, 09:58 -0800 schrieb Andres Freund: > > It still requires running a binary locally on the DB server, no? Which > > means it'll not be an option on most cloud providers... > > At least Azure and RDS seem to have data_checksums on anyway, I don't > have a GCP test instance around handily right now to check. Well I was curious: GCP SQL Postgres also has checksums enabled. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: set_config() documentation clarification
út 5. 1. 2021 v 22:15 odesílatel Joel Jacobson napsal: > On Tue, Jan 5, 2021, at 21:34, Pavel Stehule wrote: > > yes, it is supported. More the schema variables supports RESET to > default on transaction end, > > and supports access rights for usage in security definer functions. > > Nice. > > > Maybe - I don't know what "Statement variables" :). Other databases do > not have similarly named features. > > I know, I made that name up just to make the connection, > the name used by other databases is "LET clause", > and in functional languages such as OCaml and Haskell, > this concept is called "let expressions". > > > There are two concepts (these concepts can be mixed). Maybe - you can > see there how non tabular objects can be > > accessed in queries with. > > ... > > Thank you for a detailed explanation, very useful. > > >> Also, do you know if Schema variables are part of the SQL standard? > > ANSI SQL defines modules, and some variables can be declared in module > scope. Modules are like our schemas with the > > possibility to define private objects. But I don't know any > implementation of this part of the standard in some widely used > > database . It is like a mix of package concepts (Oracle) with schemas, > because modules can hold private database objects > > like tables or temporary tables. So my proposed schema variables are not > part of SQL standard, because related features > > depend on modules. Functionally it is similar +/-. Personally I don't > like concepts of modules (or packages) too much. The > > schemas are a good replacement for 90% and the current system of > qualified names and search path, that is same for > > tables and same for procedures, is very simple and good enough). So > instead of introducing modules, I prefer enhanced > > schemas about some features like private objects. But it is in the > category "nice to have" rather than a necessary feature. > > This is encouraging to hear, then I will pray there might be hope for LET > clauses I need, > even though not being part of the SQL standard. > > In another attempt to sell the LET clause feature, imagine OCaml/Haskell > *without* let expressions, > where users would be advised to write functions in a different language > like C, > to do their complex computations reused at many places, and then return > the result back to OCaml/Haskell. > That wouldn't be a very nice user-experience to the OCaml/Haskell user. > > I really think a lot of real-life complex SQL code could be simplified a > lot > and written much more clear and concise with LET clauses. > I have no idea - all my life I use procedural languages, when this case is not a problem. > Since using "SET" as the command for Schema variables, > maybe using SET for LET clause would make the idea less controversial: > The schema variables (my patch) introduced the LET statement, because SET (SET keyword) is already used in Postgres for GUC setting and works with GUC. But this fact doesn't block using LET as a new clause. > SET > g = year % 19, > c = year / 100, > h = (c - c/4 - (8*c + 13)/25 + 19*g + 15) % 30, > i = h - (h/28)*(1 - (h/28)*(29/(h + 1))*((21 - g)/11)), > j = year + year/4 + i + 2 - c + c/4) % 7, > p = i - j, > easter_month = 3 + (p + 26)/30, > easter_day = 1 + (p + 27 + (p + 6)/40) % 31 > SELECT make_date(year, easter_month, easter_day) > > or maybe even WITH like this: > > WITH > year % 19 AS g , > year / 100 AS c, > (c - c/4 - (8*c + 13)/25 + 19*g + 15) % 30 AS h, > h - (h/28)*(1 - (h/28)*(29/(h + 1))*((21 - g)/11)) AS i, > year + year/4 + i + 2 - c + c/4) % 7 AS j, > i - j AS p, > 3 + (p + 26)/30 AS easter_month, > 1 + (p + 27 + (p + 6)/40) % 31 AS easter_day > SELECT make_date(year, easter_month, easter_day) > I do not think this clause is necessary (because we have PLpgSQL or C), but other people can have different opinions (and it is true, so this feature can have some performance benefit - because it enhances the possibilities of inlined expressions and custom (own) extensions are prohibited in cloud environments (and will be) ). Theoretically the implementation of this feature should not be hard, because these variables are very local only (the scope is just row), so this is just a game for parser and for expression's interpreter. But if you introduce this feature, then it is better to use syntax that is used by some other well known systems (Oracle or others). > I will study SQL code in the wild on Github written by other users to see > how many % > that could benefit from this feature. > I am sure, so it can be very good task for learning PostgresSQL internals - parser and executor, and it can be funny work (when I started with Postgres, I had to modify same parts). Regards Pavel > Maybe I'm wrong, but my gut feeling says this would be a really good thing, > and just like like Schema variables, I didn't really know I needed them > before I saw them. > > Best regards, > > Joel >
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Am Mittwoch, den 06.01.2021, 09:58 -0800 schrieb Andres Freund: > It still requires running a binary locally on the DB server, no? Which > means it'll not be an option on most cloud providers... At least Azure and RDS seem to have data_checksums on anyway, I don't have a GCP test instance around handily right now to check. Micael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Hi, On Wed, Jan 06, 2021 at 09:55:08AM -0800, Andres Freund wrote: > On 2021-01-06 12:02:40 -0500, Stephen Frost wrote: > > * Andres Freund (and...@anarazel.de) wrote: > > > On 2021-01-04 19:11:43 +0100, Michael Banck wrote: > > > > This looks much better from the WAL size perspective, there's now almost > > > > no additional WAL. However, that is because pgbench doesn't do TOAST, so > > > > in a real-world example it might still be quite larger. Also, the vacuum > > > > runtime is still 15x longer. > > > > > > That's obviously an issue. > > > > It'd certainly be nice to figure out a way to improve the VACUUM run but > > I don't think the impact on the time to run VACUUM is really a good > > reason to not move forward with changing the default. > > Vacuum performance is one of *THE* major complaints about > postgres. Making it run slower by a lot obviously exascerbates that > problem significantly. I think it'd be prohibitively expensive if it > were 1.5x, not to even speak of 15x. To maybe clarify, the vacuum slowdown is just as large in my (somewhat contrived as a worst-case scenario) tests when wal_log_hints is on and not data_checksums, I just ommitted those numbers due to being basically identical (or maybe a bit worse even): |data_checksums=off, wal_log_hints=off: | |done in 10.24 s (vacuum 3.31 s, primary keys 6.92 s). |done in 8.81 s (vacuum 2.72 s, primary keys 6.09 s). |done in 8.35 s (vacuum 2.32 s, primary keys 6.03 s). | |data_checksums=off, wal_log_hints=on: | |1,5G data1/pg_wal |1,5G data1/base |2,5G data1_archive/ | |done in 87.89 s (vacuum 69.67 s, primary keys 18.23 s). |done in 73.71 s (vacuum 60.19 s, primary keys 13.52 s). |done in 75.12 s (vacuum 62.49 s, primary keys 12.62 s). | |data_checksums=on, wal_log_hints=off: | |done in 67.42 s (vacuum 54.57 s, primary keys 12.85 s). |done in 65.03 s (vacuum 53.25 s, primary keys 11.78 s). |done in 77.57 s (vacuum 62.64 s, primary keys 14.94 s). Of course, wal_log_hints is not the default either and can be turned off easily. You mostly lose the ability to run pg_rewind I think, are there other use-cases for it? Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On Wed, Jan 6, 2021 at 6:58 PM Andres Freund wrote: > > Hi, > > On 2021-01-06 18:27:48 +0100, Magnus Hagander wrote: > > The other argument is that admins can cheaply and quickly turn off > > checksums if they don't want them. > > > > The same cannot be said for turning them *on* again, that's a very > > slow offline operation at this time. > > > > Turning off checksums doesn't take noticeably more time than say > > changing the shared_buffers from the default, which is also almost > > guaranteed to be wrong for most installations. > > It still requires running a binary locally on the DB server, no? Which It does. So does changing shared_buffers -- for example you need to run "systemctl" if you're on systemd, or just pg_ctl if you're using unpackaged postres. > means it'll not be an option on most cloud providers... I really don't see why. They've implemented the ability to restart postgres. Surely they can implement the ability to run a single command in between. Or if that's too complicated, they are more than capable of passing a parameter to initdb to change what the default is on their platform. They already do so for other things (such as not using trust or peer auth by default, or by actually not having a superuser setc). -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2021-01-06 18:27:48 +0100, Magnus Hagander wrote: > > The other argument is that admins can cheaply and quickly turn off > > checksums if they don't want them. > > > > The same cannot be said for turning them *on* again, that's a very > > slow offline operation at this time. > > > > Turning off checksums doesn't take noticeably more time than say > > changing the shared_buffers from the default, which is also almost > > guaranteed to be wrong for most installations. > > It still requires running a binary locally on the DB server, no? Which > means it'll not be an option on most cloud providers... ... unless they choose to make it an option, which is entirely up to them and certainly well within what they're capable of doing. I'd also mention that, at least according to some cloud providers I've talked to, they specifically wouldn't support PG until data checksums were available, making me not really feel like having them enabled by default would be such an issue (not to mention that, clearly, cloud providers could choose to change the default for their deployments if they wished to). Thanks, Stephen signature.asc Description: PGP signature
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2021-01-06 12:02:40 -0500, Stephen Frost wrote: > > * Andres Freund (and...@anarazel.de) wrote: > > > On 2021-01-04 19:11:43 +0100, Michael Banck wrote: > > > > Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost: > > > > > I agree with this, but I'd also like to propose, again, as has been > > > > > discussed a few times, making it the default too. > > > > > > FWIW, I am quite doubtful we're there performance-wise. Besides the WAL > > > logging overhead, the copy we do via PageSetChecksumCopy() shows up > > > quite significantly in profiles here. Together with the checksums > > > computation that's *halfing* write throughput on fast drives in my aio > > > branch. > > > > Our defaults are not going to win any performance trophies and so I > > don't see the value in stressing over it here. > > Meh^3. There's a difference between defaults that are about resource > usage (e.g. shared_buffers) and defaults that aren't. fsync isn't about resource usage. > > > > This looks much better from the WAL size perspective, there's now almost > > > > no additional WAL. However, that is because pgbench doesn't do TOAST, so > > > > in a real-world example it might still be quite larger. Also, the vacuum > > > > runtime is still 15x longer. > > > > > > That's obviously an issue. > > > > It'd certainly be nice to figure out a way to improve the VACUUM run but > > I don't think the impact on the time to run VACUUM is really a good > > reason to not move forward with changing the default. > > Vacuum performance is one of *THE* major complaints about > postgres. Making it run slower by a lot obviously exascerbates that > problem significantly. I think it'd be prohibitively expensive if it > were 1.5x, not to even speak of 15x. We already make vacuum, when run out of autovacuum, relatively slow, quite intentionally. If someone's having trouble with vacuum run times they're going to be adjusting the configuration anyway. > > imv, enabling page checksums is akin to having fsync enabled by default. > > Does it impact performance? Yes, surely quite a lot, but it's also the > > safe and sane choice when it comes to defaults. > > Oh for crying out loud. Not sure what you're hoping to gain from such comments, but it doesn't do anything to change my opinion. Thanks, Stephen signature.asc Description: PGP signature
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Hi, On 2021-01-06 18:27:48 +0100, Magnus Hagander wrote: > The other argument is that admins can cheaply and quickly turn off > checksums if they don't want them. > > The same cannot be said for turning them *on* again, that's a very > slow offline operation at this time. > > Turning off checksums doesn't take noticeably more time than say > changing the shared_buffers from the default, which is also almost > guaranteed to be wrong for most installations. It still requires running a binary locally on the DB server, no? Which means it'll not be an option on most cloud providers... Greetings, Andres Freund
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Hi, On 2021-01-06 12:02:40 -0500, Stephen Frost wrote: > * Andres Freund (and...@anarazel.de) wrote: > > On 2021-01-04 19:11:43 +0100, Michael Banck wrote: > > > Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost: > > > > I agree with this, but I'd also like to propose, again, as has been > > > > discussed a few times, making it the default too. > > > > FWIW, I am quite doubtful we're there performance-wise. Besides the WAL > > logging overhead, the copy we do via PageSetChecksumCopy() shows up > > quite significantly in profiles here. Together with the checksums > > computation that's *halfing* write throughput on fast drives in my aio > > branch. > > Our defaults are not going to win any performance trophies and so I > don't see the value in stressing over it here. Meh^3. There's a difference between defaults that are about resource usage (e.g. shared_buffers) and defaults that aren't. > > > This looks much better from the WAL size perspective, there's now almost > > > no additional WAL. However, that is because pgbench doesn't do TOAST, so > > > in a real-world example it might still be quite larger. Also, the vacuum > > > runtime is still 15x longer. > > > > That's obviously an issue. > > It'd certainly be nice to figure out a way to improve the VACUUM run but > I don't think the impact on the time to run VACUUM is really a good > reason to not move forward with changing the default. Vacuum performance is one of *THE* major complaints about postgres. Making it run slower by a lot obviously exascerbates that problem significantly. I think it'd be prohibitively expensive if it were 1.5x, not to even speak of 15x. > imv, enabling page checksums is akin to having fsync enabled by default. > Does it impact performance? Yes, surely quite a lot, but it's also the > safe and sane choice when it comes to defaults. Oh for crying out loud. Greetings, Andres Freund
Re: Enhance traceability of wal_level changes for backup management
Greetings, * osumi.takami...@fujitsu.com (osumi.takami...@fujitsu.com) wrote: > You said > > The use case I imagined is that the user temporarily > > changes wal_level to 'none' from 'replica' or 'logical' to speed up loading > > and > > changes back to the normal. In this case, the backups taken before the > > wal_level change cannot be used to restore the database to the point after > > the > > wal_level change. So I think backup management tools would want to > > recognize the time or LSN when/where wal_level is changed to ‘none’ in order > > to do some actions such as invalidating older backups, re-calculating backup > > redundancy etc. > > Actually the same is true when the user changes to ‘minimal’. The tools > > would > > need to recognize the time or LSN in this case too. Since temporarily > > changing > > wal_level has been an uncommon use case some tools perhaps are not aware > > of that yet. But since introducing wal_level = ’none’ could make the change > > common, I think we need to provide a way for the tools. I continue to be against the idea of introducing another wal_level. If there's additional things we can do to reduce WAL traffic while we continue to use it to keep the system in a consistent state then we should implement those for the 'minimal' case and be done with it. Adding another wal_level is just going to be confusing and increase complexity, and the chances that someone will end up in a bad state. > I wondered, couldn't backup management tools utilize the information > in the backup that is needed to be made when wal_level is changed to "none" > for example ? What information is that, exactly? If there's a way to detect that the wal_level has been flipped to 'minimal' and then back to 'replica', other than scanning the WAL, we'd certainly like to hear of it, so we can implement logic in pgbackrest to detect that happening. I'll admit that I've not gone hunting for such of late, but I don't recall seeing anything that would change that either. The idea proposed above about having the LSN and time recorded when a wal_level change to minimal happens, presumably in pg_control, seems like it could work as a way to allow external tools to more easily figure out if the wal_level's been changed to minimal. Perhaps there's a reason to track changes between replica and logical but I can't think of any offhand and backup tools would still need to know if the wal level was set to *minimal* independently of changes between replica and logical. Then again, once we add support for scanning the WAL to pgbackrest, we'll almost certainly track it that way since that'll work for older and released versions of PG, so I'm not really sure it's worth it to add this to pg_control unless there's other reasons to. > As I said before, existing backup management tools support > only wal_level=replica or logical at present. And, if they would wish to > alter the > status quo and want to cover the changes over wal_levels, I felt it's natural > that > they support feature like taking a full backup, trigged by the wal_level > changes (or server stop). Sure, but there needs to be a way to actually do that.. > This is because taking a backup is a must for wal_level=none, > as I described in the patch of wal_level=none. > For example, they could prepare an easy way to > take an offline physical backup when the server stops for changing the > wal_level. > (Here, they can support the change to minimal, too.) pgbackrest does support offline physical backups and it's pretty easy (just pass in --no-online). That doesn't really help with the issue of detecting a wal_level change though. Thanks, Stephen signature.asc Description: PGP signature
Re: Add Information during standby recovery conflicts
On 2020/12/15 0:20, Fujii Masao wrote: On 2020/12/14 21:31, Fujii Masao wrote: On 2020/12/05 12:38, Masahiko Sawada wrote: On Fri, Dec 4, 2020 at 7:22 PM Drouvot, Bertrand wrote: Hi, On 12/4/20 2:21 AM, Fujii Masao wrote: On 2020/12/04 9:28, Masahiko Sawada wrote: On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao wrote: On 2020/12/01 17:29, Drouvot, Bertrand wrote: Hi, On 12/1/20 12:35 AM, Masahiko Sawada wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera wrote: On 2020-Dec-01, Fujii Masao wrote: + if (proc) + { + if (nprocs == 0) + appendStringInfo(&buf, "%d", proc->pid); + else + appendStringInfo(&buf, ", %d", proc->pid); + + nprocs++; What happens if all the backends in wait_list have gone? In other words, how should we handle the case where nprocs == 0 (i.e., nprocs has not been incrmented at all)? This would very rarely happen, but can happen. In this case, since buf.data is empty, at least there seems no need to log the list of conflicting processes in detail message. Yes, I noticed this too; this can be simplified by changing the condition in the ereport() call to be "nprocs > 0" (rather than wait_list being null), otherwise not print the errdetail. (You could test buf.data or buf.len instead, but that seems uglier to me.) +1 Maybe we can also improve the comment of this function from: + * This function also reports the details about the conflicting + * process ids if *wait_list is not NULL. to " This function also reports the details about the conflicting process ids if exist" or something. Thank you all for the review/remarks. They have been addressed in the new attached patch version. Thanks for updating the patch! I read through the patch again and applied the following chages to it. Attached is the updated version of the patch. Could you review this version? If there is no issue in it, I'm thinking to commit this version. Thank you for updating the patch! I have one question. + timeouts[cnt].id = STANDBY_TIMEOUT; + timeouts[cnt].type = TMPARAM_AFTER; + timeouts[cnt].delay_ms = DeadlockTimeout; Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here? I changed the code that way. As the comment of ResolveRecoveryConflictWithLock() says the following, a deadlock is detected by the ordinary backend process: * Deadlocks involving the Startup process and an ordinary backend proces * will be detected by the deadlock detector within the ordinary backend. If we use STANDBY_DEADLOCK_TIMEOUT, SendRecoveryConflictWithBufferPin() will be called after DeadlockTimeout passed, but I think it's not necessary for the startup process in this case. Thanks for pointing this! You are right. If we want to just wake up the startup process maybe we can use STANDBY_TIMEOUT here? Thanks for the patch updates! Except what we are still discussing below, it looks good to me. When STANDBY_TIMEOUT happens, a request to release conflicting buffer pins is sent. Right? If so, we should not also use STANDBY_TIMEOUT there? Agree Or, first of all, we don't need to enable the deadlock timer at all? Since what we'd like to do is to wake up after deadlock_timeout passes, we can do that by changing ProcWaitForSignal() so that it can accept the timeout and giving the deadlock_timeout to it. If we do this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from ResolveRecoveryConflictWithLock(). Thought? Where do we enable deadlock timeout in hot standby case? You meant to enable it in ProcWaitForSignal() or where we set a timer for not hot standby case, in ProcSleep()? No, what I tried to say is to change ResolveRecoveryConflictWithLock() so that it does 1. calculate the "minimum" timeout from deadlock_timeout and max_standby_xxx_delay 2. give the calculated timeout value to ProcWaitForSignal() 3. wait for signal and timeout on ProcWaitForSignal() Since ProcWaitForSignal() calls WaitLatch(), seems it's not so difficult to make ProcWaitForSignal() handle the timeout. If we do this, I was thinking that we can get rid of enable_timeouts() from ResolveRecoveryConflictWithLock(). Why not simply use (again) the STANDBY_LOCK_TIMEOUT one? (as it triggers a call to StandbyLockTimeoutHandler() which does nothing, except waking up. That's what we want, right?) Right, what I wanted to mean is STANDBY_LOCK_TIMEOUT. The startup process can wake up and do nothing. Thank you for pointing out. Okay, understood! Firstly I was thinking that enabling the same type (i.e., STANDBY_LOCK_TIMEOUT) of lock twice doesn't work properly, but as far as I read the code, it works. In that case, only the shorter timeout would b
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On Wed, Jan 6, 2021 at 8:31 AM Michael Banck wrote: > > Hi, > > Am Mittwoch, den 06.01.2021, 10:52 +0900 schrieb Michael Paquier: > > On Mon, Jan 04, 2021 at 07:11:43PM +0100, Michael Banck wrote: > > > So maybe we should switch on wal_compression if we enable data checksums > > > by default. > > > > I don't agree with this assumption. In some CPU-bounded workloads, I > > have seen that wal_compression = on leads to performance degradation > > with or without checksums enabled. > > I meant just flipping the default, admins could still turn off > wal_compression if they think it'd help their performance. But it might > be tricky to implement, not sure. The other argument is that admins can cheaply and quickly turn off checksums if they don't want them. The same cannot be said for turning them *on* again, that's a very slow offline operation at this time. Turning off checksums doesn't take noticeably more time than say changing the shared_buffers from the default, which is also almost guaranteed to be wrong for most installations. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Phrase search vs. multi-lexeme tokens
Alexander Korotkov writes: > # select to_tsvector('pg_class foo') @@ websearch_to_tsquery('"pg_class > foo"'); > ?column? > -- > f Yeah, surely this is wrong. > # select to_tsquery('pg_class <-> foo'); > to_tsquery > -- > ( 'pg' & 'class' ) <-> 'foo' > I think if a user writes 'pg_class <-> foo', then it's expected to > match 'pg_class foo' independently on which lexemes 'pg_class' is > split into. Indeed. It seems to me that this: regression=# select to_tsquery('pg_class'); to_tsquery 'pg' & 'class' (1 row) is wrong all by itself. Now that we have phrase search, a much saner translation would be "'pg' <-> 'class'". If we fixed that then it seems like the more complex case would just work. I read your patch over quickly and it seems like a reasonable approach (but sadly underdocumented). Can we extend the idea to fix the to_tsquery case? regards, tom lane
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Greetings, * Bruce Momjian (br...@momjian.us) wrote: > On Wed, Jan 6, 2021 at 12:02:40PM -0500, Stephen Frost wrote: > > > It unfortunately also hurts other workloads. If we moved towards a saner > > > compression algorithm that'd perhaps not be an issue anymore... > > > > I agree that improving compression performance would be good but I don't > > see that as relevant to the question of what our defaults should be. > > > > imv, enabling page checksums is akin to having fsync enabled by default. > > Does it impact performance? Yes, surely quite a lot, but it's also the > > safe and sane choice when it comes to defaults. > > Well, you know fsyncs are required to recover from an OS crash, which is > more likely than detecting data corruption. Yes, I do know that. That doesn't change my feeling that we should have checksums enabled by default. Thanks, Stephen signature.asc Description: PGP signature
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On Wed, Jan 6, 2021 at 12:02:40PM -0500, Stephen Frost wrote: > > It unfortunately also hurts other workloads. If we moved towards a saner > > compression algorithm that'd perhaps not be an issue anymore... > > I agree that improving compression performance would be good but I don't > see that as relevant to the question of what our defaults should be. > > imv, enabling page checksums is akin to having fsync enabled by default. > Does it impact performance? Yes, surely quite a lot, but it's also the > safe and sane choice when it comes to defaults. Well, you know fsyncs are required to recover from an OS crash, which is more likely than detecting data corruption. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2021-01-04 19:11:43 +0100, Michael Banck wrote: > > Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost: > > > * Michael Paquier (mich...@paquier.xyz) wrote: > > > > On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote: > > > > > I think enough people use data checksums these days that it warrants > > > > > to > > > > > be moved into the "normal part", like in the attached. > > > > > > > > +1. Let's see first what others think about this change. > > > > > > I agree with this, but I'd also like to propose, again, as has been > > > discussed a few times, making it the default too. > > FWIW, I am quite doubtful we're there performance-wise. Besides the WAL > logging overhead, the copy we do via PageSetChecksumCopy() shows up > quite significantly in profiles here. Together with the checksums > computation that's *halfing* write throughput on fast drives in my aio > branch. Our defaults are not going to win any performance trophies and so I don't see the value in stressing over it here. > > This looks much better from the WAL size perspective, there's now almost > > no additional WAL. However, that is because pgbench doesn't do TOAST, so > > in a real-world example it might still be quite larger. Also, the vacuum > > runtime is still 15x longer. > > That's obviously an issue. It'd certainly be nice to figure out a way to improve the VACUUM run but I don't think the impact on the time to run VACUUM is really a good reason to not move forward with changing the default. > > So maybe we should switch on wal_compression if we enable data checksums > > by default. That does seem like a good idea to me, +1 to also changing that. > It unfortunately also hurts other workloads. If we moved towards a saner > compression algorithm that'd perhaps not be an issue anymore... I agree that improving compression performance would be good but I don't see that as relevant to the question of what our defaults should be. imv, enabling page checksums is akin to having fsync enabled by default. Does it impact performance? Yes, surely quite a lot, but it's also the safe and sane choice when it comes to defaults. Thanks, Stephen signature.asc Description: PGP signature
Re: PoC/WIP: Extended statistics on expressions
Looking over the statscmds.c changes, there are a few XXX's and FIXME's that need resolving, and I had a couple of other minor comments: + /* +* An expression using mutable functions is probably wrong, +* since if you aren't going to get the same result for the +* same data every time, it's not clear what the index entries +* mean at all. +*/ + if (CheckMutability((Expr *) expr)) + ereport(ERROR, That comment is presumably copied from the index code, so needs updating. + /* +* Disallow data types without a less-than operator +* +* XXX Maybe allow this, but only for EXPRESSIONS stats and +* prevent building e.g. MCV etc. +*/ + atttype = exprType(expr); + type = lookup_type_cache(atttype, TYPECACHE_LT_OPR); + if (type->lt_opr == InvalidOid) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), +errmsg("expression cannot be used in statistics because its type %s has no default btree operator class", + format_type_be(atttype; As the comment suggests, it's probably worth skipping this check if numcols is 1 so that single-column stats can be built for more types of expressions. (I'm assuming that it's basically no more effort to make that work, so I think it falls into the might-as-well-do-it category.) + /* +* Parse the statistics kinds. Firstly, check that this is not the +* variant building statistics for a single expression, in which case +* we don't allow specifying any statistis kinds. The simple variant +* only has one expression, and does not allow statistics kinds. +*/ + if ((list_length(stmt->exprs) == 1) && (list_length(stxexprs) == 1)) + { Typo: "statistis" Nit-picking, this test could just be: + if ((numcols == 1) && (list_length(stxexprs) == 1)) which IMO is a little more readable, and matches a similar test a little further down. + /* +* If there are no simply-referenced columns, give the statistics an +* auto dependency on the whole table. In most cases, this will +* be redundant, but it might not be if the statistics expressions +* contain no Vars (which might seem strange but possible). +* +* XXX This is copied from index_create, not sure if it's applicable +* to extended statistics too. +*/ Seems right to me. + /* +* FIXME use 'expr' for expressions, which have empty column names. +* For indexes this is handled in ChooseIndexColumnNames, but we +* have no such function for stats. +*/ + if (!name) + name = "expr"; In theory, this function could be made to duplicate the logic used for indexes, creating names like "expr1", "expr2", etc. To be honest though, I don't think it's worth the effort. The code for indexes isn't really bulletproof anyway -- for example there might be a column called "expr" that is or isn't included in the index, which would make the generated name ambiguous. And in any case, a name like "tbl_cola_expr_colb_expr1_colc_stat" isn't really any more useful than "tbl_cola_expr_colb_expr_colc_stat". So I'd be tempted to leave that code as it is. + +/* + * CheckMutability + * Test whether given expression is mutable + * + * FIXME copied from indexcmds.c, maybe use some shared function? + */ +static bool +CheckMutability(Expr *expr) +{ As the comment says, it's quite messy duplicating this code, but I'm wondering whether it would be OK to just skip this check entirely. I think someone else suggested that elsewhere, and I think it might not be a bad idea. For indexes, it could easily lead to wrong query results, but for stats the most likely problem is that the stats would get out of date (which they tend to do all by themselves anyway) and need rebuilding. If you ignore intentionally crazy examples (which are still possible even with this check), then there are probably many legitimate cases where someone might want to use non-immutable functions in stats, and this check just forces them to create an immutable wrapper function. Regards, Dean
RE: [Patch] Optimize dropping of relation buffers using dlist
Hi Kirk, >And if you want to test, I have already indicated the detailed steps including >the scripts I used. Have fun testing! Thank you for your sharing of test steps and scripts. I'd like take a look at them and redo some of the tests using my machine. I'll send my test reults in a separate email after this. Regards, Tang
Re: Incorrect allocation handling for cryptohash functions with OpenSSL
On 25/12/2020 02:57, Michael Paquier wrote: On Mon, Dec 21, 2020 at 04:28:26PM -0500, Robert Haas wrote: TBH, I think there's no point in return an error here at all, because it's totally non-specific. You have no idea what failed, just that something failed. Blech. If we want to check that ctx is non-NULL, we should do that with an Assert(). Complicating the code with error checks that have to be added in multiple places that are far removed from where the actual problem was detected stinks. You could technically do that, but only for the backend at the cost of painting the code of src/common/ with more #ifdef FRONTEND. Even if we do that, enforcing an error in the backend could be a problem when it comes to some code paths. Yeah, you would still need to remember to check for the error in frontend code. Maybe it would still be a good idea, not sure. It would be a nice backstop, if you forget to check for the error. I had a quick look at the callers: contrib/pgcrypto/internal-sha2.c and src/backend/utils/adt/cryptohashfuncs.c: the call to pg_cryptohash_create() is missing check for NULL result. With your latest patch, that's OK because the subsequent pg_cryptohash_update() calls will fail if passed a NULL context. But seems sloppy. contrib/pgcrypto/internal.c: all the calls to pg_cryptohash_* functions are missing checks for error return codes. contrib/uuid-ossp/uuid-ossp.c: uses pg_cryptohash for MD5, but borrows the built-in implementation of SHA1 on some platforms. Should we add support for SHA1 in pg_cryptohash and use that for consistency? src/backend/libpq/auth-scram.c: SHA256 is used in the mock authentication. If the pg_cryptohash functions fail, we throw a distinct error "could not encode salt" that reveals that it was a mock authentication. I don't think this is a big deal in practice, it would be hard for an attacker to induce the SHA256 computation to fail, and there are probably better ways to distinguish mock authentication from real, like timing attacks. But still. src/include/common/checksum_helper.h: in pg_checksum_raw_context, do we still need separate fields for the different sha variants. - Heikki
Re: Moving other hex functions to /common
On Wed, Jan 6, 2021 at 01:10:28PM +0900, Michael Paquier wrote: > > It was very confusing, and this attached patch fixes all of that. I > > also added the pg_ prefix you suggrested. If we want to add dstlen to > > all the functions, we have to do it for all types --- not sure it is > > worth it, now that things are much clearer. > > Overflow protection is very important in my opinion here once we > expose more those functions. Not touching ECPG at all and not making > this refactoring pluggable into shared libs is fine by me at the end > because we don't have a case for libpq yet, but I object to the lack > of protection against overflows. Fine. Do you want to add the overflow to the patch I posted, for all encoding types? -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Incorrect allocation handling for cryptohash functions with OpenSSL
On 06/01/2021 13:42, Michael Paquier wrote: On Sat, Dec 19, 2020 at 03:13:50PM +0900, Michael Paquier wrote: At the same time, I have taken care of your comment from upthread to return a failure if the caller passes NULL for the context, and adjusted some comments. What do you think of the attached? I have looked again at this thread with a fresher mind and I did not see a problem with the previous patch, except some indentation issues. So if there are no objections, I'd like to commit the attached. Looks fine to me. - Heikki
[PATCH] Partial foreign key updates in referential integrity triggers
Hey, hackers! I've created a patch to better support referential integrity constraints when using composite primary and foreign keys. This patch allows creating a foreign key using the syntax: FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL (fk_id) which means that only the fk_id column will be set to NULL when the referenced row is deleted, rather than both the tenant_id and fk_id columns. In multi-tenant applications, it is common to denormalize a "tenant_id" column across every table, and use composite primary keys of the form (tenant_id, id) and composite foreign keys of the form (tenant_id, fk_id), reusing the tenant_id column in both the primary and foreign key. This is often done initially for performance reasons, but has the added benefit of making it impossible for data from one tenant to reference data from another tenant, also making this a good decision from a security perspective. Unfortunately, one downside of using composite foreign keys in such a matter is that commonly used referential actions, such as ON DELETE SET NULL, no longer work, because Postgres tries to set all of the referencing columns to NULL, including the columns that overlap with the primary key: CREATE TABLE tenants (id serial PRIMARY KEY); CREATE TABLE users ( tenant_id int REFERENCES tenants ON DELETE CASCADE, id serial, PRIMARY KEY (tenant_id, id), ); CREATE TABLE posts ( tenant_id int REFERENCES tenants ON DELETE CASCADE, id serial, author_id int, PRIMARY KEY (tenant_id, id), FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL ); INSERT INTO tenants VALUES (1); INSERT INTO users VALUES (1, 101); INSERT INTO posts VALUES (1, 201, 101); DELETE FROM users WHERE id = 101; ERROR: null value in column "tenant_id" violates not-null constraint DETAIL: Failing row contains (null, 201, null). This can be resolved by manually creating triggers on the referenced table, but this is clunky and adds a significant amount of extra work when adding (or removing!) foreign keys. Users shouldn't have to compromise on maintenance overhead when using composite foreign keys. I have implemented a simple extension to the syntax for foreign keys that makes it just as easy to support referential integrity constraints for composite foreign keys as it is for single column foreign keys. The SET NULL and SET DEFAULT referential actions can now be optionally followed by a column list: key_action: | NO ACTION | RESTRICT | CASCADE | SET NULL [ (column_name [, ...] ) ] | SET DEFAULT [ (column_name [, ...] ) ] When a column list is provided, only the specified columns, which must be a subset of the referencing columns, will be updated in the associated trigger. Note that use of SET NULL (col1, ...) on a composite foreign key with MATCH FULL specified will still raise an error. In such a scenario we could raise an error when the user tries to define the foreign key, but we don't raise a similar error when a user tries to use SET NULL on a non-nullable column, so I don't think this is critical. (I haven't added this check in my patch.) While this feature would mostly be used with the default MATCH SIMPLE, I could imagine using SET DEFAULT (col, ...) even for MATCH FULL foreign keys. To store this additional data, I've added two columns to the pg_constraint catalog entry: confupdsetcols int2[] confdelsetcols int2[] These columns store the attribute numbers of the columns to update in the ON UPDATE and ON DELETE triggers respectively. If the arrays are empty, then all of the referencing columns should be updated. I previously proposed this feature about a year ago [1], but I don't feel like the arguments against it were very strong. Wanting to get more familiar with the Postgres codebase I decided to implement the feature over this holiday break, and I've gotten everything working and put together a complete patch including tests and updates to documentation. Hopefully if people find it useful it can make its way into the next commitfest! Visual diff: https://github.com/postgres/postgres/compare/master...PaulJuliusMartinez:on-upd-del-set-cols Here's a rough outline of the changes: src/backend/parser/gram.y | 122 src/include/nodes/parsenodes.h| 3 src/backend/nodes/copyfuncs.c | 2 src/backend/nodes/equalfuncs.c| 2 src/backend/nodes/outfuncs.c | 47 - Modify grammar to add opt_column_list after SET NULL and SET DEFAULT - Add fk_{upd,del}_set_cols fields to Constraint struct - Add proper node support, as well as outfuncs support for AlterTableStmt, which I used while debugging src/include/catalog/pg_constraint.h | 20 src/backend/catalog/pg_constraint.c | 80 src/include/catalog/catversion.h | 2 - Add confupdsetcols and confdelsetcols to pg_constraint catalog entry src/backend/commands/tablecmds.c | 142 - Pass along data from parsed Constraint node to Creat
RE: [Patch] Optimize dropping of relation buffers using dlist
On Wed, January 6, 2021 7:04 PM (JST), I wrote: > I will resume the test similar to Tang, because she also executed the original > failover test which I have been doing before. > To avoid confusion and to check if the results from mine and Tang are > consistent, I also did the recovery/failover test for VACUUM on single > relation, > which I will send in a separate email after this. A. Test to find the right THRESHOLD So below are the procedures and results of the VACUUM recovery performance test on single relation. I followed the advice below and applied the supplementary patch on top of V39: Test-for-threshold.patch This will ensure that we'll always enter the optimized path. We're gonna use the threshold then as the relation size. > >One idea could be to remove "nBlocksToInvalidate < > >BUF_DROP_FULL_SCAN_THRESHOLD" part of check "if (cached && > >nBlocksToInvalidate < BUF_DROP_FULL_SCAN_THRESHOLD)" so that it > >always use optimized path for the tests. Then use the relation size > >as NBuffers/128, NBuffers/256, NBuffers/512 for different values of > >shared buffers as 128MB, 1GB, 20GB, 100GB. Each relation size is NBuffers/XXX, so I used the attached "rel.sh" script to test from NBuffers/512 until NBuffers/8 relation size per shared_buffers. I did not go further beyond 8 because it took too much time, and I could already observe significant results until that. [Vacuum Recovery Performance on Single Relation] 1. Setup synchronous streaming replication. I used the configuration written at the bottom of this email. 2. [Primary] Create 1 table. (rel.sh create) 3. [Primary] Insert data of NBuffers/XXX size. Make sure to use the correct size for the set shared_buffers by commenting out the right size in "insert" of rel.sh script. (rel.sh insert) 4. [Primary] Delete table. (rel.sh delete) 5. [Standby] Optional: To double-check that DELETE is reflected on standby. SELECT count(*) FROM tableXXX; Make sure it returns 0. 6. [Standby] Pause WAL replay. (rel.sh pause) (This script will execute SELECT pg_wal_replay_pause(); .) 7. [Primary] VACUUM the single relation. (rel.sh vacuum) 8. [Primary] After the vacuum finishes, stop the server. (rel.sh stop) (The script will execute pg_ctl stop -D $PGDATA -w -mi) 9. [Standby] Resume WAL replay and promote the standby. (rel.sh resume) It basically prints a timestamp when resuming WAL replay, and prints another timestamp when the promotion is done. Compute the time difference. [Results for VACUUM on single relation] Average of 5 runs. 1. % REGRESSION % Regression: (patched - master)/master | rel_size | 128MB | 1GB| 20GB | 100GB| |--||||--| | NB/512 | 0.000% | 0.000% | 0.000% | -32.680% | | NB/256 | 0.000% | 0.000% | 0.000% | 0.000% | | NB/128 | 0.000% | 0.000% | 0.000% | -16.502% | | NB/64| 0.000% | 0.000% | 0.000% | -9.841% | | NB/32| 0.000% | 0.000% | 0.000% | -6.219% | | NB/16| 0.000% | 0.000% | 0.000% | 3.323% | | NB/8 | 0.000% | 0.000% | 0.000% | 8.178% | For 100GB shared_buffers, we can observe regression beyond NBuffers/32. So with this, we can conclude that NBuffers/32 is the right threshold. For NBuffers/16 and beyond, the patched performs worse than master. In other words, the cost of for finding to be invalidated buffers gets higher in the optimized path than the traditional path. So in attached V39 patches, I have updated the threshold BUF_DROP_FULL_SCAN_THRESHOLD to NBuffers/32. 2. [PATCHED] Units: Seconds | rel_size | 128MB | 1GB | 20GB | 100GB | |--|---|---|---|---| | NB/512 | 0.106 | 0.106 | 0.106 | 0.206 | | NB/256 | 0.106 | 0.106 | 0.106 | 0.306 | | NB/128 | 0.106 | 0.106 | 0.206 | 0.506 | | NB/64| 0.106 | 0.106 | 0.306 | 0.907 | | NB/32| 0.106 | 0.106 | 0.406 | 1.508 | | NB/16| 0.106 | 0.106 | 0.706 | 3.109 | | NB/8 | 0.106 | 0.106 | 1.307 | 6.614 | 3. MASTER Units: Seconds | rel_size | 128MB | 1GB | 20GB | 100GB | |--|---|---|---|---| | NB/512 | 0.106 | 0.106 | 0.106 | 0.306 | | NB/256 | 0.106 | 0.106 | 0.106 | 0.306 | | NB/128 | 0.106 | 0.106 | 0.206 | 0.606 | | NB/64| 0.106 | 0.106 | 0.306 | 1.006 | | NB/32| 0.106 | 0.106 | 0.406 | 1.608 | | NB/16| 0.106 | 0.106 | 0.706 | 3.009 | | NB/8 | 0.106 | 0.106 | 1.307 | 6.114 | I used the following configurations: [postgesql.conf] shared_buffers = 100GB #20GB,1GB,128MB autovacuum = off full_page_writes = off checkpoint_timeout = 30min max_locks_per_transaction = 1 max_wal_size = 20GB # For streaming replication from primary. Don't uncomment on Standby. synchronous_commit = remote_write synchronous_standby_names = 'walreceiver' # For Standby. Don't uncomment on Primary. # hot_standby = on #primary_conninfo = 'host=... user=... port=... application_name=walreceiver' -- B. Regression Test using the NBuffers/32 Threshold (V39 Patches) For this one,
Re: New Table Access Methods for Multi and Single Inserts
On Wed, Jan 6, 2021 at 12:56 PM Luc Vlaming wrote: > The main reason for me for wanting a single API is that I would like the > decision of using single or multi inserts to move to inside the tableam. > For e.g. a heap insert we might want to put the threshold at e.g. 100 > rows so that the overhead of buffering the tuples is actually > compensated. For other tableam this logic might also be quite different, > and I think therefore that it shouldn't be e.g. COPY or CTAS deciding > whether or not multi inserts should be used. Because otherwise the thing > we'll get is that there will be tableams that will ignore this flag and > do their own thing anyway. I'd rather have an API that gives all > necessary information to the tableam and then make the tableam do "the > right thing". > > Another reason I'm suggesting this API is that I would expect that the > begin is called in a different place in the code for the (multiple) > inserts than the actual insert statement. > To me conceptually the begin and end are like e.g. the executor begin > and end: you prepare the inserts with the knowledge you have at that > point. I assumed (wrongly?) that during the start of the statement one > knows best how many rows are coming; and then the actual insertion of > the row doesn't have to deal anymore with multi/single inserts, choosing > when to buffer or not, because that information has already been given > during the initial phase. One of the reasons this is appealing to me is > that e.g. in [1] there was discussion on when to switch to a multi > insert state, and imo this should be up to the tableam. Agree that whether to go with the multi or single inserts should be completely left to tableam implementation, we, as callers of those API just need to inform whether we expect single or multiple rows, and it should be left to tableam implementation whether to actually go with buffering or single inserts. ISTM that it's an elegant way of making the API generic and abstracting everything from the callers. What I wonder is how can we know in advance the expected row count that we need to pass in to heap_insert_begin()? IIUC, we can not estimate the upcoming rows in COPY, Insert Into Select, or Refresh Mat View or some other insert queries? Of course, we can look at the planner's estimated row count for the selects in COPY, Insert Into Select or Refresh Mat View after the planning, but to me that's not something we can depend on and pass in the row count to the insert APIs. When we don't know the expected row count, why can't we(as callers of the APIs) tell the APIs something like, "I'm intending to perform multi inserts, so if possible and if you have a mechanism to buffer the slots, do it, otherwise insert the tuples one by one, or else do whatever you want to do with the tuples I give it you". So, in case of COPY we can ask the API for multi inserts and call heap_insert_begin() and heap_insert_v2(). Given the above explanation, I still feel bool is_multi would suffice. Thoughts? On dynamically, switching from single to multi inserts, this can be done by heap_insert_v2 itself. The way I think it's possible is that, say we have some threshold row count 1000(can be a macro) after inserting those many tuples, heap_insert_v2 can switch to buffering mode. Thoughts? > Which would make the code something like: > > void > heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot) > { > TupleTableSlot *batchslot; > HeapMultiInsertState *mistate = (HeapMultiInsertState > *)state->mistate; > Size sz; > > Assert(mistate && mistate->slots); > > if (mistate->slots[mistate->cur_slots] == NULL) > mistate->slots[mistate->cur_slots] = > > table_slot_create(state->rel, NULL); > > batchslot = mistate->slots[mistate->cur_slots]; > > ExecClearTuple(batchslot); > ExecCopySlot(batchslot, slot); > > /* > * Calculate the tuple size after the original slot is copied, > because the > * copied slot type and the tuple size may change. > */ > sz = GetTupleSize(batchslot, mistate->max_size); > > Assert(sz > 0); > > mistate->cur_slots++; > mistate->cur_size += sz; > > if (mistate->cur_slots >= mistate->max_slots || > mistate->cur_size >= mistate->max_size) > heap_multi_insert_flush(state); > } I think clearing tuples before copying the slot as you suggested may work without the need of clear_slots flag. > > > > Also, why do we want to do ExecClearTuple() anyway? Isn't > > > it good enough that the next call to ExecCopySlot will effectively clear > > > it out? > > > > For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the > > slot before copying. But, for buffer heap slots, the > > tts_buffer_heap_copyslot does not always clear the destination slot, see > >
Re: Track replica origin progress for Rollback Prepared
On Wed, Jan 6, 2021 at 5:18 PM Michael Paquier wrote: > > On Tue, Jan 05, 2021 at 04:24:21PM +0530, Amit Kapila wrote: > > There are already tests [1] in one of the upcoming patches for logical > > decoding of 2PC which covers this code using which I have found this > > problem. So, I thought those would be sufficient. I have not checked > > the feasibility of using test_decoding because I thought adding more > > using test_decoding will unnecessarily duplicate the tests. > > Hmm. This stuff does not check after replication origins even if it > stresses 2PC, so that looks incomplete when seen from here. > I think it does. Let me try to explain in a bit more detail. Internally, the apply worker uses replication origins to track the progress of apply, see the code near ApplyWorkerMain->replorigin_create. We will store the progress (WAL LSN) for each commit (prepared)/ rollback prepared with this origin. If the server crashes and restarts, we will use the origin's LSN as the start decoding point (the subscriber sends the last LSN to the publisher). The bug here is that after restart the origin was not advanced for rollback prepared which I have fixed with this patch. Now, let us see how the tests mentioned by me cover this code. In the first test (check that 2PC gets replicated to subscriber then ROLLBACK PREPARED), we do below on publisher and wait for it to be applied on the subscriber. BEGIN; INSERT INTO tab_full VALUES (12); PREPARE TRANSACTION 'test_prepared_tab_full'; ROLLBACK PREPARED 'test_prepared_tab_full'; Note that we would have WAL logged the LSN (replication_origin_lsn) corresponding to ROLLBACK PREPARED on the subscriber during apply. Now, in the second test(Check that ROLLBACK PREPARED is decoded properly on crash restart (publisher and subscriber crash)), we prepare a transaction and crash the server. After the restart, because we have not advanced the replication origin in the recovery of Rollback Prepared, the subscriber won't consider that transaction has been applied so it again requests that transaction. Actually speaking, we don't need the second test to reproduce this exact problem, if we would have restarted after the first test the problem would be reproduced but I was consistent getting the problem so with the current way tests are written. However, we can change it slightly to restart after the first test if we want. -- With Regards, Amit Kapila.
Re: Track replica origin progress for Rollback Prepared
On Tue, Jan 05, 2021 at 04:24:21PM +0530, Amit Kapila wrote: > There are already tests [1] in one of the upcoming patches for logical > decoding of 2PC which covers this code using which I have found this > problem. So, I thought those would be sufficient. I have not checked > the feasibility of using test_decoding because I thought adding more > using test_decoding will unnecessarily duplicate the tests. Hmm. This stuff does not check after replication origins even if it stresses 2PC, so that looks incomplete when seen from here. > How about something like "Dump transaction origin information only for > abort prepared. We need this during recovery to update the replication > origin progress."? That sounds fine. -- Michael signature.asc Description: PGP signature
Re: Incorrect allocation handling for cryptohash functions with OpenSSL
On Sat, Dec 19, 2020 at 03:13:50PM +0900, Michael Paquier wrote: > At the same time, I have taken care of your comment from upthread to > return a failure if the caller passes NULL for the context, and > adjusted some comments. What do you think of the attached? I have looked again at this thread with a fresher mind and I did not see a problem with the previous patch, except some indentation issues. So if there are no objections, I'd like to commit the attached. -- Michael diff --git a/src/include/common/cryptohash.h b/src/include/common/cryptohash.h index 4c3df8e5ae..3ecaf62113 100644 --- a/src/include/common/cryptohash.h +++ b/src/include/common/cryptohash.h @@ -25,12 +25,8 @@ typedef enum PG_SHA512 } pg_cryptohash_type; -typedef struct pg_cryptohash_ctx -{ - pg_cryptohash_type type; - /* private area used by each hash implementation */ - void *data; -} pg_cryptohash_ctx; +/* opaque context, private to each cryptohash implementation */ +typedef struct pg_cryptohash_ctx pg_cryptohash_ctx; extern pg_cryptohash_ctx *pg_cryptohash_create(pg_cryptohash_type type); extern int pg_cryptohash_init(pg_cryptohash_ctx *ctx); diff --git a/src/common/cryptohash.c b/src/common/cryptohash.c index 1921a33b34..efedadd626 100644 --- a/src/common/cryptohash.c +++ b/src/common/cryptohash.c @@ -39,6 +39,21 @@ #define FREE(ptr) free(ptr) #endif +/* Internal pg_cryptohash_ctx structure */ +struct pg_cryptohash_ctx +{ + pg_cryptohash_type type; + + union + { + pg_md5_ctx md5; + pg_sha224_ctx sha224; + pg_sha256_ctx sha256; + pg_sha384_ctx sha384; + pg_sha512_ctx sha512; + } data; +}; + /* * pg_cryptohash_create * @@ -50,38 +65,18 @@ pg_cryptohash_create(pg_cryptohash_type type) { pg_cryptohash_ctx *ctx; + /* + * Note that this always allocates enough space for the largest hash. A + * smaller allocation would be enough for md5, sha224 and sha256, but the + * small extra amount of memory does not make it worth complicating this + * code. + */ ctx = ALLOC(sizeof(pg_cryptohash_ctx)); if (ctx == NULL) return NULL; - + memset(ctx, 0, sizeof(pg_cryptohash_ctx)); ctx->type = type; - switch (type) - { - case PG_MD5: - ctx->data = ALLOC(sizeof(pg_md5_ctx)); - break; - case PG_SHA224: - ctx->data = ALLOC(sizeof(pg_sha224_ctx)); - break; - case PG_SHA256: - ctx->data = ALLOC(sizeof(pg_sha256_ctx)); - break; - case PG_SHA384: - ctx->data = ALLOC(sizeof(pg_sha384_ctx)); - break; - case PG_SHA512: - ctx->data = ALLOC(sizeof(pg_sha512_ctx)); - break; - } - - if (ctx->data == NULL) - { - explicit_bzero(ctx, sizeof(pg_cryptohash_ctx)); - FREE(ctx); - return NULL; - } - return ctx; } @@ -95,24 +90,24 @@ int pg_cryptohash_init(pg_cryptohash_ctx *ctx) { if (ctx == NULL) - return 0; + return -1; switch (ctx->type) { case PG_MD5: - pg_md5_init((pg_md5_ctx *) ctx->data); + pg_md5_init(&ctx->data.md5); break; case PG_SHA224: - pg_sha224_init((pg_sha224_ctx *) ctx->data); + pg_sha224_init(&ctx->data.sha224); break; case PG_SHA256: - pg_sha256_init((pg_sha256_ctx *) ctx->data); + pg_sha256_init(&ctx->data.sha256); break; case PG_SHA384: - pg_sha384_init((pg_sha384_ctx *) ctx->data); + pg_sha384_init(&ctx->data.sha384); break; case PG_SHA512: - pg_sha512_init((pg_sha512_ctx *) ctx->data); + pg_sha512_init(&ctx->data.sha512); break; } @@ -123,30 +118,31 @@ pg_cryptohash_init(pg_cryptohash_ctx *ctx) * pg_cryptohash_update * * Update a hash context. Note that this implementation is designed - * to never fail, so this always returns 0. + * to never fail, so this always returns 0 except if the caller has + * given a NULL context. */ int pg_cryptohash_update(pg_cryptohash_ctx *ctx, const uint8 *data, size_t len) { if (ctx == NULL) - return 0; + return -1; switch (ctx->type) { case PG_MD5: - pg_md5_update((pg_md5_ctx *) ctx->data, data, len); + pg_md5_update(&ctx->data.md5, data, len); break; case PG_SHA224: - pg_sha224_update((pg_sha224_ctx *) ctx->data, data, len); + pg_sha224_update(&ctx->data.sha224, data, len); break; case PG_SHA256: - pg_sha256_update((pg_sha256_ctx *) ctx->data, data, len); + pg_sha256_update(&ctx->data.sha256, data, len); break; case PG_SHA384: - pg_sha384_update((pg_sha384_ctx *) ctx->data, data, len); + pg_sha384_update(&ctx->data.sha384, data, len); break; case PG_SHA512: - pg_sha512_update((pg_sha512_ctx *) ctx->data, data, len); + pg_sha512_update(&ctx->data.sha512, data, len); break; } @@ -157,30 +153,31 @@ pg_cryptohash_update(pg_cryptohash_ctx *ctx, const uint8 *data, size_t len) * pg_cryptohash_final * * Finalize a hash context. Note that this implementation is designed - * to never fail, so this always returns 0. + * to never fail, so this always returns 0 except if the caller has + * given a NULL context. */ int pg_cryptohash_final(pg_cryptohash_ctx *ctx, uint8 *dest) { if (ctx ==
Re: [PATCH] Runtime control of CLOBBER_CACHE_ALWAYS
On 2021-01-06 00:30, Craig Ringer wrote: Perhaps debug_invalidate_system_caches_always ? It's a bit long but we use the debug prefix elsewhere too. Committed with that name. In your original patch, this hunk in pg_config_manual.h: + * You can define these by editing pg_config_manual.h but it's usually + * sufficient to pass CFLAGS to configure, e.g. + * + * ./configure --enable-debug --enable-cassert CFLAGS="-DUSE_VALGRIND" + * + * The flags are persisted in Makefile.global so they will be correctly + * applied to extensions, including those built by PGXS. I don't think that necessarily works for all settings. Maybe we should make a pass through it and ensure it all works sensibly, but that seems like a separate project. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
Re: Single transaction in the tablesync worker?
On Wed, Jan 6, 2021 at 2:13 PM Peter Smith wrote: > > I think it makes sense. If there can be a race between the tablesync > re-launching (after error), and the AlterSubscription_refresh removing > some table’s relid from the subscription then there could be lurking > slot/origin tablesync resources (of the removed table) which a > subsequent DROP SUBSCRIPTION cannot discover. I will think more about > how/if it is possible to make this happen. Anyway, I suppose I ought > to refactor/isolate some of the tablesync cleanup code in case it > needs to be commonly called from DropSubscription and/or from > AlterSubscription_refresh. > Fair enough. BTW, I have analyzed whether we need any modifications to pg_dump/restore for this patch as this changes the state of one of the fields in the system table and concluded that we don't need any change. For subscriptions, we don't dump any of the information from pg_subscription_rel, rather we just dump subscriptions with the connect option as false which means users need to enable the subscription and refresh publication after restore. I have checked this in the code and tested it as well. The related information is present in pg_dump doc page [1], see from "When dumping logical replication subscriptions ". [1] - https://www.postgresql.org/docs/devel/app-pgdump.html -- With Regards, Amit Kapila.
RE: [Patch] Optimize dropping of relation buffers using dlist
On Sunday, January 3, 2021 10:35 PM (JST), Amit Kapila wrote: > On Sat, Jan 2, 2021 at 7:47 PM k.jami...@fujitsu.com > wrote: > > > > Happy new year. The V38 LGTM. > > Apologies for a bit of delay on posting the test results, but since > > it's the start of commitfest, here it goes and the results were interesting. > > > > I executed a VACUUM test using the same approach that Tsunakawa-san > > did in [1], but only this time, the total time that DropRelFileNodeBuffers() > took. > > > > Please specify the exact steps like did you deleted all the rows from a table > or > some of it or none before performing Vacuum? How did you measure this > time, did you removed the cached check? It would be better if you share the > scripts and or the exact steps so that the same can be used by others to > reproduce. Basically, I used the TimestampDifference function in DropRelFileNodeBuffers(). I also executed DELETE before VACUUM. I also removed nBlocksToInvalidate < BUF_DROP_FULL_SCAN_THRESHOLD And used the threshold as the relation size. > Hmm, in the tests done by Tang, the results indicate that in some cases the > patched version is slower at even NBuffers/32, so not sure if we can go to > values shown by you unless she is doing something wrong. I think the > difference in results could be because both of you are using different > techniques to measure the timings, so it might be better if both of you can > share scripts or exact steps used to measure the time and other can use the > same technique and see if we are getting consistent results. Right, since we want consistent results, please disregard the approach that I did. I will resume the test similar to Tang, because she also executed the original failover test which I have been doing before. To avoid confusion and to check if the results from mine and Tang are consistent, I also did the recovery/failover test for VACUUM on single relation, which I will send in a separate email after this. Regards, Kirk Jamison
RE: Enhance traceability of wal_level changes for backup management
Hello, Sawada-san I'll continue the discussion of [2]. We talked about how to recognize the time or LSN when/where wal_level is changed to 'none' there. You said > The use case I imagined is that the user temporarily > changes wal_level to 'none' from 'replica' or 'logical' to speed up loading > and > changes back to the normal. In this case, the backups taken before the > wal_level change cannot be used to restore the database to the point after the > wal_level change. So I think backup management tools would want to > recognize the time or LSN when/where wal_level is changed to ‘none’ in order > to do some actions such as invalidating older backups, re-calculating backup > redundancy etc. > Actually the same is true when the user changes to ‘minimal’. The tools would > need to recognize the time or LSN in this case too. Since temporarily changing > wal_level has been an uncommon use case some tools perhaps are not aware > of that yet. But since introducing wal_level = ’none’ could make the change > common, I think we need to provide a way for the tools. I wondered, couldn't backup management tools utilize the information in the backup that is needed to be made when wal_level is changed to "none" for example ? As I said before, existing backup management tools support only wal_level=replica or logical at present. And, if they would wish to alter the status quo and want to cover the changes over wal_levels, I felt it's natural that they support feature like taking a full backup, trigged by the wal_level changes (or server stop). This is because taking a backup is a must for wal_level=none, as I described in the patch of wal_level=none. For example, they could prepare an easy way to take an offline physical backup when the server stops for changing the wal_level. (Here, they can support the change to minimal, too.) What did you think ? [2] - https://www.postgresql.org/message-id/CAD21AoCotoAxxCmMVz6niwg4j6c3Er_-GboTLmHBft8pALpOGA%40mail.gmail.com Best Regards, Takamichi Osumi
Enhance traceability of wal_level changes for backup management
Hi, This thread came from another thread about wal_level [1]. Mainly from backup management tools perspective such as pgBackRest, EDB's BART and pg_probackup, it seems worth talking about a way comprehensively to trace and recognize wal_level changes for various purposes and values like necessity of invalidating old backups for example. In the thread [1], I talk about wal_level='none' but these kind of topic applies changing wal_level to 'minimal' from higher level too. Accordingly, I made this topic as a new independent thread. Currently, these backup management tools described above work when wal_level is higher than minimal because these use physical online backup or wal archiving but giving any useful ideas for backup management related to wal_level changes is welcomed. [1] - https://www.postgresql.org/message-id/TYAPR01MB29901EBE5A3ACCE55BA99186FE320%40TYAPR01MB2990.jpnprd01.prod.outlook.com Best Regards, Takamichi Osumi