On Wed, Sep 20, 2023 at 7:20 PM Hayato Kuroda (Fujitsu) <kuroda.hay...@fujitsu.com> wrote: > > Good catch, I could not notice because it worked well in my RHEL. Here is the > updated version.
Thanks for the patch. I have some comments on v42: 1. +{ oid => '8046', descr => 'for use by pg_upgrade', + proname => 'binary_upgrade_validate_wal_records', proisstrict => 'f', + provolatile => 'v', proparallel => 'u', prorettype => 'bool', + if (PG_ARGISNULL(0)) + elog(ERROR, "null argument to binary_upgrade_validate_wal_records is not allowed"); Can proisstrict => 'f' be removed so that there's no need for explicit PG_ARGISNULL check? Any specific reason to keep it? And, the before the ISNULL check the arg is read, which isn't good. 2. +Datum +binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS) The function name looks too generic in the sense that it validates WAL records for correctness/corruption, but it is not. Can it be something like binary_upgrade_{check_for_wal_logical_end, check_for_logical_end_of_wal} or such? 3. + /* Quick exit if the given lsn is larger than current one */ + if (start_lsn >= GetFlushRecPtr(NULL)) + PG_RETURN_BOOL(false); + An LSN that doesn't exists yet is an error IMO, may be an error better here? 4. + * This function is used to verify that there are no WAL records (except some + * types) after confirmed_flush_lsn of logical slots, which means all the + * changes were replicated to the subscriber. There is a possibility that some + * WALs are inserted during upgrade, so such types would be ignored. + * This comment before the function better be at the callsite of the function, because as far as this function is concerned, it checks if there are any WAL records that are not "certain" types after the given LSN, it doesn't know logical slots or confirmed_flush_lsn or such. 5. Trying to understand the interaction of this feature with custom WAL records that a custom WAL resource manager puts in. Is it okay to have custom WAL records after the "logical WAL end"? + /* + * There is a possibility that following records may be generated + * during the upgrade. + */ 6. + if (PQntuples(res) != 1) + pg_fatal("could not count the number of logical replication slots"); + Not existing a single logical replication slot an error? I think it must be if (PQntuples(res) == 0) return;? 7. A nit: + nslots_on_new = atoi(PQgetvalue(res, 0, 0)); + + if (nslots_on_new) Just do if(atoi(PQgetvalue(res, 0, 0)) > 0) and get rid of nslots_on_new? 8. + if (nslots_on_new) + pg_fatal("expected 0 logical replication slots but found %d", + nslots_on_new); How about "New cluster database is containing logical replication slots", note that the some of the fatal messages are starting with an upper-case letter. 9. + res = executeQueryOrDie(conn, "SHOW wal_level;"); + res = executeQueryOrDie(conn, "SHOW max_replication_slots;"); Instead of 2 queries to determine required parameters, isn't it better with a single query like the following? select setting from pg_settings where name in ('wal_level', 'max_replication_slots') order by name; 10. Why just wal_level and max_replication_slots, why not max_worker_processes and max_wal_senders too? I'm looking at RecoveryRequiresIntParameter and if they are different on the upgraded instance, chances that the logical replication won't work, no? 11. +# 2. Generate extra WAL records. Because these WAL records do not get consumed +# it will cause the upcoming pg_upgrade test to fail. +$old_publisher->safe_psql('postgres', + "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;" +); +$old_publisher->stop; This might be a recipie for sporadic test failures - how is it guaranteed that the newly generated WAL records aren't consumed. May be stop subscriber or temporarily disable the subscription and then generate WAL records? 12. +extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn); +extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader); + Why not these functions be defined in xlogreader.h with elog/ereport in #ifndef FRONTEND #endif blocks? IMO, xlogreader.h seems right location for these functions. 13. +LogicalReplicationSlotInfo Where is this structure defined? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com