Dear Euler, Here are comments for v21.
01. main ``` /* rudimentary check for a data directory. */ ... /* subscriber PID file. */ ``` Initial char must be upper, and period is not needed. 02. check_data_directory ``` snprintf(versionfile, MAXPGPATH, "%s/PG_VERSION", datadir); ``` You removed the version checking from PG_VERSION, but I think it is still needed. Indeed v21 can detect the pg_ctl/pg_resetwal/pg_createsubscriber has different verson, but this cannot ditect the started instance has the differnet version. I.e., v20-0010 is partially needed. 03. store_pub_sub_info() ``` SimpleStringListCell *cell; ``` This definition can be in loop variable. 04. get_standby_sysid() ``` pfree(cf); ``` This can be pg_pfree(). 05. check_subscriber ``` /* The target server must be a standby */ if (server_is_in_recovery(conn) == 0) { pg_log_error("The target server is not a standby"); return false; } ``` What if the the function returns -1? Should we ditect (maybe the disconnection) here? 06. server_is_in_recovery ``` ret = strcmp("t", PQgetvalue(res, 0, 0)); PQclear(res); if (ret == 0) return 1; else if (ret > 0) return 0; else return -1; /* should not happen */ ``` But strcmp may return a negative value, right? Based on the comment atop function, we should not do it. I think we can use ternary operator instead. 07. server_is_in_recovery As the fisrt place, no one consider this returns -1. So can we change the bool function and raise pg_fatal() in case of the error? 08. check_subscriber ``` if (strcmp(PQgetvalue(res, 0, 1), "t") != 0) { pg_log_error("permission denied for function \"%s\"", "pg_catalog.pg_replication_origin_advance(text, pg_lsn)"); return false; } ``` I think the third argument must be 2. 09. check_subscriber ``` pg_log_debug("subscriber: primary_slot_name: %s", primary_slot_name); ``` The output seems strange if the primary_slot_name is not set. 10. setup_publisher() ``` PGconn *conn; PGresult *res; ``` Definitions can be in the loop. 11. create_publication() ``` if (PQntuples(res) == 1) { /* * If publication name already exists and puballtables is true, let's * use it. A previous run of pg_createsubscriber must have created * this publication. Bail out. */ ``` Hmm, but pre-existing publications may not send INSERT/UPDATE/DELETE/TRUNCATE. They should be checked if we really want to reuse. (I think it is OK to just raise ERROR) 12. create_publication() Based on above, we do not have to check before creating publicatios. The publisher can detect the duplication. I prefer it. 13. create_logical_replication_slot() ``` if (PQresultStatus(res) != PGRES_TUPLES_OK) { pg_log_error("could not create replication slot \"%s\" on database \"%s\": %s", slot_name, dbinfo->dbname, PQresultErrorMessage(res)); return lsn; } ``` I know lsn is always NULL, but can we use `return NULL`? 14. setup_subscriber() ``` PGconn *conn; ``` This definition can be in the loop. 15. You said in case of failure, cleanups is not needed if the process exits soon [1]. But some functions call PQfinish() then exit(1) or pg_fatal(). Should we follow? 16. Some places refer PGresult or PGConn even after the cleanup. They must be fixed. ``` PQclear(res); disconnect_database(conn); pg_fatal("could not get system identifier: %s", PQresultErrorMessage(res)); ``` I think this is a root cause why sometimes the wrong error message has output. 17. Some places call PQerrorMessage() and other places call PQresultErrorMessage(). I think it PQerrorMessage() should be used only after the connection establishment functions. Thought? 18. 041_pg_createsubscriber_standby.pl ``` use warnings; ``` We must set "FATAL = all"; 19. ``` my $node_p; my $node_f; my $node_s; my $node_c; my $result; my $slotname; ``` I could not find forward declarations in perl file. The node name might be bit a consuging, but I could not find better name. 20. ``` # On node P # - create databases # - create test tables # - insert a row # - create a physical replication slot $node_p->safe_psql( 'postgres', q( CREATE DATABASE pg1; CREATE DATABASE pg2; )); $node_p->safe_psql('pg1', 'CREATE TABLE tbl1 (a text)'); $node_p->safe_psql('pg1', "INSERT INTO tbl1 VALUES('first row')"); $node_p->safe_psql('pg2', 'CREATE TABLE tbl2 (a text)'); my $slotname = 'physical_slot'; $node_p->safe_psql('pg2', "SELECT pg_create_physical_replication_slot('$slotname')"); ``` I think setting of the same node can be gathered into one place. Also, any settings and definitions should be done just before they are used. 21. ``` $node_s->append_conf( 'postgresql.conf', qq[ log_min_messages = debug2 primary_slot_name = '$slotname' ]); ``` I could not find a reason why we set to debug2. 22. ``` command_fails ``` command_checks_all() can check returned value and outputs. Should we use it? 23. Can you add headers for each testcases? E.g., ``` # ------------------------------ # Check pg_createsubscriber fails when the target server is not a # standby of the source. ... # ------------------------------ # Check pg_createsubscriber fails when the target server is not running ... # ------------------------------ # Check pg_createsubscriber fails when the target server is a member of # the cascading standby. ... # ------------------------------ # Check successful dry-run ... # ------------------------------ # Check successful conversion ``` 24. ``` # Stop node C $node_c->teardown_node; ... $node_p->stop; $node_s->stop; $node_f->stop; ``` Why you choose the teardown? 25. The creation of subscriptions are not directory tested. @subnames contains the name of subscriptions, but it just assumes the number of them is more than two. Since it may be useful, I will post top-up patch on Monday, if there are no updating. [1]: https://www.postgresql.org/message-id/89ccf72b-59f9-4317-b9fd-1e6d20a0c3b1%40app.fastmail.com [2]: https://www.postgresql.org/message-id/TYCPR01MB1207713BEC5C379A05D65E342F54B2%40TYCPR01MB12077.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/global/