On Mon, Mar 18, 2024, at 2:43 AM, Hayato Kuroda (Fujitsu) wrote: > Thanks for updating the patch. Here are my comments. > I used Grammarly to proofread sentences. > (The tool strongly recommends to use active voice, but I can ignore for now)
Thanks for another review. I posted a new patch (v32) that hopefully addresses these points. > 01. > > "After a successful run, the state of the target server is analagous to a > fresh > logical replication setup." > a/analagous/analogous Fixed. > 02. > > "The main difference between the logical replication setup and > pg_createsubscriber > is the initial data copy." > > Grammarly suggests: > "The initial data copy is the main difference between the logical replication > setup and pg_createsubscriber." Not fixed. > 03. > > "Only the synchronization phase is done, which ensures each table is brought > up > to a synchronized state." > > This sentence is not very clear to me. How about: > "pg_createsubscriber does only the synchronization phase, ensuring each > table's > replication state is ready." I avoided pg_createsubscriber at the beginning because it is already used in the previous sentence. I kept the last part of the sentence because it is similar to one in the logical replication [1]. > 04. > > "The pg_createsubscriber targets large database systems because most of the > execution time is spent making the initial data copy." > > Hmm, but initial data sync by logical replication also spends most of time to > make the initial data copy. IIUC bottlenecks are a) this application must stop > and start server several times, and b) only the serial copy works. Can you > clarify them? Reading the sentence again, it is not clear. When I said "most of execution time" I was referring to the actual logical replication setup. > 05. > > It is better to say the internal difference between pg_createsubscriber and > the > initial sync by logical replication. For example: > pg_createsubscriber uses a physical replication mechanism to ensure the > standby > catches up until a certain point. Then, it converts to the standby to the > subscriber by promoting and creating subscriptions. Isn't it better to leave these details to "How It Works"? > 06. > > "If these are not met an error will be reported." > > Grammarly suggests: > "If these are not met, an error will be reported." Fixed. > 07. > > "The given target data directory must have the same system identifier than the > source data directory." > > Grammarly suggests: > "The given target data directory must have the same system identifier as the > source data directory." Fixed. > 08. > > "If a standby server is running on the target data directory or it is a base > backup from the source data directory, system identifiers are the same." > > This line is not needed if bullet-style is not used. The line is just a > supplement, > not prerequisite. Fixed. > 09. > > "The source server must accept connections from the target server. The source > server must not be in recovery." > > Grammarly suggests: > "The source server must accept connections from the target server and not be > in recovery." Not fixed. > 10. > > "Publications cannot be created in a read-only cluster." > > Same as 08, this line is not needed if bullet-style is not used. Fixed. > 11. > > "pg_createsubscriber usually starts the target server with different > connection > settings during the transformation steps. Hence, connections to target server > might fail." > > Grammarly suggests: > "pg_createsubscriber usually starts the target server with different > connection > settings during transformation. Hence, connections to the target server might > fail." Fixed. > 12. > > "During the recovery process," > > Grammarly suggests: > "During recovery," Not fixed. Our documentation uses "recovery process". > 13. > > "replicated so an error would occur." > > Grammarly suggests: > "replicated, so an error would occur." I didn't find this one. Maybe you checked a previous version. > 14. > > "It would avoid situations in which WAL files from the source server might be > used by the target server." > > Grammarly suggests: > "It would avoid situations in which the target server might use WAL files from > the source server." Fixed. > 15. > > "a LSN" > > s/a/an Fixed. > 16. > > "of write-ahead" > > s/of/of the/ Fixed. > 17. > > "specifies promote" > > We can do double-quote for the word promote. Why? It is referring to recovery_target_action. If you check this GUC, you will notice that it also uses literal tag. > 18. > > "are also added so it avoids" > > Grammarly suggests: > "are added to avoid" Fixed. > 19. > > "is accepting read-write transactions" > > Grammarly suggests: > "accepts read-write transactions" Not fixed. > 20. > > New options must be also documented as well. This helps not only users but > also > reviewers. > (Sometimes we cannot identify that the implementation is intentinal or not.) I don't know what are you referring to? If the new options are --publication, --subscription and --replication-slot, they are documented. Are you checking the latest patch? > 21. > > Also, not sure the specification is good. I preferred to specify them by > format > string. Because it can reduce the number of arguments and I cannot find use > cases > which user want to control the name of objects. > > However, your approach has a benefit which users can easily identify the > generated > objects by pg_createsubscriber. How do other think? I prefer explicit options. We can always expand it later if people think it is a good idea to provide a format string. > 22. > > ``` > #define BASE_OUTPUT_DIR "pg_createsubscriber_output.d" > ``` > > No one refers the define. It was removed in v30. > 23. > > ``` > } CreateSubscriberOptions; > ... > } LogicalRepInfo; > ``` > > Declarations after the "{" are not needed, because we do not do typedef. It is a leftover when I removed the typedef. > 22. > > While seeing definitions of functions, I found that some pointers are declared > as const, but others are not. E.g., "char *lsn" in setup_recovery() won' be > changed but not the constant. Is it just missing or is there another rule? It slipped my mind. Peter's fixups improves it. > 23. > > ``` > static int num_dbs = 0; > static int num_pubs = 0; > static int num_subs = 0; > static int num_replslots = 0; > ``` > > I think the name is bit confusing. The number of generating > publications/subscriptions/replication slots > are always same as the number of databases. They just indicate the number of > specified. > > My idea is num_custom_pubs or something. Thought? What does "custom" add to make the name clear? I added comments saying so. > 24. > > ``` > /* standby / subscriber data directory */ > static char *subscriber_dir = NULL; > ``` > > It is bit strange that only subscriber_dir is a global variable. Caller > requires > the CreateSubscriberOptions as an argument, except cleanup_objects_atexit() > and > main. So, how about makeing `CreateSubscriberOptions opt` to global one? I avoided turning all the options global variables. Since the cleanup routine required the target data directory to be a global variable, I just did it and left the others alone. > 25. > > ``` > * Replication slots, publications and subscriptions are created. Depending on > * the step it failed, it should remove the already created objects if it is > * possible (sometimes it won't work due to a connection issue). > ``` > > I think it should be specified here that subscriptions won't be removed with > the > reason. I rephrased this comment. > 26. > > ``` > > /* > * If the server is promoted, there is no way to use the current setup > * again. Warn the user that a new replication setup should be done before > * trying again. > */ > ``` > > Per comment 25, we can add a reference like "See comments atop the function" It is a few lines above. I don't think you have to point it out. If you are unsure about this decision, you should check the whole function. > 27. > > usage() was not updated based on recent changes. Check v30. > 28. > > ``` > if (strcmp(conn_opt->keyword, "dbname") == 0 && conn_opt->val != NULL) > { > if (dbname) > *dbname = pg_strdup(conn_opt->val); > continue; > } > ``` > > There is a memory-leak if multiple dbname are specified in the conninfo. It is not a worrying or critical memory leak. > 29. > > ``` > pg_prng_seed(&prng_state, (uint64) (getpid() ^ time(NULL))); > ``` > > No need to initialize the seed every time. Can you reuse pg_prng_state? Sure. > 30. > > ``` > if (num_replslots == 0) > dbinfo[i].replslotname = pg_strdup(genname); > ``` > > I think the straightforward way is to use the name of subscription if no name > is specified. This follows the rule for CREATE SUBSCRIPTION. Agreed. > 31. > > ``` > /* Create replication slot on publisher */ > if (lsn) > pg_free(lsn); > ``` > > I think allocating/freeing memory is not so efficient. > Can we add a flag to create_logical_replication_slot() for controlling the > returning value (NULL or duplicated string)? We can use the condition (i == > num_dbs-1) > as flag. It is not. This code path is not critical. You are suggesting to add complexity here. Efficiency is a good goal but in this case it only adds complexity with small return. > 32. > > ``` > /* > * Close the connection. If exit_on_error is true, it has an undesired > * condition and it should exit immediately. > */ > static void > disconnect_database(PGconn *conn, bool exit_on_error) > ``` > > In case of disconnect_database(), the second argument should have different > name. > If it is true, the process exits unconditionally. > Also, comments atop the function must be fixed. I choose a short name. The comment seems ok to me. > > 33. > > ``` > wal_level = strdup(PQgetvalue(res, 0, 0)); > ``` > > pg_strdup should be used here. Fixed. > 34. > > ``` > {"config-file", required_argument, NULL, 1}, > {"publication", required_argument, NULL, 2}, > {"replication-slot", required_argument, NULL, 3}, > {"subscription", required_argument, NULL, 4}, > ``` > > The ordering looks strange for me. According to pg_upgarade and pg_basebackup, > options which do not have short notation are listed behind. Fixed. > 35. > > ``` > opt.sub_port = palloc(16); > ``` > > Per other lines, pg_alloc() should be used. I think you meant pg_malloc. Fixed. > 36. > > ``` > pg_free(opt.sub_port); > ``` > > You said that the leak won't be concerned here. If so, why only 'p' has > pg_free()? Fixed. > 37. > > ``` > /* Register a function to clean up objects in case of failure */ > atexit(cleanup_objects_atexit); > ``` > > Sorry if we have already discussed. I think the registration can be moved just > before the boot of the standby. Before that, the callback will be no-op. The main reason is to catch future cases added *before* the point you want to move this call that requires a cleanup. As you said it is a no-op. My preference for atexit() calls is to add it as earlier as possible to avoid leaving cases that it should trigger. > 38. > > ``` > /* Subscriber PID file */ > snprintf(pidfile, MAXPGPATH, "%s/postmaster.pid", subscriber_dir); > > /* > * If the standby server is running, stop it. Some parameters (that can > * only be set at server start) are informed by command-line options. > */ > if (stat(pidfile, &statbuf) == 0) > ``` > > Hmm. pidfile is used only here, but it is declared in main(). Can it be > separated into another funtion like is_standby_started()? It is so small that I didn't bother adding a new function for it. > 39. > > Or, we may able to introcue "restart_standby_if_needed" or something. > > 40. > > ``` > * XXX this code was extracted from BootStrapXLOG(). > ``` > > So, can we extract the common part to somewhere? Since system identifier is > related > with the controldata file, I think it can be located in controldata_util.c. I added this comment here as a reference from where I extracted the code. The referred function is from backend. Feel free to propose a separate patch for it. > 41. > > You said like below in [1], but I could not find the related fix. Can you > clarify? > > > That's a good point. We should state in the documentation that GUCs > > specified in > > the command-line options are ignored during the execution. I added a sentence for it. See "How It Works". [1] https://www.postgresql.org/docs/current/logical-replication-architecture.html#LOGICAL-REPLICATION-SNAPSHOT -- Euler Taveira EDB https://www.enterprisedb.com/