On Sun, Nov 27, 2022 at 9:43 AM houzj.f...@fujitsu.com <houzj.f...@fujitsu.com> wrote: > > Attach the new version patch which addressed all comments so far. >
Few comments on v52-0001* ======================== 1. pa_free_worker() { ... + /* Free the worker information if the worker exited cleanly. */ + if (!winfo->error_mq_handle) + { + pa_free_worker_info(winfo); + + if (winfo->in_use && + !hash_search(ParallelApplyWorkersHash, &xid, HASH_REMOVE, NULL)) + elog(ERROR, "hash table corrupted"); pa_free_worker_info() pfrees the winfo, so how is it legal to winfo->in_use in the above check? Also, why is this check (!winfo->error_mq_handle) required in the first place in the patch? The worker exits cleanly only when the leader apply worker sends a SIGINT signal and in that case, we already detach from the error queue and clean up other worker information. 2. +HandleParallelApplyMessages(void) +{ ... ... + foreach(lc, ParallelApplyWorkersList) + { + shm_mq_result res; + Size nbytes; + void *data; + ParallelApplyWorkerInfo *winfo = (ParallelApplyWorkerInfo *) lfirst(lc); + + if (!winfo->error_mq_handle) + continue; Similar to the previous comment, it is not clear whether we need this check. If required, can we add a comment to indicate the case where it happens to be true? Note, there is a similar check for winfo->error_mq_handle in pa_wait_for_xact_state(). Please add some comments if that is required. 3. Why is there apply_worker_clean_exit() at the end of ParallelApplyWorkerMain()? Normally either the leader worker stops parallel apply, or parallel apply gets stopped because of a parameter change, or exits because of error, and in none of those cases it can hit this code path unless I am missing something. Additionally, I think in LogicalParallelApplyLoop, we will never receive zero-length messages so that is also wrong and should be converted to elog(ERROR,..). 4. I think in logicalrep_worker_detach(), we should detach from the shm error queue so that the parallel apply worker won't try to send a termination message back to the leader worker. 5. pa_send_data() { ... + if (startTime == 0) + startTime = GetCurrentTimestamp(); ... What is the use of getting the current timestamp before waitlatch logic, if it is not used before that? It seems that is for the time logic to look correct. We can probably reduce the 10s interval to 9s for that. In this function, we need to add some comments to indicate why the current logic is used, and also probably we can refer to the comments atop this file. 6. I think it will be better if we keep stream_apply_worker local to applyparallelworker.c by exposing functions to cache/resetting the required info. 7. Apart from the above, I have made a few changes in the comments and some miscellaneous cosmetic changes in the attached. Kindly include these in the next version unless you see a problem with any change. -- With Regards, Amit Kapila.
changes_amit_v52_1.patch
Description: Binary data
changes_amit_v52_1.patch
Description: Binary data