On Wed, May 17, 2023 at 7:18 AM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > Currently, the main loop of apply worker looks like below[1]. Since there are > two loops, the inner loop will keep receiving and applying message from > publisher until no more message left. The worker only reloads the > configuration in > the outer loop. This means if the publisher keeps sending messages (it could > keep sending multiple transactions), the apply worker won't get a chance to > update the GUCs. >
Apart from that, I think in rare cases, it seems possible that after the apply worker has waited for the data and just before it receives the new replication data/message, the reload happens, then it won't get a chance to process the reload before processing the new message. I think such a theory can explain the rare BF failure you pointed out later in the thread. Does that make sense? > [1] > for(;;) /* outer loop */ > { > for(;;) /* inner loop */ > { > len = walrcv_receive() > if (len == 0) > break; > ... > apply change > } > > ... > if (ConfigReloadPending) > { > ConfigReloadPending = false; > ProcessConfigFile(PGC_SIGHUP); > } > ... > } > > I think it would be better that the apply worker can reflect user's > configuration changes sooner. To achieve this, we can add one more > ProcessConfigFile() call in the inner loop. Attach the patch for the same. > What > do you think ? > I think it appears to somewhat match what Tom said in the third point in his email [1]. > BTW, I saw one BF failure[2] (it's very rare and only happened once in 4 > months) which I think is due to the low frequent reload in apply worker. > > The attached tap test shows how the failure happened. > I haven't yet tried to reproduce it but will try later sometime. Thanks for your analysis. [1] - https://www.postgresql.org/message-id/2138662.1623460441%40sss.pgh.pa.us -- With Regards, Amit Kapila.