On Friday, April 28, 2023 2:18 PM Masahiko Sawada <[email protected]> wrote: > > On Fri, Apr 28, 2023 at 11:51 AM Amit Kapila <[email protected]> wrote: > > > > On Wed, Apr 26, 2023 at 4:11 PM Zhijie Hou (Fujitsu) > > <[email protected]> wrote: > > > > > > On Wednesday, April 26, 2023 5:00 PM Alexander Lakhin > <[email protected]> wrote: > > > > > > > > IIUC, that assert will fail in case of any error raised between > > > > > ApplyWorkerMain()->logicalrep_worker_attach()->before_shmem_exit() and > > > > > ApplyWorkerMain()->InitializeApplyWorker()->BackgroundWorkerInitializeC > > > > onnectionByOid()->InitPostgres(). > > > > > > Thanks for reporting the issue. > > > > > > I think the problem is that it tried to release locks in > > > logicalrep_worker_onexit() before the initialization of the process is > complete > > > because this callback function was registered before the init phase. So I > think we > > > can add a conditional statement before releasing locks. Please find an > attached > > > patch. > > > > > > > Alexander, does the proposed patch fix the problem you are facing? > > Sawada-San, and others, do you see any better way to fix it than what > > has been proposed? > > I'm concerned that the idea of relying on IsNormalProcessingMode() > might not be robust since if we change the meaning of > IsNormalProcessingMode() some day it would silently break again. So I > prefer using something like InitializingApplyWorker, or another idea > would be to do cleanup work (e.g., fileset deletion and lock release) > in a separate callback that is registered after connecting to the > database.
Thanks for the review. I agree that it’s better to use a new variable here.
Attach the patch for the same.
>
> FWIW, we might need to be careful about the timing when we call
> logicalrep_worker_detach() in the worker's termination process. Since
> we rely on IsLogicalParallelApplyWorker() for the parallel apply
> worker to send ERROR messages to the leader apply worker, if an ERROR
> happens after logicalrep_worker_detach(), we will end up with the
> assertion failure.
>
> if (IsLogicalParallelApplyWorker())
> SendProcSignal(pq_mq_parallel_leader_pid,
> PROCSIG_PARALLEL_APPLY_MESSAGE,
> pq_mq_parallel_leader_backend_id);
> else
> {
> Assert(IsParallelWorker());
>
> It normally would be a should-no-happen case, though.
Yes, I think currently PA sends ERROR message before exiting,
so the callback functions are always fired after the above code which
looks fine to me.
Best Regards,
Hou zj
v2-0001-Fix-assert-failure-in-logical-replication-apply-w.patch
Description: v2-0001-Fix-assert-failure-in-logical-replication-apply-w.patch
