Re: non-exclusive backup cleanup is mildly broken

Robert Haas Tue, 17 Dec 2019 19:20:24 -0800

On Tue, Dec 17, 2019 at 6:40 PM Michael Paquier <mich...@paquier.xyz> wrote:
> So that's how you prevent piling up multiple registrations of this
> callback compared to v1.  FWIW, I think that it is a cleaner approach
> to remove the callback once a non-exclusive backup is done, because a
> session has no need for it once it is done with its non-exclusive
> backup, and this session may remain around for some time.


The fact that the session may remain around for some time isn't really
relevant, because the callback isn't consuming any resources. It does
not increase memory usage by a single byte, nor CPU consumption
either. It does consume a few CPU cycles when the backend finally
exits, but the number of such cycles is very small and unrelated to
the length of the session. And removing the callback isn't entirely
free, either.

I think the real point for me is that it's bad to have two sources of
truth. We have the sessionBackupState variable that tells us whether
we're in a backup, and then we separately have whether or not the
callback is registered. If those two things ever get out of sync, as
they do in the test cases that I've proposed, then we have problems --
so it's better not to maintain the state in two separate ways.

The way it's set up right now actually seems quite fragile even apart
from the problem with cancel_before_shmem_exit(). do_pg_stop_backup()
sets sessionBackupState to SESSION_BACKUP_NONE and then does things
that might fail. If they do, then the cancel_before_shmem_exit()
callback will leave the callback installed, which can lead to a
spurious warning or assertion failure later as in the original report.
The only way to avoid that problem would be to move the
cancel_before_shmem_exit() callback so that it happens right next to
setting sessionBackupState to SESSION_BACKUP_NONE -- note the comments
there saying we can't even do WALInsertLockRelease() between updating
XLogCtl and updating sessionBackupState. But that would be very ugly,
because we'd then have to pass a flag to do_pg_stop_backup() saying
whether to remove the callback, since only one caller wants that
behavior.

And, we'd have to change cancel_before_shmem_exit() to search the
whole array, which would almost certainly cost more cycles than
leaving a do-nothing callback around.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: non-exclusive backup cleanup is mildly broken

Reply via email to