Am 18.02.26 um 16:45 schrieb Fiona Ebner:
> If the lock directory is not removed after failing because of a
> signal, it won't be possible to acquire the lock anymore before the
> 120 second timeout imposed on the lock by pmxcfs. This can easily
> happen by a second, unrelated task in production and is quite
> surprising. Install a signal handler that releases the lock if it was
> already acquired. If an old handler is defined, it is invoked,
> otherwise the signal is raised again. Just using 'die' would change
> the execution flow compared to before the change.
>
> Signed-off-by: Fiona Ebner <[email protected]>
> ---
> src/PVE/Cluster.pm | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/src/PVE/Cluster.pm b/src/PVE/Cluster.pm
> index bdb465f..7165d1c 100644
> --- a/src/PVE/Cluster.pm
> +++ b/src/PVE/Cluster.pm
> @@ -615,6 +615,22 @@ my $cfs_lock = sub {
>
> my $is_code_err = 0;
> eval {
> + # catch signals to release the lock - further defer to old handler
> if one was set
> + my $old_sig;
> + $old_sig->{$_} = $SIG{$_} for qw(INT TERM QUIT HUP PIPE);
really a non-issue in practice and basically the same thing under the hood, but
this could probably just a map, something like (untested):
my $old_sig = { map { $_ => $SIG{$_} qw(INT TERM QUIT HUP PIPE) };
> +
> + local $SIG{INT} = local $SIG{TERM} = local $SIG{QUIT} = local
> $SIG{HUP} =
> + local $SIG{PIPE} = sub {
> + my $signame = $_[0];
> + rmdir $filename if $got_lock; # if we held the lock always
> unlock again
Could be nice to output a warning if above rmdir fails?
> + if ($old_sig->{$signame}) {
> + $old_sig->{$signame}->(@_);
> + } else {
> + $SIG{$signame} = 'DEFAULT';
> + POSIX::raise($signame);
hmm, this reads alright, but then I'm wondering if it should be added elsewhere?
As I found not a single "POSIX::raise" or "raise\(" instance in our perl code
inside the /usr/share/perl5/{PVE,Proxmox} directories on a recent PVE 9 system,
but
we have quite a few signal overrides, and while I did not checked those, I do
believe
to remember that some of those fallback to the handler defined by the calling
site.
Describing how exactly the code flow changes would be nice in any case.
> + }
> + die "interrupted by signal\n";
> + };
>
> mkdir $lockdir;
>