Am 18.02.26 um 7:33 PM schrieb Thomas Lamprecht:
> Am 18.02.26 um 16:45 schrieb Fiona Ebner:
>> If the lock directory is not removed after failing because of a
>> signal, it won't be possible to acquire the lock anymore before the
>> 120 second timeout imposed on the lock by pmxcfs. This can easily
>> happen by a second, unrelated task in production and is quite
>> surprising. Install a signal handler that releases the lock if it was
>> already acquired. If an old handler is defined, it is invoked,
>> otherwise the signal is raised again. Just using 'die' would change
>> the execution flow compared to before the change.
>>
>> Signed-off-by: Fiona Ebner <[email protected]>
>> ---
>>  src/PVE/Cluster.pm | 16 ++++++++++++++++
>>  1 file changed, 16 insertions(+)
>>
>> diff --git a/src/PVE/Cluster.pm b/src/PVE/Cluster.pm
>> index bdb465f..7165d1c 100644
>> --- a/src/PVE/Cluster.pm
>> +++ b/src/PVE/Cluster.pm
>> @@ -615,6 +615,22 @@ my $cfs_lock = sub {
>>  
>>      my $is_code_err = 0;
>>      eval {
>> +        # catch signals to release the lock - further defer to old handler 
>> if one was set
>> +        my $old_sig;
>> +        $old_sig->{$_} = $SIG{$_} for qw(INT TERM QUIT HUP PIPE);
> 
> really a non-issue in practice and basically the same thing under the hood, 
> but
> this could probably just a map, something like (untested):
> 
> my $old_sig = { map { $_ => $SIG{$_} qw(INT TERM QUIT HUP PIPE) };

Will do!

>> +
>> +        local $SIG{INT} = local $SIG{TERM} = local $SIG{QUIT} = local 
>> $SIG{HUP} =
>> +            local $SIG{PIPE} = sub {
>> +                my $signame = $_[0];
>> +                rmdir $filename if $got_lock; # if we held the lock always 
>> unlock again
> 
> Could be nice to output a warning if above rmdir fails?

Good point! Will also add it to the original line I copied this from.

>> +                if ($old_sig->{$signame}) {
>> +                    $old_sig->{$signame}->(@_);
>> +                } else {
>> +                    $SIG{$signame} = 'DEFAULT';
>> +                    POSIX::raise($signame);
> 
> hmm, this reads alright, but then I'm wondering if it should be added 
> elsewhere?
> As I found not a single "POSIX::raise" or "raise\(" instance in our perl code
> inside the /usr/share/perl5/{PVE,Proxmox} directories on a recent PVE 9 
> system, but
> we have quite a few signal overrides, and while I did not checked those, I do 
> believe
> to remember that some of those fallback to the handler defined by the calling 
> site.

The only ones I found that do invoke the previous handler are in
PVE::Daemon. They also do not use raise, but terminate the server.

For some other ones it's most likely intentional to convert the signal
to a simple die. For example PVE:VZDump::QemuServer, where it makes
sense to just catch the signal and proceed with aborting the backup
rather than raise it again.

Compared to those, cfs_lock() is quite low in the call chains and there
are callers that just warn about an error from cfs_lock(). So while it
is essential to not convert a signal to a simple die in cfs_lock(), it
might not be for other current signal overrides.

> Describing how exactly the code flow changes would be nice in any case.

Do you mean expanding on the sentence mentioning "code flow" in the
commit message or something else?

>> +                }
>> +                die "interrupted by signal\n";
>> +            };
>>  
>>          mkdir $lockdir;
>>  




Reply via email to