Re: [pve-devel] [PATCH storage] Use bigger timeouts for zfs operations

Fabian Ebner Thu, 10 Oct 2019 04:22:48 -0700

On 10/10/19 10:29 AM, Fabian Grünbichler wrote:

On October 10, 2019 8:55 am, Fabian Ebner wrote:

On 10/1/19 12:28 PM, Fabian Grünbichler wrote:

On October 1, 2019 12:17 pm, Fabian Ebner wrote:

Seems like 'zfs destroy' can take longer than 5 seconds, see [0].
I changed the timeout to 15 seconds and also changed the default
timeout to 10 instead of 5 seconds, to be on the safe side
for other commands like 'zfs create'.


[0]: 
https://forum.proxmox.com/threads/timeout-beim-l%C3%B6schen-des-entfernen-replikats-bei-entfernung-der-replikation.58467/

NAK, we have a 30s timeout for synchronous API requests that call this,
and they might do more than 2 zfs_requests. we'd rather timeout and have
time to do error handling, than finish successfully sometimes, and die
without cleanup other times.

the real solution for this is to convert the remaining synchronous API
calls that trigger storage operations to async ones, and switch the GUI
over to use the new variants. we had previous discussions about this
issue already. to implement it really nice, we'd need to things:
- tasks with structured return value (to implement async content listing
    and similar operations, and make conversion of sync to async API calls
    easier)
- light-weight / ephemeral tasks (not included in regular task
    lists/log, cleaned up after final poll / 1 day after finishing / ...)

in case of the mentioned report, please investigate whether this call in
'pvesr' context is wrongly treated as sync API call (i.e., is_worker
should maybe return true for pvesr calls?)

I looked at setting the worker flag for the (whole) pvesr environment,
but it seems a
bit much. E.g. in prepare we need to delete stale snapshots before
starting the replication
and we probably don't want to wait all too long for that to complete,
i.e. not
have the worker flag when calling volume_snapshot_delete.

The situation is different with the vdisk_free call, since there we
don't need to
sit and wait for the result, so there it would make sense to be treated
as a worker.
So we could set the worker flag locally before that call and use our own
timeout,
but it feels like a bad workaround. Also we can't use run_with_timeout
to set our own
timeout, since run_command is called inside vdisk_free, which resets the
alarm.

but we could use run_fork_with_timeout ;)

Since we have a mechanism to spawn a worker in vdisk_free already
(currently only
used by LVM with saferemove), wouldn't it make sense to use that for zfs
as well? So
having free_image in the zfs plugin return a subroutine instead of
executing zfs destroy
directly. Also other callers of vdisk_free should be fine with such a
change, since it already
can happen that vdisk_free spawns a worker (but I haven't looked in detail).

but that is the non-default behaviour, not the default one. I'd rather
not switch all ZFS vdisk removals to forking a task if there are other
solutions.

AFAICT, this vdisk_free happens only for previously, but no-longer
replicated volumes? a simple solution might be to re-factor 139ff of
pvesr.pm to fork a worker if any vdisks need to be removed, and remove
them all asynchronously in a single task?

OTOH, if vdisk_free runs into this problem, it's only a question of load
for other zfs_requests (like listing volumes, snapshots, creating
snapshots, removing snapshots) to also run into the low timeouts.. so
the big question still remains - do we want to increase those timeouts
in general for pvesr, and if we do, how do we do it?

We might think of it as a feature ;) since on a zfs which is already
under high load pvesr "knows" that now is not a good time to do
a replication.

Contrary to most of the other zfs_requests here, pvesr doesn't
actually need to wait for successful removal of stale disks and
isn't it also the most likely operation to time out?
So I like your proposed simple solution.

It would introduce a bit of noise since it would create a task on every
vdisk_free for a zfs volume.
There the light-weight tasks you mentioned would be ideal, but for now
we don't have those.
And maybe the timeout is hit very rarely and so it's not worth the
change, what do you think?

Signed-off-by: Fabian Ebner <f.eb...@proxmox.com>
---
   PVE/Storage/ZFSPoolPlugin.pm | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/PVE/Storage/ZFSPoolPlugin.pm b/PVE/Storage/ZFSPoolPlugin.pm
index f66b277..3ce06be 100644
--- a/PVE/Storage/ZFSPoolPlugin.pm
+++ b/PVE/Storage/ZFSPoolPlugin.pm
@@ -182,7 +182,7 @@ sub zfs_request {
       my $msg = '';
       my $output = sub { $msg .= "$_[0]\n" };

- $timeout = PVE::RPCEnvironment->is_worker() ? 60*60 : 5 if !$timeout;

+    $timeout = PVE::RPCEnvironment->is_worker() ? 60*60 : 10 if !$timeout;

run_command($cmd, errmsg => "zfs error", outfunc => $output, timeout => $timeout);@@ -346,7 +346,7 @@ sub zfs_delete_zvol {for (my $i = 0; $i < 6; $i++) {- eval { $class->zfs_request($scfg, undef, 'destroy', '-r', "$scfg->{pool}/$zvol"); };

+       eval { $class->zfs_request($scfg, 15, 'destroy', '-r', 
"$scfg->{pool}/$zvol"); };
        if ($err = $@) {
            if ($err =~ m/^zfs error:(.*): dataset is busy.*/) {
                sleep(1);
--
2.20.1


_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] [PATCH storage] Use bigger timeouts for zfs operations

Reply via email to