I continue to get spurious errors like:

        Subject: Replication Job: 121-0 failed

  command 'zfs snapshot 
rpool-data/vm-121-disk-0@__replicate_121-0_1666288805__' failed: got timeout

I'm convinced that :

1) are io-bound, not network-bound; if i limit the bandwith of the replica
  to some indecent value (eg, 5 Mbit/s) they still happen.

2) they are totally self-healing and benign


Practically if the IO is under stress (for example: for a running backup)
the perl PVE code timeout waiting a reply for an operation that indeed
succeed, only not on the specified time.


Loking at log i've also found:

        Oct 21 02:30:25 pppve2 pvesr[19291]: command 'zfs destroy 
rpool-data/vm-128-disk-1@__replicate_128-0_1666297807__' failed: got timeout

so destroy operation still tiemout, but PVE does not send email complaining
about them. And snapshot get correctly deleted, indeed:

 root@pppve2:~# zfs list -t snapshot | grep _128
 rpool-data/vm-128-disk-1@__replicate_128-0_1666312205__       378K      -     
2.02G  -
 rpool/data/vm-128-disk-0@__replicate_128-0_1666312205__      50.2M      -     
19.7G  -


I am right?! I can fire up a bug for that?


Thanks.

-- 
  ...il ponte di Messina unirà «non due coste ma due cosche».
                                                        (Niki Vendola)



_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to