Re: [pve-devel] [RFC container v2 22/25] backup: implement backup for external providers

2024-09-16 Thread Fiona Ebner
Am 13.09.24 um 08:19 schrieb Fabian Grünbichler:
>> Fiona Ebner  hat am 12.09.2024 15:38 CEST geschrieben:
>>  
>> Am 12.09.24 um 14:43 schrieb Fabian Grünbichler:
>>> On August 13, 2024 3:28 pm, Fiona Ebner wrote:
>>>> +  $info->{'firewall-config'} = $firewall_file if -e $firewall_file;
>>>> +  $info->{'bandwidth-limit'} = $opts->{bwlimit} * 1024 if 
>>>> $opts->{bwlimit};
>>>> +  $backup_provider->backup_container($vmid, $config_file, $id_map, 
>>>> $findexcl, $info);
>>>
>>> it might be easier to hide the idmapping from the backup provider? e.g.,
>>> hand it a idmapped bindmount or something like that?
>>>
>>
>> Yes, that would be nicer. But could that potentially lead to permission
>> issues? A mid/long term plan is to have the backup provider code run
>> with lower privileges. I suppose to later implement that, the subroutine
>> for the provider could run within a matching user namespace too?
> 
> yeah, I think there are a few options here
> - run the provider as root-in-user-ns, give it access to the mapped FS (this 
> is how we do regular backups, but requires some glue code/forking)

Gave this a try. Issue is that the backup provider also needs access to
the backup target/etc. Can network access also be an issue (I guess it
is not for PBS)?

E.g. directory example plugin fails with
> ERROR: Backup of VM 112 failed - unable to open file 
> '/mnt/pve/sparschwein/112/lxc-1726484790/guest.conf.tmp.125275' - Permission 
> denied
and Borg plugin fails with
> ERROR: Backup of VM 112 failed - mkdir /run/pve-storage-borg-plugin: 
> Permission denied at /usr/share/perl5/PVE/BackupProvider/Plugin/Borg.pm line 
> 41
or after switching to /tmp with
> ERROR: Backup of VM 112 failed - file '/etc/pve/priv/storage/borg.pw' exists 
> but open for reading failed - Permission denied

Less coupling with the associated storage plugin or a special kind of
"unprivileged" storage plugin would help. In PBS we do the
storage-plugin-related stuff first with root privileges and only run the
final pbs-client command in user namespace. Maybe we need something like
that here too, a preparatory method run as root that prepares for the
unprivileged backup operation? But that makes life more complicated for
provider implementers (and also us).

> - run the provider as root-on-host, give it access to a reverse-mapped FS 
> somehow (well, it would be nicer to run the backup code in the userns instead 
> of as root)

I'd try and go with this option for now if that is okay.

> - run the provider as root-on-host, give it access to the mapped FS and let 
> it handle the (un)mapping itself (if they are not familiar with namespaces, 
> this might go wrong)
> 
> so if we find a generic way to do the first variant, we are both closer to 
> how we do backups, and err on the side of caution w.r.t. context of execution.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [RFC container v2 23/25] backup: implement restore for external providers

2024-09-13 Thread Fiona Ebner
Am 13.09.24 um 08:35 schrieb Fabian Grünbichler:
> 
>> Fiona Ebner  hat am 12.09.2024 16:08 CEST geschrieben:
>>
>>  
>> Am 12.09.24 um 15:56 schrieb Fiona Ebner:
>>> Am 12.09.24 um 14:43 schrieb Fabian Grünbichler:
>>>> On August 13, 2024 3:28 pm, Fiona Ebner wrote:
>>>>> + } elsif ($mechanism eq 'directory') {
>>>>> + my $directory = $info->{'archive-directory'}
>>>>> + or die "did not get path to archive directory from backup 
>>>>> provider\n";
>>>>> + die "not a directory '$directory'" if !-d $directory;
>>>>> +
>>>>> + my $rsync = ['rsync', '--stats', '-h', '-X', '-A', '--numeric-ids', 
>>>>> '-aH', '--delete',
>>>>> + '--no-whole-file', '--sparse', '--one-file-system', 
>>>>> '--relative'];
>>>>> + push $rsync->@*, '--bwlimit', $bwlimit if $bwlimit;
>>>>> + push $rsync->@*, "${directory}/./", $rootdir;
>>>>
>>>> and this as well?
>>>>
>>>
>>> Good catch, will fix!
>>>
>>
>> Hmm, then rsync won't be able to access the source (for my Borg example)
>> anymore :/
>>
>> WARN: rsync: [sender] change_dir
>> "/run/pve-storage-borg-plugin/pve-lxc-111-2024-08-13T09:34:25Z.restore-container/filesystem"
>> failed: Permission denied (13)
>>
>> Wit restore_tar_archive we stream the contents via stdin, can't do that
>> here. But maybe some kind of bind mount to make it accessible?
> 
> or rsync-on-host piped to rsync-in-ns ? haven't tried though

Would that require setting up an rsync daemon process? Or how would you
achieve the split? The man page says that --server/sender should not be
used:

> INTERNAL OPTIONS
>The options --server and --sender are used internally by rsync, and 
> should never be typed by a user under normal circumstances.  Some awareness 
> of these options may be needed in certain  sce‐
>narios,  such  as when setting up a login that can only run an rsync 
> command.  For instance, the support directory of the rsync distribution has 
> an example script named rrsync (for restricted
>rsync) that can be used with a restricted ssh login.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] Bug 2582 roadmap

2024-09-13 Thread Fiona Ebner
Hi Pavel,

Am 10.09.24 um 12:18 schrieb Pavel Tide via pve-devel:
> Hi Proxmox team,
> 
> Would you provide any delivery estimates on this item?
> https://bugzilla.proxmox.com/show_bug.cgi?id=2582
> 
> As far as I understand it's been implemented already, but currently stays in 
> the development branch - our lab is up to date and yet we don't see how we 
> can create a separate non-root account to work with PVE cluster.
> 

a patch series had been proposed, but the implementation was not
finished AFAIK, see Fabian's review[0]. Since Oguz left the company,
nobody else has picked up work on the series yet. Maybe you can describe
what exactly your use case is, which privileges you'd need in
particular. Of course, proposing patches for what you need (or a rebased
version of Oguz's full series) is welcome too :)

[0]:
https://lore.proxmox.com/pve-devel/1658908014.zeyifvlr1o.astr...@nora.none/

Best Regards,
Fiona


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [RFC container v2 23/25] backup: implement restore for external providers

2024-09-12 Thread Fiona Ebner
Am 12.09.24 um 15:56 schrieb Fiona Ebner:
> Am 12.09.24 um 14:43 schrieb Fabian Grünbichler:
>> On August 13, 2024 3:28 pm, Fiona Ebner wrote:
>>> +   } elsif ($mechanism eq 'directory') {
>>> +   my $directory = $info->{'archive-directory'}
>>> +   or die "did not get path to archive directory from backup 
>>> provider\n";
>>> +   die "not a directory '$directory'" if !-d $directory;
>>> +
>>> +   my $rsync = ['rsync', '--stats', '-h', '-X', '-A', '--numeric-ids', 
>>> '-aH', '--delete',
>>> +   '--no-whole-file', '--sparse', '--one-file-system', 
>>> '--relative'];
>>> +   push $rsync->@*, '--bwlimit', $bwlimit if $bwlimit;
>>> +   push $rsync->@*, "${directory}/./", $rootdir;
>>
>> and this as well?
>>
> 
> Good catch, will fix!
> 

Hmm, then rsync won't be able to access the source (for my Borg example)
anymore :/

WARN: rsync: [sender] change_dir
"/run/pve-storage-borg-plugin/pve-lxc-111-2024-08-13T09:34:25Z.restore-container/filesystem"
failed: Permission denied (13)

Wit restore_tar_archive we stream the contents via stdin, can't do that
here. But maybe some kind of bind mount to make it accessible?


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [RFC container v2 23/25] backup: implement restore for external providers

2024-09-12 Thread Fiona Ebner
Am 12.09.24 um 14:43 schrieb Fabian Grünbichler:
> On August 13, 2024 3:28 pm, Fiona Ebner wrote:
>> @@ -118,6 +136,55 @@ sub restore_tar_archive {
>>  die $err if $err && !$no_unpack_error;
>>  }
>>  
>> +sub restore_external_archive {
>> +my ($backup_provider, $storeid, $volname, $rootdir, $conf, 
>> $no_unpack_error, $bwlimit) = @_;
>> +
>> +my ($mechanism, $vmtype) = 
>> $backup_provider->restore_get_mechanism($volname, $storeid);
>> +die "cannot restore non-LXC guest of type '$vmtype'\n" if $vmtype ne 
>> 'lxc';
>> +
>> +my $info = $backup_provider->restore_container_init($volname, $storeid, 
>> {});
>> +eval {
>> +if ($mechanism eq 'tar') {
>> +my $tar_path = $info->{'tar-path'}
>> +or die "did not get path to tar file from backup provider\n";
>> +die "not a regular file '$tar_path'" if !-f $tar_path;
>> +restore_tar_archive($tar_path, $rootdir, $conf, $no_unpack_error, 
>> $bwlimit);
> 
> shouldn't this be `lxc-userns-exec`-ed?
> 

The restore_tar_archive() function does that AFAICS.

>> +} elsif ($mechanism eq 'directory') {
>> +my $directory = $info->{'archive-directory'}
>> +or die "did not get path to archive directory from backup 
>> provider\n";
>> +die "not a directory '$directory'" if !-d $directory;
>> +
>> +my $rsync = ['rsync', '--stats', '-h', '-X', '-A', '--numeric-ids', 
>> '-aH', '--delete',
>> +'--no-whole-file', '--sparse', '--one-file-system', 
>> '--relative'];
>> +push $rsync->@*, '--bwlimit', $bwlimit if $bwlimit;
>> +push $rsync->@*, "${directory}/./", $rootdir;
> 
> and this as well?
> 

Good catch, will fix!

> also, for both tar and rsync we probably need to think about how to
> prevent bogus input here (which might be user-creatable if they have
> write access to the backup storage) from violating our assumptions..
> 
What assumptions do you mean exactly?


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [RFC container v2 22/25] backup: implement backup for external providers

2024-09-12 Thread Fiona Ebner
Am 12.09.24 um 14:43 schrieb Fabian Grünbichler:
> On August 13, 2024 3:28 pm, Fiona Ebner wrote:
>> +$info->{'firewall-config'} = $firewall_file if -e $firewall_file;
>> +$info->{'bandwidth-limit'} = $opts->{bwlimit} * 1024 if 
>> $opts->{bwlimit};
>> +$backup_provider->backup_container($vmid, $config_file, $id_map, 
>> $findexcl, $info);
> 
> it might be easier to hide the idmapping from the backup provider? e.g.,
> hand it a idmapped bindmount or something like that?
>

Yes, that would be nicer. But could that potentially lead to permission
issues? A mid/long term plan is to have the backup provider code run
with lower privileges. I suppose to later implement that, the subroutine
for the provider could run within a matching user namespace too?


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [RFC qemu-server v2 21/25] backup: implement restore for external providers

2024-09-12 Thread Fiona Ebner
Am 12.09.24 um 14:44 schrieb Fabian Grünbichler:
> On August 13, 2024 3:28 pm, Fiona Ebner wrote:
>> +
>> +# allocate volumes
>> +my $map = $restore_allocate_devices->($storecfg, $virtdev_hash, $vmid);
>> +
>> +for my $virtdev (sort keys $virtdev_hash->%*) {
>> +my $d = $virtdev_hash->{$virtdev};
>> +next if $d->{is_cloudinit}; # no need to restore cloudinit
>> +
>> +my $info =
>> +$backup_provider->restore_vm_volume_init($volname, $storeid, 
>> $d->{devname}, {});
>> +my $source_path = $info->{'qemu-img-path'}
>> +or die "did not get source image path from backup provider\n";
>> +eval {
>> +qemu_img_convert(
>> +$source_path, $d->{volid}, $d->{size}, undef, 0, 
>> $options->{bwlimit});
>> +};
> 
> this definitely needs to get a call to file_size_info with import
> hardening patches applied ;)
> 

Sure, added a reminder to myself for now :)


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [RFC storage v2 10/25] plugin: introduce new_backup_provider() method

2024-09-12 Thread Fiona Ebner
Am 12.09.24 um 14:43 schrieb Fabian Grünbichler:
> On August 13, 2024 3:28 pm, Fiona Ebner wrote:
>>
>> Container mechanism 'directory':
>>
>> The backup provider gives the path to a directory with the full
>> filesystem structure of the container.
>>
>> Container mechanism 'directory':
>>
>> The backup provider gives the path to a (potentially compressed) tar
>> archive with the full filesystem structure of the container.
> 
> same as in the cover letter ;) base on the code here I assume the second
> one should be tar. I wonder whether just tar wouldn't be enough (easier
> to not get ACLs/xattrs/ownership/.. right)?
> 

Yes, should be tar, will fix!

I'd guess it is more convenient for many providers to expose (a FUSE
mount of) a directory. Using tar can mean more work. E.g. with Borg,
mounting the archive seems much cheaper than creating the tar. It's also
that the archive looks like:
guest.config
firewall.config
filesystem/
and while borg has "export-tar" where one can specify specific paths,
the tar will still contain the "filesystem/" prefix. Not sure if there
is an easy way to get rid of that.

>> diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
>> index 6444390..d5b76ae 100644
>> --- a/src/PVE/Storage/Plugin.pm
>> +++ b/src/PVE/Storage/Plugin.pm
>> @@ -1755,6 +1755,21 @@ sub rename_volume {
>>  return "${storeid}:${base}${target_vmid}/${target_volname}";
>>  }
>>  
>> +# Used by storage plugins for external backup providers. See 
>> PVE::BackupProvider::Plugin for the API
>> +# the provider needs to implement.
>> +#
>> +# $scfg - the storage configuration
>> +# $storeid - the storage ID
>> +# $log_function($log_level, $message) - this log function can be used to 
>> write to the backup task
>> +#   log in Proxmox VE. $log_level is 'info', 'warn' or 'err', $message is 
>> the message to be printed.
>> +#
>> +# Returns a blessed reference to the backup provider class.
>> +sub new_backup_provider {
>> +my ($class, $scfg, $storeid, $log_function) = @_;
>> +
>> +return;
>> +}
> 
> would it maybe make sense to make this a "die implement me" and make the
> opt-in via the storage plugin features? it would be more in line with
> what we do in other parts and less subtle..
> 

We don't have a method for storage plugin features yet, only
volume_has_feature() and the stand-alone storage_can_replicate(). We
could generalize (and deprecate) the latter though.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [PATCH v2 container] fix #3367: prevent adding bind and device mountpoints to template

2024-09-12 Thread Fiona Ebner
There is a check in the clone API call that prohibits cloning when a
bind or device mountpoint is present. Converting to a template also
will fail when such a mountpoint is present. In particular, that
failure will also trigger at the end of a restore operation, because
restore will also try to convert to a template again. Add a safeguard
so that users won't get into the problematic situation in the first
place.

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* different approach with safeguard instead of silently dropping
  bind mounts during conversion to template

Previous discussion:
https://lore.proxmox.com/pve-devel/20210402123636.27037-1-f.eb...@proxmox.com/T/#u

 src/PVE/LXC/Config.pm | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm
index ce64c4c..09687ab 100644
--- a/src/PVE/LXC/Config.pm
+++ b/src/PVE/LXC/Config.pm
@@ -1181,7 +1181,13 @@ sub update_pct_config {
if ($opt =~ m/^mp(\d+)$/ || $opt eq 'rootfs') {
$class->check_protection($conf, "can't update CT $vmid drive 
'$opt'");
my $mp = $class->parse_volume($opt, $value);
-   $check_content_type->($mp) if ($mp->{type} eq 'volume');
+   if ($mp->{type} eq 'volume') {
+   $check_content_type->($mp);
+   } elsif ($class->is_template($conf)) {
+   # TODO allow bind mounts once admin-defined bind mount sources 
and corresponding ACL
+   # support is implemented
+   die "cannot add mountpoint '$opt' of type '$mp->{type}' to 
template\n";
+   }
} elsif ($opt eq 'hookscript') {
PVE::GuestHelpers::check_hookscript($value);
} elsif ($opt eq 'nameserver') {
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC v2 qemu] savevm-async: reuse migration blocker check for snapshots/hibernation

2024-09-11 Thread Fiona Ebner
Same rationale as with upstream QEMU commit 5aaac46793 ("migration:
savevm: consult migration blockers"), migration and (async) snapshot
are essentially the same operation and thus snapshot also needs to
check for migration blockers. For example, this catches passed-through
PCI devices, where the driver does not support migration.

However, the commit notes:

> There is really no difference between live migration and savevm, except
> that savevm does not require bdrv_invalidate_cache to be implemented
> by all disks.  However, it is unlikely that savevm is used with anything
> except qcow2 disks, so the penalty is small and worth the improvement
> in catching bad usage of savevm.

and for Proxmox VE, suspend-to-disk with VMDK does use savevm-async
and would be broken by simply using migration_is_blocked(). To keep
this working, introduce a new helper that filters blockers with the
prefix used by the VMDK migration blocker.

The function qemu_savevm_state_blocked() is called as part of
savevm_async_is_blocked() so no check is lost with this
patch. The helper is declared in migration/migration.c to be able to
access the 'migration_blockers'.

The VMDK blocker message is declared via a '#define', because using a
'const char*' led to the linker to complain about multiple
declarations. The message does not include the reference to the block
node anymore, but it would've just been a generated one like
'#block334' in any case, which didn't help users identify the VMDK
disk before the change either.

Note, this also "breaks" snapshot and hibernate with VNC clipboard by
preventing it. Previously, this would "work", because the Proxmox VE
API has no check yet, but the clipboard will be broken after rollback,
in the sense that it cannot be used anymore, not just lost contents.
So some users might consider adding the check here a breaking change
even if it's technically correct to prevent snapshot and hibernate
with VNC clipboard. But other users might rightfully complain about
broken clipboard. And again, the check also prevents blockers from
passed-through PCI devices, etc. so it seems worth tolerating that
breakage.

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* hard-code blocker message to get a conflict if it changes
  upstream.
* avoid making helper more general than it needs to be (had a
  parameter for matching the prefix of blocker messages).

 block/vmdk.c|  4 +---
 include/migration/blocker.h |  2 ++
 migration/migration.c   | 24 
 migration/migration.h   |  1 +
 migration/savevm-async.c|  2 +-
 5 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 3b82979fdf..eb15109620 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1403,9 +1403,7 @@ static int vmdk_open(BlockDriverState *bs, QDict 
*options, int flags,
 qemu_co_mutex_init(&s->lock);
 
 /* Disable migration when VMDK images are used */
-error_setg(&s->migration_blocker, "The vmdk format used by node '%s' "
-   "does not support live migration",
-   bdrv_get_device_or_node_name(bs));
+error_setg(&s->migration_blocker, "%s", MIGRATION_BLOCKER_VMDK);
 ret = migrate_add_blocker_normal(&s->migration_blocker, errp);
 if (ret < 0) {
 goto fail;
diff --git a/include/migration/blocker.h b/include/migration/blocker.h
index a687ac0efe..f36bfb2df1 100644
--- a/include/migration/blocker.h
+++ b/include/migration/blocker.h
@@ -18,6 +18,8 @@
 
 #define MIG_MODE_ALL MIG_MODE__MAX
 
+#define MIGRATION_BLOCKER_VMDK "The vmdk format used by a disk does not 
support live migration"
+
 /**
  * @migrate_add_blocker - prevent all modes of migration from proceeding
  *
diff --git a/migration/migration.c b/migration/migration.c
index b8d7e471a4..b18c3720b2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1913,6 +1913,30 @@ bool migration_is_blocked(Error **errp)
 return false;
 }
 
+bool savevm_async_is_blocked(Error **errp)
+{
+GSList *blockers = migration_blockers[migrate_mode()];
+
+if (qemu_savevm_state_blocked(errp)) {
+return true;
+}
+
+/*
+ * The limitation for VMDK images only applies to live-migration, not
+ * snapshots, see commit 5aaac46793 ("migration: savevm: consult migration
+ * blockers").
+ */
+while (blockers) {
+if (strcmp(error_get_pretty(blockers->data), MIGRATION_BLOCKER_VMDK)) {
+error_propagate(errp, error_copy(blockers->data));
+return true;
+}
+blockers = g_slist_next(blockers);
+}
+
+return false;
+}
+
 /* Returns true if continue to migrate, or false if error detected */
 static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
  

Re: [pve-devel] [PATCH v2 common] REST environment: warn helpers: also log messages to syslog

2024-09-11 Thread Fiona Ebner
Ping

Am 12.06.24 um 11:51 schrieb Fiona Ebner:
> Ping
> 
> Am 09.02.24 um 15:15 schrieb Fiona Ebner:
>> for better visibility. When not in a task, warnings from these helpers
>> are only logged to STDERR, which is particularly unhelpful in case of
>> daemons. This is the main motivation behind this change.
>>
>> For tasks, warnings from these helpers are already more visible on the
>> UI side, but when looking at the syslog, one can only see the warning
>> count from the task right now, not the actual messages. This is
>> another reason in favor of the change.
>>
>> Reported-by: Friedrich Weber 
>> Suggested-by: Thomas Lamprecht 
>> Signed-off-by: Fiona Ebner 
>> ---
>>
>> Changes in v2:
>> * Go with the approach suggested by Thomas.
>>
>>  src/PVE/RESTEnvironment.pm | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
>> index 191c6eb..bfde7e6 100644
>> --- a/src/PVE/RESTEnvironment.pm
>> +++ b/src/PVE/RESTEnvironment.pm
>> @@ -723,6 +723,7 @@ sub log_warn {
>>  $rest_env->warn($message);
>>  } else {
>>  chomp($message);
>> +syslog('warning', "%s", $message);
>>  print STDERR "WARN: $message\n";
>>  }
>>  }
>> @@ -732,6 +733,7 @@ sub warn {
>>  
>>  chomp($message);
>>  
>> +syslog('warning', "%s", $message);
>>  print STDERR "WARN: $message\n";
>>  
>>  $self->{warning_count}++;
> 
> 
> ___
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH pve-storage] qcow2 format: enable subcluster allocation by default

2024-09-11 Thread Fiona Ebner
Am 03.07.24 um 16:24 schrieb Alexandre Derumier via pve-devel:
> 
> 
> extended_l2 is an optimisation to reduce write amplification.
> Currently,without it, when a vm write 4k, a full 64k cluster

s/write/writes/

> need to be writen.

needs to be written.

> 
> When enabled, the cluster is splitted in 32 subclusters.

s/splitted/split/

> 
> We use a 128k cluster by default, to have 32 * 4k subclusters
> 
> https://blogs.igalia.com/berto/2020/12/03/subcluster-allocation-for-qcow2-images/
> https://static.sched.com/hosted_files/kvmforum2020/d9/qcow2-subcluster-allocation.pdf
> 
> some stats for 4k randwrite benchmark

Can you please share the exact command you used? What kind of underlying
disks do you have?

> 
> Cluster size   Without subclusters With subclusters
> 16 KB  5859 IOPS   8063 IOPS
> 32 KB  5674 IOPS   11107 IOPS
> 64 KB  2527 IOPS   12731 IOPS
> 128 KB 1576 IOPS   11808 IOPS
> 256 KB 976 IOPS 9195 IOPS
> 512 KB 510 IOPS 7079 IOPS
> 1 MB   448 IOPS 3306 IOPS
> 2 MB   262 IOPS 2269 IOPS
> 

How does read performance compare for you (with 128 KiB cluster size)?

I don't see any noticeable difference in my testing with an ext4
directory storage on an SSD, attaching the qcow2 images as SCSI disks to
the VM, neither for reading nor writing. I only tested without your
change and with your change using 4k (rand)read and (rand)write.

I'm not sure we should enable this for everybody, there's always a risk
to break stuff with added complexity. Maybe it's better to have a
storage configuration option that people can opt-in to, e.g.

qcow2-create-opts extended_l2=on,cluster_size=128k

If we get enough positive feedback, we can still change the default in a
future (major) release.

> Signed-off-by: Alexandre Derumier 
> ---
>  src/PVE/Storage/Plugin.pm | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
> index 6444390..31b20fe 100644
> --- a/src/PVE/Storage/Plugin.pm
> +++ b/src/PVE/Storage/Plugin.pm
> @@ -561,7 +561,7 @@ sub preallocation_cmd_option {
>   die "preallocation mode '$prealloc' not supported by format '$fmt'\n"
>   if !$QCOW2_PREALLOCATION->{$prealloc};
>  
> - return "preallocation=$prealloc";
> + return "preallocation=$prealloc,extended_l2=on,cluster_size=128k";

Also, it doesn't really fit here in the preallocation helper as the
helper is specific to that setting.

>  } elsif ($fmt eq 'raw') {
>   $prealloc = $prealloc // 'off';
>   $prealloc = 'off' if $prealloc eq 'metadata';
> -- 
> 2.39.2
> 
> 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] qcow2 internal snapshot: 4k write speed reduce to 100~200 iops on ssd ?

2024-09-11 Thread Fiona Ebner
Hi,

Am 30.08.24 um 18:01 schrieb DERUMIER, Alexandre via pve-devel:
> Hi,
> 
> I was doing tests with gfs2 && ocfs2,
> 
> and I have notice 4k randwrite iops going from 2 iops to
> 100~200iops when a snapshot is present.
> 
> I thinked it was related to gfs2 && ocfs2 allocation, 
> but I can reproduce too with a simple qcow2 file on
> a local ssd drive.
> 

Is the performance drop also this big with the local storage?

> is it expected ?
> 
> (I don't have use qcow2 snapshot since 10year, so I really don't 
>  remember performance).
> 

A performance drop is of course expected, because AFAIU it needs to do
COW for the sectors that the snapshot references, but the drop does
sound rather dramatic in your case. The performance should be better
again once the COW is done, e.g. when running another test afterwards. I
quickly tested in a nested VM with qcow2 disk on a directory storage and
--numjobs=4:

ran test, got ~33k IOPS
made a snapshot
ran test, got ~20k IOPS (it starts out very slow and then recovers)
ran test, got ~33k IOPS again

Best Regards,
Fiona

> 
> Using external snapshot file, I'm around 1iops.
> 
> test inside the vm:
> 
> fio --filename=/dev/sdX --ioengine=libaio --direct 1 --bs=4k  --
> rw=randwrite --numjobs=64
> 
> 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH storage 1/2] fix #3873: btrfs: check for correct subvolume taking snapshot

2024-09-10 Thread Fiona Ebner
Am 09.07.24 um 13:51 schrieb Maximiliano Sandoval:
> Suppose we are doing a snapshot of disk 0 for VM 100. The
> dir_glob_foreach runs over $path=/subvolume/images/100, lists all
> snapshot names and appends their names to the path of the disk, e.g.
> /subvolume/images/vm-100-disk-0@SNAP_NAME, but the original directory
> $path might contain a second disk `vm-100-disk-1` which is also listed
> by the dir_glib_foreach.
> 
> The patch skips images which reference other disks.
> 
> Signed-off-by: Maximiliano Sandoval 
> ---
>  src/PVE/Storage/BTRFSPlugin.pm | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/PVE/Storage/BTRFSPlugin.pm b/src/PVE/Storage/BTRFSPlugin.pm
> index 42815cb..dc0420d 100644
> --- a/src/PVE/Storage/BTRFSPlugin.pm
> +++ b/src/PVE/Storage/BTRFSPlugin.pm
> @@ -767,6 +767,9 @@ sub volume_export {
>   push @$cmd, (map { "$path\@$_" } ($with_snapshots // [])->@*), $path;
>  } else {
>   dir_glob_foreach(dirname($path), $BTRFS_VOL_REGEX, sub {

I feel like this would be a bit nicer (and slightly more robust) if it
used foreach_subvol() and would check the basename like done in
free_image(). Or even better, introduce a new helper that iterates over
all snapshots of a specific volume to not repeat ourselves here and in
free_image(). Since foreach_subvol() is only used once (i.e. in
free_image()), we could even replace that. There's another pre-existing
thing that's a bit confusing, which is that BTRFS_VOL_REGEX only matches
snapshot volumes and not the actual volume itself AFAICS.

> + if (index($path, $_[1]) < 0) {
> + return
> + }
>   push @$cmd, "$path\@$_[2]" if !(defined($snapshot) && $_[2] eq 
> $snapshot);
>   });
>  }


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH storage v3 2/2] d/control: dependencies: add bzip2, gzip, lzop, zstd

2024-09-10 Thread Fiona Ebner
Am 10.09.24 um 14:52 schrieb Fabian Grünbichler:
>> Fiona Ebner  hat am 10.09.2024 14:24 CEST geschrieben:
>> Am 12.08.24 um 13:44 schrieb Maximiliano Sandoval:
>>> The decompressor_info method calls binaries provided by these packages
>>> so they are (alphabetically) added explicitly as dependencies.
>>>
>>> To avoid a build-time error
>>>
>>> E: libpve-storage-perl: 
>>> depends-on-essential-package-without-using-version Depends: gzip
>>>
>>> the current minor version available in bullseye was set for gzip.
>>>
>>
>> Since I didn't get that error, I'm interested: what command are you
>> building with? Why the version for Bullseye?
> 
> it's wrong in any case - gzip is Essential (installed on all Debian systems). 
> you don't ever need a dependency (build or otherwise) for it. this is what 
> lintian is trying to tell you - it's wrong to depend on gzip without a 
> version constraint (and okay to use it without a dependency). *only* if you 
> have a certain version requirement you need to add the dependency (with the 
> proper version qualifier) to be able to encode that version requirement.
>  

I see, thank you for the explanation!

>> I guess many systems already have bzip2 installed, but do we really want
>> to require it for everybody? Or should we rather keep it optional (could
>> be a Recommends or Suggests dependency) and add a clean error if not
>> installed?
> 
> I am not sure about *how* many (most dev machines yes, since it's 
> build-essential and required by a lot of build-related packages ;)). it's 
> only 124kB though, and libbz2-1.0 is always installed (it's transitively 
> Essential via dpkg), so it doesn't pull in anything that isn't installed on 
> all standard Debian-based systems anyway since it only depends on that and 
> libc6. I don't really see a reason to use Recommends or Suggests here, even 
> if bz2 compress isos are not that common.

Okay :)


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH storage v3 1/2] fix #5267: storage: add bzip2 support

2024-09-10 Thread Fiona Ebner
Am 12.08.24 um 13:44 schrieb Maximiliano Sandoval:
> A popular ISO compressed exclusively with bz2 is OPNsense [2].
> 
> Since this requires adding `bz2` to the list of known compression
> formats we add decompression methods for vmz a tar.

Typo here I presume.

> diff --git a/src/test/list_volumes_test.pm b/src/test/list_volumes_test.pm
> index d155cb9..ba959ff 100644
> --- a/src/test/list_volumes_test.pm
> +++ b/src/test/list_volumes_test.pm
> @@ -189,6 +189,7 @@ my @tests = (
>   "$storage_dir/dump/vzdump-lxc-16112-2020_03_30-21_49_30.tar.gz",
>   "$storage_dir/dump/vzdump-lxc-16112-2020_03_30-21_49_30.tar.zst",
>   "$storage_dir/dump/vzdump-lxc-16112-2020_03_30-21_59_30.tgz",
> + "$storage_dir/dump/vzdump-openvz-16112-2020_03_30-21_39_30.tar.bz2",
>   ],
>   expected => [
>   {
> @@ -237,6 +238,15 @@ my @tests = (
>   'vmid'=> '16112',
>   'volid'   => 
> 'local:backup/vzdump-lxc-16112-2020_03_30-21_59_30.tgz',
>   },
> + {
> + 'content' => 'backup',
> + 'ctime'   => 1585604370,
> + 'format'  => 'tar.bz2',
> + 'size'=> DEFAULT_SIZE,
> + 'subtype' => 'openvz',
> + 'vmid'=> '16112',
> + 'volid'   => 
> 'local:backup/vzdump-openvz-16112-2020_03_30-21_39_30.tar.bz2',
> + },
>   ],
>  },
>  {
> @@ -440,7 +450,6 @@ my @tests = (
>   
> "$storage_dir/template/cache/debian-10.0-standard_10.0-1_amd64.zip.gz",
>   
> "$storage_dir/template/cache/debian-10.0-standard_10.0-1_amd64.tar.bz2",

Why is this one still listed here in the non-matching test cases?
Shouldn't this be recognized too now?

>   "$storage_dir/private/subvol-19254-disk-0/19254",
> - "$storage_dir/dump/vzdump-openvz-16112-2020_03_30-21_39_30.tar.bz2",
>   "$storage_dir/dump/vzdump-openvz-16112-2020_03_30-21_39_30.zip.gz",
>   "$storage_dir/dump/vzdump-openvz-16112-2020_03_30-21_39_30.tgz.lzo",
>   "$storage_dir/dump/vzdump-qemu-16110-2020_03_30-21_12_40.vma.xz",


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH storage v3 2/2] d/control: dependencies: add bzip2, gzip, lzop, zstd

2024-09-10 Thread Fiona Ebner
Am 12.08.24 um 13:44 schrieb Maximiliano Sandoval:
> The decompressor_info method calls binaries provided by these packages
> so they are (alphabetically) added explicitly as dependencies.
> 
> To avoid a build-time error
> 
> E: libpve-storage-perl: 
> depends-on-essential-package-without-using-version Depends: gzip
> 
> the current minor version available in bullseye was set for gzip.
> 

Since I didn't get that error, I'm interested: what command are you
building with? Why the version for Bullseye?

I guess many systems already have bzip2 installed, but do we really want
to require it for everybody? Or should we rather keep it optional (could
be a Recommends or Suggests dependency) and add a clean error if not
installed?


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH guest-common 1/1] storage tunnel: check just-imported image files

2024-09-10 Thread Fiona Ebner
Am 09.08.24 um 13:22 schrieb Fabian Grünbichler:
> remote migration requires elevated privileges already and can thus only be
> triggered by trusted sources, but an additional safeguard of checking the 
> image
> for external references doesn't hurt.
> 
> Signed-off-by: Fabian Grünbichler 
> ---
> 
> Notes:
> requires pve-storage change to actually have an effect
> 
>  src/PVE/StorageTunnel.pm | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/src/PVE/StorageTunnel.pm b/src/PVE/StorageTunnel.pm
> index c880889..21780bd 100644
> --- a/src/PVE/StorageTunnel.pm
> +++ b/src/PVE/StorageTunnel.pm
> @@ -280,6 +280,13 @@ sub handle_query_disk_import {
>   delete $state->{sockets}->{$unix};
>   delete $state->{disk_import};
>   $state->{cleanup}->{volumes}->{$volid} = 1;
> + my $cfg = PVE::Storage::config();
> + my ($storage, $volume) = PVE::Storage::parse_volume_id($volid);
> + my $scfg = PVE::Storage::storage_config($cfg, $storage);
> + # check imported image for bad references
> + if ($scfg->{path}) {

Seems a bit like a plugin method would be nicest here. But I guess we
can still add that if/when it becomes necessary for non-path-based
plugins to do such checks.

> + PVE::Storage::file_size_info(PVE::Storage::path($cfg, $volid), 
> undef, 1);

Isn't this broken because PVE::Storage::path() uses wantarray?

At least a small test program does "break" in the same scenario:

> febner@enia ~ % cat asdf.pm
> use strict;
> use warnings;
>
> sub a {
>   return wantarray ? (1, 2) : 1;
> }
>
> sub b {
>   my ($a, $b, $c) = @_;
>
>   print "$a $b $c\n";
> }
>
> b(a(), 3, 4);
> febner@enia ~ % perl asdf.pm
> 1 2 3


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH storage 1/1] file_size_info: implement untrusted mode

2024-09-10 Thread Fiona Ebner
Am 09.08.24 um 13:22 schrieb Fabian Grünbichler:
> this allows checking some extra attributes for images which come from a
> potentially malicious source.
> 
> since file_size_info is not part of the plugin API, no API bump is needed. if
> desired, a similar check could also be implemented in volume_size_info, which
> would entail bumping both APIVER and APIAGE (since the additional parameter
> would make checking untrusted volumes opt-in for external plugins).
> 
> Signed-off-by: Fabian Grünbichler 

Reviewed-by: Fiona Ebner 

> @@ -977,12 +987,27 @@ sub file_size_info {
>  

There's a new early return now because of commit 851cc07 ("base plugin:
do not decode the empty string") where we should also die if $untrusted.
Although it is very unlikely that we'll reach that since you do not set
a timeout at the call sites with $untrusted set.

>  my $info = eval { decode_json($json) };
>  if (my $err = $@) {


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH qemu-server 1/1] disk import: add additional safeguards for imported image files

2024-09-10 Thread Fiona Ebner
Am 09.08.24 um 13:22 schrieb Fabian Grünbichler:
> creating non-raw disk images with arbitrary content is only possible with raw
> access to the storage, but checking for references to external files doesn't
> hurt.
> 
> Signed-off-by: Fabian Grünbichler 
> ---
> 
> Notes:
> requires pve-storage change to actually have an effect
> 
>  PVE/API2/Qemu.pm | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> index d25a79fe..30839745 100644
> --- a/PVE/API2/Qemu.pm
> +++ b/PVE/API2/Qemu.pm
> @@ -412,12 +412,14 @@ my sub create_disks : prototype($$) {
>  
>   $needs_creation = $live_import;
>  
> - if (PVE::Storage::parse_volume_id($source, 1)) { # PVE-managed 
> volume
> + if (my ($source_storage, $source_volid) = 
> PVE::Storage::parse_volume_id($source, 1)) { # PVE-managed volume

Nit: line too long and honestly, becomes a bit hard to read.

>   if ($live_import && $ds ne 'efidisk0') {
>   my $path = PVE::Storage::path($storecfg, $source)
>   or die "failed to get a path for '$source'\n";
>   $source = $path;
> - ($size, my $source_format) = 
> PVE::Storage::file_size_info($source);
> + # check potentially untrusted image file!
> + ($size, my $source_format) = 
> PVE::Storage::file_size_info($source, undef, 1);
> +
>   die "could not get file size of $source\n" if !$size;
>   $live_import_mapping->{$ds} = {
>   path => $source,
> @@ -431,6 +433,12 @@ my sub create_disks : prototype($$) {
>   format => $disk->{format},
>   };
>  
> + my $scfg = PVE::Storage::storage_config($storecfg, 
> $source_storage);
> + # check potentially untrusted image file!
> + if ($scfg->{path}) {
> + 
> PVE::Storage::file_size_info(PVE::Storage::path($storecfg, $source), undef, 
> 1);

This call is probably broken, see reply to guest-common patch.

Nit: I'd move this to the beginning of the branch to avoid having it
between $dest_info assignments.

> + }
> +
>   $dest_info->{efisize} = 
> PVE::QemuServer::get_efivars_size($conf, $disk)
>   if $ds eq 'efidisk0';
>  
> @@ -441,7 +449,8 @@ my sub create_disks : prototype($$) {
>   }
>   } else {
>   $source = PVE::Storage::abs_filesystem_path($storecfg, 
> $source, 1);
> - ($size, my $source_format) = 
> PVE::Storage::file_size_info($source);
> + # check potentially untrusted image file!
> + ($size, my $source_format) = 
> PVE::Storage::file_size_info($source, undef, 1);
>   die "could not get file size of $source\n" if !$size;
>  
>   if ($live_import && $ds ne 'efidisk0') {


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] applied: [PATCH manager] ui: qemu: hardware: fix permission check for adding tpmstate volume

2024-09-10 Thread Fiona Ebner
Am 05.08.24 um 15:33 schrieb Friedrich Weber:
> Previously, the "Add -> TPM State" menu item in the GUI was disabled
> if the user did not have Sys.Console privileges. This deviated from
> the permission check in the backend, which does not require
> Sys.Console but (among others) VM.Config.Disk.
> 
> Fix this inconsistency by checking for VM.Config.Disk in the frontend
> instead of Sys.Console.
> 
> Reported in enterprise support.
> 
> Signed-off-by: Friedrich Weber 

applied, thanks!


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH storage] base plugin: do not decode the empty string

2024-09-10 Thread Fiona Ebner
Am 02.09.24 um 14:47 schrieb Maximiliano Sandoval:
> If the json was empty, for example if the qemu-img command times out, a
> message
> 
> warn "could not parse qemu-img info command output for '$filename' - 
> $err\n";
> 
> would have been printed.
> 
> This message could lead one to think the issue lies in the contents of
> the json, even if the previous warning said that there was a timeout.
> 

I know it's already applied, so mentioning it for the future: the commit
title (and actually also message) really should've included the context,
e.g. "when querying file size".


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH common v4 3/7] inotify: interfaces: make sure bridge_vids use space as separator

2024-09-10 Thread Fiona Ebner
Am 29.07.24 um 13:55 schrieb Aaron Lauterer:
> Because the API accepts multiple possible list separators we need to
> make sure that we write the bridge_vids with space as separator, no
> matter which separator was used when passing it to the API.
> 

Nit: The reason why it's necessary to write out with space as the
separator is missing. Who gets confused otherwise ;)?

Should we do the conversion earlier, i.e. in the API endpoint when
writing the interface parameters? Like that we don't end up carrying the
invalid value around.

> Signed-off-by: Aaron Lauterer 
> Tested-By: Stefan Hanreich 
> Reviewed-by: Shannon Sterz 

Still:

Reviewed-by: Fiona Ebner 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH common v4 2/7] fix #3893: network: add vlan id and range parameter definitions

2024-09-10 Thread Fiona Ebner
Am 29.07.24 um 13:55 schrieb Aaron Lauterer:
> @@ -595,6 +601,34 @@ sub pve_verify_iface {
>  return $id;
>  }
>  
> +# vlan id (vids), single or range
> +register_format('pve-vlan-id-or-range', \&pve_verify_vlan_id_or_range);
> +sub pve_verify_vlan_id_or_range {
> +my ($vlan, $noerr) = @_;
> +
> +my $check_vid = sub {
> + my $tag = shift;
> + if ( $tag < 2 || $tag > 4094) {

Style nit: extra space after parenthesis

> + return undef if $noerr;
> + die "invalid VLAN tag '$tag'\n";
> + }
> +};
> +
> +if ($vlan !~ m/^(\d+)(?:-(\d+))?$/) {
> + return undef if $noerr;
> + die "invalid VLAN configuration '$vlan'\n";
> +}
> +my $start = $1;
> +my $end = $2;
> +return undef if (!defined $check_vid->($start));

check_vid does not explicitly return anything in the success case, so
this check seems very brittle and I guess relies on implicit return?

Style nits:
parentheses for post-if
missing parentheses for defined()
perlcritic prefers 'return' over 'return undef'

> +if ($end) {
> + return undef if (!defined $check_vid->($end));
> + die "VLAN range must go from lower to higher tag '$vlan'" if $start >= 
> $end && !$noerr;

What if $noerr is set, but $start >= $end? You want to have a 'return'
then, right?

> +}
> +
> +return $vlan;
> +}
> +
>  # general addresses by name or IP
>  register_format('address', \&pve_verify_address);
>  sub pve_verify_address {


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH common v4 1/7] tools: add check_list_empty function

2024-09-09 Thread Fiona Ebner
Am 29.07.24 um 13:55 schrieb Aaron Lauterer:
> In some situations we don't want a total empty list. I opted for a
> dedicated function instead of integrating it as error in the
> `split_list` function. It is used in many places and the potential
> fallout from unintended behavior changes is too big.
> 
> Signed-off-by: Aaron Lauterer 
> Tested-By: Stefan Hanreich 
> Reviewed-by: Shannon Sterz 
> ---
> changes since: v3: none
> v2: newly added
> 
>  src/PVE/Tools.pm | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/src/PVE/Tools.pm b/src/PVE/Tools.pm
> index bd305bd..f796bd0 100644
> --- a/src/PVE/Tools.pm
> +++ b/src/PVE/Tools.pm
> @@ -718,6 +718,14 @@ sub split_list {
>  return @data;
>  }
>  
> +sub check_list_empty {
> +my ($list) = @_;
> +if (scalar(PVE::Tools::split_list($list)) < 1) {
> + return 0;
> +}
> +return 1;
> +}

This can be very confusing IMHO. Intuitively, I'd expect the expression
check_list_empty($list) to be truthy if $list is empty. I'd rather call
it list_not_empty. But looking at the caller you introduce later, it
might be better to avoid the double negative, flip the truth table and
call it list_is_empty.

> +
>  sub trim {
>  my $txt = shift;
>  


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH v2 proxmox-apt-hook] initial commit

2024-09-09 Thread Fiona Ebner
Many people will use 'upgrade' instead of 'full-upgrade' or
'dist-upgrade' (e.g. [0][1]) despite the documentation explicitly
mentioning 'dist-upgrade' [3]. Proxmox projects use different
packaging guarantees than Debian (necessary for a rolling release
model) and using 'upgrade' can lead to the system being stuck on
outdated versions, or in rare cases, even break the system [2].

The match is kept simple, to not accidentally catch things like
> -o 'foo=bar upgrade baz'
and trip up advanced users.

It does not catch invocations with '-y' either, making it less likely
to break automated user scripts. Although they should not use
'upgrade' either, it still would be bad to break them. If the risk is
still considered too high, this change should wait until a major or
at least point release.

To avoid false positives, it would be necessary to properly parse
options, which is likely not worth the effort.

A downside is that the hook is only invoked after the user confirms
the upgrade and fetching the packages, but there doesn't seem to be an
early enough hook entry that provides access to the command line.
Since this is just an additional safety warning to guide new users, it
should still be good enough.

It is intended that meta-packages for Proxmox projects recommend this
package.

The same postinst/postrm logic for the hook like in proxmox-ve and
apt-listchanges is used to not have disable/re-enable the hook upon
removal/re-install of the package.

[0]: https://forum.proxmox.com/threads/150217/post-680158
[1]: https://forum.proxmox.com/threads/140580/post-630419
[2]: 
https://www.reddit.com/r/Proxmox/comments/ujqig9/use_apt_distupgrade_or_the_gui_not_apt_upgrade/
[3]: 
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#system_software_updates

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Mention that actual breakage is rare, being stuck on outdated
versions is much more common.
* Do not ask for confirmation, only log the warning.
* Split into own package, so it can be re-used by different products.

 .gitignore |  7 
 Makefile   | 47 +++
 debian/apt-hook/10proxmox-apt-hook |  4 +++
 debian/apt-hook/proxmox-apt-hook   | 52 ++
 debian/changelog   |  5 +++
 debian/control | 17 ++
 debian/copyright   | 14 
 debian/docs|  1 +
 debian/install |  2 ++
 debian/postrm  | 35 
 debian/preinst | 13 
 debian/rules   |  8 +
 debian/source/format   |  1 +
 13 files changed, 206 insertions(+)
 create mode 100644 .gitignore
 create mode 100644 Makefile
 create mode 100644 debian/apt-hook/10proxmox-apt-hook
 create mode 100755 debian/apt-hook/proxmox-apt-hook
 create mode 100644 debian/changelog
 create mode 100644 debian/control
 create mode 100644 debian/copyright
 create mode 100644 debian/docs
 create mode 100644 debian/install
 create mode 100644 debian/postrm
 create mode 100644 debian/preinst
 create mode 100755 debian/rules
 create mode 100644 debian/source/format

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 000..5e6053d
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,7 @@
+/*.build
+/*.buildinfo
+/*.changes
+/*.deb
+/*.dsc
+/*.tar*
+/proxmox-apt-hook-*/
diff --git a/Makefile b/Makefile
new file mode 100644
index 000..e0cd704
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,47 @@
+include /usr/share/dpkg/default.mk
+
+PACKAGE=proxmox-apt-hook
+
+GITVERSION:=$(shell git rev-parse HEAD)
+
+BUILDDIR ?= $(PACKAGE)-$(DEB_VERSION)
+DSC=$(PACKAGE)_$(DEB_VERSION).dsc
+
+DEB=$(PACKAGE)_$(DEB_VERSION_UPSTREAM_REVISION)_all.deb
+
+all: $(DEB)
+
+$(BUILDDIR): debian
+   rm -rf $@ $@.tmp
+   mkdir -p $@.tmp/debian
+   cp -a debian/ $@.tmp/
+   echo "git clone git://git.proxmox.com/git/proxmox-apt-hook.git\\ngit 
checkout $(GITVERSION)" > $@.tmp/debian/SOURCE
+   mv $@.tmp $@
+
+.PHONY: deb
+deb: $(DEB)
+$(DEB): $(BUILDDIR)
+   cd $(BUILDDIR); dpkg-buildpackage -b -us -uc
+   lintian $(DEB)
+
+.PHONY: dsc
+dsc: $(DSC)
+$(DSC): $(BUILDDIR)
+   cd $(BUILDDIR); dpkg-buildpackage -S -us -uc -d
+   lintian $(DSC)
+
+.PHONY: sbuild
+sbuild: $(DSC)
+   sbuild $(DSC)
+
+.PHONY: upload
+upload: UPLOAD_DIST ?= $(DEB_DISTRIBUTION)
+upload: $(DEB)
+   tar cf - $(DEB)|ssh repo...@repo.proxmox.com -- upload --product pve 
--dist $(UPLOAD_DIST)
+
+.PHONY: distclean
+distclean: clean
+
+.PHONY: clean
+clean:
+   rm -rf *~ $(PACKAGE)-[0-9]*/ $(PACKAGE)*.tar.* *.deb *.dsc *.changes 
*.build *.buildinfo
diff --git a/debian/apt-hook/10proxmox-apt-hook 
b/debian/apt-hook/10proxmox-apt-hook
new file mode 100644
index 000..c4e6978
--- /dev/null
+++ b/debian/apt-h

Re: [pve-devel] [RFC qemu-server] apt hook: warn against using 'upgrade' command

2024-09-09 Thread Fiona Ebner
Am 09.09.24 um 09:48 schrieb Fabian Grünbichler:
>> Thomas Lamprecht  hat am 06.09.2024 18:58 CEST 
>> geschrieben:  
>> Am 06/09/2024 um 12:40 schrieb Fiona Ebner:
>>> Many people will use 'upgrade' instead of 'full-upgrade' or
>>> 'dist-upgrade' (e.g. [0][1]) despite the documentation explicitly
>>> mentioning 'dist-upgrade' [3]. Proxmox VE uses different packaging
>>> guarantees than Debian and using 'upgrade' can lead to a broken
>>> system [2].
> 
> just a slight nit here: you should only end up with a broken system if we 
> miss properly tracking some inter-package relationship. it can happen happen 
> (and probably does, from time to time), but in the vast majority of cases 
> "apt[-get] upgrade" should at most leave you stuck with an outdated system 
> (with APT telling you that there are still packages to be upgraded), not a 
> broken one. we did get a lot better about accounting for these things over 
> the past few years (but of course, we don't have anywhere close to the 
> infrastructure that Debian has for automated tracking and testing).

Okay, will change the commit message in v2.

> 
>>> The match is kept simple, to not accidentally catch things like
>>>> -o 'foo=bar upgrade baz'
>>> and trip up advanced users.
>>>
>>> It does not catch invocations with '-y' either, making it less likely
>>> to break automated user scripts. Although they should not use
>>> 'upgrade' either, it still would be bad to break them. If the risk is
>>> still considered too high, this change should wait until a major or
>>> at least point release.
>>>
>>> To avoid false positives, it would be necessary to properly parse
>>> options, which is likely not worth the effort.
>>>
>>> A downside is that the hook is only invoked after the user confirms
>>> the upgrade, but there doesn't seem to be an early enough hook entry
>>> (DPkg::Pre-Invoke is also too late). Since this is just an additional
>>> safety warning to guide new users, it should still be good enough.
>>>
>>> [0]: https://forum.proxmox.com/threads/150217/post-680158
>>> [1]: https://forum.proxmox.com/threads/140580/post-630419
>>> [2]: 
>>> https://www.reddit.com/r/Proxmox/comments/ujqig9/use_apt_distupgrade_or_the_gui_not_apt_upgrade/
>>> [3]: 
>>> https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#system_software_updates
>>>
>>
>> yeah, it's something I considered here and then but never pulled through,
>> as it just somehow doesn't feel right...
>>
>> But it's definitively a real problem, and so I surely won't block this on
>> the basis of some gut feeling, I'd rather like to hear Fabian's opinion on
>> it.
> 
> given that I also use `apt upgrade` from time to time (habit from being an 
> unstable user ;)), and that it might alienate power users coming from Debian, 
> I'd prefer this to be a non-interactive warning with the text "disarmed" a 
> bit?
> 
> something like
> 
> !! WARNING !!
> Since Proxmox VE follows a rolling release model, using 'upgrade' can lead to 
> a system being stuck on outdated versions, or in rare cases, break upon 
> upgrading. Use 'dist-upgrade' or 'full-upgrade' instead.
> !! WARNING !!
> 
> with or without a prompt (it's a pity that the hook is not executed with the 
> config before the regular confirmation prompt, else we could just depend on 
> that)?

Sounds good. And since it is so rare, I'd just leave out the additional
confirmation prompt then.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH qemu-server 10/10] test: cfg2cmd: don't use QEMU binary version less than 8.0 for tests

2024-09-09 Thread Fiona Ebner
Am 06.09.24 um 16:39 schrieb Daniel Kral:
> Another small misspelling that could be corrected here:
> 
> s/depreacation/deprecation
> 
> On Tue, 23 Jul 2024 17:25:40 +0200, Fiona Ebner wrote:
>> diff --git a/test/cfg2cmd/cputype-icelake-client-deprecation.conf 
>> b/test/cfg2cmd/cputype-icelake-client-deprecation.conf
>> index 523dd275..2f595162 100644
>> --- a/test/cfg2cmd/cputype-icelake-client-deprecation.conf
>> +++ b/test/cfg2cmd/cputype-icelake-client-deprecation.conf
>> @@ -1,5 +1,4 @@
>>  # TEST: test CPU type depreacation for Icelake-Client (never existed in the 
>> wild)
> 
> Otherwise, I haven't found anything to add to the changes and the test
> cases are all passing on my machine as expected.
> 
> Therefore:
> 
> Reviewed-by: Daniel Kral 
> Tested-by: Daniel Kral 
> 

Thank you for the review/testing!

Note (for the future): If the tags here are supposed to be for the whole
series, it would be customary to add them in a reply to the cover letter.

Feel free to send patches for the pre-existing typos you found. You
might want to wait until my series got applied to avoid merge-conflicts.
If there's going to be a v2 of my series for some reason, I can also add
them as follow-ups myself.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH container/manager v2 0/2] add deny read/write options for device passthrough

2024-09-06 Thread Fiona Ebner
Am 24.07.24 um 19:18 schrieb Filip Schauer:
> Add the deny_read and deny_write options for device passthrough, to
> restrict container access to devices.
> 
> This allows for passing through a device in read-only mode without
> giving the container full access it.
> 
> Up until now a container with a device passed through to it was granted
> full access to that device without an option to restrict that access as
> pointed out by @Fiona.
> 
> Changes since v1:
> * set default values for deny_read & deny_write
> * remove the deny_read checkbox from the UI, since it is expected to
>   only have a very niche use case.
> 

We could also use dashes instead of underscores, i.e.
"deny-read"/"deny-write" as we often do for new properties.

Still not fully sure we need deny_read in the backend until somebody
complains with a sensible use case, but I guess it doesn't hurt if it's
already there.

In any case:

Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH docs] passthrough: viommu: replace host requirement with reference to pcie passthrough

2024-09-06 Thread Fiona Ebner
For reference, this already got applied:
https://git.proxmox.com/?p=pve-docs.git;a=commit;h=2173def64373a50a13390ee342fb95abc901f203


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [RFC qemu-server] apt hook: warn against using 'upgrade' command

2024-09-06 Thread Fiona Ebner
Sorry, messed up the subject-prefix, since I copy-pasted it in the
.git/config and forgot to change. This is for proxmox-ve of course.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC qemu-server] apt hook: warn against using 'upgrade' command

2024-09-06 Thread Fiona Ebner
Many people will use 'upgrade' instead of 'full-upgrade' or
'dist-upgrade' (e.g. [0][1]) despite the documentation explicitly
mentioning 'dist-upgrade' [3]. Proxmox VE uses different packaging
guarantees than Debian and using 'upgrade' can lead to a broken
system [2].

The match is kept simple, to not accidentally catch things like
> -o 'foo=bar upgrade baz'
and trip up advanced users.

It does not catch invocations with '-y' either, making it less likely
to break automated user scripts. Although they should not use
'upgrade' either, it still would be bad to break them. If the risk is
still considered too high, this change should wait until a major or
at least point release.

To avoid false positives, it would be necessary to properly parse
options, which is likely not worth the effort.

A downside is that the hook is only invoked after the user confirms
the upgrade, but there doesn't seem to be an early enough hook entry
(DPkg::Pre-Invoke is also too late). Since this is just an additional
safety warning to guide new users, it should still be good enough.

[0]: https://forum.proxmox.com/threads/150217/post-680158
[1]: https://forum.proxmox.com/threads/140580/post-630419
[2]: 
https://www.reddit.com/r/Proxmox/comments/ujqig9/use_apt_distupgrade_or_the_gui_not_apt_upgrade/
[3]: 
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#system_software_updates

Signed-off-by: Fiona Ebner 
---
 debian/apthook/pve-apt-hook | 8 
 1 file changed, 8 insertions(+)

diff --git a/debian/apthook/pve-apt-hook b/debian/apthook/pve-apt-hook
index 47629bc..b41d0fe 100755
--- a/debian/apthook/pve-apt-hook
+++ b/debian/apthook/pve-apt-hook
@@ -55,6 +55,14 @@ my $blank;
 while (my $line = <$fh>) {
   chomp $line;
 
+  if ($line && $line =~ m/^CommandLine::AsString=apt(-get)?%20upgrade$/) {
+$log->("!! WARNING !!\n");
+$log->("Using 'upgrade' can violate packaging assumptions made by Proxmox 
VE. Use 'dist-upgrade' instead.\n");
+$log->("\n");
+$log->("Press enter to continue, or C^c to abort.\n");
+$cleanup->(0, 1);
+  }
+
   if (!defined($blank)) {
 $blank = 1 if !$line;
 next;
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH v3 storage] fix #5191: api, cli: implement moving a volume between storages

2024-09-05 Thread Fiona Ebner
Am 03.07.24 um 14:59 schrieb Filip Schauer:
> Add the ability to move a backup, ISO, container template or snippet
> between storages and nodes via an API method. Moving a VMA backup to a
> Proxmox Backup Server requires the proxmox-vma-to-pbs package to be
> installed. Currently only VMA backups can be moved to a Proxmox Backup
> Server and moving backups from a Proxmox Backup Server is not yet
> supported. 

Can we split this? I.e. first add the "easy moves", export/import
preparations, then support moving backups without PBS, introduce the
decompress_archive_into_pipe() helper, finally support moving VMA to
PBS. API/CLI could be separate too.

> @@ -483,15 +485,173 @@ __PACKAGE__->register_method ({
>   return $upid;
>  }});
>  
> +sub volume_move {

Should this even be a new top-level method? Or can/should we extend
export/import instead, to not only cover guest images? Because with this
top-level method we block the way for external storage plugins to
support this functionality too if they don't adhere to our assumptions.

> +my ($cfg, $source_volid, $target_storeid, $delete) = @_;
> +
> +my ($source_storeid, $source_volname) = 
> PVE::Storage::parse_volume_id($source_volid, 0);
> +
> +die "source and target storage cannot be the same\n" if ($source_storeid 
> eq $target_storeid);

Style nit: superfluous parentheses for post-if (there are other
instances of the same below)

---snip 8<---

> +
> + if ($vtype eq 'backup' && $target_scfg->{type} eq 'pbs') {

IMHO this branch should be factored out into a helper. Either inside
PBSPlugin or inside PBSClient, not sure which one is the best fit.

---snip 8<---

> + if (my $err = $@) {
> + for my $created_file (@created_files) {
> + eval { unlink($created_file) };
> + warn $@ if $@;

I think you need to check the return value for unlink. And if it failed,
use $! (don't log anything if it's already ENOENT).

> + }
> + die $err;
> + }
> + }
> +
> +PVE::Storage::archive_remove($source_path) if $delete;
> +} elsif ($vtype eq 'images') {
> + die "use pct move-volume or qm disk move\n";
> +} elsif ($vtype eq 'rootdir') {
> + die "cannot move OpenVZ rootdir\n";
Maybe put these on top as early exits? Then you could save one level of
indentation. And I'd also catch the case when you encounter a new
content type with something like "not implemented yet".

Note that content type 'rootdir' is nowadays used for all container
images (that's how it's used in the storage configuration and
list_images()). It's just that the default implementation of
parse_volname() (wrongly) classifies those as 'images'.

> +}
> +
> +return;
> +}
> +
>  __PACKAGE__->register_method ({
> -name => 'copy',
> +name => 'move',
>  path => '{volume}',
>  method => 'POST',
> -description => "Copy a volume. This is experimental code - do not use.",
> +description => "Move a volume.",
> +permissions => {
> + description => "You need the 'Datastore.Allocate' privilege on the 
> storages.",

The documentation says:

Datastore.Allocate: create/modify/remove a datastore and delete volumes
Datastore.AllocateSpace: allocate space on a datastore

and the DELTE method for volumes has:

You need 'Datastore.Allocate' privilege on the storage (or
'Datastore.AllocateSpace' for backup volumes if you have VM.Backup
privilege on the VM)

I think we can use that too here. Require Datastore.AllocateSpace if not
using --delete and require Datastore.Allocate on the source if using
--delete, as well as special casing the backup case.

And writing this later, after looking at the actual checks below: the
documentation could be much more precise here ;)

> + user => 'all',
> +},
>  protected => 1,
>  proxyto => 'node',
>  parameters => {
> - additionalProperties => 0,
> + additionalProperties => 0,
>   properties => {
>   node => get_standard_option('pve-node'),
>   storage => get_standard_option('pve-storage-id', { optional => 1}),

---snip 8<---

> + if ($delete) {
> + $rpcenv->check($user, "/storage/$src_storeid", 
> ["Datastore.Allocate"]);

Aha, so this is only required when using delete ;)

> + } else {
> + $rpcenv->check($user, "/storage/$src_storeid", ["Datastore.Audit"]);

But Datastore.Audit does not entail permission to read the volume
contents. There is check_volume_access() for that.

> + }
>  
> - my ($target_sid, $target_volname) = 
> PVE::Storage::parse_volume_id($dst_volid);
> - #my $target_ip = PVE::Cluster::remote_node_ip($target_node);
> + my ($vtype) = PVE::Storage::parse_volname($cfg, $src_volid);
> + die "use pct move-volume or qm disk move" if $vtype eq 'images';

|| $vtype eq 'rootdir'

---snip 8<---

> @@ -1661,16 +1677,21 @@ sub volume_import {
>  my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $file_format) =
>  

[pve-devel] [PATCH qemu 2/2] pick up stable fixes for 9.0

2024-09-05 Thread Fiona Ebner
Includes fixes for VirtIO-net, ARM and x86(_64) emulation, CVEs to
harden NBD server against malicious clients, as well as a few others
(VNC, physmem, Intel IOMMU, ...).

Signed-off-by: Fiona Ebner 
---
 ...net-Ensure-queue-index-fits-with-RSS.patch |  35 ++
 ...etwork-stall-at-the-host-side-waitin.patch | 338 ++
 ...t-nic-model-help-output-as-documente.patch |  70 
 ...net-nic-model-for-non-help-arguments.patch |  32 ++
 ...-assert-for-128-bit-tile-accesses-wh.patch |  57 +++
 ...arm-Fix-UMOPA-UMOPS-of-16-bit-values.patch |  59 +++
 ...-shifts-by-1-in-tszimm_shr-and-tszim.patch |  62 
 ...e-SMCR_EL2.LEN-and-SVCR_EL2.LEN-if-E.patch |  41 +++
 ...e-denormals-correctly-for-FMOPA-wide.patch | 164 +
 ...el_iommu-fix-FRCD-construction-macro.patch |  39 ++
 ...386-Do-not-apply-REX-to-MMX-operands.patch |  33 ++
 ...rash-by-resetting-local_err-in-modul.patch |  42 +++
 ...-Plumb-in-new-args-to-nbd_client_add.patch | 164 +
 ...024-7409-Cap-default-max-connections.patch | 172 +
 ...024-7409-Drop-non-negotiating-client.patch | 123 +++
 ...024-7409-Close-stray-clients-at-serv.patch | 161 +
 ...c-fix-crash-when-no-console-attached.patch |  47 +++
 ...024-7409-Avoid-use-after-free-when-c.patch |  89 +
 ...fix-memory-leak-in-dirty_memory_exte.patch | 134 +++
 ...ckup-Proxmox-backup-patches-for-QEMU.patch |   4 +-
 .../0046-PVE-backup-add-fleecing-option.patch |   4 +-
 debian/patches/series |  19 +
 22 files changed, 1885 insertions(+), 4 deletions(-)
 create mode 100644 
debian/patches/extra/0018-virtio-net-Ensure-queue-index-fits-with-RSS.patch
 create mode 100644 
debian/patches/extra/0019-virtio-net-Fix-network-stall-at-the-host-side-waitin.patch
 create mode 100644 
debian/patches/extra/0020-net-Reinstate-net-nic-model-help-output-as-documente.patch
 create mode 100644 
debian/patches/extra/0021-net-Fix-net-nic-model-for-non-help-arguments.patch
 create mode 100644 
debian/patches/extra/0022-target-arm-Don-t-assert-for-128-bit-tile-accesses-wh.patch
 create mode 100644 
debian/patches/extra/0023-target-arm-Fix-UMOPA-UMOPS-of-16-bit-values.patch
 create mode 100644 
debian/patches/extra/0024-target-arm-Avoid-shifts-by-1-in-tszimm_shr-and-tszim.patch
 create mode 100644 
debian/patches/extra/0025-target-arm-Ignore-SMCR_EL2.LEN-and-SVCR_EL2.LEN-if-E.patch
 create mode 100644 
debian/patches/extra/0026-target-arm-Handle-denormals-correctly-for-FMOPA-wide.patch
 create mode 100644 
debian/patches/extra/0027-intel_iommu-fix-FRCD-construction-macro.patch
 create mode 100644 
debian/patches/extra/0028-target-i386-Do-not-apply-REX-to-MMX-operands.patch
 create mode 100644 
debian/patches/extra/0029-module-Prevent-crash-by-resetting-local_err-in-modul.patch
 create mode 100644 
debian/patches/extra/0030-nbd-server-Plumb-in-new-args-to-nbd_client_add.patch
 create mode 100644 
debian/patches/extra/0031-nbd-server-CVE-2024-7409-Cap-default-max-connections.patch
 create mode 100644 
debian/patches/extra/0032-nbd-server-CVE-2024-7409-Drop-non-negotiating-client.patch
 create mode 100644 
debian/patches/extra/0033-nbd-server-CVE-2024-7409-Close-stray-clients-at-serv.patch
 create mode 100644 
debian/patches/extra/0034-vnc-fix-crash-when-no-console-attached.patch
 create mode 100644 
debian/patches/extra/0035-nbd-server-CVE-2024-7409-Avoid-use-after-free-when-c.patch
 create mode 100644 
debian/patches/extra/0036-softmmu-physmem-fix-memory-leak-in-dirty_memory_exte.patch

diff --git 
a/debian/patches/extra/0018-virtio-net-Ensure-queue-index-fits-with-RSS.patch 
b/debian/patches/extra/0018-virtio-net-Ensure-queue-index-fits-with-RSS.patch
new file mode 100644
index 000..1dcb129
--- /dev/null
+++ 
b/debian/patches/extra/0018-virtio-net-Ensure-queue-index-fits-with-RSS.patch
@@ -0,0 +1,35 @@
+From  Mon Sep 17 00:00:00 2001
+From: Akihiko Odaki 
+Date: Mon, 1 Jul 2024 20:58:04 +0900
+Subject: [PATCH] virtio-net: Ensure queue index fits with RSS
+
+Ensure the queue index points to a valid queue when software RSS
+enabled. The new calculation matches with the behavior of Linux's TAP
+device with the RSS eBPF program.
+
+Fixes: 4474e37a5b3a ("virtio-net: implement RX RSS processing")
+Reported-by: Zhibin Hu 
+Cc: qemu-sta...@nongnu.org
+Signed-off-by: Akihiko Odaki 
+Reviewed-by: Michael S. Tsirkin 
+Signed-off-by: Jason Wang 
+(cherry picked from commit f1595ceb9aad36a6c1da95bcb77ab9509b38822d)
+Signed-off-by: Fiona Ebner 
+---
+ hw/net/virtio-net.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
+index 3644bfd91b..f48588638d 100644
+--- a/hw/net/virtio-net.c
 b/hw/net/virtio-net.c
+@@ -1949,7 +1949,8 @@ static ssize_t virtio_net_receive_rcu(NetClientState 
*nc, const uint8_t *buf,
+ if (!no_rss && n->rss_data.enabled && n->rss_data.enabled_software_rss) {
+ int index = virtio_net_pro

[pve-devel] [PATCH qemu 1/2] pick up fix for VirtIO PCI regressions

2024-09-05 Thread Fiona Ebner
Commit f06b222 ("fixes for QEMU 9.0") included a revert for the QEMU
commit 2ce6cff94d ("virtio-pci: fix use of a released vector"). That
commit caused some regressions which sounded just as bad as the fix.
Those regressions have now been addressed upstream, so pick up the fix
and drop the revert. Dropping the revert fixes the original issue that
commit 2ce6cff94d ("virtio-pci: fix use of a released vector")
addressed.

Signed-off-by: Fiona Ebner 
---
 ...tio-pci-fix-use-of-a-released-vector.patch | 87 ---
 ...ck-copy-before-write-fix-permission.patch} |  0
 ...-write-support-unligned-snapshot-di.patch} |  0
 ...-write-create-block_copy-bitmap-in-.patch} |  0
 ...backup-add-discard-source-parameter.patch} |  0
 ...-de-initialization-of-vhost-user-de.patch} |  0
 ...se-float_status-copy-in-sme_fmopa_s.patch} |  0
 ...Use-FPST_F16-for-SME-FMOPA-widening.patch} |  0
 ...on-and-honor-bootindex-again-for-le.patch} |  0
 ...a-bump-instruction-limit-in-scripts.patch} |  0
 ...5-block-copy-Fix-missing-graph-lock.patch} |  0
 ...do-not-operate-on-sources-from-fina.patch} |  0
 ...ix-the-use-of-an-uninitialized-irqfd.patch | 77 
 debian/patches/series | 24 ++---
 14 files changed, 89 insertions(+), 99 deletions(-)
 delete mode 100644 
debian/patches/extra/0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch
 rename debian/patches/extra/{0007-block-copy-before-write-fix-permission.patch 
=> 0006-block-copy-before-write-fix-permission.patch} (100%)
 rename 
debian/patches/extra/{0008-block-copy-before-write-support-unligned-snapshot-di.patch
 => 0007-block-copy-before-write-support-unligned-snapshot-di.patch} (100%)
 rename 
debian/patches/extra/{0009-block-copy-before-write-create-block_copy-bitmap-in-.patch
 => 0008-block-copy-before-write-create-block_copy-bitmap-in-.patch} (100%)
 rename 
debian/patches/extra/{0010-qapi-blockdev-backup-add-discard-source-parameter.patch
 => 0009-qapi-blockdev-backup-add-discard-source-parameter.patch} (100%)
 rename 
debian/patches/extra/{0011-hw-virtio-Fix-the-de-initialization-of-vhost-user-de.patch
 => 0010-hw-virtio-Fix-the-de-initialization-of-vhost-user-de.patch} (100%)
 rename 
debian/patches/extra/{0012-target-arm-Use-float_status-copy-in-sme_fmopa_s.patch
 => 0011-target-arm-Use-float_status-copy-in-sme_fmopa_s.patch} (100%)
 rename 
debian/patches/extra/{0013-target-arm-Use-FPST_F16-for-SME-FMOPA-widening.patch 
=> 0012-target-arm-Use-FPST_F16-for-SME-FMOPA-widening.patch} (100%)
 rename 
debian/patches/extra/{0014-scsi-fix-regression-and-honor-bootindex-again-for-le.patch
 => 0013-scsi-fix-regression-and-honor-bootindex-again-for-le.patch} (100%)
 rename 
debian/patches/extra/{0015-hw-scsi-lsi53c895a-bump-instruction-limit-in-scripts.patch
 => 0014-hw-scsi-lsi53c895a-bump-instruction-limit-in-scripts.patch} (100%)
 rename debian/patches/extra/{0016-block-copy-Fix-missing-graph-lock.patch => 
0015-block-copy-Fix-missing-graph-lock.patch} (100%)
 rename 
debian/patches/extra/{0017-Revert-qemu-char-do-not-operate-on-sources-from-fina.patch
 => 0016-Revert-qemu-char-do-not-operate-on-sources-from-fina.patch} (100%)
 create mode 100644 
debian/patches/extra/0017-virtio-pci-Fix-the-use-of-an-uninitialized-irqfd.patch

diff --git 
a/debian/patches/extra/0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch
 
b/debian/patches/extra/0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch
deleted file mode 100644
index d2de6d1..000
--- 
a/debian/patches/extra/0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch
+++ /dev/null
@@ -1,87 +0,0 @@
-From 00000000 Mon Sep 17 00:00:00 2001
-From: Fiona Ebner 
-Date: Thu, 16 May 2024 12:59:52 +0200
-Subject: [PATCH] Revert "virtio-pci: fix use of a released vector"
-
-This reverts commit 2ce6cff94df2650c460f809e5ad263f1d22507c0.
-
-The fix causes some issues:
-https://gitlab.com/qemu-project/qemu/-/issues/2321
-https://gitlab.com/qemu-project/qemu/-/issues/2334
-
-The CVE fixed by commit 2ce6cff94d ("virtio-pci: fix use of a released
-vector") is CVE-2024-4693 [0] and allows a malicious guest that
-controls the boot process in the guest to crash its QEMU process.
-
-The issues sound worse than the CVE, so revert until there is a proper
-fix.
-
-[0]: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-4693
-
-Signed-off-by: Fiona Ebner 

- hw/virtio/virtio-pci.c | 37 ++---
- 1 file changed, 2 insertions(+), 35 deletions(-)
-
-diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
-index e04218a9fb..fd66713848 100644
 a/hw/virtio/virtio-pci.c
-+++ b/hw/virtio/virtio-pci.c
-@@ -1410,38 +1410,6 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
- return offset;
- }
- 
--static void virtio_pci_set_vector(VirtIODevice *vdev,
--  VirtIOPCI

Re: [pve-devel] [PATCH v2 qemu-server] remote migration: fix online migration via API clients

2024-09-04 Thread Fiona Ebner
Am 03.09.24 um 11:37 schrieb Fabian Grünbichler:
> On August 13, 2024 10:42 am, Fiona Ebner wrote:
>> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
>> index e71face4..d31589e7 100644
>> --- a/PVE/QemuMigrate.pm
>> +++ b/PVE/QemuMigrate.pm
>> @@ -1095,7 +1095,10 @@ sub phase2 {
>>  die "only UNIX sockets are supported for remote migration\n"
>>  if $tunnel_info->{proto} ne 'unix';
>>  
>> -my $remote_socket = $tunnel_info->{addr};
>> +# untaint
>> +my ($remote_socket) =
>> +$tunnel_info->{addr} =~ 
>> m|^(/run/qemu-server/(?:(?!\.\./).)+\.migrate)$|;
> 
> should we just switch to `\d+`, like we do for regular migration? in
> phase2_start_local_cluster we have:
> 
>   elsif ($line =~ m!^migration listens on 
> (unix):(/run/qemu-server/(\d+)\.migrate)$!) {
>   $tunnel_info->{addr} = $2;
>   die "Destination UNIX sockets VMID does not match source VMID" if 
> $vmid ne $3;
>   $tunnel_info->{proto} = $1;
>   }
> 
> and I don't really see a reason to deviate from that scheme any time
> soon?
> 

Sounds good, did so in v3:
https://lore.proxmox.com/pve-devel/20240904111231.106570-1-f.eb...@proxmox.com/T/#u


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [PATCH v3 qemu-server] remote migration: fix online migration via API clients

2024-09-04 Thread Fiona Ebner
As reported in the community forum [0], when a remote migration
request comes in via an API client, the -T flag for Perl is set, so an
insecure dependency in a call like unlink() in forward_unix_socket()
will fail with:

> failed to write forwarding command - Insecure dependency in unlink while 
> running with -T switch

To fix it, untaint the problematic socket addresses coming from the
remote side. Require that all sockets are below '/run/qemu-server/'
and end with '.migrate' with the main socket being matched more
strictly. This allows extensions in the future while still being quite
strict.

[0]: https://forum.proxmox.com/threads/123048/post-691958

Signed-off-by: Fiona Ebner 
---

Changes in v3:
* Match main socket address more strictly as suggested by Fabian.

 PVE/QemuMigrate.pm | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index e71face4..6591f3f7 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -1095,7 +1095,9 @@ sub phase2 {
die "only UNIX sockets are supported for remote migration\n"
if $tunnel_info->{proto} ne 'unix';
 
-   my $remote_socket = $tunnel_info->{addr};
+   # untaint
+   my ($remote_socket) = $tunnel_info->{addr} =~ 
m|^(/run/qemu-server/\d+\.migrate)$|
+   or die "unexpected socket address '$tunnel_info->{addr}'\n";
my $local_socket = $remote_socket;
$local_socket =~ s/$remote_vmid/$vmid/g;
$tunnel_info->{addr} = $local_socket;
@@ -1104,6 +1106,9 @@ sub phase2 {
PVE::Tunnel::forward_unix_socket($self->{tunnel}, $local_socket, 
$remote_socket);
 
foreach my $remote_socket (@{$tunnel_info->{unix_sockets}}) {
+   # untaint
+   ($remote_socket) = $remote_socket =~ 
m|^(/run/qemu-server/(?:(?!\.\./).)+\.migrate)$|
+   or die "unexpected socket address '$remote_socket'\n";
my $local_socket = $remote_socket;
$local_socket =~ s/$remote_vmid/$vmid/g;
next if $self->{tunnel}->{forwarded}->{$local_socket};
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] applied: [PATCH docs] pve-network: correct language errors

2024-08-13 Thread Fiona Ebner
Am 13.08.24 um 15:57 schrieb Fiona Ebner:
> Am 13.08.24 um 12:23 schrieb Alexander Zeidler:
>> Signed-off-by: Alexander Zeidler 
>> ---
>>  pve-network.adoc | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/pve-network.adoc b/pve-network.adoc
>> index acdcf39..434430d 100644
>> --- a/pve-network.adoc
>> +++ b/pve-network.adoc
>> @@ -447,7 +447,7 @@ slaves in the single logical bonded interface such that 
>> different
>>  network-peers use different MAC addresses for their network packet
>>  traffic.
>>  
>> -If your switch support the LACP (IEEE 802.3ad) protocol then we recommend 
>> using
>> +If your switch supports the LACP (IEEE 802.3ad) protocol then we recommend 
>> using
>>  the corresponding bonding mode (802.3ad). Otherwise you should generally 
>> use the
>>  active-backup mode.
>>  
>> @@ -490,7 +490,7 @@ iface vmbr0 inet static
>>  
>>  
>>  [thumbnail="default-network-setup-bond.svg"]
>> -Another possibility it to use the bond directly as bridge port.
>> +Another possibility is to use the bond directly as bridge port.
>>  This can be used to make the guest network fault-tolerant.
>>  
>>  .Example: Use a bond as bridge port
> 
> applied, thanks! While adding the missing comma Lukas noted and his R-b.

I also added a follow-up using "as the bridge port" instead of "as
bridge port" while at it.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH docs] local-zfs: correct language errors

2024-08-13 Thread Fiona Ebner
Am 13.08.24 um 12:23 schrieb Alexander Zeidler:
> Signed-off-by: Alexander Zeidler 
> ---
>  local-zfs.adoc | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 

applied, thanks! Squashed in a change to wrap lines that would've been
longer than 80 characters.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH docs] pve-network: correct language errors

2024-08-13 Thread Fiona Ebner
Am 13.08.24 um 12:23 schrieb Alexander Zeidler:
> Signed-off-by: Alexander Zeidler 
> ---
>  pve-network.adoc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/pve-network.adoc b/pve-network.adoc
> index acdcf39..434430d 100644
> --- a/pve-network.adoc
> +++ b/pve-network.adoc
> @@ -447,7 +447,7 @@ slaves in the single logical bonded interface such that 
> different
>  network-peers use different MAC addresses for their network packet
>  traffic.
>  
> -If your switch support the LACP (IEEE 802.3ad) protocol then we recommend 
> using
> +If your switch supports the LACP (IEEE 802.3ad) protocol then we recommend 
> using
>  the corresponding bonding mode (802.3ad). Otherwise you should generally use 
> the
>  active-backup mode.
>  
> @@ -490,7 +490,7 @@ iface vmbr0 inet static
>  
>  
>  [thumbnail="default-network-setup-bond.svg"]
> -Another possibility it to use the bond directly as bridge port.
> +Another possibility is to use the bond directly as bridge port.
>  This can be used to make the guest network fault-tolerant.
>  
>  .Example: Use a bond as bridge port

applied, thanks! While adding the missing comma Lukas noted and his R-b.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC container v2 23/25] backup: implement restore for external providers

2024-08-13 Thread Fiona Ebner
First, the provider is asked about what restore mechanism to use.
Currently, 'directory' and 'tar' are possible, for restoring either
from a directory containing the full filesystem structure (for which
rsync is used) or a potentially compressed tar file containing the
same.

The new functions are copied and adapted from the existing ones for
PBS or tar and it might be worth to factor out the common parts.

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Adapt to API changes.

 src/PVE/LXC/Create.pm | 141 ++
 1 file changed, 141 insertions(+)

diff --git a/src/PVE/LXC/Create.pm b/src/PVE/LXC/Create.pm
index 117103c..9d1c337 100644
--- a/src/PVE/LXC/Create.pm
+++ b/src/PVE/LXC/Create.pm
@@ -25,6 +25,24 @@ sub restore_archive {
if ($scfg->{type} eq 'pbs') {
return restore_proxmox_backup_archive($storage_cfg, $archive, 
$rootdir, $conf, $no_unpack_error, $bwlimit);
}
+   my $log_function = sub {
+   my ($log_level, $message) = @_;
+   my $prefix = $log_level eq 'err' ? 'ERROR' : uc($log_level);
+   print "$prefix: $message\n";
+   };
+   my $backup_provider =
+   PVE::Storage::new_backup_provider($storage_cfg, $storeid, 
$log_function);
+   if ($backup_provider) {
+   return restore_external_archive(
+   $backup_provider,
+   $storeid,
+   $volname,
+   $rootdir,
+   $conf,
+   $no_unpack_error,
+   $bwlimit,
+   );
+   }
 }
 
 $archive = PVE::Storage::abs_filesystem_path($storage_cfg, $archive) if 
$archive ne '-';
@@ -118,6 +136,55 @@ sub restore_tar_archive {
 die $err if $err && !$no_unpack_error;
 }
 
+sub restore_external_archive {
+my ($backup_provider, $storeid, $volname, $rootdir, $conf, 
$no_unpack_error, $bwlimit) = @_;
+
+my ($mechanism, $vmtype) = 
$backup_provider->restore_get_mechanism($volname, $storeid);
+die "cannot restore non-LXC guest of type '$vmtype'\n" if $vmtype ne 'lxc';
+
+my $info = $backup_provider->restore_container_init($volname, $storeid, 
{});
+eval {
+   if ($mechanism eq 'tar') {
+   my $tar_path = $info->{'tar-path'}
+   or die "did not get path to tar file from backup provider\n";
+   die "not a regular file '$tar_path'" if !-f $tar_path;
+   restore_tar_archive($tar_path, $rootdir, $conf, $no_unpack_error, 
$bwlimit);
+   } elsif ($mechanism eq 'directory') {
+   my $directory = $info->{'archive-directory'}
+   or die "did not get path to archive directory from backup 
provider\n";
+   die "not a directory '$directory'" if !-d $directory;
+
+   my $rsync = ['rsync', '--stats', '-h', '-X', '-A', '--numeric-ids', 
'-aH', '--delete',
+   '--no-whole-file', '--sparse', '--one-file-system', 
'--relative'];
+   push $rsync->@*, '--bwlimit', $bwlimit if $bwlimit;
+   push $rsync->@*, "${directory}/./", $rootdir;
+
+   my $transferred = '';
+   my $outfunc = sub {
+   return if $_[0] !~ /^Total transferred file size: (.+)$/;
+   $transferred = $1;
+   };
+   my $errfunc = sub { log_warn($_[0]); };
+
+   my $starttime = time();
+   PVE::Tools::run_command($rsync, outfunc => $outfunc, errfunc => 
$errfunc);
+   my $delay = time () - $starttime;
+
+   print "sync finished - transferred ${transferred} in ${delay}s\n";
+   } else {
+   die "mechanism '$mechanism' requested by backup provider is not 
supported for LXCs\n";
+   }
+};
+my $err = $@;
+eval { $backup_provider->restore_container_cleanup($volname, $storeid, 
{}); };
+if (my $cleanup_err = $@) {
+   die $cleanup_err if !$err;
+   warn $cleanup_err;
+}
+die $err if $err;
+
+}
+
 sub recover_config {
 my ($storage_cfg, $volid, $vmid) = @_;
 
@@ -126,6 +193,8 @@ sub recover_config {
my $scfg = PVE::Storage::storage_check_enabled($storage_cfg, $storeid);
if ($scfg->{type} eq 'pbs') {
return recover_config_from_proxmox_backup($storage_cfg, $volid, 
$vmid);
+   } elsif (PVE::Storage::new_backup_provider($storage_cfg, $storeid, sub 
{})) {
+   return recover_config_from_external_backup($storage_cfg, $volid, 
$vmid);
}
 }
 
@@ -200,6 +269,26 @@ sub recover_config_from_tar {
 return wantarray ? ($conf, $mp_param) : $conf;
 }
 
+sub recover_config

[pve-devel] [RFC qemu-server v2 19/25] backup: implement backup for external providers

2024-08-13 Thread Fiona Ebner
The state of the VM's disk images at the time the backup is started is
preserved via a snapshot-access block node. Old data is moved to the
fleecing image when new guest writes come in. The snapshot-access
block node, as well as the associated bitmap in case of incremental
backup, will be made available to the external provider. They are
exported via NBD and for 'nbd' mechanism, the NBD socket path is
passed to the provider, while for 'block-device' mechanism, the NBD
export is made accessible as a regular block device first and the
bitmap information is made available via a $next_dirty_region->()
function. For 'block-device', the 'nbdinfo' binary is required.

The provider can indicate that it wants to do an incremental backup by
returning the bitmap ID that was used for a previous backup and it
will then be told if the bitmap was newly created (either first backup
or old bitmap was invalid) or if the bitmap can be reused.

The provider then reads the parts of the NBD or block device it needs,
either the full disk for full backup, or the dirty parts according to
the bitmap for incremental backup. The bitmap has to be respected,
reads to other parts of the image will return an error. After backing
up each part of the disk, it should be discarded in the export to
avoid unnecessary space usage in the fleecing image (requires the
storage underlying the fleecing image to support discard too).

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Support 'block-device' mechanism, by exposing the NBD export as a
  block device via qemu-nbd.
* Adapt to API changes, i.e. pass all volumes as well as configuration
  files to the provider at once.

 PVE/VZDump/QemuServer.pm | 308 ++-
 1 file changed, 307 insertions(+), 1 deletion(-)

diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 4ad4a154..9daebbc2 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -20,7 +20,7 @@ use PVE::QMPClient;
 use PVE::Storage::Plugin;
 use PVE::Storage::PBSPlugin;
 use PVE::Storage;
-use PVE::Tools;
+use PVE::Tools qw(run_command);
 use PVE::VZDump;
 use PVE::Format qw(render_duration render_bytes);
 
@@ -291,6 +291,8 @@ sub archive {
 
 if ($self->{vzdump}->{opts}->{pbs}) {
$self->archive_pbs($task, $vmid);
+} elsif ($self->{vzdump}->{'backup-provider'}) {
+   $self->archive_external($task, $vmid);
 } else {
$self->archive_vma($task, $vmid, $filename, $comp);
 }
@@ -1136,6 +1138,23 @@ sub cleanup {
 
 # If VM was started only for backup, it is already stopped now.
 if (PVE::QemuServer::Helpers::vm_running_locally($vmid)) {
+   if ($task->{cleanup}->{'nbd-stop'}) {
+   eval { PVE::QemuServer::QMPHelpers::nbd_stop($vmid); };
+   $self->logerr($@) if $@;
+   }
+
+   if (my $info = $task->{cleanup}->{'backup-access-teardown'}) {
+   my $params = {
+   'target-id' => $info->{'target-id'},
+   timeout => 60,
+   success => $info->{success} ? JSON::true : JSON::false,
+   };
+
+   $self->loginfo("tearing down backup-access");
+   eval { mon_cmd($vmid, "backup-access-teardown", $params->%*) };
+   $self->logerr($@) if $@;
+   }
+
$detach_tpmstate_drive->($task, $vmid);
detach_fleecing_images($task->{disks}, $vmid) if 
$task->{'use-fleecing'};
 }
@@ -1147,4 +1166,291 @@ sub cleanup {
 }
 }
 
+my sub block_device_backup_cleanup {
+my ($self, $paths, $cpids) = @_;
+
+for my $path ($paths->@*) {
+   eval { run_command(["qemu-nbd", "-d", $path ]); };
+   $self->log('warn', "unable to disconnect NBD backup source '$path' - 
$@") if $@;
+}
+
+my $waited;
+my $wait_limit = 5;
+for ($waited = 0; $waited < $wait_limit && scalar(keys $cpids->%*); 
$waited++) {
+   while ((my $cpid = waitpid(-1, POSIX::WNOHANG)) > 0) {
+   delete($cpids->{$cpid});
+   }
+   if ($waited == 0) {
+   kill 15, $_ for keys $cpids->%*;
+   }
+   sleep 1;
+}
+if ($waited == $wait_limit && scalar(keys $cpids->%*)) {
+   kill 9, $_ for keys $cpids->%*;
+   sleep 1;
+   while ((my $cpid = waitpid(-1, POSIX::WNOHANG)) > 0) {
+   delete($cpids->{$cpid});
+   }
+   $self->log('warn', "unable to collect nbdinfo child process '$_'") for 
keys $cpids->%*;
+}
+}
+
+my sub block_device_backup_prepare {
+my ($self, $devicename, $size, $nbd_path, $bitmap_name, $count) = @_;
+
+my $nbd_info_uri = "nbd+unix:///${devicename}?socket=${nbd_path}";
+my $qemu_nbd_uri = "nbd:unix:$

[pve-devel] [POC storage v2 13/25] Borg plugin

2024-08-13 Thread Fiona Ebner
Archive names start with the guest type and ID and then the same
timestamp format as PBS.

Container archives have the following structure:
guest.config
firewall.config
filesystem/ # containing the whole filesystem structure

VM archives have the following structure
guest.config
firewall.config
volumes/ # containing a raw file for each device

A bindmount (respectively symlinks) are used to achieve this
structure, because Borg doesn't seem to support renaming on-the-fly.
(Prefix stripping via the "slashdot hack" would have helped slightly,
but is only in Borg >= 1.4
https://github.com/borgbackup/borg/actions/runs/7967940995)

NOTE: running via SSH was not yet tested. Bandwidth limit is not yet
honored and the task size is not calculated yet. Discard for VM
backups would also be nice to have, but it's not entirely clear how
(parsing progress and discarding according to that is one idea).
There is no dirty bitmap support, not sure if that is feasible to add.

Signed-off-by: Fiona Ebner 
---

New in v2.

 src/PVE/BackupProvider/Plugin/Borg.pm  | 373 ++
 src/PVE/BackupProvider/Plugin/Makefile |   2 +-
 src/PVE/Storage.pm |   2 +
 src/PVE/Storage/BorgBackupPlugin.pm| 506 +
 src/PVE/Storage/Makefile   |   1 +
 5 files changed, 883 insertions(+), 1 deletion(-)
 create mode 100644 src/PVE/BackupProvider/Plugin/Borg.pm
 create mode 100644 src/PVE/Storage/BorgBackupPlugin.pm

diff --git a/src/PVE/BackupProvider/Plugin/Borg.pm 
b/src/PVE/BackupProvider/Plugin/Borg.pm
new file mode 100644
index 000..b65321c
--- /dev/null
+++ b/src/PVE/BackupProvider/Plugin/Borg.pm
@@ -0,0 +1,373 @@
+package PVE::BackupProvider::Plugin::Borg;
+
+use strict;
+use warnings;
+
+use File::chdir;
+use File::Basename qw(basename);
+use File::Path qw(make_path remove_tree);
+use POSIX qw(strftime);
+
+use PVE::Tools;
+
+# ($vmtype, $vmid, $time_string)
+our $ARCHIVE_RE_3 = 
qr!^pve-(lxc|qemu)-([0-9]+)-([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z)$!;
+
+sub archive_name {
+my ($vmtype, $vmid, $backup_time) = @_;
+
+return "pve-${vmtype}-${vmid}-" . strftime("%FT%TZ", gmtime($backup_time));
+}
+
+# remove_tree can be very verbose by default, do explicit error handling and 
limit to one message
+my sub _remove_tree {
+my ($path) = @_;
+
+remove_tree($path, { error => \my $err });
+if ($err && @$err) { # empty array if no error
+   for my $diag (@$err) {
+   my ($file, $message) = %$diag;
+   die "cannot remove_tree '$path': $message\n" if $file eq '';
+   die "cannot remove_tree '$path': unlinking $file failed - 
$message\n";
+   }
+}
+}
+
+my sub prepare_run_dir {
+my ($archive, $operation) = @_;
+
+my $run_dir = "/run/pve-storage-borg-plugin/${archive}.${operation}";
+_remove_tree($run_dir);
+make_path($run_dir);
+die "unable to create directory $run_dir\n" if !-d $run_dir;
+
+return $run_dir;
+}
+
+my sub log_info {
+my ($self, $message) = @_;
+
+$self->{'log-function'}->('info', $message);
+}
+
+my sub log_warning {
+my ($self, $message) = @_;
+
+$self->{'log-function'}->('warn', $message);
+}
+
+my sub log_error {
+my ($self, $message) = @_;
+
+$self->{'log-function'}->('err', $message);
+}
+
+my sub file_contents_from_archive {
+my ($self, $archive, $file) = @_;
+
+my $run_dir = prepare_run_dir($archive, "file-contents");
+
+my $raw;
+
+eval {
+   local $CWD = $run_dir;
+
+   $self->{'storage-plugin'}->borg_cmd_extract(
+   $self->{scfg},
+   $self->{storeid},
+   $archive,
+   [$file],
+   );
+
+   $raw = PVE::Tools::file_get_contents("${run_dir}/${file}");
+};
+my $err = $@;
+eval { _remove_tree($run_dir); };
+log_warning($self, $@) if $@;
+die $err if $err;
+
+return $raw;
+}
+
+# Plugin implementation
+
+sub new {
+my ($class, $storage_plugin, $scfg, $storeid, $log_function) = @_;
+
+my $self = bless {
+   scfg => $scfg,
+   storeid => $storeid,
+   'storage-plugin' => $storage_plugin,
+   'log-function' => $log_function,
+}, $class;
+
+return $self;
+}
+
+sub provider_name {
+my ($self) = @_;
+
+return "Borg";
+}
+
+sub job_hook {
+my ($self, $phase, $info) = @_;
+
+if ($phase eq 'start') {
+   $self->{'job-id'} = $info->{'start-time'};
+}
+
+return;
+}
+
+sub backup_hook {
+my ($self, $phase, $vmid, $vmtype, $info) = @_;
+
+if ($phase eq 'start') {
+   $self->{$vmid}->{'task-size'} = 0;
+}
+
+return;
+}
+
+sub backup_get_mechanism {

[pve-devel] [RFC qemu v2 09/25] PVE backup: implement bitmap support for external backup access

2024-08-13 Thread Fiona Ebner
There can be one dirty bitmap for each backup target ID (which are
tracked in the backup_access_bitmaps hash table). The QMP user can
specify the ID of the bitmap it likes to use. This ID is then compared
to the current one for the given target. If they match, the bitmap is
re-used (should it still exist on the drive, otherwise re-created). If
there is a mismatch, the old bitmap is removed and a new one is
created.

The return value of the QMP command includes information about what
bitmap action was taken. Similar to what the query-backup QMP command
returns for regular backup. It also includes the bitmap name and
associated block node, so the management layer can then set up an NBD
export with the bitmap.

While the backup access is active, a background bitmap is also
required. This is necessary to implement bitmap handling according to
the original reference [0]. In particular:

- in the error case, new writes since the backup access was set up are
  in the background bitmap. Because of failure, the previously tracked
  writes from the backup access bitmap are still required too. Thus,
  the bitmap is merged with the background bitmap to get all new
  writes since the last backup.

- in the success case, continue tracking for the next incremental
  backup in the backup access bitmap. New writes since the backup
  access was set up are in the background bitmap. Because the backup
  was successfully, clear the backup access bitmap and merge back the
  background bitmap to get only the new writes.

Since QEMU cannot know if the backup was successful or not (except if
failure already happens during the setup QMP command), the management
layer needs to tell it via the teardown QMP command.

The bitmap action is also recorded in the device info now.

[0]: 
https://lore.kernel.org/qemu-devel/b68833dd-8864-4d72-7c61-c134a9835...@ya.ru/

Signed-off-by: Fiona Ebner 
---

No changes in v2.

 pve-backup.c | 175 ++-
 pve-backup.h |   2 +-
 qapi/block-core.json |  22 +-
 system/runstate.c|   2 +-
 4 files changed, 193 insertions(+), 8 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index d3370d6744..5f8dd396d5 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -15,6 +15,7 @@
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/cutils.h"
+#include "qemu/error-report.h"
 
 #if defined(CONFIG_MALLOC_TRIM)
 #include 
@@ -41,6 +42,7 @@
  */
 
 const char *PBS_BITMAP_NAME = "pbs-incremental-dirty-bitmap";
+const char *BACKGROUND_BITMAP_NAME = "backup-access-background-bitmap";
 
 static struct PVEBackupState {
 struct {
@@ -72,6 +74,7 @@ static struct PVEBackupState {
 CoMutex backup_mutex;
 CoMutex dump_callback_mutex;
 char *target_id;
+GHashTable *backup_access_bitmaps; // key=target_id, value=bitmap_name
 } backup_state;
 
 static void pvebackup_init(void)
@@ -99,6 +102,8 @@ typedef struct PVEBackupDevInfo {
 char* device_name;
 int completed_ret; // INT_MAX if not completed
 BdrvDirtyBitmap *bitmap;
+BdrvDirtyBitmap *background_bitmap; // used for external backup access
+PBSBitmapAction bitmap_action;
 BlockDriverState *target;
 BlockJob *job;
 } PVEBackupDevInfo;
@@ -362,6 +367,67 @@ static void coroutine_fn pvebackup_co_complete_stream(void 
*opaque)
 qemu_co_mutex_unlock(&backup_state.backup_mutex);
 }
 
+/*
+ * New writes since the backup access was set up are in the background bitmap. 
Because of failure,
+ * the previously tracked writes in di->bitmap are still required too. Thus, 
merge with the
+ * background bitmap to get all new writes since the last backup.
+ */
+static void handle_backup_access_bitmaps_in_error_case(PVEBackupDevInfo *di)
+{
+Error *local_err = NULL;
+
+if (di->bs && di->background_bitmap) {
+bdrv_drained_begin(di->bs);
+if (di->bitmap) {
+bdrv_enable_dirty_bitmap(di->bitmap);
+if (!bdrv_merge_dirty_bitmap(di->bitmap, di->background_bitmap, 
NULL, &local_err)) {
+warn_report("backup access: %s - could not merge bitmaps in 
error path - %s",
+di->device_name,
+local_err ? error_get_pretty(local_err) : "unknown 
error");
+/*
+ * Could not merge, drop original bitmap too.
+ */
+bdrv_release_dirty_bitmap(di->bitmap);
+}
+} else {
+warn_report("backup access: %s - expected bitmap not present", 
di->device_name);
+}
+bdrv_release_dirty_bitmap(di->background_bitmap);
+bdrv_drained_end(di->bs);
+}
+}
+
+/*
+ * Continue tracking for next incremental backup in di->bitmap. New writes 
since the backup access
+ * was set up are in the background

[pve-devel] [RFC qemu-server v2 18/25] backup: allow adding fleecing images also for EFI and TPM

2024-08-13 Thread Fiona Ebner
For the external backup API, it will be necessary to add a fleecing
image even for small disks like EFI and TPM, because there is no other
place the old data could be copied to when a new guest write comes in.

Signed-off-by: Fiona Ebner 
---

No changes in v2.

 PVE/VZDump/QemuServer.pm | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 98685127..4ad4a154 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -548,7 +548,7 @@ my sub cleanup_fleecing_images {
 }
 
 my sub allocate_fleecing_images {
-my ($self, $disks, $vmid, $fleecing_storeid, $format) = @_;
+my ($self, $disks, $vmid, $fleecing_storeid, $format, $all_images) = @_;
 
 die "internal error - no fleecing storage specified\n" if 
!$fleecing_storeid;
 
@@ -559,7 +559,8 @@ my sub allocate_fleecing_images {
my $n = 0; # counter for fleecing image names
 
for my $di ($disks->@*) {
-   next if $di->{virtdev} =~ m/^(?:tpmstate|efidisk)\d$/; # too small 
to be worth it
+   # EFI/TPM are usually too small to be worth it, but it's required 
for external providers
+   next if !$all_images && $di->{virtdev} =~ 
m/^(?:tpmstate|efidisk)\d$/;
if ($di->{type} eq 'block' || $di->{type} eq 'file') {
my $scfg = PVE::Storage::storage_config($self->{storecfg}, 
$fleecing_storeid);
my $name = "vm-$vmid-fleece-$n";
@@ -623,7 +624,7 @@ my sub attach_fleecing_images {
 }
 
 my sub check_and_prepare_fleecing {
-my ($self, $vmid, $fleecing_opts, $disks, $is_template, $qemu_support) = 
@_;
+my ($self, $vmid, $fleecing_opts, $disks, $is_template, $qemu_support, 
$all_images) = @_;
 
 # Even if the VM was started specifically for fleecing, it's possible that 
the VM is resumed and
 # then starts doing IO. For VMs that are not resumed the fleecing images 
will just stay empty,
@@ -644,7 +645,8 @@ my sub check_and_prepare_fleecing {
$self->{storecfg}, $fleecing_opts->{storage});
my $format = scalar(grep { $_ eq 'qcow2' } $valid_formats->@*) ? 
'qcow2' : 'raw';
 
-   allocate_fleecing_images($self, $disks, $vmid, 
$fleecing_opts->{storage}, $format);
+   allocate_fleecing_images(
+   $self, $disks, $vmid, $fleecing_opts->{storage}, $format, 
$all_images);
attach_fleecing_images($self, $disks, $vmid, $format);
 }
 
@@ -735,7 +737,7 @@ sub archive_pbs {
my $is_template = 
PVE::QemuConfig->is_template($self->{vmlist}->{$vmid});
 
$task->{'use-fleecing'} = check_and_prepare_fleecing(
-   $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, 
$qemu_support);
+   $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, 
$qemu_support, 0);
 
my $fs_frozen = $self->qga_fs_freeze($task, $vmid);
 
@@ -919,7 +921,7 @@ sub archive_vma {
$attach_tpmstate_drive->($self, $task, $vmid);
 
$task->{'use-fleecing'} = check_and_prepare_fleecing(
-   $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, 
$qemu_support);
+   $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, 
$qemu_support, 0);
 
my $outfh;
if ($opts->{stdout}) {
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC qemu v2 08/25] PVE backup: implement backup access setup and teardown API for external providers

2024-08-13 Thread Fiona Ebner
For external backup providers, the state of the VM's disk images at
the time the backup is started is preserved via a snapshot-access
block node. Old data is moved to the fleecing image when new guest
writes come in. The snapshot-access block node, as well as the
associated bitmap in case of incremental backup, will be exported via
NBD to the external provider. The NBD export will be done by the
management layer, the missing functionality is setting up and tearing
down the snapshot-access block nodes, which this patch adds.

It is necessary to also set up fleecing for EFI and TPM disks, so that
old data can be moved out of the way when a new guest write comes in.

There can only be one regular backup or one active backup access at
a time, because both require replacing the original block node of the
drive. Thus the backup state is re-used, and checks are added to
prohibit regular backup while snapshot access is active and vice
versa.

The block nodes added by the backup-access-setup QMP call are not
tracked anywhere else (there is no job they are associated to like for
regular backup). This requires adding a callback for teardown when
QEMU exits, i.e. in qemu_cleanup(). Otherwise, there will be an
assertion failure that the block graph is not empty when QEMU exits
before the backup-access-teardown QMP command is called.

The code for the qmp_backup_access_setup() was based on the existing
qmp_backup() routine.

The return value for the setup QMP command contains information about
the snapshot-access block nodes that can be used by the management
layer to set up the NBD exports.

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Also return the size of the block devices in the setup call.

 pve-backup.c | 273 +++
 pve-backup.h |  16 +++
 qapi/block-core.json |  45 +++
 system/runstate.c|   6 +
 4 files changed, 340 insertions(+)
 create mode 100644 pve-backup.h

diff --git a/pve-backup.c b/pve-backup.c
index d0593fc581..d3370d6744 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -1,4 +1,5 @@
 #include "proxmox-backup-client.h"
+#include "pve-backup.h"
 #include "vma.h"
 
 #include "qemu/osdep.h"
@@ -585,6 +586,37 @@ static int setup_snapshot_access(PVEBackupDevInfo *di, 
Error **errp)
 return 0;
 }
 
+static void setup_all_snapshot_access_bh(void *opaque)
+{
+assert(!qemu_in_coroutine());
+
+CoCtxData *data = (CoCtxData*)opaque;
+Error **errp = (Error**)data->data;
+
+Error *local_err = NULL;
+
+GList *l =  backup_state.di_list;
+while (l) {
+PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+l = g_list_next(l);
+
+bdrv_drained_begin(di->bs);
+
+if (setup_snapshot_access(di, &local_err) < 0) {
+cleanup_snapshot_access(di);
+bdrv_drained_end(di->bs);
+error_setg(errp, "%s - setting up snapshot access failed: %s", 
di->device_name,
+   local_err ? error_get_pretty(local_err) : "unknown 
error");
+break;
+}
+
+bdrv_drained_end(di->bs);
+}
+
+/* return */
+aio_co_enter(data->ctx, data->co);
+}
+
 /*
  * backup_job_create can *not* be run from a coroutine, so this can't either.
  * The caller is responsible that backup_mutex is held nonetheless.
@@ -722,6 +754,11 @@ static bool fleecing_no_efi_tpm(const char *device_id)
 return strncmp(device_id, "drive-efidisk", 13) && strncmp(device_id, 
"drive-tpmstate", 14);
 }
 
+static bool fleecing_all(const char *device_id)
+{
+return true;
+}
+
 /*
  * Returns a list of device infos, which needs to be freed by the caller. In
  * case of an error, errp will be set, but the returned value might still be a
@@ -810,6 +847,242 @@ err:
 return di_list;
 }
 
+BackupAccessInfoList *coroutine_fn qmp_backup_access_setup(
+const char *target_id,
+const char *devlist,
+Error **errp)
+{
+assert(qemu_in_coroutine());
+
+qemu_co_mutex_lock(&backup_state.backup_mutex);
+
+Error *local_err = NULL;
+GList *di_list = NULL;
+GList *l;
+
+if (backup_state.di_list) {
+error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+  "previous backup by provider '%s' not finished", 
backup_state.target_id);
+qemu_co_mutex_unlock(&backup_state.backup_mutex);
+return NULL;
+}
+
+bdrv_graph_co_rdlock();
+di_list = get_device_info(devlist, fleecing_all, &local_err);
+bdrv_graph_co_rdunlock();
+if (local_err) {
+error_propagate(errp, local_err);
+goto err;
+}
+assert(di_list);
+
+size_t total = 0;
+
+l = di_list;
+while (l) {
+PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+l = g_list_next(l);
+
+ssize_t size = bdrv_getlength(di->bs);
+if (size < 0

[pve-devel] [PATCH manager v2 24/25] ui: backup: also check for backup subtype to classify archive

2024-08-13 Thread Fiona Ebner
In anticipation of future storage plugins that might not have
PBS-specific formats or adhere to the vzdump naming scheme for
backups.

Signed-off-by: Fiona Ebner 
---

No changes in v2.

 www/manager6/Utils.js  | 10 ++
 www/manager6/grid/BackupView.js|  4 ++--
 www/manager6/storage/BackupView.js |  4 ++--
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/www/manager6/Utils.js b/www/manager6/Utils.js
index db86fa9a..a8e4e8ee 100644
--- a/www/manager6/Utils.js
+++ b/www/manager6/Utils.js
@@ -693,12 +693,14 @@ Ext.define('PVE.Utils', {
'snippets': gettext('Snippets'),
 },
 
-volume_is_qemu_backup: function(volid, format) {
-   return format === 'pbs-vm' || volid.match(':backup/vzdump-qemu-');
+volume_is_qemu_backup: function(volume) {
+   return volume.format === 'pbs-vm' || 
volume.volid.match(':backup/vzdump-qemu-') ||
+   volume.subtype === 'qemu';
 },
 
-volume_is_lxc_backup: function(volid, format) {
-   return format === 'pbs-ct' || 
volid.match(':backup/vzdump-(lxc|openvz)-');
+volume_is_lxc_backup: function(volume) {
+   return volume.format === 'pbs-ct' || 
volume.volid.match(':backup/vzdump-(lxc|openvz)-') ||
+   volume.subtype === 'lxc';
 },
 
 authSchema: {
diff --git a/www/manager6/grid/BackupView.js b/www/manager6/grid/BackupView.js
index e71d1c88..ef3649c6 100644
--- a/www/manager6/grid/BackupView.js
+++ b/www/manager6/grid/BackupView.js
@@ -29,11 +29,11 @@ Ext.define('PVE.grid.BackupView', {
var vmtypeFilter;
if (vmtype === 'lxc' || vmtype === 'openvz') {
vmtypeFilter = function(item) {
-   return PVE.Utils.volume_is_lxc_backup(item.data.volid, 
item.data.format);
+   return PVE.Utils.volume_is_lxc_backup(item.data);
};
} else if (vmtype === 'qemu') {
vmtypeFilter = function(item) {
-   return PVE.Utils.volume_is_qemu_backup(item.data.volid, 
item.data.format);
+   return PVE.Utils.volume_is_qemu_backup(item.data);
};
} else {
throw "unsupported VM type '" + vmtype + "'";
diff --git a/www/manager6/storage/BackupView.js 
b/www/manager6/storage/BackupView.js
index 878e1c8f..ad6e6a01 100644
--- a/www/manager6/storage/BackupView.js
+++ b/www/manager6/storage/BackupView.js
@@ -84,9 +84,9 @@ Ext.define('PVE.storage.BackupView', {
disabled: true,
handler: function(b, e, rec) {
let vmtype;
-   if (PVE.Utils.volume_is_qemu_backup(rec.data.volid, 
rec.data.format)) {
+   if (PVE.Utils.volume_is_qemu_backup(rec.data)) {
vmtype = 'qemu';
-   } else if (PVE.Utils.volume_is_lxc_backup(rec.data.volid, 
rec.data.format)) {
+   } else if (PVE.Utils.volume_is_lxc_backup(rec.data)) {
vmtype = 'lxc';
} else {
return;
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu-server v2 15/25] backup: move cleanup of fleecing images to cleanup method

2024-08-13 Thread Fiona Ebner
TPM drives are already detached there and it's better to group
these things together.

Signed-off-by: Fiona Ebner 
---

No changes in v2.

 PVE/VZDump/QemuServer.pm | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 012c9210..b2ced154 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -690,7 +690,6 @@ sub archive_pbs {
 
 # get list early so we die on unkown drive types before doing anything
 my $devlist = _get_task_devlist($task);
-my $use_fleecing;
 
 $self->enforce_vm_running_for_backup($vmid);
 $self->{qmeventd_fh} = PVE::QemuServer::register_qmeventd_handle($vmid);
@@ -721,7 +720,7 @@ sub archive_pbs {
 
my $is_template = 
PVE::QemuConfig->is_template($self->{vmlist}->{$vmid});
 
-   $use_fleecing = check_and_prepare_fleecing(
+   $task->{'use-fleecing'} = check_and_prepare_fleecing(
$self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, 
$qemu_support);
 
my $fs_frozen = $self->qga_fs_freeze($task, $vmid);
@@ -735,7 +734,7 @@ sub archive_pbs {
devlist => $devlist,
'config-file' => $conffile,
};
-   $params->{fleecing} = JSON::true if $use_fleecing;
+   $params->{fleecing} = JSON::true if $task->{'use-fleecing'};
 
if (defined(my $ns = $scfg->{namespace})) {
$params->{'backup-ns'} = $ns;
@@ -784,11 +783,6 @@ sub archive_pbs {
 }
 $self->restore_vm_power_state($vmid);
 
-if ($use_fleecing) {
-   detach_fleecing_images($task->{disks}, $vmid);
-   cleanup_fleecing_images($self, $task->{disks});
-}
-
 die $err if $err;
 }
 
@@ -891,7 +885,6 @@ sub archive_vma {
 }
 
 my $devlist = _get_task_devlist($task);
-my $use_fleecing;
 
 $self->enforce_vm_running_for_backup($vmid);
 $self->{qmeventd_fh} = PVE::QemuServer::register_qmeventd_handle($vmid);
@@ -911,7 +904,7 @@ sub archive_vma {
 
$attach_tpmstate_drive->($self, $task, $vmid);
 
-   $use_fleecing = check_and_prepare_fleecing(
+   $task->{'use-fleecing'} = check_and_prepare_fleecing(
$self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, 
$qemu_support);
 
my $outfh;
@@ -942,7 +935,7 @@ sub archive_vma {
devlist => $devlist
};
$params->{'firewall-file'} = $firewall if -e $firewall;
-   $params->{fleecing} = JSON::true if $use_fleecing;
+   $params->{fleecing} = JSON::true if $task->{'use-fleecing'};
add_backup_performance_options($params, $opts->{performance}, 
$qemu_support);
 
$qmpclient->queue_cmd($vmid, $backup_cb, 'backup', %$params);
@@ -984,11 +977,6 @@ sub archive_vma {
 
 $self->restore_vm_power_state($vmid);
 
-if ($use_fleecing) {
-   detach_fleecing_images($task->{disks}, $vmid);
-   cleanup_fleecing_images($self, $task->{disks});
-}
-
 if ($err) {
if ($cpid) {
kill(9, $cpid);
@@ -1132,6 +1120,11 @@ sub cleanup {
 
 $detach_tpmstate_drive->($task, $vmid);
 
+if ($task->{'use-fleecing'}) {
+   detach_fleecing_images($task->{disks}, $vmid);
+   cleanup_fleecing_images($self, $task->{disks});
+}
+
 if ($self->{qmeventd_fh}) {
close($self->{qmeventd_fh});
 }
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu-server v2 16/25] backup: cleanup: check if VM is running before issuing QMP commands

2024-08-13 Thread Fiona Ebner
When the VM is only started for backup, the VM will be stopped at that
point again. While the detach helpers do not warn about errors
currently, that might change in the future. This is also in
preparation for other cleanup QMP helpers that are more verbose about
failure.

Signed-off-by: Fiona Ebner 
---

No changes in v2.

 PVE/VZDump/QemuServer.pm | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index b2ced154..c46e607c 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -1118,13 +1118,14 @@ sub snapshot {
 sub cleanup {
 my ($self, $task, $vmid) = @_;
 
-$detach_tpmstate_drive->($task, $vmid);
-
-if ($task->{'use-fleecing'}) {
-   detach_fleecing_images($task->{disks}, $vmid);
-   cleanup_fleecing_images($self, $task->{disks});
+# If VM was started only for backup, it is already stopped now.
+if (PVE::QemuServer::Helpers::vm_running_locally($vmid)) {
+   $detach_tpmstate_drive->($task, $vmid);
+   detach_fleecing_images($task->{disks}, $vmid) if 
$task->{'use-fleecing'};
 }
 
+cleanup_fleecing_images($self, $task->{disks}) if $task->{'use-fleecing'};
+
 if ($self->{qmeventd_fh}) {
close($self->{qmeventd_fh});
 }
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu-server v2 20/25] restore: die early when there is no size for a device

2024-08-13 Thread Fiona Ebner
Makes it a clean error for buggy (external) backup providers where the
size might not be set at all.

Signed-off-by: Fiona Ebner 
---

No changes in v2.

 PVE/QemuServer.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index e5ff5efb..37f56f69 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6755,6 +6755,7 @@ my $restore_allocate_devices = sub {
 my $map = {};
 foreach my $virtdev (sort keys %$virtdev_hash) {
my $d = $virtdev_hash->{$virtdev};
+   die "got no size for '$virtdev'\n" if !defined($d->{size});
my $alloc_size = int(($d->{size} + 1024 - 1)/1024);
my $storeid = $d->{storeid};
my $scfg = PVE::Storage::storage_config($storecfg, $storeid);
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC qemu v2 07/25] PVE backup: get device info: allow caller to specify filter for which devices use fleecing

2024-08-13 Thread Fiona Ebner
For providing snapshot-access to external backup providers, EFI and
TPM also need an associated fleecing image. The new caller will thus
need a different filter.

Signed-off-by: Fiona Ebner 
---

No changes in v2.

 pve-backup.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index e8031bb89c..d0593fc581 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -717,7 +717,7 @@ static void create_backup_jobs_bh(void *opaque) {
 /*
  * EFI disk and TPM state are small and it's just not worth setting up 
fleecing for them.
  */
-static bool device_uses_fleecing(const char *device_id)
+static bool fleecing_no_efi_tpm(const char *device_id)
 {
 return strncmp(device_id, "drive-efidisk", 13) && strncmp(device_id, 
"drive-tpmstate", 14);
 }
@@ -729,7 +729,7 @@ static bool device_uses_fleecing(const char *device_id)
  */
 static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
 const char *devlist,
-bool fleecing,
+bool (*device_uses_fleecing)(const char*),
 Error **errp)
 {
 gchar **devs = NULL;
@@ -755,7 +755,7 @@ static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
 di->bs = bs;
 di->device_name = g_strdup(bdrv_get_device_name(bs));
 
-if (fleecing && device_uses_fleecing(*d)) {
+if (device_uses_fleecing && device_uses_fleecing(*d)) {
 g_autofree gchar *fleecing_devid = g_strconcat(*d, 
"-fleecing", NULL);
 BlockBackend *fleecing_blk = blk_by_name(fleecing_devid);
 if (!fleecing_blk) {
@@ -858,7 +858,8 @@ UuidInfo coroutine_fn *qmp_backup(
 format = has_format ? format : BACKUP_FORMAT_VMA;
 
 bdrv_graph_co_rdlock();
-di_list = get_device_info(devlist, has_fleecing && fleecing, &local_err);
+di_list = get_device_info(devlist, (has_fleecing && fleecing) ? 
fleecing_no_efi_tpm : NULL,
+  &local_err);
 bdrv_graph_co_rdunlock();
 if (local_err) {
 error_propagate(errp, local_err);
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC qemu v2 06/25] PVE backup: add target ID in backup state

2024-08-13 Thread Fiona Ebner
In preparation for allowing multiple backup providers. Each backup
target can then have its own dirty bitmap and there can be additional
checks that the current backup state is actually associated to the
expected target.

Signed-off-by: Fiona Ebner 
---

No changes in v2.

 pve-backup.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/pve-backup.c b/pve-backup.c
index d931746453..e8031bb89c 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -70,6 +70,7 @@ static struct PVEBackupState {
 JobTxn *txn;
 CoMutex backup_mutex;
 CoMutex dump_callback_mutex;
+char *target_id;
 } backup_state;
 
 static void pvebackup_init(void)
@@ -848,7 +849,7 @@ UuidInfo coroutine_fn *qmp_backup(
 
 if (backup_state.di_list) {
 error_set(errp, ERROR_CLASS_GENERIC_ERROR,
-  "previous backup not finished");
+  "previous backup by provider '%s' not finished", 
backup_state.target_id);
 qemu_co_mutex_unlock(&backup_state.backup_mutex);
 return NULL;
 }
@@ -1100,6 +1101,11 @@ UuidInfo coroutine_fn *qmp_backup(
 backup_state.vmaw = vmaw;
 backup_state.pbs = pbs;
 
+if (backup_state.target_id) {
+g_free(backup_state.target_id);
+}
+backup_state.target_id = g_strdup("Proxmox");
+
 backup_state.di_list = di_list;
 
 uuid_info = g_malloc0(sizeof(*uuid_info));
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu v2 05/25] PVE backup: include device name in error when setting up snapshot access fails

2024-08-13 Thread Fiona Ebner
Signed-off-by: Fiona Ebner 
---

No changes in v2.

 pve-backup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/pve-backup.c b/pve-backup.c
index 33c23e53c2..d931746453 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -626,7 +626,8 @@ static void create_backup_jobs_bh(void *opaque) {
 bool discard_source = false;
 if (di->fleecing.bs) {
 if (setup_snapshot_access(di, &local_err) < 0) {
-error_setg(errp, "setting up snapshot access for fleecing 
failed: %s",
+error_setg(errp, "%s - setting up snapshot access for fleecing 
failed: %s",
+   di->device_name,
local_err ? error_get_pretty(local_err) : "unknown 
error");
 cleanup_snapshot_access(di);
 bdrv_drained_end(di->bs);
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu v2 01/25] block/reqlist: allow adding overlapping requests

2024-08-13 Thread Fiona Ebner
Allow overlapping request by removing the assert that made it
impossible. There are only two callers:

1. block_copy_task_create()

It already asserts the very same condition before calling
reqlist_init_req().

2. cbw_snapshot_read_lock()

There is no need to have read requests be non-overlapping in
copy-before-write when used for snapshot-access. In fact, there was no
protection against two callers of cbw_snapshot_read_lock() calling
reqlist_init_req() with overlapping ranges and this could lead to an
assertion failure [1].

In particular, with the reproducer script below [0], two
cbw_co_snapshot_block_status() callers could race, with the second
calling reqlist_init_req() before the first one finishes and removes
its conflicting request.

[0]:

> #!/bin/bash -e
> dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024
> ./qemu-img create /tmp/fleecing.raw -f raw 1G
> (
> ./qemu-system-x86_64 --qmp stdio \
> --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \
> --blockdev 
> raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \
> < {"execute": "qmp_capabilities"}
> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", 
> "file": "node0", "target": "node1", "node-name": "node3" } }
> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", 
> "file": "node3", "node-name": "snap0" } }
> {"execute": "nbd-server-start", "arguments": {"addr": { "type": "unix", 
> "data": { "path": "/tmp/nbd.socket" } } } }
> {"execute": "block-export-add", "arguments": {"id": "exp0", "node-name": 
> "snap0", "type": "nbd", "name": "exp0"}}
> EOF
> ) &
> sleep 5
> while true; do
> ./qemu-nbd -d /dev/nbd0
> ./qemu-nbd -c /dev/nbd0 nbd:unix:/tmp/nbd.socket:exportname=exp0 -f raw -r
> nbdinfo --map 'nbd+unix:///exp0?socket=/tmp/nbd.socket'
> done

[1]:

> #5  0x71e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101
> #6  0x615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23
> #7  0x6152853e2d98 in cbw_snapshot_read_lock (...) at 
> ../block/copy-before-write.c:237
> #8  0x6152853e3068 in cbw_co_snapshot_block_status (...) at 
> ../block/copy-before-write.c:304
> #9  0x6152853f4d22 in bdrv_co_snapshot_block_status (...) at 
> ../block/io.c:3726
> #10 0x61528543a63e in snapshot_access_co_block_status (...) at 
> ../block/snapshot-access.c:48
> #11 0x6152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474
> #12 0x6152853f2016 in bdrv_co_common_block_status_above (...) at 
> ../block/io.c:2652
> #13 0x6152853f22cf in bdrv_co_block_status_above (...) at 
> ../block/io.c:2732
> #14 0x6152853d9a86 in blk_co_block_status_above (...) at 
> ../block/block-backend.c:1473
> #15 0x61528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374
> #16 0x61528538deb1 in nbd_co_send_block_status (...) at 
> ../nbd/server.c:2481
> #17 0x61528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978
> #18 0x61528538f906 in nbd_trip (...) at ../nbd/server.c:3121
> #19 0x6152855a7caf in coroutine_trampoline (...) at 
> ../util/coroutine-ucontext.c:175

Cc: qemu-sta...@nongnu.org
Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Fiona Ebner 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

No changes in v2.

 block/copy-before-write.c | 3 ++-
 block/reqlist.c   | 2 --
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 50cc4c7aae..a5bb4d14f6 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -67,7 +67,8 @@ typedef struct BDRVCopyBeforeWriteState {
 
 /*
  * @frozen_read_reqs: current read requests for fleecing user in bs->file
- * node. These areas must not be rewritten by guest.
+ * node. These areas must not be rewritten by guest. There can be multiple
+ * overlapping read requests.
  */
 BlockReqList frozen_read_reqs;
 
diff --git a/block/reqlist.c b/block/reqlist.c
index 08cb57cfa4..098e807378 100644
--- a/block/reqlist.c
+++ b/block/reqlist.c
@@ -20,8 +20,6 @@
 void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
   int64_t bytes)
 {
-assert(!reqlist_find_conflict(reqs, offset, bytes));
-
 *req = (BlockReq) {
 .offset = offset,
 .bytes = bytes,
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC storage v2 10/25] plugin: introduce new_backup_provider() method

2024-08-13 Thread Fiona Ebner
The new_backup_provider() method can be used by storage plugins for
external backup providers. If the method returns a provider, Proxmox
VE will use callbacks to that provider for backups and restore instead
of using its usual backup/restore mechanisms.

API age and version are both bumped.

The backup provider API is split into two parts, both of which again
need different implementations for VM and LXC guests:

1. Backup API

There are two hook callback functions, namely:
1. job_hook() is called during the start/end/abort phases of the
   whole backup job.
2. backup_hook() is called during the start/end/abort phases of the
   backup of an individual guest.

The backup_get_mechanism() method is used to decide on the backup
mechanism. Currently, 'block-device' or 'nbd' for VMs, and 'directory'
for containers is possible. The method also let's the plugin indicate
whether to use a bitmap for incremental VM backup or not. It is enough
to implement one mechanism for VMs and one mechanism for containers.

Next, there are methods for backing up the guest's configuration and
data, backup_vm() for VM backup and backup_container() for container
backup.

Finally, some helpers like getting the provider name or volume ID for
the backup target, as well as for handling the backup log.

1.1 Backup Mechanisms

VM:

Access to the data on the VM's disk from the time the backup started
is made available via a so-called "snapshot access". This is either
the full image, or in case a bitmap is used, the dirty parts of the
image since the last time the bitmap was used for a successful backup.
Reading outside of the dirty parts will result in an error. After
backing up each part of the disk, it should be discarded in the export
to avoid unnecessary space usage on the Proxmox VE side (there is an
associated fleecing image).

VM mechanism 'block-device':

The snapshot access is exposed as a block device. If used, a bitmap is
passed along.

VM mechanism 'nbd':

The snapshot access and, if used, bitmap are exported via NBD.

Container mechanism 'directory':

A copy or snapshot of the container's filesystem state is made
available as a directory.

2. Restore API

The restore_get_mechanism() method is used to decide on the restore
mechanism. Currently, 'qemu-img' for VMs, and 'directory' or 'tar' for
containers are possible. It is enough to implement one mechanism for
VMs and one mechanism for containers.

Next, methods for extracting the guest and firewall configuration and
the implementations of the restore mechanism via a pair of methods: an
init method, for making the data available to Proxmox VE and a cleanup
method that is called after restore.

For VMs, there also is a restore_vm_get_device_info() helper required,
to get the disks included in the backup and their sizes.

2.1. Restore Mechanisms

VM mechanism 'qemu-img':

The backup provider gives a path to the disk image that will be
restored. The path needs to be something 'qemu-img' can deal with,
e.g. can also be an NBD URI or similar.

Container mechanism 'directory':

The backup provider gives the path to a directory with the full
filesystem structure of the container.

Container mechanism 'directory':

The backup provider gives the path to a (potentially compressed) tar
archive with the full filesystem structure of the container.

See the PVE::BackupProvider::Plugin module for the full API
documentation.

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Merge hook API into a single function for backup and for jobs.
* Add restore_vm_init() and restore_vm_cleanup() for better
  flexibility to allow preparing the whole restore. Question is
  if restore_vm_volume_init() and restore_vm_volume_cleanup() should
  be dropped (but certain providers might prefer using only those)?
  Having both is more flexible, but makes the API longer of course.
* Switch to backup_vm() (was per-volume backup_vm_volume() before) and
  backup_container(), passing along the configuration files, rather
  than having dedicated methods for the configuration files, for
  giving the backup provider more flexibility.
* Pass backup time to backup 'start' hook and use that in the
  directory example rather than the job start time.
* Use POD for base plugin documentation and flesh out documentation.
* Use 'BackupProvider::Plugin::' namespace.
* Rename $exclude_paths to $exclude_patterns and clarify what should
  be supported.
* Rename backup_get_target() method to backup_get_archive_name() for
  clarity.
* Rename extract_guest_config() to restore_get_guest_config() for
  better consistency with other (restore) API methods.

 src/PVE/BackupProvider/Makefile|3 +
 src/PVE/BackupProvider/Plugin/Base.pm  | 1149 
 src/PVE/BackupProvider/Plugin/Makefile |5 +
 src/PVE/Makefile   

[pve-devel] [POC storage v2 12/25] add backup provider example

2024-08-13 Thread Fiona Ebner
The example uses a simple directory structure to save the backups,
grouped by guest ID. VM backups are saved as configuration files and
qcow2 images, with backing files when doing incremental backups.
Container backups are saved as configuration files and a tar file or
squashfs image (added to test the 'directory' restore mechanism).

Whether to use incremental VM backups and which backup mechanisms to
use can be configured in the storage configuration.

The 'nbdinfo' binary from the 'libnbd-bin' package is required for
backup mechanism 'nbd' for VM backups, the 'mksquashfs' binary from
the 'squashfs-tools' package is required for backup mechanism
'squashfs' for containers.

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Adapt to API changes.
* Add log function helpers.
* Use make_path and remove_tree instead of deprecated variants.
* Add support for 'block-device' backup mechanism for VMs.
* Use backup time for guest rather than the job start time in the
  archive name.

 .../BackupProvider/Plugin/DirectoryExample.pm | 694 ++
 src/PVE/BackupProvider/Plugin/Makefile|   2 +-
 .../Custom/BackupProviderDirExamplePlugin.pm  | 306 
 src/PVE/Storage/Custom/Makefile   |   5 +
 src/PVE/Storage/Makefile  |   1 +
 5 files changed, 1007 insertions(+), 1 deletion(-)
 create mode 100644 src/PVE/BackupProvider/Plugin/DirectoryExample.pm
 create mode 100644 src/PVE/Storage/Custom/BackupProviderDirExamplePlugin.pm
 create mode 100644 src/PVE/Storage/Custom/Makefile

diff --git a/src/PVE/BackupProvider/Plugin/DirectoryExample.pm 
b/src/PVE/BackupProvider/Plugin/DirectoryExample.pm
new file mode 100644
index 000..b01ad02
--- /dev/null
+++ b/src/PVE/BackupProvider/Plugin/DirectoryExample.pm
@@ -0,0 +1,694 @@
+package PVE::BackupProvider::Plugin::DirectoryExample;
+
+use strict;
+use warnings;
+
+use Fcntl qw(SEEK_SET);
+use File::Path qw(make_path remove_tree);
+use IO::File;
+use IPC::Open3;
+
+use PVE::Storage::Plugin;
+use PVE::Tools qw(file_get_contents file_read_firstline file_set_contents 
run_command);
+
+use base qw(PVE::BackupProvider::Plugin::Base);
+
+use constant {
+BLKDISCARD => 0x1277, # see linux/fs.h
+};
+
+# Private helpers
+
+my sub log_info {
+my ($self, $message) = @_;
+
+$self->{'log-function'}->('info', $message);
+}
+
+my sub log_warning {
+my ($self, $message) = @_;
+
+$self->{'log-function'}->('warn', $message);
+}
+
+my sub log_error {
+my ($self, $message) = @_;
+
+$self->{'log-function'}->('err', $message);
+}
+
+# Try to use the same bitmap ID as last time for incremental backup if the 
storage is configured for
+# incremental VM backup. Need to start fresh if there is no previous ID or the 
associated backup
+# doesn't exist.
+my sub get_bitmap_id {
+my ($self, $vmid, $vmtype) = @_;
+
+return if $self->{'storage-plugin'}->get_vm_backup_mode($self->{scfg}) ne 
'incremental';
+
+my $previous_info_dir = "$self->{scfg}->{path}/$vmid/";
+
+my $previous_info_file = "$previous_info_dir/previous-info";
+my $info = file_read_firstline($previous_info_file) // '';
+$self->{$vmid}->{'old-previous-info'} = $info;
+my ($bitmap_id, $previous_backup_id) = $info =~ m/^(\d+)\s+(\d+)$/;
+my $previous_backup_dir =
+   $previous_backup_id ? 
"$self->{scfg}->{path}/$vmid/$vmtype-$previous_backup_id" : undef;
+
+if ($bitmap_id && -d $previous_backup_dir) {
+   $self->{$vmid}->{'previous-backup-dir'} = $previous_backup_dir;
+} else {
+   # need to start fresh if there is no previous ID or the associated 
backup doesn't exist
+   $bitmap_id = $self->{$vmid}->{'backup-time'};
+}
+
+$self->{$vmid}->{'bitmap-id'} = $bitmap_id;
+make_path($previous_info_dir);
+die "unable to create directory $previous_info_dir\n" if !-d 
$previous_info_dir;
+file_set_contents($previous_info_file, "$bitmap_id 
$self->{$vmid}->{'backup-time'}");
+
+return $bitmap_id;
+}
+
+# Backup Provider API
+
+sub new {
+my ($class, $storage_plugin, $scfg, $storeid, $log_function) = @_;
+
+my $self = bless {
+   scfg => $scfg,
+   storeid => $storeid,
+   'storage-plugin' => $storage_plugin,
+   'log-function' => $log_function,
+}, $class;
+
+return $self;
+}
+
+sub provider_name {
+my ($self) = @_;
+
+return 'dir provider example';
+}
+
+# Hooks
+
+my sub job_start {
+my ($self, $start_time) = @_;
+
+log_info($self, "job start hook called");
+
+run_command(["modprobe", "nbd"]);
+
+log_info($self, &

[pve-devel] [RFC manager v2 25/25] backup: implement backup for external providers

2024-08-13 Thread Fiona Ebner
Hooks from the backup provider are called during start/end/abort for
both job and backup. And it is necessary to adapt some log messages
and special case some things like is already done for PBS, e.g. log
file handling.

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Adapt to API changes.

 PVE/VZDump.pm   | 62 -
 test/vzdump_new_test.pl |  3 ++
 2 files changed, 58 insertions(+), 7 deletions(-)

diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm
index f1a6b220..2b3e7b1c 100644
--- a/PVE/VZDump.pm
+++ b/PVE/VZDump.pm
@@ -206,7 +206,15 @@ sub storage_info {
 $info->{'prune-backups'} = 
PVE::JSONSchema::parse_property_string('prune-backups', 
$scfg->{'prune-backups'})
if defined($scfg->{'prune-backups'});
 
-if ($type eq 'pbs') {
+my $backup_provider = PVE::Storage::new_backup_provider(
+   $cfg,
+   $storage,
+   sub { debugmsg($_[0], $_[1]); },
+);
+
+if ($backup_provider) {
+   $info->{'backup-provider'} = $backup_provider
+} elsif ($type eq 'pbs') {
$info->{pbs} = 1;
 } else {
$info->{dumpdir} = PVE::Storage::get_backup_dir($cfg, $storage);
@@ -706,6 +714,7 @@ sub new {
$opts->{scfg} = $info->{scfg};
$opts->{pbs} = $info->{pbs};
$opts->{'prune-backups'} //= $info->{'prune-backups'};
+   $self->{'backup-provider'} = $info->{'backup-provider'} if 
$info->{'backup-provider'};
}
 } elsif ($opts->{dumpdir}) {
$add_error->("dumpdir '$opts->{dumpdir}' does not exist")
@@ -990,7 +999,7 @@ sub exec_backup_task {
}
}
 
-   if (!$self->{opts}->{pbs}) {
+   if (!$self->{opts}->{pbs} && !$self->{'backup-provider'}) {
$task->{logfile} = "$opts->{dumpdir}/$basename.log";
}
 
@@ -1000,7 +1009,11 @@ sub exec_backup_task {
$ext .= ".${comp_ext}";
}
 
-   if ($self->{opts}->{pbs}) {
+   if ($self->{'backup-provider'}) {
+   die "unable to pipe backup to stdout\n" if $opts->{stdout};
+   $task->{target} = 
$self->{'backup-provider'}->backup_get_archive_name(
+   $vmid, $vmtype, $task->{backup_time});
+   } elsif ($self->{opts}->{pbs}) {
die "unable to pipe backup to stdout\n" if $opts->{stdout};
$task->{target} = $pbs_snapshot_name;
} else {
@@ -1018,7 +1031,7 @@ sub exec_backup_task {
my $pid = $$;
if ($opts->{tmpdir}) {
$task->{tmpdir} = "$opts->{tmpdir}/vzdumptmp${pid}_$vmid/";
-   } elsif ($self->{opts}->{pbs}) {
+   } elsif ($self->{opts}->{pbs} || $self->{'backup-provider'}) {
$task->{tmpdir} = "/var/tmp/vzdumptmp${pid}_$vmid";
} else {
# dumpdir is posix? then use it as temporary dir
@@ -1090,6 +1103,10 @@ sub exec_backup_task {
if ($mode eq 'stop') {
$plugin->prepare ($task, $vmid, $mode);
 
+   if ($self->{'backup-provider'}) {
+   $self->{'backup-provider'}->backup_hook(
+   'start', $vmid, $vmtype, { 'start-time' => 
$task->{backup_time} });
+   }
$self->run_hook_script ('backup-start', $task, $logfd);
 
if ($running) {
@@ -1104,6 +1121,10 @@ sub exec_backup_task {
} elsif ($mode eq 'suspend') {
$plugin->prepare ($task, $vmid, $mode);
 
+   if ($self->{'backup-provider'}) {
+   $self->{'backup-provider'}->backup_hook(
+   'start', $vmid, $vmtype, { 'start-time' => 
$task->{backup_time} });
+   }
$self->run_hook_script ('backup-start', $task, $logfd);
 
if ($vmtype eq 'lxc') {
@@ -1130,6 +1151,10 @@ sub exec_backup_task {
}
 
} elsif ($mode eq 'snapshot') {
+   if ($self->{'backup-provider'}) {
+   $self->{'backup-provider'}->backup_hook(
+   'start', $vmid, $vmtype, { 'start-time' => 
$task->{backup_time} });
+   }
$self->run_hook_script ('backup-start', $task, $logfd);
 
my $snapshot_count = $task->{snapshot_count} || 0;
@@ -1172,11 +1197,13 @@ sub exec_backup_task {
return;
}
 
-   my $archive_txt = $self->{opts}->{pbs} ? 'Proxmox Backup Server' : 
'vzdump';
+   my $archive_txt = 'vzdump

[pve-devel] [RFC qemu-server v2 21/25] backup: implement restore for external providers

2024-08-13 Thread Fiona Ebner
First, the provider is asked about what restore mechanism to use.
Currently, only 'qemu-img' is possible. Then the configuration files
are restored, the provider gives information about volumes contained
in the backup and finally the volumes are restored via
'qemu-img convert'.

The code for the restore_external_archive() function was copied and
adapted from the restore_proxmox_backup_archive() function. Together
with restore_vma_archive() it seems sensible to extract the common
parts and use a dedicated module for restore code.

The parse_restore_archive() helper was renamed, because it's not just
parsing.

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Adapt to API changes.

 PVE/API2/Qemu.pm  |  29 +-
 PVE/QemuServer.pm | 139 ++
 2 files changed, 165 insertions(+), 3 deletions(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index 212cfc1b..8917905e 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -845,7 +845,7 @@ __PACKAGE__->register_method({
return $res;
 }});
 
-my $parse_restore_archive = sub {
+my $classify_restore_archive = sub {
 my ($storecfg, $archive) = @_;
 
 my ($archive_storeid, $archive_volname) = 
PVE::Storage::parse_volume_id($archive, 1);
@@ -859,6 +859,21 @@ my $parse_restore_archive = sub {
$res->{type} = 'pbs';
return $res;
}
+   my $log_function = sub {
+   my ($log_level, $message) = @_;
+   my $prefix = $log_level eq 'err' ? 'ERROR' : uc($log_level);
+   print "$prefix: $message\n";
+   };
+   my $backup_provider = PVE::Storage::new_backup_provider(
+   $storecfg,
+   $archive_storeid,
+   $log_function,
+   );
+   if ($backup_provider) {
+   $res->{type} = 'external';
+   $res->{'backup-provider'} = $backup_provider;
+   return $res;
+   }
 }
 my $path = PVE::Storage::abs_filesystem_path($storecfg, $archive);
 $res->{type} = 'file';
@@ -1011,7 +1026,7 @@ __PACKAGE__->register_method({
'backup',
);
 
-   $archive = $parse_restore_archive->($storecfg, $archive);
+   $archive = $classify_restore_archive->($storecfg, $archive);
}
}
 
@@ -1069,7 +1084,15 @@ __PACKAGE__->register_method({
PVE::QemuServer::check_restore_permissions($rpcenv, 
$authuser, $merged);
}
}
-   if ($archive->{type} eq 'file' || $archive->{type} eq 'pipe') {
+   if (my $backup_provider = $archive->{'backup-provider'}) {
+   PVE::QemuServer::restore_external_archive(
+   $backup_provider,
+   $archive->{volid},
+   $vmid,
+   $authuser,
+   $restore_options,
+   );
+   } elsif ($archive->{type} eq 'file' || $archive->{type} eq 
'pipe') {
die "live-restore is only compatible with backup images 
from a Proxmox Backup Server\n"
if $live_restore;
PVE::QemuServer::restore_file_archive($archive->{path} // 
'-', $vmid, $authuser, $restore_options);
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 37f56f69..6cd21b7d 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -7245,6 +7245,145 @@ sub restore_proxmox_backup_archive {
 }
 }
 
+sub restore_external_archive {
+my ($backup_provider, $archive, $vmid, $user, $options) = @_;
+
+die "live restore from backup provider is not implemented\n" if 
$options->{live};
+
+my $storecfg = PVE::Storage::config();
+
+my ($storeid, $volname) = PVE::Storage::parse_volume_id($archive);
+my $scfg = PVE::Storage::storage_config($storecfg, $storeid);
+
+my $tmpdir = "/var/tmp/vzdumptmp$$";
+rmtree $tmpdir;
+mkpath $tmpdir;
+
+my $conffile = PVE::QemuConfig->config_file($vmid);
+# disable interrupts (always do cleanups)
+local $SIG{INT} =
+   local $SIG{TERM} =
+   local $SIG{QUIT} =
+   local $SIG{HUP} = sub { print STDERR "got interrupt - ignored\n"; };
+
+# Note: $oldconf is undef if VM does not exists
+my $cfs_path = PVE::QemuConfig->cfs_config_path($vmid);
+my $oldconf = PVE::Cluster::cfs_read_file($cfs_path);
+my $new_conf_raw = '';
+
+my $rpcenv = PVE::RPCEnvironment::get();
+my $devinfo = {}; # info about drives included in backup
+my $virtdev_hash = {}; # info about allocated drives
+
+eval {
+   # enable interrupts
+   local $SIG{INT} =
+   local $SIG{TERM} =
+   local $SIG{QUI

[pve-devel] [RFC qemu/storage/qemu-server/container/manager v2 00/25] backup provider API

2024-08-13 Thread Fiona Ebner
e container.

Container mechanism 'directory':

The backup provider gives the path to a (potentially compressed) tar
archive with the full filesystem structure of the container.

See the PVE::BackupProvider::Plugin module for the full API
documentation.

==

This series adapts the backup stack in Proxmox VE to allow using the
above API. For QEMU, backup access setup and teardown QMP commands are
implemented to be able to provide access to a consistent disk state to
the backup provider.

The series also provides an example implementation for a backup
provider as a proof-of-concept, exposing the different features.

==

Open questions:

Should the backup provider plugin system also follow the same API
age+version schema with a Custom/ directory for external plugins
derived from the base plugin?

Should the bitmap action be passed directly to the backup provider?
I.e. have 'not-used', 'not-used-removed', 'new', 'used', 'invalid',
instead of only 'none', 'new' and 'reuse'. It makes API slightly more
complicated. Is there any situation where backup provider could care
if bitmap is new, because it was the first or bitmap is new because
previous was invalid? Both cases require the backup provider to do a
full backup.

==

The patches marked as PATCH rather than RFC can make sense
independently, with QEMU patches 02 and 03 having been sent already
before (touching same code, so included here):

https://lore.proxmox.com/pve-devel/20240625133551.210636-1-f.eb...@proxmox.com/#r

==

Feedback is very welcome, especially from people wishing to implement
such a backup provider plugin! Please tell me what issues you see with
the proposed API, what would and wouldn't work from your perspective?

==

Dependencies: pve-manager, pve-container and qemu-server all depend on
new libpve-storage-perl. pve-manager also build-depends on the new
libpve-storage-perl for its tests. To keep things clean, pve-manager
should also depend on new pve-container and qemu-server.

In qemu-server, there is no version guard added yet, as that depends
on the QEMU version the feature will land in.

==

qemu:

Fiona Ebner (9):
  block/reqlist: allow adding overlapping requests
  PVE backup: fixup error handling for fleecing
  PVE backup: factor out setting up snapshot access for fleecing
  PVE backup: save device name in device info structure
  PVE backup: include device name in error when setting up snapshot
access fails
  PVE backup: add target ID in backup state
  PVE backup: get device info: allow caller to specify filter for which
devices use fleecing
  PVE backup: implement backup access setup and teardown API for
external providers
  PVE backup: implement bitmap support for external backup access

 block/copy-before-write.c |   3 +-
 block/reqlist.c   |   2 -
 pve-backup.c  | 620 +-
 pve-backup.h  |  16 +
 qapi/block-core.json  |  61 
 system/runstate.c |   6 +
 6 files changed, 637 insertions(+), 71 deletions(-)
 create mode 100644 pve-backup.h


storage:

Fiona Ebner (4):
  plugin: introduce new_backup_provider() method
  extract backup config: delegate to backup provider if there is one
  add backup provider example
  WIP Borg plugin

 src/PVE/BackupProvider/Makefile   |3 +
 src/PVE/BackupProvider/Plugin/Base.pm | 1149 +
 src/PVE/BackupProvider/Plugin/Borg.pm |  373 ++
 .../BackupProvider/Plugin/DirectoryExample.pm |  694 ++
 src/PVE/BackupProvider/Plugin/Makefile|5 +
 src/PVE/Makefile  |1 +
 src/PVE/Storage.pm|   24 +-
 src/PVE/Storage/BorgBackupPlugin.pm   |  506 
 .../Custom/BackupProviderDirExamplePlugin.pm  |  306 +
 src/PVE/Storage/Custom/Makefile   |5 +
 src/PVE/Storage/Makefile  |2 +
 src/PVE/Storage/Plugin.pm |   15 +
 12 files changed, 3081 insertions(+), 2 deletions(-)
 create mode 100644 src/PVE/BackupProvider/Makefile
 create mode 100644 src/PVE/BackupProvider/Plugin/Base.pm
 create mode 100644 src/PVE/BackupProvider/Plugin/Borg.pm
 create mode 100644 src/PVE/BackupProvider/Plugin/DirectoryExample.pm
 create mode 100644 src/PVE/BackupProvider/Plugin/Makefile
 create mode 100644 src/PVE/Storage/BorgBackupPlugin.pm
 create mode 100644 src/PVE/Storage/Custom/BackupProviderDirExamplePlugin.pm
 create mode 100644 src/PVE/Storage/Custom/Makefile


qemu-server:

Fiona Ebner (8):
  move nbd_stop helper to QMPHelpers module
  backup: move cleanup of fleecing images to cleanup method
  backup: cleanup: check if VM is running before issuing QMP commands
  backup: keep track of block-node size instead of volume size
  backup: allow adding fleecing images also for EFI and TPM
  backup: implement backup for external providers
  

[pve-devel] [RFC container v2 22/25] backup: implement backup for external providers

2024-08-13 Thread Fiona Ebner
The filesystem structure is made available as a directory in a
consistent manner (with details depending on the vzdump backup mode)
just like for regular backup via tar.

The backup provider needs to back up the guest and firewall
configuration and then the filesystem structure, honoring the ID maps
(for unprivileged containers) as well as file exclusions and the
bandwidth limit.

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Adapt to API changes.

 src/PVE/VZDump/LXC.pm | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/PVE/VZDump/LXC.pm b/src/PVE/VZDump/LXC.pm
index 67d13db..0fc2a94 100644
--- a/src/PVE/VZDump/LXC.pm
+++ b/src/PVE/VZDump/LXC.pm
@@ -373,7 +373,27 @@ sub archive {
 my $userns_cmd = $task->{userns_cmd};
 my $findexcl = $self->{vzdump}->{findexcl};
 
-if ($self->{vzdump}->{opts}->{pbs}) {
+if (my $backup_provider = $self->{vzdump}->{'backup-provider'}) {
+   $self->loginfo("starting external backup via " . 
$backup_provider->provider_name());
+
+   my ($mechanism) = $backup_provider->backup_get_mechanism($vmid, 'lxc');
+   die "mechanism '$mechanism' requested by backup provider is not 
supported for containers\n"
+   if $mechanism ne 'directory';
+
+   my $config_file = "$tmpdir/etc/vzdump/pct.conf";
+   my $firewall_file = "$tmpdir/etc/vzdump/pct.fw";
+
+
+   my $conf = PVE::LXC::Config->load_config($vmid);
+   my ($id_map, undef, undef) = PVE::LXC::parse_id_maps($conf);
+   my $info = {
+   directory => $snapdir,
+   sources => [@sources],
+   };
+   $info->{'firewall-config'} = $firewall_file if -e $firewall_file;
+   $info->{'bandwidth-limit'} = $opts->{bwlimit} * 1024 if 
$opts->{bwlimit};
+   $backup_provider->backup_container($vmid, $config_file, $id_map, 
$findexcl, $info);
+} elsif ($self->{vzdump}->{opts}->{pbs}) {
 
my $param = [];
push @$param, "pct.conf:$tmpdir/etc/vzdump/pct.conf";
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu-server v2 17/25] backup: keep track of block-node size instead of volume size

2024-08-13 Thread Fiona Ebner
For fleecing, the size needs to match exactly with what QEMU sees. In
particular, EFI disks might be attached with a 'size=' option, meaning
that size can be different from the volume's size. Commit 36377acf
("backup: disk info: also keep track of size") introduced size
tracking and it was only used for fleecing since then, so replace the
existing 'size' key in the device info hash and replace it with an
explicit 'block-node-size' for clarity.

Should also help with the following issue reported in the community
forum:
https://forum.proxmox.com/threads/152202

Fixes: 36377acf ("backup: disk info: also keep track of size")
Signed-off-by: Fiona Ebner 
---

New in v2.

 PVE/VZDump/QemuServer.pm | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index c46e607c..98685127 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -106,6 +106,9 @@ sub prepare {
 
 PVE::Storage::activate_volumes($self->{storecfg}, $vollist);
 
+my $block_info = mon_cmd($vmid, "query-block");
+$block_info = { map { $_->{device} => $_ } $block_info->@* };
+
 foreach my $ds (sort keys %$drivehash) {
my $drive = $drivehash->{$ds};
 
@@ -133,11 +136,22 @@ sub prepare {
die "cannot determine size and format of volume '$volid' - $@\n" if 
$@;
}
 
+   # The size for fleecing images needs to be exactly the same size as 
QEMU sees. E.g. EFI disk
+   # can be attached with a smaller size then the underyling image on the 
storage.
+   my $block_node_size =
+   eval { 
$block_info->{"drive-$ds"}->{inserted}->{image}->{'virtual-size'}; };
+   if (!$block_node_size) {
+   # TPM state is not attached yet and will be attached with same 
size, so don't warn then.
+   $self->loginfo("could not determine block node size of drive '$ds' 
- using fallback")
+   if $ds !~ m/^tpmstate\d+/;
+   $block_node_size = $size;
+   }
+
my $diskinfo = {
path => $path,
volid => $volid,
storeid => $storeid,
-   size => $size,
+   'block-node-size' => $block_node_size,
format => $format,
virtdev => $ds,
qmdevice => "drive-$ds",
@@ -551,7 +565,7 @@ my sub allocate_fleecing_images {
my $name = "vm-$vmid-fleece-$n";
$name .= ".$format" if $scfg->{path};
 
-   my $size = PVE::Tools::convert_size($di->{size}, 'b' => 'kb');
+   my $size = PVE::Tools::convert_size($di->{'block-node-size'}, 
'b' => 'kb');
 
$di->{'fleece-volid'} = PVE::Storage::vdisk_alloc(
$self->{storecfg}, $fleecing_storeid, $vmid, $format, 
$name, $size);
@@ -600,7 +614,7 @@ my sub attach_fleecing_images {
my $drive = 
"file=$path,if=none,id=$devid,format=$format,discard=unmap";
# Specify size explicitly, to make it work if storage backend 
rounded up size for
# fleecing image when allocating.
-   $drive .= ",size=$di->{size}" if $format eq 'raw';
+   $drive .= ",size=$di->{'block-node-size'}" if $format eq 'raw';
$drive =~ s/\\//g;
my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_add auto 
\"$drive\"", 60);
die "attaching fleecing image $volid failed - $ret\n" if $ret !~ 
m/OK/s;
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu-server v2 14/25] move nbd_stop helper to QMPHelpers module

2024-08-13 Thread Fiona Ebner
Like this nbd_stop() can be called from a module that cannot include
QemuServer.pm.

Signed-off-by: Fiona Ebner 
---

No changes in v2.

 PVE/API2/Qemu.pm | 3 ++-
 PVE/CLI/qm.pm| 3 ++-
 PVE/QemuServer.pm| 6 --
 PVE/QemuServer/QMPHelpers.pm | 6 ++
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index d25a79fe..212cfc1b 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -35,6 +35,7 @@ use PVE::QemuServer::Monitor qw(mon_cmd);
 use PVE::QemuServer::Machine;
 use PVE::QemuServer::Memory qw(get_current_memory);
 use PVE::QemuServer::PCI;
+use PVE::QemuServer::QMPHelpers;
 use PVE::QemuServer::USB;
 use PVE::QemuMigrate;
 use PVE::RPCEnvironment;
@@ -5910,7 +5911,7 @@ __PACKAGE__->register_method({
return;
},
'nbdstop' => sub {
-   PVE::QemuServer::nbd_stop($state->{vmid});
+   PVE::QemuServer::QMPHelpers::nbd_stop($state->{vmid});
return;
},
'resume' => sub {
diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm
index d3dbf7b4..8349997e 100755
--- a/PVE/CLI/qm.pm
+++ b/PVE/CLI/qm.pm
@@ -35,6 +35,7 @@ use PVE::QemuServer::Agent qw(agent_available);
 use PVE::QemuServer::ImportDisk;
 use PVE::QemuServer::Monitor qw(mon_cmd);
 use PVE::QemuServer::OVF;
+use PVE::QemuServer::QMPHelpers;
 use PVE::QemuServer;
 
 use PVE::CLIHandler;
@@ -385,7 +386,7 @@ __PACKAGE__->register_method ({
 
my $vmid = $param->{vmid};
 
-   eval { PVE::QemuServer::nbd_stop($vmid) };
+   eval { PVE::QemuServer::QMPHelpers::nbd_stop($vmid) };
warn $@ if $@;
 
return;
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index b26da505..e5ff5efb 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -8548,12 +8548,6 @@ sub generate_smbios1_uuid {
 return "uuid=".generate_uuid();
 }
 
-sub nbd_stop {
-my ($vmid) = @_;
-
-mon_cmd($vmid, 'nbd-server-stop', timeout => 25);
-}
-
 sub create_reboot_request {
 my ($vmid) = @_;
 open(my $fh, '>', "/run/qemu-server/$vmid.reboot")
diff --git a/PVE/QemuServer/QMPHelpers.pm b/PVE/QemuServer/QMPHelpers.pm
index 0269ea46..826938de 100644
--- a/PVE/QemuServer/QMPHelpers.pm
+++ b/PVE/QemuServer/QMPHelpers.pm
@@ -15,6 +15,12 @@ qemu_objectadd
 qemu_objectdel
 );
 
+sub nbd_stop {
+my ($vmid) = @_;
+
+mon_cmd($vmid, 'nbd-server-stop', timeout => 25);
+}
+
 sub qemu_deviceadd {
 my ($vmid, $devicefull) = @_;
 
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu v2 04/25] PVE backup: save device name in device info structure

2024-08-13 Thread Fiona Ebner
The device name needs to be queried while holding the graph read lock
and since it doesn't change during the whole operation, just get it
once during setup and avoid the need to query it again in different
places.

Also in preparation to use it more often in error messages and for the
upcoming external backup access API.

Signed-off-by: Fiona Ebner 
---

No changes in v2.

 pve-backup.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index 051ebffe48..33c23e53c2 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -94,6 +94,7 @@ typedef struct PVEBackupDevInfo {
 size_t size;
 uint64_t block_size;
 uint8_t dev_id;
+char* device_name;
 int completed_ret; // INT_MAX if not completed
 BdrvDirtyBitmap *bitmap;
 BlockDriverState *target;
@@ -327,6 +328,8 @@ static void coroutine_fn pvebackup_co_complete_stream(void 
*opaque)
 }
 
 di->bs = NULL;
+g_free(di->device_name);
+di->device_name = NULL;
 
 assert(di->target == NULL);
 
@@ -621,9 +624,6 @@ static void create_backup_jobs_bh(void *opaque) {
 
 BlockDriverState *source_bs = di->bs;
 bool discard_source = false;
-bdrv_graph_co_rdlock();
-const char *job_id = bdrv_get_device_name(di->bs);
-bdrv_graph_co_rdunlock();
 if (di->fleecing.bs) {
 if (setup_snapshot_access(di, &local_err) < 0) {
 error_setg(errp, "setting up snapshot access for fleecing 
failed: %s",
@@ -654,7 +654,7 @@ static void create_backup_jobs_bh(void *opaque) {
 }
 
 BlockJob *job = backup_job_create(
-job_id, source_bs, di->target, backup_state.speed, sync_mode, 
di->bitmap,
+di->device_name, source_bs, di->target, backup_state.speed, 
sync_mode, di->bitmap,
 bitmap_mode, false, discard_source, NULL, &perf, 
BLOCKDEV_ON_ERROR_REPORT,
 BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, 
backup_state.txn,
 &local_err);
@@ -751,6 +751,7 @@ static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
 }
 PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
 di->bs = bs;
+di->device_name = g_strdup(bdrv_get_device_name(bs));
 
 if (fleecing && device_uses_fleecing(*d)) {
 g_autofree gchar *fleecing_devid = g_strconcat(*d, 
"-fleecing", NULL);
@@ -789,6 +790,7 @@ static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
 
 PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
 di->bs = bs;
+di->device_name = g_strdup(bdrv_get_device_name(bs));
 di_list = g_list_append(di_list, di);
 }
 }
@@ -956,9 +958,6 @@ UuidInfo coroutine_fn *qmp_backup(
 
 di->block_size = dump_cb_block_size;
 
-bdrv_graph_co_rdlock();
-const char *devname = bdrv_get_device_name(di->bs);
-bdrv_graph_co_rdunlock();
 PBSBitmapAction action = PBS_BITMAP_ACTION_NOT_USED;
 size_t dirty = di->size;
 
@@ -973,7 +972,8 @@ UuidInfo coroutine_fn *qmp_backup(
 }
 action = PBS_BITMAP_ACTION_NEW;
 } else {
-expect_only_dirty = proxmox_backup_check_incremental(pbs, 
devname, di->size) != 0;
+expect_only_dirty =
+proxmox_backup_check_incremental(pbs, di->device_name, 
di->size) != 0;
 }
 
 if (expect_only_dirty) {
@@ -997,7 +997,8 @@ UuidInfo coroutine_fn *qmp_backup(
 }
 }
 
-int dev_id = proxmox_backup_co_register_image(pbs, devname, 
di->size, expect_only_dirty, errp);
+int dev_id = proxmox_backup_co_register_image(pbs, 
di->device_name, di->size,
+  expect_only_dirty, 
errp);
 if (dev_id < 0) {
 goto err_mutex;
 }
@@ -1009,7 +1010,7 @@ UuidInfo coroutine_fn *qmp_backup(
 di->dev_id = dev_id;
 
 PBSBitmapInfo *info = g_malloc(sizeof(*info));
-info->drive = g_strdup(devname);
+info->drive = g_strdup(di->device_name);
 info->action = action;
 info->size = di->size;
 info->dirty = dirty;
@@ -1034,10 +1035,7 @@ UuidInfo coroutine_fn *qmp_backup(
 goto err_mutex;
 }
 
-bdrv_graph_co_rdlock();
-const char *devname = bdrv_get_device_name(di->bs);
-bdrv_graph_co_rdunlock();
-di->dev_id = vma_writer_register_stream(vmaw, devname, di->size);
+di->dev_id = vma_writer_register_stream(vmaw, di->device_name, 
di->size);
   

[pve-devel] [RFC storage v2 11/25] extract backup config: delegate to backup provider if there is one

2024-08-13 Thread Fiona Ebner
Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Adapt to method rename.

 src/PVE/Storage.pm | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm
index aea57ab..8993ba7 100755
--- a/src/PVE/Storage.pm
+++ b/src/PVE/Storage.pm
@@ -1726,6 +1726,16 @@ sub extract_vzdump_config {
storage_check_enabled($cfg, $storeid);
return PVE::Storage::PBSPlugin->extract_vzdump_config($scfg, 
$volname, $storeid);
}
+
+   my $plugin = PVE::Storage::Plugin->lookup($scfg->{type});
+   my $log_function = sub {
+   my ($log_level, $message) = @_;
+   my $prefix = $log_level eq 'err' ? 'ERROR' : uc($log_level);
+   print "$prefix: $message\n";
+   };
+   if (my $backup_provider = $plugin->new_backup_provider($scfg, $storeid, 
$log_function)) {
+   return $backup_provider->restore_get_guest_config($volname, 
$storeid);
+   }
 }
 
 my $archive = abs_filesystem_path($cfg, $volid);
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu v2 03/25] PVE backup: factor out setting up snapshot access for fleecing

2024-08-13 Thread Fiona Ebner
Avoids some line bloat in the create_backup_jobs_bh() function and is
in preparation for setting up the snapshot access independently of
fleecing, in particular that will be useful for providing access to
the snapshot via NBD.

Signed-off-by: Fiona Ebner 
---

No changes in v2.

 pve-backup.c | 95 
 1 file changed, 58 insertions(+), 37 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index c4178758b3..051ebffe48 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -525,6 +525,62 @@ static int coroutine_fn pvebackup_co_add_config(
 goto out;
 }
 
+/*
+ * Setup a snapshot-access block node for a device with associated fleecing 
image.
+ */
+static int setup_snapshot_access(PVEBackupDevInfo *di, Error **errp)
+{
+Error *local_err = NULL;
+
+if (!di->fleecing.bs) {
+error_setg(errp, "no associated fleecing image");
+return -1;
+}
+
+QDict *cbw_opts = qdict_new();
+qdict_put_str(cbw_opts, "driver", "copy-before-write");
+qdict_put_str(cbw_opts, "file", bdrv_get_node_name(di->bs));
+qdict_put_str(cbw_opts, "target", bdrv_get_node_name(di->fleecing.bs));
+
+if (di->bitmap) {
+/*
+ * Only guest writes to parts relevant for the backup need to be 
intercepted with
+ * old data being copied to the fleecing image.
+ */
+qdict_put_str(cbw_opts, "bitmap.node", bdrv_get_node_name(di->bs));
+qdict_put_str(cbw_opts, "bitmap.name", 
bdrv_dirty_bitmap_name(di->bitmap));
+}
+/*
+ * Fleecing storage is supposed to be fast and it's better to break backup 
than guest
+ * writes. Certain guest drivers like VirtIO-win have 60 seconds timeout 
by default, so
+ * abort a bit before that.
+ */
+qdict_put_str(cbw_opts, "on-cbw-error", "break-snapshot");
+qdict_put_int(cbw_opts, "cbw-timeout", 45);
+
+di->fleecing.cbw = bdrv_insert_node(di->bs, cbw_opts, BDRV_O_RDWR, 
&local_err);
+
+if (!di->fleecing.cbw) {
+error_setg(errp, "appending cbw node for fleecing failed: %s",
+   local_err ? error_get_pretty(local_err) : "unknown error");
+return -1;
+}
+
+QDict *snapshot_access_opts = qdict_new();
+qdict_put_str(snapshot_access_opts, "driver", "snapshot-access");
+qdict_put_str(snapshot_access_opts, "file", 
bdrv_get_node_name(di->fleecing.cbw));
+
+di->fleecing.snapshot_access =
+bdrv_open(NULL, NULL, snapshot_access_opts, BDRV_O_RDWR | 
BDRV_O_UNMAP, &local_err);
+if (!di->fleecing.snapshot_access) {
+error_setg(errp, "setting up snapshot access for fleecing failed: %s",
+   local_err ? error_get_pretty(local_err) : "unknown error");
+return -1;
+}
+
+return 0;
+}
+
 /*
  * backup_job_create can *not* be run from a coroutine, so this can't either.
  * The caller is responsible that backup_mutex is held nonetheless.
@@ -569,49 +625,14 @@ static void create_backup_jobs_bh(void *opaque) {
 const char *job_id = bdrv_get_device_name(di->bs);
 bdrv_graph_co_rdunlock();
 if (di->fleecing.bs) {
-QDict *cbw_opts = qdict_new();
-qdict_put_str(cbw_opts, "driver", "copy-before-write");
-qdict_put_str(cbw_opts, "file", bdrv_get_node_name(di->bs));
-qdict_put_str(cbw_opts, "target", 
bdrv_get_node_name(di->fleecing.bs));
-
-if (di->bitmap) {
-/*
- * Only guest writes to parts relevant for the backup need to 
be intercepted with
- * old data being copied to the fleecing image.
- */
-qdict_put_str(cbw_opts, "bitmap.node", 
bdrv_get_node_name(di->bs));
-qdict_put_str(cbw_opts, "bitmap.name", 
bdrv_dirty_bitmap_name(di->bitmap));
-}
-/*
- * Fleecing storage is supposed to be fast and it's better to 
break backup than guest
- * writes. Certain guest drivers like VirtIO-win have 60 seconds 
timeout by default, so
- * abort a bit before that.
- */
-qdict_put_str(cbw_opts, "on-cbw-error", "break-snapshot");
-qdict_put_int(cbw_opts, "cbw-timeout", 45);
-
-di->fleecing.cbw = bdrv_insert_node(di->bs, cbw_opts, BDRV_O_RDWR, 
&local_err);
-
-if (!di->fleecing.cbw) {
-error_setg(errp, "appending cbw node for fleecing failed: %s",
-   local_err ? error_get_pretty(local_err) : "unknown 
error");
-bdrv_drained_end(di->bs);
-break;
-  

[pve-devel] [PATCH qemu v2 02/25] PVE backup: fixup error handling for fleecing

2024-08-13 Thread Fiona Ebner
The drained section needs to be terminated before breaking out of the
loop in the error scenarios. Otherwise, guest IO on the drive would
become stuck.

If the job is created successfully, then the job completion callback
will clean up the snapshot access block nodes. In case failure
happened before the job is created, there was no cleanup for the
snapshot access block nodes yet. Add it.

Signed-off-by: Fiona Ebner 
---

No changes in v2.

 pve-backup.c | 38 +-
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index 4e730aa3da..c4178758b3 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -357,22 +357,23 @@ static void coroutine_fn 
pvebackup_co_complete_stream(void *opaque)
 qemu_co_mutex_unlock(&backup_state.backup_mutex);
 }
 
+static void cleanup_snapshot_access(PVEBackupDevInfo *di)
+{
+if (di->fleecing.snapshot_access) {
+bdrv_unref(di->fleecing.snapshot_access);
+di->fleecing.snapshot_access = NULL;
+}
+if (di->fleecing.cbw) {
+bdrv_cbw_drop(di->fleecing.cbw);
+di->fleecing.cbw = NULL;
+}
+}
+
 static void pvebackup_complete_cb(void *opaque, int ret)
 {
 PVEBackupDevInfo *di = opaque;
 di->completed_ret = ret;
 
-/*
- * Handle block-graph specific cleanup (for fleecing) outside of the 
coroutine, because the work
- * won't be done as a coroutine anyways:
- * - For snapshot_access, allows doing bdrv_unref() directly. Doing it via 
bdrv_co_unref() would
- *   just spawn a BH calling bdrv_unref().
- * - For cbw, draining would need to spawn a BH.
- */
-if (di->fleecing.snapshot_access) {
-bdrv_unref(di->fleecing.snapshot_access);
-di->fleecing.snapshot_access = NULL;
-}
 if (di->fleecing.cbw) {
 /*
  * With fleecing, failure for cbw does not fail the guest write, but 
only sets the snapshot
@@ -383,10 +384,17 @@ static void pvebackup_complete_cb(void *opaque, int ret)
 if (di->completed_ret == -EACCES && snapshot_error) {
 di->completed_ret = snapshot_error;
 }
-bdrv_cbw_drop(di->fleecing.cbw);
-di->fleecing.cbw = NULL;
 }
 
+/*
+ * Handle block-graph specific cleanup (for fleecing) outside of the 
coroutine, because the work
+ * won't be done as a coroutine anyways:
+ * - For snapshot_access, allows doing bdrv_unref() directly. Doing it via 
bdrv_co_unref() would
+ *   just spawn a BH calling bdrv_unref().
+ * - For cbw, draining would need to spawn a BH.
+ */
+cleanup_snapshot_access(di);
+
 /*
  * Needs to happen outside of coroutine, because it takes the graph write 
lock.
  */
@@ -587,6 +595,7 @@ static void create_backup_jobs_bh(void *opaque) {
 if (!di->fleecing.cbw) {
 error_setg(errp, "appending cbw node for fleecing failed: %s",
local_err ? error_get_pretty(local_err) : "unknown 
error");
+bdrv_drained_end(di->bs);
 break;
 }
 
@@ -599,6 +608,8 @@ static void create_backup_jobs_bh(void *opaque) {
 if (!di->fleecing.snapshot_access) {
 error_setg(errp, "setting up snapshot access for fleecing 
failed: %s",
local_err ? error_get_pretty(local_err) : "unknown 
error");
+cleanup_snapshot_access(di);
+bdrv_drained_end(di->bs);
 break;
 }
 source_bs = di->fleecing.snapshot_access;
@@ -637,6 +648,7 @@ static void create_backup_jobs_bh(void *opaque) {
 }
 
 if (!job || local_err) {
+cleanup_snapshot_access(di);
 error_setg(errp, "backup_job_create failed: %s",
local_err ? error_get_pretty(local_err) : "null");
 break;
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] issues with Virtio-SCSI devicde on Proxmox...

2024-08-13 Thread Fiona Ebner
Hi,

Am 12.08.24 um 12:40 schrieb Christian Moser:
> Hello,
> 
> I work for VSI (VMS Software Inc) which is porting the OpenVMS operating 
> system to x86. At this point we successfully on various
> hypervisors, but have some issues on the KVM running on Proxmox.
> 
> The OpenVMS VM works just fine with SATA disks and it also works with for 
> example virtio-network device etc., but trying to use
> virtio-scsi hangs  the driver. I have debugged this and I can successfully 
> configure the port/controller, send the IO request to the
> device. It then gets processed by the device, which posts the results and 
> sets the interrupt bit in the ISR register, but it never
> asserts the interrupt hence the driver never gets notified and the I/O hangs.
> 
> I have tried both “virtio-scsi-pci” and “virtio-scsi-single”, but no luck. 
> The emulated virtio-scsi device is a legacy device. But
> then again, the virtio-network device is also a legacy device and here we are 
> getting interrupts. One thing which bothers me is the
> fact that the “legacy interrupt disable” bit is set in the PCI config space 
> of the virtio-scsi device (i.e. bit 10 at offset 4)
> 
> Questions:
> * is there a way to configure a modern virtio-scsi devcie (i.e. 
> disable_legacy=on) ?

I've already answered this question when you asked in a mail addressed
directly to me:

Am 12.08.24 um 11:58 schrieb Fiona Ebner:
> Hi,
> 
> It seems that you either need to attach the "virtio-scsi-pci" device to
> a pcie bus or explicitly set the "disable_legacy=on" option for the
> device, neither of which Proxmox VE currently does or allows
> configuring. The only way right now seems to be to attach the disk
> yourself via custom arguments (the 'args' in the Proxmox VE VM
> configuration), but then the disk will be invisible to Proxmox VE
> operations which look at specific disk keys in the configuration!
> 
> Feel free to open a feature request on our bug tracker to make this
> configurable: https://bugzilla.proxmox.com/
> 
> P.S. Please write to the developer list rather than individual
> developers for such questions in the feature:
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> Best Regards,
> Fiona

> * why is the legacy interrupt bit set in the PCI config space ?

Most likely because the virtio-scsi-pci is configured without the
"disable_legacy=on" option. If not explicitly set, the option will be
"disable_legacy=auto" and when not attached to PCIe (which is the case
for Proxmox VE currently), then legacy mode will be enabled.

> * Are there any working driver for virtio-scsi on this KVM using Q35 
> machine? i.e. any other OS

The virtio drivers for Windows and the ones in Linux work just fine with
our configuration.


> Any thoughts why these interrupts are not getting delivered on the PCIE bus?

We do not configure the virtio-scsi-pci on a PCIe bus currently, see my
initial answer.

Best Regards,
Fiona


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH qemu-server] remote migration: fix online migration via API clients

2024-08-13 Thread Fiona Ebner
Am 09.08.24 um 12:17 schrieb Fiona Ebner:
> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
> index e71face4..7a7183e0 100644
> --- a/PVE/QemuMigrate.pm
> +++ b/PVE/QemuMigrate.pm
> @@ -1095,7 +1095,8 @@ sub phase2 {
>   die "only UNIX sockets are supported for remote migration\n"
>   if $tunnel_info->{proto} ne 'unix';
>  
> - my $remote_socket = $tunnel_info->{addr};
> + my ($remote_socket) = $tunnel_info->{addr} =~ 
> m!^(/run/qemu-server/.*\.migrate)$! #untaint

I sent a v2 to rule out "../" in the regex:
https://lists.proxmox.com/pipermail/pve-devel/2024-August/065075.html


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH v2 qemu-server] remote migration: fix online migration via API clients

2024-08-13 Thread Fiona Ebner
As reported in the community forum [0], when a remote migration
request comes in via an API client, the -T flag for Perl is set, so an
insecure dependency in a call like unlink() in forward_unix_socket()
will fail with:

> failed to write forwarding command - Insecure dependency in unlink while 
> running with -T switch

To fix it, untaint the problematic socket addresses coming from the
remote side. Require that all sockets are below '/run/qemu-server/'
and end with '.migrate'. This allows extensions in the future while
still being quite strict.

[0]: https://forum.proxmox.com/threads/123048/post-691958

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* rule out '../' in path

 PVE/QemuMigrate.pm | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index e71face4..d31589e7 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -1095,7 +1095,10 @@ sub phase2 {
die "only UNIX sockets are supported for remote migration\n"
if $tunnel_info->{proto} ne 'unix';
 
-   my $remote_socket = $tunnel_info->{addr};
+   # untaint
+   my ($remote_socket) =
+   $tunnel_info->{addr} =~ 
m|^(/run/qemu-server/(?:(?!\.\./).)+\.migrate)$|;
+   die "unexpected socket address '$tunnel_info->{addr}'\n" if 
!$remote_socket;
my $local_socket = $remote_socket;
$local_socket =~ s/$remote_vmid/$vmid/g;
$tunnel_info->{addr} = $local_socket;
@@ -1104,6 +1107,9 @@ sub phase2 {
PVE::Tunnel::forward_unix_socket($self->{tunnel}, $local_socket, 
$remote_socket);
 
foreach my $remote_socket (@{$tunnel_info->{unix_sockets}}) {
+   # untaint
+   ($remote_socket) = $remote_socket =~ 
m|^(/run/qemu-server/(?:(?!\.\./).)+\.migrate)$|
+   or die "unexpected socket address '$remote_socket'\n";
my $local_socket = $remote_socket;
$local_socket =~ s/$remote_vmid/$vmid/g;
next if $self->{tunnel}->{forwarded}->{$local_socket};
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu-server] remote migration: fix online migration via API clients

2024-08-09 Thread Fiona Ebner
As reported in the community forum [0], when a remote migration
request comes in via an API client, the -T flag for Perl is set, so an
insecure dependency in a call like unlink() in forward_unix_socket()
will fail with:

> failed to write forwarding command - Insecure dependency in unlink while 
> running with -T switch

To fix it, untaint the problematic socket addresses coming from the
remote side. Require that all sockets are below '/run/qemu-server/'
and end with '.migrate'. This allows extensions in the future while
still being quite strict.

[0]: https://forum.proxmox.com/threads/123048/post-691958

Signed-off-by: Fiona Ebner 
---

There is no direct call to forward_unix_socket() in pve-container and
the call in PVE::StorageTunnel::storage_migrate() has no insecure
dependency for the local socket path.

 PVE/QemuMigrate.pm | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index e71face4..7a7183e0 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -1095,7 +1095,8 @@ sub phase2 {
die "only UNIX sockets are supported for remote migration\n"
if $tunnel_info->{proto} ne 'unix';
 
-   my $remote_socket = $tunnel_info->{addr};
+   my ($remote_socket) = $tunnel_info->{addr} =~ 
m!^(/run/qemu-server/.*\.migrate)$! #untaint
+   or die "unexpected socket address '$tunnel_info->{addr}'\n";
my $local_socket = $remote_socket;
$local_socket =~ s/$remote_vmid/$vmid/g;
$tunnel_info->{addr} = $local_socket;
@@ -1104,6 +1105,8 @@ sub phase2 {
PVE::Tunnel::forward_unix_socket($self->{tunnel}, $local_socket, 
$remote_socket);
 
foreach my $remote_socket (@{$tunnel_info->{unix_sockets}}) {
+   ($remote_socket) = $remote_socket =~ 
m!^(/run/qemu-server/.*\.migrate)$! # untaint
+   or die "unexpected socket address '$remote_socket'\n";
my $local_socket = $remote_socket;
$local_socket =~ s/$remote_vmid/$vmid/g;
next if $self->{tunnel}->{forwarded}->{$local_socket};
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH manager] ui: backup job detail: fix wrong gettext

2024-08-07 Thread Fiona Ebner
Am 06.08.24 um 15:09 schrieb Dominik Csapak:
> 'suspend' mode should not be shown as `gettext('Snapshot')`
> 
> reported in the forum:
> https://forum.proxmox.com/threads/152365/
> 
> Signed-off-by: Dominik Csapak 

applied, thanks!


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH qemu] actually bump submodule to v9.0.2

2024-08-07 Thread Fiona Ebner
Am 07.08.24 um 09:45 schrieb Fiona Ebner:
> Fixes: cf40e92 ("update submodule and patches to QEMU 9.0.2")
> Signed-off-by: Fiona Ebner 

applied now with an off-list ACK from Fabian


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu] actually bump submodule to v9.0.2

2024-08-07 Thread Fiona Ebner
Fixes: cf40e92 ("update submodule and patches to QEMU 9.0.2")
Signed-off-by: Fiona Ebner 
---
 qemu | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu b/qemu
index c25df57..5ebde3b 16
--- a/qemu
+++ b/qemu
@@ -1 +1 @@
-Subproject commit c25df57ae8f9fe1c72eee2dab37d76d904ac382e
+Subproject commit 5ebde3b5c00e15f560f73055fac4ab31c0cac6d2
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC qemu-server] backup: always die early when volume size or format cannot be determined

2024-08-05 Thread Fiona Ebner
There are cases where volume_size_info() will return undef, and not
set $@. In particular, the default implementation will do so when stat
on the file fails or the output of 'qemu-img info' cannot be parsed as
JSON.

While the size is only strictly needed for fleecing, the
volume_size_info() call serves as an early sanity check otherwise.

This can break backup without fleecing in certain scenarios. Using
definedness checks would slightly reduce potential for breakage. To
minimize that potential, doing the check only for fleecing would be
the way to go. However, having a stricter check seems desirable for
future-proofing to abort early when something is amiss.

Signed-off-by: Fiona Ebner 
---
 PVE/VZDump/QemuServer.pm | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 012c9210..9291a232 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -122,7 +122,11 @@ sub prepare {
if ($storeid) {
# The call in list context can be expensive for certain plugins 
like RBD, just get size
$size = eval { PVE::Storage::volume_size_info($self->{storecfg}, 
$volid, 5) };
-   die "cannot determine size of volume '$volid' - $@\n" if $@;
+   if ($@ || !$size) {
+   my $err = "cannot determine size of volume '$volid'";
+   $err .= " - $@" if $@;
+   die "$err\n";
+   }
 
my $scfg = PVE::Storage::storage_config($self->{storecfg}, 
$storeid);
$format = PVE::QemuServer::qemu_img_format($scfg, $volname);
@@ -130,7 +134,11 @@ sub prepare {
($size, $format) = eval {
PVE::Storage::volume_size_info($self->{storecfg}, $volid, 5);
};
-   die "cannot determine size and format of volume '$volid' - $@\n" if 
$@;
+   if ($@ || !$size || !$format) {
+   my $err = "cannot determine size and format of volume '$volid'";
+   $err .= " - $@" if $@;
+   die "$err\n";
+   }
}
 
my $diskinfo = {
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH kernel 2/2] cherry-pick fix for bnxt_re driver

2024-08-05 Thread Fiona Ebner
Am 05.08.24 um 12:31 schrieb Fiona Ebner:
> Reported in the community forum:
> https://forum.proxmox.com/threads/144557/post-689148
> 
> Signed-off-by: Fiona Ebner 
> ---
> 
> Cherry-picked from the Ubuntu-6.8.0-43.43 tag, so not required if
> updating to that.
> 

Saw it only after already sending the patches, but in bug #5103 a user
reports that reverting, AFAICT commit 565736048bd5 ("ixgbe: Manual AN-37
for troublesome link partners for X550 SFI"), fixes that issue. That is
also part of the Ubuntu-6.8.0-43.43 tag as eb5551d4a4b7 ("Revert "ixgbe:
Manual AN-37 for troublesome link partners for X550 SFI"") and would
also be worth picking up.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH kernel 2/2] cherry-pick fix for bnxt_re driver

2024-08-05 Thread Fiona Ebner
Reported in the community forum:
https://forum.proxmox.com/threads/144557/post-689148

Signed-off-by: Fiona Ebner 
---

Cherry-picked from the Ubuntu-6.8.0-43.43 tag, so not required if
updating to that.

 ...ift-undefined-behavior-in-bnxt_qplib.patch | 124 ++
 1 file changed, 124 insertions(+)
 create mode 100644 
patches/kernel/0024-bnxt_re-avoid-shift-undefined-behavior-in-bnxt_qplib.patch

diff --git 
a/patches/kernel/0024-bnxt_re-avoid-shift-undefined-behavior-in-bnxt_qplib.patch
 
b/patches/kernel/0024-bnxt_re-avoid-shift-undefined-behavior-in-bnxt_qplib.patch
new file mode 100644
index 000..75bc01a
--- /dev/null
+++ 
b/patches/kernel/0024-bnxt_re-avoid-shift-undefined-behavior-in-bnxt_qplib.patch
@@ -0,0 +1,124 @@
+From  Mon Sep 17 00:00:00 2001
+From: Michal Schmidt 
+Date: Tue, 7 May 2024 12:39:28 +0200
+Subject: [PATCH] bnxt_re: avoid shift undefined behavior in
+ bnxt_qplib_alloc_init_hwq
+
+BugLink: https://bugs.launchpad.net/bugs/2071621
+
+[ Upstream commit 78cfd17142ef70599d6409cbd709d94b3da58659 ]
+
+Undefined behavior is triggered when bnxt_qplib_alloc_init_hwq is called
+with hwq_attr->aux_depth != 0 and hwq_attr->aux_stride == 0.
+In that case, "roundup_pow_of_two(hwq_attr->aux_stride)" gets called.
+roundup_pow_of_two is documented as undefined for 0.
+
+Fix it in the one caller that had this combination.
+
+The undefined behavior was detected by UBSAN:
+  UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
+  shift exponent 64 is too large for 64-bit type 'long unsigned int'
+  CPU: 24 PID: 1075 Comm: (udev-worker) Not tainted 6.9.0-rc6+ #4
+  Hardware name: Abacus electric, s.r.o. - ser...@abacus.cz Super 
Server/H12SSW-iN, BIOS 2.7 10/25/2023
+  Call Trace:
+   
+   dump_stack_lvl+0x5d/0x80
+   ubsan_epilogue+0x5/0x30
+   __ubsan_handle_shift_out_of_bounds.cold+0x61/0xec
+   __roundup_pow_of_two+0x25/0x35 [bnxt_re]
+   bnxt_qplib_alloc_init_hwq+0xa1/0x470 [bnxt_re]
+   bnxt_qplib_create_qp+0x19e/0x840 [bnxt_re]
+   bnxt_re_create_qp+0x9b1/0xcd0 [bnxt_re]
+   ? srso_alias_return_thunk+0x5/0xfbef5
+   ? srso_alias_return_thunk+0x5/0xfbef5
+   ? __kmalloc+0x1b6/0x4f0
+   ? create_qp.part.0+0x128/0x1c0 [ib_core]
+   ? __pfx_bnxt_re_create_qp+0x10/0x10 [bnxt_re]
+   create_qp.part.0+0x128/0x1c0 [ib_core]
+   ib_create_qp_kernel+0x50/0xd0 [ib_core]
+   create_mad_qp+0x8e/0xe0 [ib_core]
+   ? __pfx_qp_event_handler+0x10/0x10 [ib_core]
+   ib_mad_init_device+0x2be/0x680 [ib_core]
+   add_client_context+0x10d/0x1a0 [ib_core]
+   enable_device_and_get+0xe0/0x1d0 [ib_core]
+   ib_register_device+0x53c/0x630 [ib_core]
+   ? srso_alias_return_thunk+0x5/0xfbef5
+   bnxt_re_probe+0xbd8/0xe50 [bnxt_re]
+   ? __pfx_bnxt_re_probe+0x10/0x10 [bnxt_re]
+   auxiliary_bus_probe+0x49/0x80
+   ? driver_sysfs_add+0x57/0xc0
+   really_probe+0xde/0x340
+   ? pm_runtime_barrier+0x54/0x90
+   ? __pfx___driver_attach+0x10/0x10
+   __driver_probe_device+0x78/0x110
+   driver_probe_device+0x1f/0xa0
+   __driver_attach+0xba/0x1c0
+   bus_for_each_dev+0x8f/0xe0
+   bus_add_driver+0x146/0x220
+   driver_register+0x72/0xd0
+   __auxiliary_driver_register+0x6e/0xd0
+   ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
+   bnxt_re_mod_init+0x3e/0xff0 [bnxt_re]
+   ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
+   do_one_initcall+0x5b/0x310
+   do_init_module+0x90/0x250
+   init_module_from_file+0x86/0xc0
+   idempotent_init_module+0x121/0x2b0
+   __x64_sys_finit_module+0x5e/0xb0
+   do_syscall_64+0x82/0x160
+   ? srso_alias_return_thunk+0x5/0xfbef5
+   ? syscall_exit_to_user_mode_prepare+0x149/0x170
+   ? srso_alias_return_thunk+0x5/0xfbef5
+   ? syscall_exit_to_user_mode+0x75/0x230
+   ? srso_alias_return_thunk+0x5/0xfbef5
+   ? do_syscall_64+0x8e/0x160
+   ? srso_alias_return_thunk+0x5/0xfbef5
+   ? __count_memcg_events+0x69/0x100
+   ? srso_alias_return_thunk+0x5/0xfbef5
+   ? count_memcg_events.constprop.0+0x1a/0x30
+   ? srso_alias_return_thunk+0x5/0xfbef5
+   ? handle_mm_fault+0x1f0/0x300
+   ? srso_alias_return_thunk+0x5/0xfbef5
+   ? do_user_addr_fault+0x34e/0x640
+   ? srso_alias_return_thunk+0x5/0xfbef5
+   ? srso_alias_return_thunk+0x5/0xfbef5
+   entry_SYSCALL_64_after_hwframe+0x76/0x7e
+  RIP: 0033:0x7f4e5132821d
+  Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 
01 c3 48 8b 0d e3 db 0c 00 f7 d8 64 89 01 48
+  RSP: 002b:7ffca9c906a8 EFLAGS: 0246 ORIG_RAX: 0139
+  RAX: ffda RBX: 563ec8a8f130 RCX: 7f4e5132821d
+  RDX:  RSI: 7f4e518fa07d RDI: 003b
+  RBP: 7ffca9c90760 R08: 7f4e513f6b20 R09: 7ffca9c906f0
+  R10: 563ec8a8faa0 R11: 0246 R12: 7f4e518fa07d
+  R13: 0002 R14: 563ec8409e90 R15: 563ec8a8fa60
+   
+  ---[ end trace ]---
+
+Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware 

[pve-devel] [PATCH kernel 1/2] cherry-pick fix for NULL pointer dereference in apparmorfs

2024-08-05 Thread Fiona Ebner
Reported in the community forum:
https://forum.proxmox.com/threads/145760/post-690328

Signed-off-by: Fiona Ebner 
---
 ...ix-possible-NULL-pointer-dereference.patch | 101 ++
 1 file changed, 101 insertions(+)
 create mode 100644 
patches/kernel/0023-apparmor-fix-possible-NULL-pointer-dereference.patch

diff --git 
a/patches/kernel/0023-apparmor-fix-possible-NULL-pointer-dereference.patch 
b/patches/kernel/0023-apparmor-fix-possible-NULL-pointer-dereference.patch
new file mode 100644
index 000..36d4297
--- /dev/null
+++ b/patches/kernel/0023-apparmor-fix-possible-NULL-pointer-dereference.patch
@@ -0,0 +1,101 @@
+From  Mon Sep 17 00:00:00 2001
+From: Leesoo Ahn 
+Date: Wed, 8 May 2024 01:12:29 +0900
+Subject: [PATCH] apparmor: fix possible NULL pointer dereference
+
+profile->parent->dents[AAFS_PROF_DIR] could be NULL only if its parent is made
+from __create_missing_ancestors(..) and 'ent->old' is NULL in
+aa_replace_profiles(..).
+In that case, it must return an error code and the code, -ENOENT represents
+its state that the path of its parent is not existed yet.
+
+BUG: kernel NULL pointer dereference, address: 0030
+PGD 0 P4D 0
+PREEMPT SMP PTI
+CPU: 4 PID: 3362 Comm: apparmor_parser Not tainted 6.8.0-24-generic #24
+Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
+RIP: 0010:aafs_create.constprop.0+0x7f/0x130
+Code: 4c 63 e0 48 83 c4 18 4c 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d 31 d2 31 c9 
31 f6 31 ff 45 31 c0 45 31 c9 45 31 d2 c3 cc cc cc cc <4d> 8b 55 30 4d 8d ba a0 
00 00 00 4c 89 55 c0 4c 89 ff e8 7a 6a ae
+RSP: 0018:c9000b2c7c98 EFLAGS: 00010246
+RAX:  RBX: 41ed RCX: 
+RDX:  RSI:  RDI: 
+RBP: c9000b2c7cd8 R08:  R09: 
+R10:  R11:  R12: 82baac10
+R13:  R14:  R15: 
+FS:  7be9f22cf740() GS:88817bc0() knlGS:
+CS:  0010 DS:  ES:  CR0: 80050033
+CR2: 0030 CR3: 000134b08000 CR4: 06f0
+Call Trace:
+ 
+ ? show_regs+0x6d/0x80
+ ? __die+0x24/0x80
+ ? page_fault_oops+0x99/0x1b0
+ ? kernelmode_fixup_or_oops+0xb2/0x140
+ ? __bad_area_nosemaphore+0x1a5/0x2c0
+ ? find_vma+0x34/0x60
+ ? bad_area_nosemaphore+0x16/0x30
+ ? do_user_addr_fault+0x2a2/0x6b0
+ ? exc_page_fault+0x83/0x1b0
+ ? asm_exc_page_fault+0x27/0x30
+ ? aafs_create.constprop.0+0x7f/0x130
+ ? aafs_create.constprop.0+0x51/0x130
+ __aafs_profile_mkdir+0x3d6/0x480
+ aa_replace_profiles+0x83f/0x1270
+ policy_update+0xe3/0x180
+ profile_load+0xbc/0x150
+ ? rw_verify_area+0x47/0x140
+ vfs_write+0x100/0x480
+ ? __x64_sys_openat+0x55/0xa0
+ ? syscall_exit_to_user_mode+0x86/0x260
+ ksys_write+0x73/0x100
+ __x64_sys_write+0x19/0x30
+ x64_sys_call+0x7e/0x25c0
+ do_syscall_64+0x7f/0x180
+ entry_SYSCALL_64_after_hwframe+0x78/0x80
+RIP: 0033:0x7be9f211c574
+Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 
1e fa 80 3d d5 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 
c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
+RSP: 002b:7ffd26f2b8c8 EFLAGS: 0202 ORIG_RAX: 0001
+RAX: ffda RBX: 5d504415e200 RCX: 7be9f211c574
+RDX: 1fc1 RSI: 5d504418bc80 RDI: 0004
+RBP: 1fc1 R08: 1fc1 R09: 8000
+R10:  R11: 0202 R12: 5d504418bc80
+R13: 0004 R14: 7ffd26f2b9b0 R15: 7ffd26f2ba30
+ 
+Modules linked in: snd_seq_dummy snd_hrtimer qrtr snd_hda_codec_generic 
snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core 
snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq 
snd_seq_device i2c_i801 snd_timer i2c_smbus qxl snd soundcore drm_ttm_helper 
lpc_ich ttm joydev input_leds serio_raw mac_hid binfmt_misc msr parport_pc 
ppdev lp parport efi_pstore nfnetlink dmi_sysfs qemu_fw_cfg ip_tables x_tables 
autofs4 hid_generic usbhid hid ahci libahci psmouse virtio_rng xhci_pci 
xhci_pci_renesas
+CR2: 0030
+---[ end trace  ]---
+RIP: 0010:aafs_create.constprop.0+0x7f/0x130
+Code: 4c 63 e0 48 83 c4 18 4c 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d 31 d2 31 c9 
31 f6 31 ff 45 31 c0 45 31 c9 45 31 d2 c3 cc cc cc cc <4d> 8b 55 30 4d 8d ba a0 
00 00 00 4c 89 55 c0 4c 89 ff e8 7a 6a ae
+RSP: 0018:c9000b2c7c98 EFLAGS: 00010246
+RAX:  RBX: 41ed RCX: 
+RDX:  RSI:  RDI: 
+RBP: c9000b2c7cd8 R08:  R09: 
+R10:  R11:  R12: 82baac10
+R13:  R14:  R15: 
+FS:  7be9f22cf740() GS:88817bc0() knlGS:

[pve-devel] [PATCH common] inotify: avoid cyclic use statement

2024-08-02 Thread Fiona Ebner
Commit e68ebda ("fix #545: interfaces: allow arbitrary bridge names in
network config") introduced a cyclic usage between
PVE::RESTEnvironment and PVE::INotify, making code like the following
fail:

> perl -e "use PVE::RESTEnvironment qw(log_warn);"

Note, including the PVE::INotify module first would still work, i.e.:

> perl -e "use PVE::INotify; use PVE::RESTEnvironment qw(log_warn);"

The rest of the PVE::INotify module alredy uses syslog(), which could
be used here as well to get rid of the cyclic usage. Wolfgang argued
that the whole point of commit e68ebda was to remove coupling between
the name and the type of the interface. If there still is some code
about a name starting with 'vmbr' being classified wrong, that should
rather be fixed. Because of the very commit, the frontend already
doesn't show e.g. a non-bridge with name 'vmbr7' in bridge selectors.

Suggested-by: Wolfgang Bumiller 
Fixes: e68ebda ("fix #545: interfaces: allow arbitrary bridge names in network 
config")
Signed-off-by: Fiona Ebner 
---

@Stefan: Do you have any good rationale to rather keep the check?

 src/PVE/INotify.pm | 4 
 1 file changed, 4 deletions(-)

diff --git a/src/PVE/INotify.pm b/src/PVE/INotify.pm
index 8a4a810..e7c55f6 100644
--- a/src/PVE/INotify.pm
+++ b/src/PVE/INotify.pm
@@ -22,7 +22,6 @@ use PVE::Network;
 use PVE::ProcFSTools;
 use PVE::SafeSyslog;
 use PVE::Tools;
-use PVE::RESTEnvironment qw(log_warn);
 
 use base 'Exporter';
 
@@ -1145,9 +1144,6 @@ sub __read_etc_network_interfaces {
}
}
 
-   log_warn("detected a interface $iface that is not a bridge!")
-   if !($d->{type} eq 'OVSBridge' || $d->{type} eq 'bridge') && $iface 
=~ m/^vmbr\d+$/;
-
# map address and netmask to cidr
if (my $addr = $d->{address}) {
if (_address_is_cidr($addr)) {
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH manager] ui: dc summary: fix calculation of shared storage size

2024-07-31 Thread Fiona Ebner
Am 30.07.24 um 15:44 schrieb Igor Thaller via pve-devel:
> The issue is related to the 'Summary' tab under 'Datacenter' inside a
> cluster. To get a steady reading of the storage size data, the frontend
> requests the '/api2/json/cluster/resources' every three seconds to
> retrieve the necessary data to calculate the used and total storage
> size.
> 
> The problem occurs when a shared storage is defined and a node goes
> offline. As the node is not online, it cannot report the shared storage
> size (both used and total) back to the other nodes. The order of the
> JSON response is not always the same, so it is possible that the offline
> node will appear first. Consequently, the front-end will display the
> wrong total and used storage. This is because the shared storage data
> has both the maximum disk size and the used disk set to zero when the
> node is offline. This causes the total and used space data to be
> calculated and displayed incorrectly, leading to fluctuations in the
> displayed percentage of used disk space.
> 
> To fix this, a conditional check to skip the storage report if its
> status is 'unknown' and it is shared was added. This prevents the
> unreliable data from being processed.
> 

Nit: "and it is shared was added" could be improved language-wise

While the issue is seen with shared storages, would there be a downside
of skipping non-shared storages with status "unknown" too? From a quick
look, it seems that disk/maxdisk not being set does not depend on the
storage being shared, so I'd rather skip the addition with unset values too.

> To test these changes, adjust the 'max_requests' variable in the Perl
> script located at '/usr/share/perl5/PVE/Service/pveproxy.pm' to increase
> the likelihood of the error to occur. This makes the storage size
> fluctuations more frequent. Then compare the storage results (both used
> and total sizes) before and after implementing the fix.
> 
> Note: Be aware that it takes around one minute for the spike to happen.
> 

IMHO this part should not go into the commit message, but be mentioned
below the "---"

> Reported-by: Friedrich Weber 
> Signed-off-by: Igor Thaller 
> ---
>  www/manager6/dc/Summary.js | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/www/manager6/dc/Summary.js b/www/manager6/dc/Summary.js
> index efb44dae..9b8f3280 100644
> --- a/www/manager6/dc/Summary.js
> +++ b/www/manager6/dc/Summary.js
> @@ -170,6 +170,11 @@ Ext.define('PVE.dc.Summary', {
>   } else if (countedStorage[sid]) {
>   break;
>   }
> +
> + if (data.status === "unknown" && data.shared) {
> + break;
> + }
> +
>   used += data.disk;
>   total += data.maxdisk;
>   countedStorage[sid] = true;
> -- 
> 2.39.2
> 
> 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH storage] api: upload: correctly test for result of unlink

2024-07-30 Thread Fiona Ebner
Am 29.07.24 um 18:50 schrieb Thomas Lamprecht:
> Am 29/07/2024 um 16:29 schrieb Fiona Ebner:
>> It's not enough to check whether $! is set. From "perldoc perlvar":
>>
>>> Many system or library calls set "errno" if they fail, to
>>> indicate the cause of failure. They usually do not set "errno"
>>> to zero if they succeed and may set "errno" to a non-zero value
>>> on success. This means "errno", hence $!, is meaningful only
>>> *immediately* after a failure:
>>
>> To protect against potential issues, check the return value of unlink
>> and only check $! if it failed.
> 
> fine by me, but out of interest: any reason for why this is done now,
> i.e., did this really cause trouble or is this done out of precaution?

Sorry, while the commit message hints at it with "protect against
potential issues", I could've been more explicit. I stumbled upon an old
branch with this patch. Most likely I wrote it after needing the same
logic with unlink for something else. I don't know about any concrete
issues.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH storage] api: upload: correctly test for result of unlink

2024-07-29 Thread Fiona Ebner
It's not enough to check whether $! is set. From "perldoc perlvar":

> Many system or library calls set "errno" if they fail, to
> indicate the cause of failure. They usually do not set "errno"
> to zero if they succeed and may set "errno" to a non-zero value
> on success. This means "errno", hence $!, is meaningful only
> *immediately* after a failure:

To protect against potential issues, check the return value of unlink
and only check $! if it failed.

Signed-off-by: Fiona Ebner 
---
 src/PVE/API2/Storage/Status.pm | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/PVE/API2/Storage/Status.pm b/src/PVE/API2/Storage/Status.pm
index dc6cc69..f86e5d3 100644
--- a/src/PVE/API2/Storage/Status.pm
+++ b/src/PVE/API2/Storage/Status.pm
@@ -461,7 +461,7 @@ __PACKAGE__->register_method ({
# best effort to match apl_download behaviour
chmod 0644, $tmpfilename;
 
-   my $err_cleanup = sub { unlink $dest; die "cleanup failed: $!\n" if $! 
&& $! != ENOENT };
+   my $err_cleanup = sub { unlink $dest or $! == ENOENT or die "cleanup 
failed: $!\n" };
 
my $cmd;
if ($node ne 'localhost' && $node ne PVE::INotify::nodename()) {
@@ -513,9 +513,8 @@ __PACKAGE__->register_method ({
};
if (my $err = $@) {
# unlinks only the temporary file from the http server
-   unlink $tmpfilename;
-   warn "unable to clean up temporory file '$tmpfilename' - $!\n"
-   if $! && $! != ENOENT;
+   unlink $tmpfilename or $! == ENOENT
+   or warn "unable to clean up temporory file '$tmpfilename' - 
$!\n";
die $err;
}
 
@@ -526,8 +525,9 @@ __PACKAGE__->register_method ({
 
eval { run_command($cmd, errmsg => 'import failed'); };
 
-   unlink $tmpfilename; # the temporary file got only uploaded 
locally, no need to rm remote
-   warn "unable to clean up temporary file '$tmpfilename' - $!\n" if 
$! && $! != ENOENT;
+   # the temporary file got only uploaded locally, no need to rm remote
+   unlink $tmpfilename or $! == ENOENT
+   or warn "unable to clean up temporary file '$tmpfilename' - 
$!\n";
 
if (my $err = $@) {
eval { $err_cleanup->() };
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [RFC qemu/storage/qemu-server/container/manager 00/23] backup provider API

2024-07-29 Thread Fiona Ebner
Hi,

Am 26.07.24 um 21:47 schrieb Jonathan Nicklin via pve-devel:
> 
> Hi Fiona,
> 
> Would adding support for offloading incremental difference detection
> to the underlying storage be feasible with the API updates? The QEMU
> bitmap strategy works for all storage devices but is far from
> optimal. If backup coordinated a storage snapshot, the underlying
> storage could enumerate the differences (or generate a bitmap).
> 
> This would allow PBS to connect directly to storage and retrieve
> incremental differences, which could remove the PVE hosts from the
> equation. This "storage-direct" approach for backup would improve
> performance, reduce resources, and support incremental backups in all
> cases (i.e., power failues, shutdowns, etc.). It would also eliminate
> the dependency on QEMU bitmaps and the overhead of fleecing.
> 
> Theoretically, this should be possible with any shared storage that
> can enumerate incremental differences between snapshots: Ceph,
> Blockbridge, iSCSi/ZFS?
> 
> Thoughts?
> 

The two big advantages of the current mechanism are:

1. it's completely storage-agnostic, so you can even use it with raw
files on a directory storage. It follows in the same spirit as existing
backup. Prohibiting backup for users when they use certain kinds of
storages for VMs is not nice.
2. it's been battle-tested with PBS and works nicely.

I don't see why your suggestion can't be implemented in principle.
Feature requests for (non-incremental) "storage-snapshot" mode backup
have been around since a while. It was not a priority for development
yet and is totally different from the current "snapshot" backup mode, so
will need to be developed from the ground up.

That said, AFAICS, it's orthogonal to the series here. When an
implementation like you outlined exists, it can just be added as a new
backup mechanism for external providers (and PBS).

See also the related discussion over at:
https://bugzilla.proxmox.com/show_bug.cgi?id=3233#c19

Best Regards,
Fiona


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [RFC storage 10/23] plugin: introduce new_backup_provider() method

2024-07-26 Thread Fiona Ebner
Am 26.07.24 um 14:02 schrieb Max Carrara:
> On Fri Jul 26, 2024 at 11:52 AM CEST, Fiona Ebner wrote:
>> Am 25.07.24 um 17:32 schrieb Max Carrara:
>>> On Thu Jul 25, 2024 at 3:11 PM CEST, Fiona Ebner wrote:
>>
>> I don't quite get your point about not needing to update the call sites.
>> If you change the structure of the passed-in hash you still need to do that.
> 
> Pardon me, I was a bit imprecise there. You're completely right that the
> passed hash has to be changed as well, of course.
> 
> What I meant in particular was that because the signature (most likely)
> won't change, we won't have to take special care to update the
> individual arguments that are passed to a method when e.g. a parameter
> is deprecated (or removed) or added.
> 
> For example, optional arguments are often just left out (because that's
> possible in Perl), so if we introduced another parameter ...
> 
> * after the optional one, we'd have to pass something like `0` or
>   `undef` for the optional one first before passing the new one:
> 
> # before
> $plugin->foo($first, $second);
> 
> # after
> $plugin->foo($first, $second, undef, $new); 
> 
> * before the optional one, we'd have to make sure the order of arguments
>   is correct:
> 
> # before
> $plugin->foo($first, $second, 1);
> 
> # after
> $plugin->foo($first, $second, $new, 1);
> 
> If we were to use a hash representing the parameters instead, the cases
> would look like this respectively:
> 
> $plugin->foo({ first => $first, second => $second, new => $new });
> 
> $plugin->foo({ first => $first, second => $second, optional => 1, new = 
> $new });
> 
> More examples of that pattern would be `PVE::Tools::run_command` and
> `PVE::RADOS::mon_command`.
> 
> These changes can (and IMO should) still be guarded by an API age +
> version mechanism, it's just that the *surrounding maintenance work*
> becomes easier IMO, even if the initial cost for us and for implementors
> is a bit higher.
> 
> Of course, this is all a suggestion and I really don't want to seem like
> I'm telling you what to do here! If you decide not to have a parameter
> hash etc. then that's also completely fine by me, of course. :)
> 

Okay, I see what you mean. I agree that having a hash for optional
arguments is cleaner API-wise, but I do think fixed arguments (e.g. vmid
and drive ID for VM backup will always be present) should go directly
into the signature, not into a hash. I'll probably go with something
like what I wrote before about having mechanism-agnostic signatures, one
for VMs and one for containers.

> 
> Also, thanks for going through this discussion here with me - even if
> you choose to not incorporate my suggestions, I'm glad we have these
> little exchanges, because they periodically update my perspective(s) on
> Perl code as well :P
> 

No worries, your feedback is very welcome :)


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [RFC storage 10/23] plugin: introduce new_backup_provider() method

2024-07-26 Thread Fiona Ebner
Am 25.07.24 um 17:32 schrieb Max Carrara:
> On Thu Jul 25, 2024 at 3:11 PM CEST, Fiona Ebner wrote:
>> Am 25.07.24 um 11:48 schrieb Max Carrara:
>>> The same goes for backup provider plugins - IMO namespacing them
>>> like e.g. `PVE::Backup::Provider::Plugin::Foo` where `Foo` is a
>>> (concrete) plugin.
>>>
>>
>> The BackupProvider namespace is already intended for the plugins, adding
>> an extra level with "Plugin" would just bloat the module names,
>> especially if we decide to go the same route as for storage plugins and
>> have a "Custom" sub-namespace.
> 
> I understand what you mean, yeah. Would perhaps something like
> `PVE::BackupProvider::Plugin::*` be better?
> 
> The reason why I'm suggesting this is because in `PVE::Storage::*`,
> every plugin lives alongside `Plugin.pm`, even though the extra
> directory wouldn't really hurt IMO. E.g. `PVE::Storage::DirPlugin` would
> then be `PVE::Storage::Plugin::Dir`.
> 

I think it's fine to live alongside the base plugin (I'd prefer
Plugin::Base if going for a dedicated directory). I agree, if we ever
want to add something other than plugins to the top namespace, it is
much nicer to have the dedicated directory. And it is made more explicit
that things in there are plugins (and not having to name each one
FooPlugin). However, I still feel like
PVE::BackupProvider::Plugin::Custom::Bar is rather lengthy (should we go
with the Custom directory again), but I'm not really opposed to doing it
like this.

>>
>>> The above two methods - `backup_nbd` and `backup_directory` - is there
>>> perhaps a way to merge them? I'm not sure if what I'm having in mind
>>> here is actually feasible, but what I mean is "making the method
>>> agnostic to the type of backup". As in, perhaps pass a hash that
>>> contains a `type` key for the type of backup being made, and instead of
>>> having long method signatures, include the remaining parameters as the
>>> remaining keys. For example:
>>>
>>> {
>>> 'type' => 'lxc-dir',  # type names are just examples here
>>> 'directory' => '/foo/bar/baz',
>>> 'bandwidth_limit' => 42,
>>> ...
>>> }
>>>
>>> {
>>> 'type' => 'vm-nbd',
>>> 'device_name' => '...',
>>> 'nbd_path' => '...',
>>> ...
>>> }
>>>
>>> You get the point :P
>>>
>>> IMO it would make it easier to extend later, and also make it more
>>> straightforward to introduce new parameters / deprecate old ones, while
>>> the method signature stays stable otherwise.
>>>
>>> The same goes for the different cleanup methods further down below;
>>> instead of having a separate method for each "type of cleanup being
>>> performed", let the implementor handle it according to the data the
>>> method receives.
>>>
>>> IMHO I think it's best to be completely agnostic over VM / LXC backups
>>> (and their specific types) wherever possible and let the data describe
>>> what's going on instead.
>>>
>>
>> The point about extensibility is a good one. The API wouldn't need to
>> change even if we implement new mechanisms. But thinking about it some
>> more, is there anything really gained? Because we will not force plugins
>> to implement the methods for new mechanisms of course, they can just
>> continue supporting what they support. Each mechanism will have its own
>> specific set of parameters, so throwing everything into a catch-all
>> method and hash might make it too generic.
> 
> The main point is indeed extensibility, but it also makes maintaining
> the API a bit easier. Should we (in the future) decide to add or remove
> any parameters, we don't need to touch the signature - and in turn, we
> don't need to tediously `grep` for every call site to ensure that
> they're updated accordingly.
> 
> With hashes one could instead always just check if the required
> arguments have been provided.
> 

I'm all for going with a similar API age + version mechanism like for
the storage plugins, so removing parameters should not be done except
for major releases and adding will be backwards-compatible.

I don't quite get your point about not needing to update the call sites.
If you change the structure of the passed-in hash you still need to do that.

I do see some benefit in not needing to add new methods for every

[pve-devel] applied: [PATCH manager v2] ui: resource mappings: fix editing of mapping for non first node

2024-07-26 Thread Fiona Ebner
Am 26.07.24 um 09:40 schrieb Dominik Csapak:
> when editing the pci mapping, we set the nodename of the pci/usbselector
> to the selected node. At the same time we disable and hide the node
> selector, but it still changes it's value to the 'first' node
> (alphabetically sorted) and that triggers a change event.
> 
> To prevent that we accidentally set the node of the pci/usbselector
> too, we need to check here if the field is disabled.
> 
> There seems to be a race when loading the nodes for the nodeselector
> which leads to inconsistent behaviour, so this was only encountered for
> the pciselector, but theoretically it could also happen for the
> usbselector so adding that condition to both.
> 
> Signed-off-by: Dominik Csapak 

applied, thanks!


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH kernel] add fix for regression causing broken power indicators/LEDs

2024-07-26 Thread Fiona Ebner
The issue was reported in the enterprise support. The customer
contacted the ledmon maintainer, who found that it is not an issue
with ledmon, bisected the kernel and came up with this fix.

Signed-off-by: Fiona Ebner 
---
 ...n-Power-Indicator-bits-for-userspace.patch | 56 +++
 1 file changed, 56 insertions(+)
 create mode 100644 
patches/kernel/0021-PCI-pciehp-Retain-Power-Indicator-bits-for-userspace.patch

diff --git 
a/patches/kernel/0021-PCI-pciehp-Retain-Power-Indicator-bits-for-userspace.patch
 
b/patches/kernel/0021-PCI-pciehp-Retain-Power-Indicator-bits-for-userspace.patch
new file mode 100644
index 000..7b94bcf
--- /dev/null
+++ 
b/patches/kernel/0021-PCI-pciehp-Retain-Power-Indicator-bits-for-userspace.patch
@@ -0,0 +1,56 @@
+From  Mon Sep 17 00:00:00 2001
+From: Blazej Kucman 
+Date: Mon, 22 Jul 2024 16:14:40 +0200
+Subject: [PATCH] PCI: pciehp: Retain Power Indicator bits for userspace
+ indicators
+
+The sysfs "attention" file normally controls the Slot Control Attention
+Indicator with 0 (off), 1 (on), 2 (blink) settings.
+
+576243b3f9ea ("PCI: pciehp: Allow exclusive userspace control of
+indicators") added pciehp_set_raw_indicator_status() to allow userspace to
+directly control all four bits in both the Attention Indicator and the
+Power Indicator fields via the "attention" file.
+
+This is used on Intel VMD bridges so utilities like "ledmon" can use sysfs
+"attention" to control up to 16 indicators for NVMe device RAID status.
+
+abaaac4845a0 ("PCI: hotplug: Use FIELD_GET/PREP()") broke this by masking
+the sysfs data with PCI_EXP_SLTCTL_AIC, which discards the upper two bits
+intended for the Power Indicator Control field (PCI_EXP_SLTCTL_PIC).
+
+For NVMe devices behind an Intel VMD, ledmon settings that use the
+PCI_EXP_SLTCTL_PIC bits, i.e., ATTENTION_REBUILD (0x5), ATTENTION_LOCATE
+(0x7), ATTENTION_FAILURE (0xD), ATTENTION_OFF (0xF), no longer worked
+correctly.
+
+Mask with PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC to retain both the
+Attention Indicator and the Power Indicator bits.
+
+Fixes: abaaac4845a0 ("PCI: hotplug: Use FIELD_GET/PREP()")
+Link: https://lore.kernel.org/r/20240722141440.7210-1-blazej.kuc...@intel.com
+Signed-off-by: Blazej Kucman 
+[bhelgaas: commit log]
+Signed-off-by: Bjorn Helgaas 
+Cc: sta...@vger.kernel.org # v6.7+
+(picked from 
https://lore.kernel.org/linux-pci/20240722141440.7210-1-blazej.kuc...@intel.com/T/#u)
+Signed-off-by: Fiona Ebner 
+---
+ drivers/pci/hotplug/pciehp_hpc.c | 4 +++-
+ 1 file changed, 3 insertions(+), 1 deletion(-)
+
+diff --git a/drivers/pci/hotplug/pciehp_hpc.c 
b/drivers/pci/hotplug/pciehp_hpc.c
+index b1d0a1b3917d..9d3c249207c4 100644
+--- a/drivers/pci/hotplug/pciehp_hpc.c
 b/drivers/pci/hotplug/pciehp_hpc.c
+@@ -485,7 +485,9 @@ int pciehp_set_raw_indicator_status(struct hotplug_slot 
*hotplug_slot,
+   struct pci_dev *pdev = ctrl_dev(ctrl);
+ 
+   pci_config_pm_runtime_get(pdev);
+-  pcie_write_cmd_nowait(ctrl, FIELD_PREP(PCI_EXP_SLTCTL_AIC, status),
++
++  /* Attention and Power Indicator Control bits are supported */
++  pcie_write_cmd_nowait(ctrl, FIELD_PREP(PCI_EXP_SLTCTL_AIC | 
PCI_EXP_SLTCTL_PIC, status),
+ PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC);
+   pci_config_pm_runtime_put(pdev);
+   return 0;
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [RFC storage 10/23] plugin: introduce new_backup_provider() method

2024-07-25 Thread Fiona Ebner
Am 25.07.24 um 15:11 schrieb Fiona Ebner:
> Am 25.07.24 um 11:48 schrieb Max Carrara:
>> For the specific types we can always then provide helper functions that
>> handle common cases that implementors can use.
>>
>> Extending on my namespace idea above, those helpers could then land in
>> e.g. `PVE::Backup::Provider::Common`, `PVE::Backup::Provider::Common::LXC`,
>> etc.
>>
> 
> Could you give an example for such a helper? I rather feel like things
> like libnbd will be that, i.e. for accessing the NBD export, not sure if
> we can create common helpers that would benefit multiple providers, each
> has their own backend they'll need to talk to after all and might have
> quite different needs.

Actually, this just gave me an idea (not for a helper method though :P).
We can provide a simpler mechanism than NBD, by just exposing the block
device with qemu-nbd ourselves first and also reading the bitmap
ourselves first, then passing the path to the block device and the
bitmap info to the provider.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [RFC storage 10/23] plugin: introduce new_backup_provider() method

2024-07-25 Thread Fiona Ebner
Am 25.07.24 um 11:48 schrieb Max Carrara:
> On Tue Jul 23, 2024 at 11:56 AM CEST, Fiona Ebner wrote:
>> Signed-off-by: Fiona Ebner 
> 
> Some overall thoughts:
> 
> 1.  I'm really, really happy to see documentation in this module here,
> that's fantastic! :)
> 
> While the contents of the docs seem fine, I would suggest you used
> POD instead. You can find an example in one of my recent series. [1]
> I mainly prefer POD solely because it's what Perl uses; it also
> indirectly makes sure we all use the same kind of format for
> documenting our Perl code.
> 
> Of course, we've currently not decided on any particular format, but
> because the opportunity arose, I wanted to pitch POD here
> nevertheless. ;)
> 

I'll look into it for v2. Agreed, following a standard for documenting
an API module has its merits.

> 2.  I would personally prefer a namespace like `PVE::Backup::Provider`
> instead of `PVE::BackupProvider`, simply because it leaves room for
> further packages and reduces churn in the long term, IMO.
> 

There's a risk though that PVE::Backup::Provider and PVE::Backup::Foo
are unrelated things that have no real business sharing a namespace.

> The same goes for backup provider plugins - IMO namespacing them
> like e.g. `PVE::Backup::Provider::Plugin::Foo` where `Foo` is a
> (concrete) plugin.
> 

The BackupProvider namespace is already intended for the plugins, adding
an extra level with "Plugin" would just bloat the module names,
especially if we decide to go the same route as for storage plugins and
have a "Custom" sub-namespace.

> While this seems long or somewhat excessive, I think it enforces a
> clear package / module hierarchy and keeps things tidier in the long
> term, and those couple extra keystrokes don't really hurt anyone.
> 

I get where you're coming from, I just feel like BackupProvider might be
better as its own separate thing, containing the plugins for the
specific purpose. But I don't have a strong opinion about it, and am
fine making such changes if other developers prefer it too :)

> The above two methods - `backup_nbd` and `backup_directory` - is there
> perhaps a way to merge them? I'm not sure if what I'm having in mind
> here is actually feasible, but what I mean is "making the method
> agnostic to the type of backup". As in, perhaps pass a hash that
> contains a `type` key for the type of backup being made, and instead of
> having long method signatures, include the remaining parameters as the
> remaining keys. For example:
> 
> {
> 'type' => 'lxc-dir',  # type names are just examples here
> 'directory' => '/foo/bar/baz',
> 'bandwidth_limit' => 42,
> ...
> }
> 
> {
> 'type' => 'vm-nbd',
> 'device_name' => '...',
> 'nbd_path' => '...',
> ...
> }
> 
> You get the point :P
> 
> IMO it would make it easier to extend later, and also make it more
> straightforward to introduce new parameters / deprecate old ones, while
> the method signature stays stable otherwise.
> 
> The same goes for the different cleanup methods further down below;
> instead of having a separate method for each "type of cleanup being
> performed", let the implementor handle it according to the data the
> method receives.
> 
> IMHO I think it's best to be completely agnostic over VM / LXC backups
> (and their specific types) wherever possible and let the data describe
> what's going on instead.
> 

The point about extensibility is a good one. The API wouldn't need to
change even if we implement new mechanisms. But thinking about it some
more, is there anything really gained? Because we will not force plugins
to implement the methods for new mechanisms of course, they can just
continue supporting what they support. Each mechanism will have its own
specific set of parameters, so throwing everything into a catch-all
method and hash might make it too generic.

Or think about the documentation for the single backup method: it would
become super lengthy and describe all backup mechanisms, while a plugin
most likely only cares about a single one and would have an easier time
with a method that captures that mechanism's parameters explicitly.
Won't the end result be making the implementors life slightly harder,
because it first needs to extract the parameters for the specific mechanism?

> For the specific types we can always then provide helper functions that
> handle common cases that implementors can use.
> 
> Exten

Re: [pve-devel] [PATCH qemu-server] resume: bump timeout for query-status

2024-07-25 Thread Fiona Ebner
Sent a v2 improving the commit message with new findings:
https://lists.proxmox.com/pipermail/pve-devel/2024-July/064908.html


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH v2 qemu-server] resume: bump timeout for query-status

2024-07-25 Thread Fiona Ebner
As reported in the community forum [0], after migration, the VM might
not immediately be able to respond to QMP commands, which means the VM
could fail to resume and stay in paused state on the target.

The reason is that activating the block drives in QEMU can take a bit
of time. For example, it might be necessary to invalidate the caches
(where for raw devices a flush might be needed) and the request
alignment and size of the block device needs to be queried.

In [0], an external Ceph cluster with krbd is used, and the initial
read to the block device after migration, for probing the request
alignment, takes a bit over 10 seconds[1]. Use 60 seconds as the new
timeout to be on the safe side for the future.

All callers are inside workers or via the 'qm' CLI command, so bumping
beyond 30 seconds is fine.

[0]: https://forum.proxmox.com/threads/149610/

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* improve commit message with new findings from the forum thread

 PVE/QemuServer.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index bf59b091..9e840912 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6461,7 +6461,9 @@ sub vm_resume {
 my ($vmid, $skiplock, $nocheck) = @_;
 
 PVE::QemuConfig->lock_config($vmid, sub {
-   my $res = mon_cmd($vmid, 'query-status');
+   # After migration, the VM might not immediately be able to respond to 
QMP commands, because
+   # activating the block devices might take a bit of time.
+   my $res = mon_cmd($vmid, 'query-status', timeout => 60);
my $resume_cmd = 'cont';
my $reset = 0;
my $conf;
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu 1/2] update submodule and patches to QEMU 9.0.2

2024-07-25 Thread Fiona Ebner
Most relevant are some fixes for VirtIO and for ARM and i386
emulation. There also is a fix for VGA display to fix screen blanking,
which fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=4786

Signed-off-by: Fiona Ebner 
---
 ...d-support-for-sync-bitmap-mode-never.patch |  10 +-
 ...race-with-clients-disconnecting-earl.patch |   4 +-
 ...io-pci-fix-use-of-a-released-vector.patch} |   8 +-
 .../0006-virtio-gpu-fix-v2-migration.patch|  98 ---
 ...0007-hw-pflash-fix-block-write-start.patch |  59 -
 ...operand-size-for-DATA16-REX.W-POPCNT.patch |  51 
 ...ru-wrpkru-are-no-prefix-instructions.patch |  40 ---
 ...6-fix-feature-dependency-for-WAITPKG.patch |  33 ---
 ...move-compatibility-flags-for-VirtIO-.patch |  57 -
 ...t-monitor-use-aio_co_reschedule_self.patch |  53 
 ...ict-translation-disabled-alignment-c.patch |  51 
 ...-IRQs-a-chance-when-resetting-HF_INH.patch |  80 --
 ...r-v-Correct-kvm_hv_handle_exit-retur.patch |  60 -
 ...86-disable-jmp_opt-if-EFLAGS.RF-is-1.patch |  31 ---
 ...ingle-step-exception-after-MOV-or-PO.patch |  30 ---
 ...n-t-open-data_file-with-BDRV_O_NO_IO.patch | 107 
 ...names-only-when-explicitly-requested.patch | 241 --
 ...le-posix-make-locking-optiono-on-cre.patch |   6 +-
 ...ckup-Proxmox-backup-patches-for-QEMU.patch |   2 +-
 ...k-driver-to-map-backup-archives-into.patch |   8 +-
 ...igrate-dirty-bitmap-state-via-savevm.patch |   2 +-
 ...-backup-add-discard-source-parameter.patch |   2 +-
 ...e-allow-specifying-minimum-cluster-s.patch |   4 +-
 ...um-cluster-size-to-performance-optio.patch |   2 +-
 .../0050-PVE-backup-add-fleecing-option.patch |   2 +-
 debian/patches/series |  16 +-
 26 files changed, 26 insertions(+), 1031 deletions(-)
 rename 
debian/patches/extra/{0011-Revert-virtio-pci-fix-use-of-a-released-vector.patch 
=> 0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch} (93%)
 delete mode 100644 debian/patches/extra/0006-virtio-gpu-fix-v2-migration.patch
 delete mode 100644 
debian/patches/extra/0007-hw-pflash-fix-block-write-start.patch
 delete mode 100644 
debian/patches/extra/0008-target-i386-fix-operand-size-for-DATA16-REX.W-POPCNT.patch
 delete mode 100644 
debian/patches/extra/0009-target-i386-rdpkru-wrpkru-are-no-prefix-instructions.patch
 delete mode 100644 
debian/patches/extra/0010-target-i386-fix-feature-dependency-for-WAITPKG.patch
 delete mode 100644 
debian/patches/extra/0012-hw-core-machine-move-compatibility-flags-for-VirtIO-.patch
 delete mode 100644 
debian/patches/extra/0013-Revert-monitor-use-aio_co_reschedule_self.patch
 delete mode 100644 
debian/patches/extra/0014-target-arm-Restrict-translation-disabled-alignment-c.patch
 delete mode 100644 
debian/patches/extra/0015-target-i386-Give-IRQs-a-chance-when-resetting-HF_INH.patch
 delete mode 100644 
debian/patches/extra/0016-target-i386-hyper-v-Correct-kvm_hv_handle_exit-retur.patch
 delete mode 100644 
debian/patches/extra/0017-target-i386-disable-jmp_opt-if-EFLAGS.RF-is-1.patch
 delete mode 100644 
debian/patches/extra/0018-target-i386-no-single-step-exception-after-MOV-or-PO.patch
 delete mode 100644 
debian/patches/extra/0019-qcow2-Don-t-open-data_file-with-BDRV_O_NO_IO.patch
 delete mode 100644 
debian/patches/extra/0020-block-Parse-filenames-only-when-explicitly-requested.patch

diff --git 
a/debian/patches/bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
 
b/debian/patches/bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
index 6789ac5..392b8a2 100644
--- 
a/debian/patches/bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
+++ 
b/debian/patches/bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
@@ -364,10 +364,10 @@ index d2201e27f4..cc1387ae02 100644
BlockdevOnError on_source_error,
BlockdevOnError on_target_error,
 diff --git a/qapi/block-core.json b/qapi/block-core.json
-index 746d1694c2..45ab548dfe 100644
+index 4b18e01b85..0902b0a024 100644
 --- a/qapi/block-core.json
 +++ b/qapi/block-core.json
-@@ -2174,6 +2174,15 @@
+@@ -2170,6 +2170,15 @@
  # destination (all the disk, only the sectors allocated in the
  # topmost image, or only new I/O).
  #
@@ -383,7 +383,7 @@ index 746d1694c2..45ab548dfe 100644
  # @granularity: granularity of the dirty bitmap, default is 64K if the
  # image format doesn't have clusters, 4K if the clusters are
  # smaller than that, else the cluster size.  Must be a power of 2
-@@ -2216,7 +2225,9 @@
+@@ -2212,7 +2221,9 @@
  { 'struct': 'DriveMirror',
'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
  '*format': 'str', '*node-name': 'str', '*replaces': 'str',
@@ -394,7 +394,7 @@ index 746d1694c2..45ab548dfe 100644
   

[pve-devel] [PATCH qemu 2/2] some more stable fixes for QEMU 9.0.2

2024-07-25 Thread Fiona Ebner
Fix the two issues reported in the community forum[0][1], i.e.
regression in LSI-53c895a controller and ignored boot order for USB
storage (only possible via custom arguments in Proxmox VE), both
causing boot failures, and pick up fixes for VirtIO, ARM emulation,
char IO device and a graph lock fix for the block layer.

The block-copy patches that serve as a preparation for fleecing are
moved to the extra folder, because the graph lock fix requires them
to be present first. They have been applied upstream in the meantime
and should drop out with the rebase on 9.1.

[0]: https://forum.proxmox.com/threads/149772/post-679433
[1]: https://forum.proxmox.com/threads/149772/post-683459

Signed-off-by: Fiona Ebner 
---
 ...d-support-for-sync-bitmap-mode-never.patch | 14 +--
 ...check-for-bitmap-mode-without-bitmap.patch |  2 +-
 .../0006-mirror-move-some-checks-to-qmp.patch |  2 +-
 ...ck-copy-before-write-fix-permission.patch} |  0
 ...-write-support-unligned-snapshot-di.patch} |  0
 ...-write-create-block_copy-bitmap-in-.patch} |  0
 ...backup-add-discard-source-parameter.patch} | 20 ++--
 ...e-de-initialization-of-vhost-user-de.patch | 92 ++
 ...Use-float_status-copy-in-sme_fmopa_s.patch | 43 +
 ...-Use-FPST_F16-for-SME-FMOPA-widening.patch | 62 +
 ...ion-and-honor-bootindex-again-for-le.patch | 60 
 ...5a-bump-instruction-limit-in-scripts.patch | 48 ++
 ...16-block-copy-Fix-missing-graph-lock.patch | 38 
 ...-do-not-operate-on-sources-from-fina.patch | 93 +++
 ...le-posix-make-locking-optiono-on-cre.patch |  6 +-
 ...e-bcs-bitmap-initialization-to-job-c.patch |  4 +-
 ...-Backup-add-backup-dump-block-driver.patch |  4 +-
 ...ckup-Proxmox-backup-patches-for-QEMU.patch |  4 +-
 ...k-driver-to-map-backup-archives-into.patch |  8 +-
 ...igrate-dirty-bitmap-state-via-savevm.patch |  2 +-
 ...-allow-specifying-minimum-cluster-s.patch} |  2 +-
 ...m-cluster-size-to-performance-optio.patch} |  0
 ...0046-PVE-backup-add-fleecing-option.patch} |  0
 ...e-error-when-copy-before-write-fail.patch} |  0
 debian/patches/series | 23 +++--
 25 files changed, 485 insertions(+), 42 deletions(-)
 rename debian/patches/{pve/0044-block-copy-before-write-fix-permission.patch 
=> extra/0007-block-copy-before-write-fix-permission.patch} (100%)
 rename 
debian/patches/{pve/0045-block-copy-before-write-support-unligned-snapshot-di.patch
 => extra/0008-block-copy-before-write-support-unligned-snapshot-di.patch} 
(100%)
 rename 
debian/patches/{pve/0046-block-copy-before-write-create-block_copy-bitmap-in-.patch
 => extra/0009-block-copy-before-write-create-block_copy-bitmap-in-.patch} 
(100%)
 rename 
debian/patches/{pve/0047-qapi-blockdev-backup-add-discard-source-parameter.patch
 => extra/0010-qapi-blockdev-backup-add-discard-source-parameter.patch} (96%)
 create mode 100644 
debian/patches/extra/0011-hw-virtio-Fix-the-de-initialization-of-vhost-user-de.patch
 create mode 100644 
debian/patches/extra/0012-target-arm-Use-float_status-copy-in-sme_fmopa_s.patch
 create mode 100644 
debian/patches/extra/0013-target-arm-Use-FPST_F16-for-SME-FMOPA-widening.patch
 create mode 100644 
debian/patches/extra/0014-scsi-fix-regression-and-honor-bootindex-again-for-le.patch
 create mode 100644 
debian/patches/extra/0015-hw-scsi-lsi53c895a-bump-instruction-limit-in-scripts.patch
 create mode 100644 
debian/patches/extra/0016-block-copy-Fix-missing-graph-lock.patch
 create mode 100644 
debian/patches/extra/0017-Revert-qemu-char-do-not-operate-on-sources-from-fina.patch
 rename 
debian/patches/pve/{0048-copy-before-write-allow-specifying-minimum-cluster-s.patch
 => 0044-copy-before-write-allow-specifying-minimum-cluster-s.patch} (99%)
 rename 
debian/patches/pve/{0049-backup-add-minimum-cluster-size-to-performance-optio.patch
 => 0045-backup-add-minimum-cluster-size-to-performance-optio.patch} (100%)
 rename debian/patches/pve/{0050-PVE-backup-add-fleecing-option.patch => 
0046-PVE-backup-add-fleecing-option.patch} (100%)
 rename 
debian/patches/pve/{0051-PVE-backup-improve-error-when-copy-before-write-fail.patch
 => 0047-PVE-backup-improve-error-when-copy-before-write-fail.patch} (100%)

diff --git 
a/debian/patches/bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
 
b/debian/patches/bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
index 392b8a2..0532896 100644
--- 
a/debian/patches/bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
+++ 
b/debian/patches/bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
@@ -258,7 +258,7 @@ index 1bdce3b657..0c5c72df2e 100644
   errp);
  if (!job) {
 diff --git a/blockdev.c b/blockdev.c
-index 057601dcf0..8682814a7a 100644
+index 4c33c3f5f0..f3e508a6a7 100644
 --- a/blockdev.c
 +++ b/blockdev.c
 @@ -2776,6 +2776,9 @@ static void blockdev_mirror_c

[pve-devel] applied: [PATCH docs] fix #5525: storage: pbs: improve master-pubkey docs

2024-07-24 Thread Fiona Ebner
Am 11.06.24 um 14:17 schrieb Fabian Grünbichler:
> add the information that the parameter is special like other secret ones, and
> add the resulting config to the example to make it even more obvious.
> 
> Signed-off-by: Fabian Grünbichler 
> ---
>  pve-storage-pbs.adoc | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/pve-storage-pbs.adoc b/pve-storage-pbs.adoc
> index 84d598f..3140135 100644
> --- a/pve-storage-pbs.adoc
> +++ b/pve-storage-pbs.adoc
> @@ -64,8 +64,11 @@ Optional.
>  master-pubkey::
>  
>  A public RSA key used to encrypt the backup encryption key as part of the
> -backup task. The encrypted copy will be appended to the backup and stored on
> -the Proxmox Backup Server instance for recovery purposes.
> +backup task. Will be saved in a file under
> +`/etc/pve/priv/storage/.master.pem` with access restricted to the
> +root user.
> +The encrypted copy of the backup encryption key will be appended to each 
> backup
> +and stored on the Proxmox Backup Server instance for recovery purposes.
>  Optional, requires `encryption-key`.
>  
>  .Configuration Example (`/etc/pve/storage.cfg`)
> @@ -77,6 +80,8 @@ pbs: backup
>  fingerprint 
> 09:54:ef:..snip..:88:af:47:fe:4c:3b:cf:8b:26:88:0b:4e:3c:b2
>  prune-backups keep-all=1
>  username archiver@pbs
> +encryption-key a9:ee:c8:02:13:..snip..:2d:53:2c:98
> +master-pubkey = 1

applied, thanks! While removing the equals sign that slipped in here ;)

>  
>  
>  Storage Features


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH qemu-server v11 3/5] migration: add check_non_migratable_resources function

2024-07-24 Thread Fiona Ebner
Am 29.05.24 um 14:23 schrieb Markus Frank:
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> index 2a1d4d7..8f36cf8 100644
> --- a/PVE/API2/Qemu.pm
> +++ b/PVE/API2/Qemu.pm
> @@ -4504,7 +4504,7 @@ __PACKAGE__->register_method({
>   $res->{running} = PVE::QemuServer::check_running($vmid) ? 1:0;
>  
>   my ($local_resources, $mapped_resources, $missing_mappings_by_node) =
> - PVE::QemuServer::check_local_resources($vmconf, 1);
> + PVE::QemuServer::check_local_resources($vmconf, $res->{running}, 1);
>   delete $missing_mappings_by_node->{$localnode};
>  
>   my $vga = PVE::QemuServer::parse_vga($vmconf->{vga});
> @@ -5192,6 +5192,9 @@ __PACKAGE__->register_method({
>   die "unable to use snapshot name 'pending' (reserved name)\n"
>   if lc($snapname) eq 'pending';
>  
> + my $vmconf = PVE::QemuConfig->load_config($vmid);
> + PVE::QemuServer::check_non_migratable_resources($vmconf, 
> $param->{vmstate}, 0);

This can be fine as an early check. But there should be another check
after the 'snapshot' lock was set in the config (or right before while
still holding the config file lock). Otherwise, there is no guarantee
that the config is the same at the time of the check and during the
actual snapshot operation.

My suggestion is to introduce a new abstract method (named something
like __snapshot_assert_no_blockers()) in AbstractConfig that is called
during __snapshot_prepare() which QemuConfig can override. Like that we
can do the check before writing the modified config for the snapshot.

> +
>   my $realcmd = sub {
>   PVE::Cluster::log_msg('info', $authuser, "snapshot VM $vmid: 
> $snapname");
>   PVE::QemuConfig->snapshot_create($vmid, $snapname, 
> $param->{vmstate},
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 31a7ee9..9f342bf 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -2563,13 +2563,32 @@ sub config_list {
>  return $res;
>  }
>  
> +sub check_non_migratable_resources {
> +my ($conf, $state, $noerr) = @_;
> +
> +my @non_migr_res = ();

Style nit: I'd prefer writing out the name or use something like
"blockers" (inside the function where it's clear what is meant) if it
really is too long

> +if ($state && $conf->{amd_sev}) {
> + push @non_migr_res, "amd_sev";
> +}
> +
> +if (scalar @non_migr_res && !$noerr) {

Style nit: I'd prefer parentheses for scalar(@stuff)

> + die "Cannot live-migrate, snapshot (with RAM), or hibernate a VM with:"
> + ." @non_migr_res\n";
> +}
> +
> +return @non_migr_res;
> +}
> +


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH qemu-server v11 1/5] add C program to get hardware capabilities from CPUID

2024-07-24 Thread Fiona Ebner
Am 29.05.24 um 14:23 schrieb Markus Frank:
> Implement a C program that extracts AMD SEV hardware information such
> as reduced-phys-bios and cbitpos from CPUID at boot time, looks if
> SEV, SEV-ES & SEV-SNP are enabled, and outputs these details as JSON
> to /run/qemu-server/host-hw-capabilities.json
> 
> This programm can also be used to read and save other hardware

Typo: should be "program"

> information at boot time.
> 
> Signed-off-by: Markus Frank 
> Co-authored-by: Thomas Lamprecht 
> Tested-by: Filip Schauer 

Since there were changes, the Tested-by trailer should be removed.

With the suggested changes below:

Reviewed-by: Fiona Ebner 

> ---
> changes v11:
> * removed systemd service

* improved error handling

> +ret = fprintf(file,
> + "{"
> + " \"amd-sev\": {"
> + " \"cbitpos\": %u,"
> + " \"reduced-phys-bits\": %u,"
> + " \"sev-support\": %s,"
> + " \"sev-support-es\": %s,"
> + " \"sev-support-snp\": %s"
> + " }"
> + " }\n",
> + cbitpos,
> + reduced_phys_bits,
> + sev_support ? "true" : "false",
> + sev_es_support ? "true" : "false",
> + sev_snp_support ? "true" : "false"
> +);
> +if (ret == -1) {

"man 3 fprintf" states:

> If an output error is encountered, a negative value is returned."

and while a quick glance at the source code shows that -1 is used in
many places, I'm not sure that's true for all error paths. Let's check
for < 0 instead.

> + printf("Error writing to file %s: %s\n", path, strerror(errno));
> +}
> +
> +ret = fclose(file);

Nit:

"man 3 fclose" states:

> Upon successful completion, 0 is returned. Otherwise, EOF is returned
> and errno is set to indicate the error.

While EOF is defined to be -1 in stdio.h, it's better to use the
constant explicitly (or could also use ret != 0).

> +if (ret == -1) {
> + printf("Error closing file %s: %s\n", path, strerror(errno));
> +}
> +
> +return 0;
> +}


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH qemu-server v11 2/5] config: add AMD SEV support

2024-07-24 Thread Fiona Ebner
Am 29.05.24 um 14:23 schrieb Markus Frank:
> Signed-off-by: Markus Frank 

Reviewed-by: Fiona Ebner 

with some (style) nits that can still be addressed:

> diff --git a/PVE/QemuServer/CPUConfig.pm b/PVE/QemuServer/CPUConfig.pm
> index 33f7524..2542aa2 100644
> --- a/PVE/QemuServer/CPUConfig.pm
> +++ b/PVE/QemuServer/CPUConfig.pm
> @@ -3,9 +3,10 @@ package PVE::QemuServer::CPUConfig;
>  use strict;
>  use warnings;
>  
> +use JSON;

Style nit: please add a blank line between non-PVE and PVE modules

> @@ -773,6 +806,54 @@ sub get_cpu_bitness {
>  die "unsupported architecture '$arch'\n";
>  }
>  
> +sub get_hw_capabilities {
> +# Get reduced-phys-bits & cbitpos from host-hw-capabilities.json
> +my $filename = '/run/qemu-server/host-hw-capabilities.json';
> +if (! -e $filename) {
> +   run_command("/usr/libexec/qemu-server/query-machine-capabilities");
> +}
> +my $json_text = PVE::Tools::file_get_contents($filename);
> +($json_text) = $json_text =~ /(.*)/; # untaint json text
> +my $hw_capabilities = decode_json($json_text);

Nit: could use eval and add context to errors from decode_json()

> +return $hw_capabilities;
> +}
> +
> +sub get_amd_sev_object {
> +my ($conf) = @_;

Nit: could also pass the three properties from the config it actually
needs instead of passing the whole config for a more explicit function
signature.

> +
> +if (!$conf->{bios} || ($conf->{bios} && $conf->{bios} ne 'ovmf')) {
> + die "To use SEV, you need to change the BIOS to OVMF.\n";
> +}
> +die "To use SEV, you need to add an efidisk.\n" if (!$conf->{efidisk0});

I'd move the checks to below the ones for the hardware capability.
Otherwise a user might see the error about BIOS first, change it, try
again and then fail with the error about lacking hardware support. I'd
also use "AMD SEV" instead of just "SEV".

What happens if there is no EFI disk (i.e. the temporary efivars disk is
used)?

Style nit: superfluous parentheses in post-if

> +
> +my $amd_sev_conf = PVE::JSONSchema::parse_property_string($sev_fmt, 
> $conf->{amd_sev});
> +my $sev_hw_caps = get_hw_capabilities()->{'amd-sev'};
> +
> +if (!$sev_hw_caps->{'sev-support'}) {
> + die "Your CPU does not support AMD SEV!\n";> +}
> +if ($amd_sev_conf->{type} eq 'es' && !$sev_hw_caps->{'sev-support-es'}) {
> + die "Your CPU does not support AMD SEV-ES!\n";
> +}

No need for the exclamation marks for those two error messages IMHO.

> +
> +my $sev_mem_object = 'sev-guest,id=sev0'
> +.',cbitpos='.$sev_hw_caps->{cbitpos}
> +.',reduced-phys-bits='.$sev_hw_caps->{'reduced-phys-bits'};

Style nit: usually this is written as

my $a = "foo";
$a .= "bar";

in our code base.

> +
> +# guest policy bit calculation as described here:
> +# 
> https://documentation.suse.com/sles/15-SP5/html/SLES-amd-sev/article-amd-sev.html#table-guestpolicy
> +my $policy = 0b;
> +$policy += 0b0001 if ($amd_sev_conf->{'no-debug'});
> +$policy += 0b0010 if ($amd_sev_conf->{'no-key-sharing'});
> +$policy += 0b0100 if ($amd_sev_conf->{type} eq 'es');

Style nit: superfluous parentheses in post-if, also below

> +# disable migration with bit 3 nosend to prevent amd-sev-migration-attack
> +$policy += 0b1000;
> +
> +$sev_mem_object .= ',policy='.sprintf("%#x", $policy);
> +$sev_mem_object .= ',kernel-hashes=on' if 
> ($amd_sev_conf->{'kernel-hashes'});
> +return $sev_mem_object;
> +}
> +
>  __PACKAGE__->register();
>  __PACKAGE__->init();
>  


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH v1 pve-common 0/3] Section Config: Documentation & Code Cleanup

2024-07-24 Thread Fiona Ebner
Am 04.06.24 um 11:28 schrieb Max Carrara:
> 
> Max Carrara (3):
>   section config: document package and its methods with POD
>   section config: update code style
>   section config: clean up parser logic
> 
>  src/PVE/SectionConfig.pm | 982 +++
>  1 file changed, 798 insertions(+), 184 deletions(-)
> 

For the record, it seems like this already got applied:
https://git.proxmox.com/?p=pve-common.git;a=commit;h=6cba8d7660b62002ffdc81655b8e8668952614a9

Could you re-send the changes for v2 as follow-ups? Or v1 could also be
reverted and v2 applied?


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH container/manager 0/2] add deny read/write options for device passthrough

2024-07-24 Thread Fiona Ebner
Am 29.04.24 um 15:15 schrieb Filip Schauer:
> Add the deny_read and deny_write options for device passthrough, to
> restrict container access to devices.
> 

In the UI, it think it's enough to expose a checkbox for read-only. Use
cases that deny reads seem a bit esoteric, so I'm not even sure we
should add deny_read in the back-end before somebody complains. But no
strong opinion there.

> This allows for passing through a device in read-only mode without
> giving the container full access it.
> 
> Up until now a container with a device passed through to it was granted
> full access to that device without an option to restrict that access as
> pointed out by @Fiona.
>


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH container 1/2] add deny read/write options for device passthrough

2024-07-24 Thread Fiona Ebner
Am 29.04.24 um 15:15 schrieb Filip Schauer:
> Add the deny_read and deny_write options for device passthrough, to
> restrict container access to devices.
> 
> Signed-off-by: Filip Schauer 

Reviewed-by: Fiona Ebner 

> diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm
> index 1664a35..5db9181 100644
> --- a/src/PVE/LXC/Config.pm
> +++ b/src/PVE/LXC/Config.pm
> @@ -962,6 +962,16 @@ my $dev_desc = {
>   minimum => 0,
>   description => 'Group ID to be assigned to the device node',
>  },
> +deny_read => {
> + optional => 1,
> + type => 'boolean',
> + description => 'Deny the container to read from the device',
> +},
> +deny_write => {
> + optional => 1,
> + type => 'boolean',
> + description => 'Deny the container to write to the device',

Nit: missing default for both

> +},
>  };
>  
>  for (my $i = 0; $i < $MAX_DEVICES; $i++) {


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH docs] passthrough: viommu: remove 'amd_iommu=on' from the docs

2024-07-24 Thread Fiona Ebner
Am 26.04.24 um 09:58 schrieb Dominik Csapak:
> this is wrong and does nothing, see previous commit:
> 0c54d61 (remove 'amd_iommu=on' from the passthrough docs)
> and
> https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html?highlight=amd_iommu
> 
> Signed-off-by: Dominik Csapak 
> ---
>  qm-pci-passthrough.adoc | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/qm-pci-passthrough.adoc b/qm-pci-passthrough.adoc
> index bbd6b85..14f7fa5 100644
> --- a/qm-pci-passthrough.adoc
> +++ b/qm-pci-passthrough.adoc
> @@ -511,8 +511,7 @@ There are currently two vIOMMU implementations available: 
> Intel and VirtIO.
>  
>  Host requirement:
>  
> -* Add `intel_iommu=on` or `amd_iommu=on` depending on your CPU to your kernel
> -command line.
> +* Add `intel_iommu=on` depending on your CPU to your kernel command line.
>  

IMHO saying "For Intel CPUs, add ..." sounds nicer than "add ...
depending on your CPU" together with your change. And we could also
mention that it's only required for kernels before 6.8 and nothing for
AMD CPUs?

>  Intel vIOMMU
>  


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



  1   2   3   4   5   6   7   8   9   10   >