Re: [pve-devel] [PATCH qemu] async snapshot: stop vCPU throttling after finishing

2024-11-08 Thread Fiona Ebner
Ping, as there are still new affected users showing up in the forum thread.

Am 30.10.24 um 10:52 schrieb Fiona Ebner:
> In the community forum, users reported issues about RCU stalls and
> sluggish VMs after taking a snapshot with RAM in Proxmox VE [0]. Mario
> was also experiencing similar issues from time to time and recently,
> obtained a GDB stacktrace. The stacktrace showed that, in his case,
> the vCPU threads were waiting in cpu_throttle_thread(). It is a good
> guess that the issues in the forum could also be because of that.
> 
> From searching in the source code, it seems that migration is the only
> user of the vCPU throttling functions in QEMU relevant for Proxmox VE
> (the only other place where it is used is the Cocoa UI). In
> particular, RAM migration will begin throttling vCPUs for
> auto-converge.
> 
> In migration_iteration_finish() there is an unconditional call to
> cpu_throttle_stop(), so do the same in the async snapshot code
> specific to Proxmox VE.
> 
> It's not clear why the issue began to surface more prominently only
> now, since the vCPU throttling was there since commit 070afca258
> ("migration: Dynamic cpu throttling for auto-converge") in QEMU
> v2.10.0. However, there were a lot of changes in the migration code
> between v8.1.5 and v9.0.2 and a few of them might have affected the
> likelihood of cpu_throttle_set() being called, for example, 4e1871c450
> ("migration: Don't serialize devices in qemu_savevm_state_iterate()")
> 
> [0]: https://forum.proxmox.com/threads/153483
> 
> Reported-by: Mario Loderer 
> Signed-off-by: Fiona Ebner 
> Tested-by: Mario Loderer 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH docs] fix #5665: add note about short-lived cert renewal

2024-11-08 Thread Fiona Ebner
Am 09.09.24 um 14:39 schrieb Fabian Grünbichler:
> not that obvious behaviour on the systemd side, and missing cert renewal can
> have wide-reaching consequences.
> 
> Signed-off-by: Fabian Grünbichler 
> ---
>  certificate-management.adoc | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/certificate-management.adoc b/certificate-management.adoc
> index 71c6d71..3bb9bb9 100644
> --- a/certificate-management.adoc
> +++ b/certificate-management.adoc
> @@ -223,6 +223,9 @@ If a node has been successfully configured with an 
> ACME-provided certificate
>  renewed by the `pve-daily-update.service`. Currently, renewal will be 
> attempted
>  if the certificate has expired already, or will expire in the next 30 days.
>  
> +NOTE: If you are using a custom directory that issues short-lived 
> certificates,
> +disabling the random delay for the `pve-daily-update.timer` unit might be
> +advisable to avoid missing a certificate renewal after a reboot.
>  
>  ACME Examples with `pvenode`
>  

applied, thanks!


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [RFC container v3 30/34] create: factor out compression option helper

2024-11-07 Thread Fiona Ebner
In preparation to re-use it for checking potentially untrusted
archives.

Signed-off-by: Fiona Ebner 
---

New in v3.

 src/PVE/LXC/Create.pm | 51 +--
 1 file changed, 30 insertions(+), 21 deletions(-)

diff --git a/src/PVE/LXC/Create.pm b/src/PVE/LXC/Create.pm
index 719f372..d2f675e 100644
--- a/src/PVE/LXC/Create.pm
+++ b/src/PVE/LXC/Create.pm
@@ -78,15 +78,38 @@ sub restore_proxmox_backup_archive {
$scfg, $storeid, $cmd, $param, userns_cmd => $userns_cmd);
 }
 
+my sub tar_compression_option {
+my ($archive) = @_;
+
+my %compression_map = (
+   '.gz'  => '-z',
+   '.bz2' => '-j',
+   '.xz'  => '-J',
+   '.lzo'  => '--lzop',
+   '.zst'  => '--zstd',
+);
+if ($archive =~ /\.tar(\.[^.]+)?$/) {
+   if (defined($1)) {
+   die "unrecognized compression format: $1\n" if 
!defined($compression_map{$1});
+   return $compression_map{$1};
+   }
+   return;
+} else {
+   die "file does not look like a template archive: $archive\n";
+}
+}
+
 my sub restore_tar_archive_command {
-my ($conf, $opts, $rootdir, $bwlimit) = @_;
+my ($conf, $compression_opt, $rootdir, $bwlimit) = @_;
 
 my ($id_map, $root_uid, $root_gid) = PVE::LXC::parse_id_maps($conf);
 my $userns_cmd = PVE::LXC::userns_command($id_map);
 
-my $cmd = [@$userns_cmd, 'tar', 'xpf', '-', $opts->@*, '--totals',
-   @PVE::Storage::Plugin::COMMON_TAR_FLAGS,
-   '-C', $rootdir];
+my $cmd = [@$userns_cmd, 'tar', 'xpf', '-'];
+push $cmd->@*, $compression_opt if $compression_opt;
+push $cmd->@*, '--totals';
+push $cmd->@*, @PVE::Storage::Plugin::COMMON_TAR_FLAGS;
+push $cmd->@*, '-C', $rootdir;
 
 # skip-old-files doesn't have anything to do with time (old/new), but is
 # simply -k (annoyingly also called --keep-old-files) without the 'treat
@@ -108,24 +131,10 @@ sub restore_tar_archive {
 
 my $archive_fh;
 my $tar_input = '<&STDIN';
-my @compression_opt;
+my $compression_opt;
 if ($archive ne '-') {
# GNU tar refuses to autodetect this... *sigh*
-   my %compression_map = (
-   '.gz'  => '-z',
-   '.bz2' => '-j',
-   '.xz'  => '-J',
-   '.lzo'  => '--lzop',
-   '.zst'  => '--zstd',
-   );
-   if ($archive =~ /\.tar(\.[^.]+)?$/) {
-   if (defined($1)) {
-   die "unrecognized compression format: $1\n" if 
!defined($compression_map{$1});
-   @compression_opt = $compression_map{$1};
-   }
-   } else {
-   die "file does not look like a template archive: $archive\n";
-   }
+   $compression_opt = tar_compression_option($archive);
sysopen($archive_fh, $archive, O_RDONLY)
or die "failed to open '$archive': $!\n";
my $flags = $archive_fh->fcntl(Fcntl::F_GETFD(), 0);
@@ -133,7 +142,7 @@ sub restore_tar_archive {
$tar_input = '<&'.fileno($archive_fh);
 }
 
-my $cmd = restore_tar_archive_command($conf, [@compression_opt], $rootdir, 
$bwlimit);
+my $cmd = restore_tar_archive_command($conf, $compression_opt, $rootdir, 
$bwlimit);
 
 if ($archive eq '-') {
print "extracting archive from STDIN\n";
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC qemu/common/storage/qemu-server/container/manager v3 00/34] backup provider API

2024-11-07 Thread Fiona Ebner
ss and, if used, bitmap are exported via NBD.

Container mechanism 'directory':

A copy or snapshot of the container's filesystem state is made
available as a directory. The method is executed inside the user
namespace associated to the container.

2. Restore API

The restore_get_mechanism() method is used to decide on the restore
mechanism. Currently, 'qemu-img' for VMs, and 'directory' or 'tar' for
containers are possible. It is enough to implement one mechanism for
VMs and one mechanism for containers.

Next, methods for extracting the guest and firewall configuration and
the implementations of the restore mechanism via a pair of methods: an
init method, for making the data available to Proxmox VE and a cleanup
method that is called after restore.

For VMs, there also is a restore_vm_get_device_info() helper required,
to get the disks included in the backup and their sizes.

2.1. Restore Mechanisms

VM mechanism 'qemu-img':

The backup provider gives a path to the disk image that will be
restored. The path needs to be something 'qemu-img' can deal with,
e.g. can also be an NBD URI or similar.

Container mechanism 'directory':

The backup provider gives the path to a directory with the full
filesystem structure of the container.

Container mechanism 'tar':

The backup provider gives the path to a (potentially compressed) tar
archive with the full filesystem structure of the container.

See the PVE::BackupProvider::Plugin module for the full API
documentation.

==

This series adapts the backup stack in Proxmox VE to allow using the
above API. For QEMU, backup access setup and teardown QMP commands are
implemented to be able to provide access to a consistent disk state to
the backup provider.

The series also provides an example implementation for a backup
provider as a proof-of-concept, exposing the different features.

==

Open questions:

Should the backup provider plugin system also follow the same API
age+version schema with a Custom/ directory for external plugins
derived from the base plugin?

Should the bitmap action be passed directly to the backup provider?
I.e. have 'not-used', 'not-used-removed', 'new', 'used', 'invalid',
instead of only 'none', 'new' and 'reuse'. It makes API slightly more
complicated. Is there any situation where backup provider could care
if bitmap is new, because it was the first or bitmap is new because
previous was invalid? Both cases require the backup provider to do a
full backup.

==

The patches marked as PATCH rather than RFC can make sense
independently, with QEMU patches 02 and 03 having been sent already
before (touching same code, so included here):

https://lore.proxmox.com/pve-devel/20240625133551.210636-1-f.eb...@proxmox.com/#r

==

Feedback is very welcome, especially from people wishing to implement
such a backup provider plugin! Please tell me what issues you see with
the proposed API, what would and wouldn't work from your perspective?

==

Dependencies: pve-manager, pve-container and qemu-server all depend on
new libpve-storage-perl. pve-manager also build-depends on the new
libpve-storage-perl for its tests. pve-container depends on new
pve-common. To keep things clean, pve-manager should also depend on
new pve-container and qemu-server.

In qemu-server, there is no version guard added yet, as that depends
on the QEMU version the feature will land in.

==

qemu:

Fiona Ebner (9):
  block/reqlist: allow adding overlapping requests
  PVE backup: fixup error handling for fleecing
  PVE backup: factor out setting up snapshot access for fleecing
  PVE backup: save device name in device info structure
  PVE backup: include device name in error when setting up snapshot
access fails
  PVE backup: add target ID in backup state
  PVE backup: get device info: allow caller to specify filter for which
devices use fleecing
  PVE backup: implement backup access setup and teardown API for
external providers
  PVE backup: implement bitmap support for external backup access

 block/copy-before-write.c |   3 +-
 block/reqlist.c   |   2 -
 pve-backup.c  | 620 +-
 pve-backup.h  |  16 +
 qapi/block-core.json  |  61 
 system/runstate.c |   6 +
 6 files changed, 637 insertions(+), 71 deletions(-)
 create mode 100644 pve-backup.h


common:

Fiona Ebner (1):
  env: add module with helpers to run a Perl subroutine in a user
namespace

 src/Makefile   |   1 +
 src/PVE/Env.pm | 136 +
 2 files changed, 137 insertions(+)
 create mode 100644 src/PVE/Env.pm


storage:

Fiona Ebner (5):
  add storage_has_feature() helper function
  plugin: introduce new_backup_provider() method
  extract backup config: delegate to backup provider for s

[pve-devel] [RFC container v3 32/34] api: add early check against restoring privileged container from external source

2024-11-07 Thread Fiona Ebner
While restore_external_archive() already has a check, that happens
after an existing container is destroyed.

Signed-off-by: Fiona Ebner 
---

New in v3.

 src/PVE/API2/LXC.pm | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/PVE/API2/LXC.pm b/src/PVE/API2/LXC.pm
index 213e518..dca0e35 100644
--- a/src/PVE/API2/LXC.pm
+++ b/src/PVE/API2/LXC.pm
@@ -39,6 +39,17 @@ BEGIN {
 }
 }
 
+my sub assert_not_restore_from_external {
+my ($archive, $storage_cfg) = @_;
+
+my ($storeid, undef) = PVE::Storage::parse_volume_id($archive, 1);
+
+return if !defined($storeid);
+return if !PVE::Storage::storage_has_feature($storage_cfg, $storeid, 
'backup-provider');
+
+die "refusing to restore privileged container backup from external 
source\n";
+}
+
 my $check_storage_access_migrate = sub {
 my ($rpcenv, $authuser, $storecfg, $storage, $node) = @_;
 
@@ -408,6 +419,9 @@ __PACKAGE__->register_method({
$conf->{unprivileged} = $orig_conf->{unprivileged}
if !defined($unprivileged) && 
defined($orig_conf->{unprivileged});
 
+   assert_not_restore_from_external($archive, $storage_cfg)
+   if !$conf->{unprivileged};
+
# implicit privileged change is checked here
if ($old_conf->{unprivileged} && 
!$conf->{unprivileged}) {
$rpcenv->check_vm_perm($authuser, $vmid, $pool, 
['VM.Allocate']);
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC qemu v3 08/34] PVE backup: implement backup access setup and teardown API for external providers

2024-11-07 Thread Fiona Ebner
For external backup providers, the state of the VM's disk images at
the time the backup is started is preserved via a snapshot-access
block node. Old data is moved to the fleecing image when new guest
writes come in. The snapshot-access block node, as well as the
associated bitmap in case of incremental backup, will be exported via
NBD to the external provider. The NBD export will be done by the
management layer, the missing functionality is setting up and tearing
down the snapshot-access block nodes, which this patch adds.

It is necessary to also set up fleecing for EFI and TPM disks, so that
old data can be moved out of the way when a new guest write comes in.

There can only be one regular backup or one active backup access at
a time, because both require replacing the original block node of the
drive. Thus the backup state is re-used, and checks are added to
prohibit regular backup while snapshot access is active and vice
versa.

The block nodes added by the backup-access-setup QMP call are not
tracked anywhere else (there is no job they are associated to like for
regular backup). This requires adding a callback for teardown when
QEMU exits, i.e. in qemu_cleanup(). Otherwise, there will be an
assertion failure that the block graph is not empty when QEMU exits
before the backup-access-teardown QMP command is called.

The code for the qmp_backup_access_setup() was based on the existing
qmp_backup() routine.

The return value for the setup QMP command contains information about
the snapshot-access block nodes that can be used by the management
layer to set up the NBD exports.

Signed-off-by: Fiona Ebner 
---

No changes in v3.

 pve-backup.c | 273 +++
 pve-backup.h |  16 +++
 qapi/block-core.json |  45 +++
 system/runstate.c|   6 +
 4 files changed, 340 insertions(+)
 create mode 100644 pve-backup.h

diff --git a/pve-backup.c b/pve-backup.c
index d0593fc581..d3370d6744 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -1,4 +1,5 @@
 #include "proxmox-backup-client.h"
+#include "pve-backup.h"
 #include "vma.h"
 
 #include "qemu/osdep.h"
@@ -585,6 +586,37 @@ static int setup_snapshot_access(PVEBackupDevInfo *di, 
Error **errp)
 return 0;
 }
 
+static void setup_all_snapshot_access_bh(void *opaque)
+{
+assert(!qemu_in_coroutine());
+
+CoCtxData *data = (CoCtxData*)opaque;
+Error **errp = (Error**)data->data;
+
+Error *local_err = NULL;
+
+GList *l =  backup_state.di_list;
+while (l) {
+PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+l = g_list_next(l);
+
+bdrv_drained_begin(di->bs);
+
+if (setup_snapshot_access(di, &local_err) < 0) {
+cleanup_snapshot_access(di);
+bdrv_drained_end(di->bs);
+error_setg(errp, "%s - setting up snapshot access failed: %s", 
di->device_name,
+   local_err ? error_get_pretty(local_err) : "unknown 
error");
+break;
+}
+
+bdrv_drained_end(di->bs);
+}
+
+/* return */
+aio_co_enter(data->ctx, data->co);
+}
+
 /*
  * backup_job_create can *not* be run from a coroutine, so this can't either.
  * The caller is responsible that backup_mutex is held nonetheless.
@@ -722,6 +754,11 @@ static bool fleecing_no_efi_tpm(const char *device_id)
 return strncmp(device_id, "drive-efidisk", 13) && strncmp(device_id, 
"drive-tpmstate", 14);
 }
 
+static bool fleecing_all(const char *device_id)
+{
+return true;
+}
+
 /*
  * Returns a list of device infos, which needs to be freed by the caller. In
  * case of an error, errp will be set, but the returned value might still be a
@@ -810,6 +847,242 @@ err:
 return di_list;
 }
 
+BackupAccessInfoList *coroutine_fn qmp_backup_access_setup(
+const char *target_id,
+const char *devlist,
+Error **errp)
+{
+assert(qemu_in_coroutine());
+
+qemu_co_mutex_lock(&backup_state.backup_mutex);
+
+Error *local_err = NULL;
+GList *di_list = NULL;
+GList *l;
+
+if (backup_state.di_list) {
+error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+  "previous backup by provider '%s' not finished", 
backup_state.target_id);
+qemu_co_mutex_unlock(&backup_state.backup_mutex);
+return NULL;
+}
+
+bdrv_graph_co_rdlock();
+di_list = get_device_info(devlist, fleecing_all, &local_err);
+bdrv_graph_co_rdunlock();
+if (local_err) {
+error_propagate(errp, local_err);
+goto err;
+}
+assert(di_list);
+
+size_t total = 0;
+
+l = di_list;
+while (l) {
+PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+l = g_list_next(l);
+
+ssize_t size = bdrv_getlength(di->bs);
+if (size < 0) {
+error_setg_errno(errp, -size, "bdr

[pve-devel] [RFC container v3 28/34] backup: implement restore for external providers

2024-11-07 Thread Fiona Ebner
First, the provider is asked about what restore mechanism to use.
Currently, 'directory' and 'tar' are possible, for restoring either
from a directory containing the full filesystem structure (for which
rsync is used) or a potentially compressed tar file containing the
same.

The new functions are copied and adapted from the existing ones for
PBS or tar and it might be worth to factor out the common parts.

Restore of containers as privileged are prohibited, because the
archives from an external provider are considered less trusted than
from Proxmox VE storages. If ever allowing that in the future, at
least it would be worth extracting the tar archive in a restricted
context (e.g. user namespace with ID mapped mount or seccomp).

Signed-off-by: Fiona Ebner 
---

Changes in v3:
* Use user namespace when restoring directory (and use tar instead of
  rsync, because it is easier to split in privileged and unprivileged
  half)

 src/PVE/LXC/Create.pm | 141 ++
 1 file changed, 141 insertions(+)

diff --git a/src/PVE/LXC/Create.pm b/src/PVE/LXC/Create.pm
index 8c8cb9a..8657ac1 100644
--- a/src/PVE/LXC/Create.pm
+++ b/src/PVE/LXC/Create.pm
@@ -7,6 +7,7 @@ use File::Path;
 use Fcntl;
 
 use PVE::RPCEnvironment;
+use PVE::RESTEnvironment qw(log_warn);
 use PVE::Storage::PBSPlugin;
 use PVE::Storage::Plugin;
 use PVE::Storage;
@@ -26,6 +27,24 @@ sub restore_archive {
if ($scfg->{type} eq 'pbs') {
return restore_proxmox_backup_archive($storage_cfg, $archive, 
$rootdir, $conf, $no_unpack_error, $bwlimit);
}
+   if (PVE::Storage::storage_has_feature($storage_cfg, $storeid, 
'backup-provider')) {
+   my $log_function = sub {
+   my ($log_level, $message) = @_;
+   my $prefix = $log_level eq 'err' ? 'ERROR' : uc($log_level);
+   print "$prefix: $message\n";
+   };
+   my $backup_provider =
+   PVE::Storage::new_backup_provider($storage_cfg, $storeid, 
$log_function);
+   return restore_external_archive(
+   $backup_provider,
+   $storeid,
+   $volname,
+   $rootdir,
+   $conf,
+   $no_unpack_error,
+   $bwlimit,
+   );
+   }
 }
 
 $archive = PVE::Storage::abs_filesystem_path($storage_cfg, $archive) if 
$archive ne '-';
@@ -127,6 +146,54 @@ sub restore_tar_archive {
 die $err if $err && !$no_unpack_error;
 }
 
+sub restore_external_archive {
+my ($backup_provider, $storeid, $volname, $rootdir, $conf, 
$no_unpack_error, $bwlimit) = @_;
+
+die "refusing to restore privileged container backup from external 
source\n"
+   if !$conf->{unprivileged};
+
+my ($mechanism, $vmtype) = 
$backup_provider->restore_get_mechanism($volname, $storeid);
+die "cannot restore non-LXC guest of type '$vmtype'\n" if $vmtype ne 'lxc';
+
+my $info = $backup_provider->restore_container_init($volname, $storeid, 
{});
+eval {
+   if ($mechanism eq 'tar') {
+   my $tar_path = $info->{'tar-path'}
+   or die "did not get path to tar file from backup provider\n";
+   die "not a regular file '$tar_path'" if !-f $tar_path;
+   restore_tar_archive($tar_path, $rootdir, $conf, $no_unpack_error, 
$bwlimit);
+   } elsif ($mechanism eq 'directory') {
+   my $directory = $info->{'archive-directory'}
+   or die "did not get path to archive directory from backup 
provider\n";
+   die "not a directory '$directory'" if !-d $directory;
+
+   my $create_cmd = [
+   'tar',
+   'cpf',
+   '-',
+   @PVE::Storage::Plugin::COMMON_TAR_FLAGS,
+   "--directory=$directory",
+   '.',
+   ];
+
+   my $extract_cmd = restore_tar_archive_command($conf, undef, 
$rootdir, $bwlimit);
+
+   eval { PVE::Tools::run_command([$create_cmd, $extract_cmd]); };
+   die $@ if $@ && !$no_unpack_error;
+   } else {
+   die "mechanism '$mechanism' requested by backup provider is not 
supported for LXCs\n";
+   }
+};
+my $err = $@;
+eval { $backup_provider->restore_container_cleanup($volname, $storeid, 
{}); };
+if (my $cleanup_err = $@) {
+   die $cleanup_err if !$err;
+   warn $cleanup_err;
+}
+die $err if $err;
+
+}
+
 sub recover_config {
 my ($storage_cfg, $volid, $vmid) = @_;
 
@@ -135,6 +202,8 @@ sub recover_config {
my $scfg = PVE::Storage::storage_check_enabled($storage_cfg, $storeid);
if ($scfg->{type} eq 'pbs') {
  

[pve-devel] [PATCH qemu-server v3 18/34] backup: cleanup: check if VM is running before issuing QMP commands

2024-11-07 Thread Fiona Ebner
When the VM is only started for backup, the VM will be stopped at that
point again. While the detach helpers do not warn about errors
currently, that might change in the future. This is also in
preparation for other cleanup QMP helpers that are more verbose about
failure.

Signed-off-by: Fiona Ebner 
---

No changes in v3.

 PVE/VZDump/QemuServer.pm | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index b2ced154..c46e607c 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -1118,13 +1118,14 @@ sub snapshot {
 sub cleanup {
 my ($self, $task, $vmid) = @_;
 
-$detach_tpmstate_drive->($task, $vmid);
-
-if ($task->{'use-fleecing'}) {
-   detach_fleecing_images($task->{disks}, $vmid);
-   cleanup_fleecing_images($self, $task->{disks});
+# If VM was started only for backup, it is already stopped now.
+if (PVE::QemuServer::Helpers::vm_running_locally($vmid)) {
+   $detach_tpmstate_drive->($task, $vmid);
+   detach_fleecing_images($task->{disks}, $vmid) if 
$task->{'use-fleecing'};
 }
 
+cleanup_fleecing_images($self, $task->{disks}) if $task->{'use-fleecing'};
+
 if ($self->{qmeventd_fh}) {
close($self->{qmeventd_fh});
 }
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC storage v3 11/34] add storage_has_feature() helper function

2024-11-07 Thread Fiona Ebner
Which looks up whether a storage supports a given feature in its
'plugindata'. This is intentionally kept simple and not implemented
as a plugin method for now. Should it ever become more complex
requiring plugins to override the default implementation, it can
later be changed to a method.

Suggested-by: Fabian Grünbichler 
Signed-off-by: Fiona Ebner 
---

New in v3.

 src/PVE/Storage.pm|  8 
 src/PVE/Storage/Plugin.pm | 10 ++
 2 files changed, 18 insertions(+)

diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm
index 57b2038..e251056 100755
--- a/src/PVE/Storage.pm
+++ b/src/PVE/Storage.pm
@@ -204,6 +204,14 @@ sub storage_check_enabled {
 return storage_check_node($cfg, $storeid, $node, $noerr);
 }
 
+sub storage_has_feature {
+my ($cfg, $storeid, $feature) = @_;
+
+my $scfg = storage_config($cfg, $storeid);
+
+return PVE::Storage::Plugin::storage_has_feature($scfg->{type}, $feature);
+}
+
 # storage_can_replicate:
 # return true if storage supports replication
 # (volumes allocated with vdisk_alloc() has replication feature)
diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
index 8cc693c..6071e45 100644
--- a/src/PVE/Storage/Plugin.pm
+++ b/src/PVE/Storage/Plugin.pm
@@ -244,6 +244,16 @@ sub dirs_hash_to_string {
 return join(',', map { "$_=$hash->{$_}" } sort keys %$hash);
 }
 
+sub storage_has_feature {
+my ($type, $feature) = @_;
+
+my $data = $defaultData->{plugindata}->{$type};
+if (my $features = $data->{features}) {
+   return $features->{$feature};
+}
+return;
+}
+
 sub default_format {
 my ($scfg) = @_;
 
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [RFC container v3 26/34] backup: implement backup for external providers

2024-11-07 Thread Fiona Ebner
The filesystem structure is made available as a directory in a
consistent manner (with details depending on the vzdump backup mode)
just like for regular backup via tar.

The backup_container() method of the backup provider is executed in
a user namespace with the container's ID mapping applied. This allows
the backup provider to see the container's filesystem from the
container's perspective.

The 'prepare' phase of the backup hook is executed right before and
allows the backup provider to prepare for the (usually) unprivileged
execution context in the user namespace.

The backup provider needs to back up the guest and firewall
configuration and the filesystem structure of the container, honoring
file exclusions and the bandwidth limit.

Signed-off-by: Fiona Ebner 
---

Changes in v3:
* pass in config as raw data instead of file name
* run backup_container() method in user namespace associated to the
  container
* warn if backing up privileged container to external provider

 src/PVE/VZDump/LXC.pm | 38 +-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/src/PVE/VZDump/LXC.pm b/src/PVE/VZDump/LXC.pm
index 1928548..d201d8a 100644
--- a/src/PVE/VZDump/LXC.pm
+++ b/src/PVE/VZDump/LXC.pm
@@ -14,6 +14,7 @@ use PVE::LXC::Config;
 use PVE::LXC;
 use PVE::Storage;
 use PVE::Tools;
+use PVE::Env;
 use PVE::VZDump;
 
 use base qw (PVE::VZDump::Plugin);
@@ -124,6 +125,7 @@ sub prepare {
 
 my ($id_map, $root_uid, $root_gid) = PVE::LXC::parse_id_maps($conf);
 $task->{userns_cmd} = PVE::LXC::userns_command($id_map);
+$task->{id_map} = $id_map;
 $task->{root_uid} = $root_uid;
 $task->{root_gid} = $root_gid;
 
@@ -373,7 +375,41 @@ sub archive {
 my $userns_cmd = $task->{userns_cmd};
 my $findexcl = $self->{vzdump}->{findexcl};
 
-if ($self->{vzdump}->{opts}->{pbs}) {
+if (my $backup_provider = $self->{vzdump}->{'backup-provider'}) {
+   $self->loginfo("starting external backup via " . 
$backup_provider->provider_name());
+
+   if (!scalar($task->{id_map}->@*) || $task->{root_uid} == 0 || 
$task->{root_gid} == 0) {
+   $self->log("warn", "external backup of privileged container can 
only be restored as"
+   ." unprivileged which might not work in all cases");
+   }
+
+   my ($mechanism) = $backup_provider->backup_get_mechanism($vmid, 'lxc');
+   die "mechanism '$mechanism' requested by backup provider is not 
supported for containers\n"
+   if $mechanism ne 'directory';
+
+   my $guest_config = 
PVE::Tools::file_get_contents("$tmpdir/etc/vzdump/pct.conf");
+   my $firewall_file = "$tmpdir/etc/vzdump/pct.fw";
+
+   my $info = {
+   directory => $snapdir,
+   sources => [@sources],
+   'backup-user-id' => $task->{root_uid},
+   };
+   $info->{'firewall-config'} = 
PVE::Tools::file_get_contents($firewall_file)
+   if -e $firewall_file;
+   $info->{'bandwidth-limit'} = $opts->{bwlimit} * 1024 if 
$opts->{bwlimit};
+
+   $backup_provider->backup_hook('prepare', $vmid, 'lxc', $info);
+
+   if (scalar($task->{id_map}->@*)) {
+   PVE::Env::run_in_userns(
+   sub { $backup_provider->backup_container($vmid, $guest_config, 
$findexcl, $info); },
+   $task->{id_map},
+   );
+   } else {
+   $backup_provider->backup_container($vmid, $guest_config, $findexcl, 
$info);
+   }
+} elsif ($self->{vzdump}->{opts}->{pbs}) {
 
my $param = [];
push @$param, "pct.conf:$tmpdir/etc/vzdump/pct.conf";
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC container v3 31/34] restore tar archive: check potentially untrusted archive

2024-11-07 Thread Fiona Ebner
'tar' itself already protects against '..' in component names and
strips absolute member names when extracting (if not used with the
--absolute-names option) and in general seems sane for extracting.
Additionally, the extraction already happens in the user namespace
associated to the container. So for now, start out with some basic
sanity checks. The helper can still be extended with more checks.

Checks:

* list files in archive - will already catch many corrupted/bogus
  archives.

* check that there are at least 10 members - should also catch
  archives not actually containing a container root filesystem or
  structural issues early.

* check that /sbin directory or link exists in archive - ideally the
  check would be for /sbin/init, but this cannot be done efficiently
  before extraction, because it would require to keep track of the
  whole archive to be able to follow symlinks.

* abort if there is a multi-volume member in the archive - cheap and
  is never expected.

Checks that were considered, but not (yet) added:

* abort when a file has unrealistically large size - while this could
  help to detect certain kinds of bogus archives, there can be valid.
  use cases for extremely large sparse files, so it's not clear what
  a good limit would be (1 EiB maybe?). Also, an attacker could just
  adapt to such a limit creating multiple files and the actual
  extraction is already limited by the size of the allocated container
  volume.

* check that /sbin/init exists after extracting - cannot be done
  efficiently before extraction, because it would require to keep
  track of the whole archive to be able to follow symlinks. During
  setup there already is detection of /etc/os-release, so issues with
  the structure will already be noticed. Adding a hard fail for
  untrusted archives would require either passing that information to
  the setup phase or extracting the protected_call method from there
  into a helper.

* adding 'restrict' to the (common) tar flags - the tar manual (not
  the man page) documents: "Disable use of some potentially harmful
  'tar' options.  Currently this option disables shell invocation from
  multi-volume menu.". The flag was introduced in 2005 and this is
  still the only thing it is used for. Trying to restore a
  multi-volume archive already fails without giving multiple '--file'
  arguments and '--multi-volume', so don't bother adding the flag.

* check format of tar file - would require yet another invocation of
  the decompressor and there seems to be no built-in way to just
  display the format with 'tar'. The 'file' program could be used, but
  it seems to not make a distinction between old GNU and GNU and old
  POSIX and POSIX formats, with the old ones being candidates to
  prohibit. So that would leave just detecting the old 'v7' format.

Suggested-by: Fabian Grünbichler 
Signed-off-by: Fiona Ebner 
---

New in v3.

 src/PVE/LXC/Create.pm | 67 ---
 1 file changed, 63 insertions(+), 4 deletions(-)

diff --git a/src/PVE/LXC/Create.pm b/src/PVE/LXC/Create.pm
index d2f675e..bf424f6 100644
--- a/src/PVE/LXC/Create.pm
+++ b/src/PVE/LXC/Create.pm
@@ -99,12 +99,65 @@ my sub tar_compression_option {
 }
 }
 
+# Basic checks trying to detect issues with a potentially untrusted or bogus 
tar archive.
+# Just listing the files is already a good check against corruption.
+# 'tar' itself already protects against '..' in component names and strips 
absolute member names
+# when extracting, so no need to check for those here.
+my sub check_tar_archive {
+my ($archive) = @_;
+
+print "checking archive..\n";
+
+# To resolve links to get to 'sbin/init' would mean keeping track of 
everything in the archive,
+# because the target might be ordered first. Check only that 'sbin' exists 
here.
+my $found_sbin;
+
+# Just to detect bogus archives, any valid container filesystem should 
have more than this.
+my $required_members = 10;
+my $member_count = 0;
+
+my $check_file_list = sub {
+   my ($line) = @_;
+
+   # The date is in ISO 8601 format. The last part contains the 
potentially quoted file name,
+   # potentially followed by some additional info (e.g. where a link 
points to).
+   my ($type, $perms, $uid, $gid, $size, $date, $time, $file_info) =
+   $line =~ 
m!^([a-zA-Z\-])(\S+)\s+(\d+)/(\d+)\s+(\d+)\s+(\S+)\s+(\S+)\s+(.*)$!;
+
+   die "found multi-volume member in archive\n" if $type eq 'M';
+
+   if (!$found_sbin && (
+   ($file_info =~ m!^(?:\./)?sbin/$! && $type eq 'd')
+   || ($file_info =~ m!^(?:\./)?sbin ->! && $type eq 'l')
+   || ($file_info =~ m!^(?:\./)?sbin link to! && $type eq 'h')
+

[pve-devel] [PATCH manager v3 33/34] ui: backup: also check for backup subtype to classify archive

2024-11-07 Thread Fiona Ebner
In anticipation of future storage plugins that might not have
PBS-specific formats or adhere to the vzdump naming scheme for
backups.

Signed-off-by: Fiona Ebner 
---

No changes in v3.

 www/manager6/Utils.js  | 10 ++
 www/manager6/grid/BackupView.js|  4 ++--
 www/manager6/storage/BackupView.js |  4 ++--
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/www/manager6/Utils.js b/www/manager6/Utils.js
index db86fa9a..a8e4e8ee 100644
--- a/www/manager6/Utils.js
+++ b/www/manager6/Utils.js
@@ -693,12 +693,14 @@ Ext.define('PVE.Utils', {
'snippets': gettext('Snippets'),
 },
 
-volume_is_qemu_backup: function(volid, format) {
-   return format === 'pbs-vm' || volid.match(':backup/vzdump-qemu-');
+volume_is_qemu_backup: function(volume) {
+   return volume.format === 'pbs-vm' || 
volume.volid.match(':backup/vzdump-qemu-') ||
+   volume.subtype === 'qemu';
 },
 
-volume_is_lxc_backup: function(volid, format) {
-   return format === 'pbs-ct' || 
volid.match(':backup/vzdump-(lxc|openvz)-');
+volume_is_lxc_backup: function(volume) {
+   return volume.format === 'pbs-ct' || 
volume.volid.match(':backup/vzdump-(lxc|openvz)-') ||
+   volume.subtype === 'lxc';
 },
 
 authSchema: {
diff --git a/www/manager6/grid/BackupView.js b/www/manager6/grid/BackupView.js
index e71d1c88..ef3649c6 100644
--- a/www/manager6/grid/BackupView.js
+++ b/www/manager6/grid/BackupView.js
@@ -29,11 +29,11 @@ Ext.define('PVE.grid.BackupView', {
var vmtypeFilter;
if (vmtype === 'lxc' || vmtype === 'openvz') {
vmtypeFilter = function(item) {
-   return PVE.Utils.volume_is_lxc_backup(item.data.volid, 
item.data.format);
+   return PVE.Utils.volume_is_lxc_backup(item.data);
};
} else if (vmtype === 'qemu') {
vmtypeFilter = function(item) {
-   return PVE.Utils.volume_is_qemu_backup(item.data.volid, 
item.data.format);
+   return PVE.Utils.volume_is_qemu_backup(item.data);
};
} else {
throw "unsupported VM type '" + vmtype + "'";
diff --git a/www/manager6/storage/BackupView.js 
b/www/manager6/storage/BackupView.js
index db184def..749c2136 100644
--- a/www/manager6/storage/BackupView.js
+++ b/www/manager6/storage/BackupView.js
@@ -86,9 +86,9 @@ Ext.define('PVE.storage.BackupView', {
disabled: true,
handler: function(b, e, rec) {
let vmtype;
-   if (PVE.Utils.volume_is_qemu_backup(rec.data.volid, 
rec.data.format)) {
+   if (PVE.Utils.volume_is_qemu_backup(rec.data)) {
vmtype = 'qemu';
-   } else if (PVE.Utils.volume_is_lxc_backup(rec.data.volid, 
rec.data.format)) {
+   } else if (PVE.Utils.volume_is_lxc_backup(rec.data)) {
vmtype = 'lxc';
} else {
return;
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC container v3 27/34] create: factor out tar restore command helper

2024-11-07 Thread Fiona Ebner
In preparation to re-use it for restore from backup providers.

Signed-off-by: Fiona Ebner 
---

New in v3.

 src/PVE/LXC/Create.pm | 42 +-
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/src/PVE/LXC/Create.pm b/src/PVE/LXC/Create.pm
index 7c5bf0a..8c8cb9a 100644
--- a/src/PVE/LXC/Create.pm
+++ b/src/PVE/LXC/Create.pm
@@ -59,12 +59,34 @@ sub restore_proxmox_backup_archive {
$scfg, $storeid, $cmd, $param, userns_cmd => $userns_cmd);
 }
 
-sub restore_tar_archive {
-my ($archive, $rootdir, $conf, $no_unpack_error, $bwlimit) = @_;
+my sub restore_tar_archive_command {
+my ($conf, $opts, $rootdir, $bwlimit) = @_;
 
 my ($id_map, $root_uid, $root_gid) = PVE::LXC::parse_id_maps($conf);
 my $userns_cmd = PVE::LXC::userns_command($id_map);
 
+my $cmd = [@$userns_cmd, 'tar', 'xpf', '-', $opts->@*, '--totals',
+   @PVE::Storage::Plugin::COMMON_TAR_FLAGS,
+   '-C', $rootdir];
+
+# skip-old-files doesn't have anything to do with time (old/new), but is
+# simply -k (annoyingly also called --keep-old-files) without the 'treat
+# existing files as errors' part... iow. it's bsdtar's interpretation of -k
+# *sigh*, gnu...
+push @$cmd, '--skip-old-files';
+push @$cmd, '--anchored';
+push @$cmd, '--exclude' , './dev/*';
+
+if (defined($bwlimit)) {
+   $cmd = [ ['cstream', '-t', $bwlimit*1024], $cmd ];
+}
+
+return $cmd;
+}
+
+sub restore_tar_archive {
+my ($archive, $rootdir, $conf, $no_unpack_error, $bwlimit) = @_;
+
 my $archive_fh;
 my $tar_input = '<&STDIN';
 my @compression_opt;
@@ -92,21 +114,7 @@ sub restore_tar_archive {
$tar_input = '<&'.fileno($archive_fh);
 }
 
-my $cmd = [@$userns_cmd, 'tar', 'xpf', '-', @compression_opt, '--totals',
-   @PVE::Storage::Plugin::COMMON_TAR_FLAGS,
-   '-C', $rootdir];
-
-# skip-old-files doesn't have anything to do with time (old/new), but is
-# simply -k (annoyingly also called --keep-old-files) without the 'treat
-# existing files as errors' part... iow. it's bsdtar's interpretation of -k
-# *sigh*, gnu...
-push @$cmd, '--skip-old-files';
-push @$cmd, '--anchored';
-push @$cmd, '--exclude' , './dev/*';
-
-if (defined($bwlimit)) {
-   $cmd = [ ['cstream', '-t', $bwlimit*1024], $cmd ];
-}
+my $cmd = restore_tar_archive_command($conf, [@compression_opt], $rootdir, 
$bwlimit);
 
 if ($archive eq '-') {
print "extracting archive from STDIN\n";
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [POC storage v3 14/34] add backup provider example

2024-11-07 Thread Fiona Ebner
The example uses a simple directory structure to save the backups,
grouped by guest ID. VM backups are saved as configuration files and
qcow2 images, with backing files when doing incremental backups.
Container backups are saved as configuration files and a tar file or
squashfs image (added to test the 'directory' restore mechanism).

Whether to use incremental VM backups and which backup mechanisms to
use can be configured in the storage configuration.

The 'nbdinfo' binary from the 'libnbd-bin' package is required for
backup mechanism 'nbd' for VM backups, the 'mksquashfs' binary from the
'squashfs-tools' package is required for backup mechanism 'squashfs' for
containers.

Signed-off-by: Fiona Ebner 
---

Changes in v3:
* adapt to API changes
* use NBD export when restoring VM image, to make incremental backups
  using qcow2 chains work again

 .../BackupProvider/Plugin/DirectoryExample.pm | 697 ++
 src/PVE/BackupProvider/Plugin/Makefile|   2 +-
 .../Custom/BackupProviderDirExamplePlugin.pm  | 307 
 src/PVE/Storage/Custom/Makefile   |   5 +
 src/PVE/Storage/Makefile  |   1 +
 5 files changed, 1011 insertions(+), 1 deletion(-)
 create mode 100644 src/PVE/BackupProvider/Plugin/DirectoryExample.pm
 create mode 100644 src/PVE/Storage/Custom/BackupProviderDirExamplePlugin.pm
 create mode 100644 src/PVE/Storage/Custom/Makefile

diff --git a/src/PVE/BackupProvider/Plugin/DirectoryExample.pm 
b/src/PVE/BackupProvider/Plugin/DirectoryExample.pm
new file mode 100644
index 000..99825ef
--- /dev/null
+++ b/src/PVE/BackupProvider/Plugin/DirectoryExample.pm
@@ -0,0 +1,697 @@
+package PVE::BackupProvider::Plugin::DirectoryExample;
+
+use strict;
+use warnings;
+
+use Fcntl qw(SEEK_SET);
+use File::Path qw(make_path remove_tree);
+use IO::File;
+use IPC::Open3;
+
+use PVE::Storage::Plugin;
+use PVE::Tools qw(file_get_contents file_read_firstline file_set_contents 
run_command);
+
+use base qw(PVE::BackupProvider::Plugin::Base);
+
+use constant {
+BLKDISCARD => 0x1277, # see linux/fs.h
+};
+
+# Private helpers
+
+my sub log_info {
+my ($self, $message) = @_;
+
+$self->{'log-function'}->('info', $message);
+}
+
+my sub log_warning {
+my ($self, $message) = @_;
+
+$self->{'log-function'}->('warn', $message);
+}
+
+my sub log_error {
+my ($self, $message) = @_;
+
+$self->{'log-function'}->('err', $message);
+}
+
+# Try to use the same bitmap ID as last time for incremental backup if the 
storage is configured for
+# incremental VM backup. Need to start fresh if there is no previous ID or the 
associated backup
+# doesn't exist.
+my sub get_bitmap_id {
+my ($self, $vmid, $vmtype) = @_;
+
+return if $self->{'storage-plugin'}->get_vm_backup_mode($self->{scfg}) ne 
'incremental';
+
+my $previous_info_dir = "$self->{scfg}->{path}/$vmid/";
+
+my $previous_info_file = "$previous_info_dir/previous-info";
+my $info = file_read_firstline($previous_info_file) // '';
+$self->{$vmid}->{'old-previous-info'} = $info;
+my ($bitmap_id, $previous_backup_id) = $info =~ m/^(\d+)\s+(\d+)$/;
+my $previous_backup_dir =
+   $previous_backup_id ? 
"$self->{scfg}->{path}/$vmid/$vmtype-$previous_backup_id" : undef;
+
+if ($bitmap_id && -d $previous_backup_dir) {
+   $self->{$vmid}->{'previous-backup-dir'} = $previous_backup_dir;
+} else {
+   # need to start fresh if there is no previous ID or the associated 
backup doesn't exist
+   $bitmap_id = $self->{$vmid}->{'backup-time'};
+}
+
+$self->{$vmid}->{'bitmap-id'} = $bitmap_id;
+make_path($previous_info_dir);
+die "unable to create directory $previous_info_dir\n" if !-d 
$previous_info_dir;
+file_set_contents($previous_info_file, "$bitmap_id 
$self->{$vmid}->{'backup-time'}");
+
+return $bitmap_id;
+}
+
+# Backup Provider API
+
+sub new {
+my ($class, $storage_plugin, $scfg, $storeid, $log_function) = @_;
+
+my $self = bless {
+   scfg => $scfg,
+   storeid => $storeid,
+   'storage-plugin' => $storage_plugin,
+   'log-function' => $log_function,
+}, $class;
+
+return $self;
+}
+
+sub provider_name {
+my ($self) = @_;
+
+return 'dir provider example';
+}
+
+# Hooks
+
+my sub job_start {
+my ($self, $start_time) = @_;
+
+log_info($self, "job start hook called");
+
+run_command(["modprobe", "nbd"]);
+
+log_info($self, "backup provider initialized successfully for new job 
$start_time");
+}
+
+sub job_hook {
+my ($self, $phase, $info) = @_;
+
+  

[pve-devel] [RFC qemu v3 09/34] PVE backup: implement bitmap support for external backup access

2024-11-07 Thread Fiona Ebner
There can be one dirty bitmap for each backup target ID (which are
tracked in the backup_access_bitmaps hash table). The QMP user can
specify the ID of the bitmap it likes to use. This ID is then compared
to the current one for the given target. If they match, the bitmap is
re-used (should it still exist on the drive, otherwise re-created). If
there is a mismatch, the old bitmap is removed and a new one is
created.

The return value of the QMP command includes information about what
bitmap action was taken. Similar to what the query-backup QMP command
returns for regular backup. It also includes the bitmap name and
associated block node, so the management layer can then set up an NBD
export with the bitmap.

While the backup access is active, a background bitmap is also
required. This is necessary to implement bitmap handling according to
the original reference [0]. In particular:

- in the error case, new writes since the backup access was set up are
  in the background bitmap. Because of failure, the previously tracked
  writes from the backup access bitmap are still required too. Thus,
  the bitmap is merged with the background bitmap to get all new
  writes since the last backup.

- in the success case, continue tracking for the next incremental
  backup in the backup access bitmap. New writes since the backup
  access was set up are in the background bitmap. Because the backup
  was successfully, clear the backup access bitmap and merge back the
  background bitmap to get only the new writes.

Since QEMU cannot know if the backup was successful or not (except if
failure already happens during the setup QMP command), the management
layer needs to tell it via the teardown QMP command.

The bitmap action is also recorded in the device info now.

[0]: 
https://lore.kernel.org/qemu-devel/b68833dd-8864-4d72-7c61-c134a9835...@ya.ru/

Signed-off-by: Fiona Ebner 
---

No changes in v3.

 pve-backup.c | 175 ++-
 pve-backup.h |   2 +-
 qapi/block-core.json |  22 +-
 system/runstate.c|   2 +-
 4 files changed, 193 insertions(+), 8 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index d3370d6744..5f8dd396d5 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -15,6 +15,7 @@
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/cutils.h"
+#include "qemu/error-report.h"
 
 #if defined(CONFIG_MALLOC_TRIM)
 #include 
@@ -41,6 +42,7 @@
  */
 
 const char *PBS_BITMAP_NAME = "pbs-incremental-dirty-bitmap";
+const char *BACKGROUND_BITMAP_NAME = "backup-access-background-bitmap";
 
 static struct PVEBackupState {
 struct {
@@ -72,6 +74,7 @@ static struct PVEBackupState {
 CoMutex backup_mutex;
 CoMutex dump_callback_mutex;
 char *target_id;
+GHashTable *backup_access_bitmaps; // key=target_id, value=bitmap_name
 } backup_state;
 
 static void pvebackup_init(void)
@@ -99,6 +102,8 @@ typedef struct PVEBackupDevInfo {
 char* device_name;
 int completed_ret; // INT_MAX if not completed
 BdrvDirtyBitmap *bitmap;
+BdrvDirtyBitmap *background_bitmap; // used for external backup access
+PBSBitmapAction bitmap_action;
 BlockDriverState *target;
 BlockJob *job;
 } PVEBackupDevInfo;
@@ -362,6 +367,67 @@ static void coroutine_fn pvebackup_co_complete_stream(void 
*opaque)
 qemu_co_mutex_unlock(&backup_state.backup_mutex);
 }
 
+/*
+ * New writes since the backup access was set up are in the background bitmap. 
Because of failure,
+ * the previously tracked writes in di->bitmap are still required too. Thus, 
merge with the
+ * background bitmap to get all new writes since the last backup.
+ */
+static void handle_backup_access_bitmaps_in_error_case(PVEBackupDevInfo *di)
+{
+Error *local_err = NULL;
+
+if (di->bs && di->background_bitmap) {
+bdrv_drained_begin(di->bs);
+if (di->bitmap) {
+bdrv_enable_dirty_bitmap(di->bitmap);
+if (!bdrv_merge_dirty_bitmap(di->bitmap, di->background_bitmap, 
NULL, &local_err)) {
+warn_report("backup access: %s - could not merge bitmaps in 
error path - %s",
+di->device_name,
+local_err ? error_get_pretty(local_err) : "unknown 
error");
+/*
+ * Could not merge, drop original bitmap too.
+ */
+bdrv_release_dirty_bitmap(di->bitmap);
+}
+} else {
+warn_report("backup access: %s - expected bitmap not present", 
di->device_name);
+}
+bdrv_release_dirty_bitmap(di->background_bitmap);
+bdrv_drained_end(di->bs);
+}
+}
+
+/*
+ * Continue tracking for next incremental backup in di->bitmap. New writes 
since the backup access
+ * was set up are in the background

[pve-devel] [POC storage v3 15/34] WIP Borg plugin

2024-11-07 Thread Fiona Ebner
Archive names start with the guest type and ID and then the same
timestamp format as PBS.

Container archives have the following structure:
guest.config
firewall.config
filesystem/ # containing the whole filesystem structure

VM archives have the following structure
guest.config
firewall.config
volumes/ # containing a raw file for each device

A bindmount (respectively symlinks) are used to achieve this
structure, because Borg doesn't seem to support renaming on-the-fly.
(Prefix stripping via the "slashdot hack" would have helped slightly,
but is only in Borg >= 1.4
https://github.com/borgbackup/borg/actions/runs/7967940995)

NOTE: Bandwidth limit is not yet honored and the task size is not
calculated yet. Discard for VM backups would also be nice to have, but
it's not entirely clear how (parsing progress and discarding according
to that is one idea). There is no dirty bitmap support, not sure if
that is feasible to add.

Signed-off-by: Fiona Ebner 
---

Changes in v3:
* make SSH work.
* adapt to API changes, i.e. config as raw data and user namespace
  execution context for containers.

 src/PVE/API2/Storage/Config.pm |   2 +-
 src/PVE/BackupProvider/Plugin/Borg.pm  | 439 ++
 src/PVE/BackupProvider/Plugin/Makefile |   2 +-
 src/PVE/Storage.pm |   2 +
 src/PVE/Storage/BorgBackupPlugin.pm| 595 +
 src/PVE/Storage/Makefile   |   1 +
 6 files changed, 1039 insertions(+), 2 deletions(-)
 create mode 100644 src/PVE/BackupProvider/Plugin/Borg.pm
 create mode 100644 src/PVE/Storage/BorgBackupPlugin.pm

diff --git a/src/PVE/API2/Storage/Config.pm b/src/PVE/API2/Storage/Config.pm
index e04b6ab..1cbf09d 100755
--- a/src/PVE/API2/Storage/Config.pm
+++ b/src/PVE/API2/Storage/Config.pm
@@ -190,7 +190,7 @@ __PACKAGE__->register_method ({
return &$api_storage_config($cfg, $param->{storage});
 }});
 
-my $sensitive_params = [qw(password encryption-key master-pubkey keyring)];
+my $sensitive_params = [qw(password encryption-key master-pubkey keyring 
ssh-key)];
 
 __PACKAGE__->register_method ({
 name => 'create',
diff --git a/src/PVE/BackupProvider/Plugin/Borg.pm 
b/src/PVE/BackupProvider/Plugin/Borg.pm
new file mode 100644
index 000..7bb3ae3
--- /dev/null
+++ b/src/PVE/BackupProvider/Plugin/Borg.pm
@@ -0,0 +1,439 @@
+package PVE::BackupProvider::Plugin::Borg;
+
+use strict;
+use warnings;
+
+use File::chdir;
+use File::Basename qw(basename);
+use File::Path qw(make_path remove_tree);
+use Net::IP;
+use POSIX qw(strftime);
+
+use PVE::Tools;
+
+# ($vmtype, $vmid, $time_string)
+our $ARCHIVE_RE_3 = 
qr!^pve-(lxc|qemu)-([0-9]+)-([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z)$!;
+
+sub archive_name {
+my ($vmtype, $vmid, $backup_time) = @_;
+
+return "pve-${vmtype}-${vmid}-" . strftime("%FT%TZ", gmtime($backup_time));
+}
+
+# remove_tree can be very verbose by default, do explicit error handling and 
limit to one message
+my sub _remove_tree {
+my ($path) = @_;
+
+remove_tree($path, { error => \my $err });
+if ($err && @$err) { # empty array if no error
+   for my $diag (@$err) {
+   my ($file, $message) = %$diag;
+   die "cannot remove_tree '$path': $message\n" if $file eq '';
+   die "cannot remove_tree '$path': unlinking $file failed - 
$message\n";
+   }
+}
+}
+
+my sub prepare_run_dir {
+my ($archive, $operation) = @_;
+
+my $run_dir = "/run/pve-storage-borg-plugin/${archive}.${operation}";
+_remove_tree($run_dir);
+make_path($run_dir);
+die "unable to create directory $run_dir\n" if !-d $run_dir;
+
+return $run_dir;
+}
+
+my sub log_info {
+my ($self, $message) = @_;
+
+$self->{'log-function'}->('info', $message);
+}
+
+my sub log_warning {
+my ($self, $message) = @_;
+
+$self->{'log-function'}->('warn', $message);
+}
+
+my sub log_error {
+my ($self, $message) = @_;
+
+$self->{'log-function'}->('err', $message);
+}
+
+my sub file_contents_from_archive {
+my ($self, $archive, $file) = @_;
+
+my $run_dir = prepare_run_dir($archive, "file-contents");
+
+my $raw;
+
+eval {
+   local $CWD = $run_dir;
+
+   $self->{'storage-plugin'}->borg_cmd_extract(
+   $self->{scfg},
+   $self->{storeid},
+   $archive,
+   [$file],
+   );
+
+   $raw = PVE::Tools::file_get_contents("${run_dir}/${file}");
+};
+my $err = $@;
+eval { _remove_tree($run_dir); };
+log_warning($self, $@) if $@;
+die $err if $err;
+
+return $raw;
+}
+
+# Plugin implementation
+
+sub new {
+my ($class, $storage_plugin, $scfg, $storeid, $log_function) = @_;
+
+my $self = bless {
+   sc

[pve-devel] [RFC qemu-server v3 21/34] backup: implement backup for external providers

2024-11-07 Thread Fiona Ebner
The state of the VM's disk images at the time the backup is started is
preserved via a snapshot-access block node. Old data is moved to the
fleecing image when new guest writes come in. The snapshot-access
block node, as well as the associated bitmap in case of incremental
backup, will be made available to the external provider. They are
exported via NBD and for 'nbd' mechanism, the NBD socket path is
passed to the provider, while for 'block-device' mechanism, the NBD
export is made accessible as a regular block device first and the
bitmap information is made available via a $next_dirty_region->()
function. For 'block-device', the 'nbdinfo' binary is required.

The provider can indicate that it wants to do an incremental backup by
returning the bitmap ID that was used for a previous backup and it
will then be told if the bitmap was newly created (either first backup
or old bitmap was invalid) or if the bitmap can be reused.

The provider then reads the parts of the NBD or block device it needs,
either the full disk for full backup, or the dirty parts according to
the bitmap for incremental backup. The bitmap has to be respected,
reads to other parts of the image will return an error. After backing
up each part of the disk, it should be discarded in the export to
avoid unnecessary space usage in the fleecing image (requires the
storage underlying the fleecing image to support discard too).

Signed-off-by: Fiona Ebner 
---

Changes in v3:
* adapt to API changes, config files are now passed as raw

 PVE/VZDump/QemuServer.pm | 309 ++-
 1 file changed, 308 insertions(+), 1 deletion(-)

diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index b6dcd6cc..d0218c9b 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -20,7 +20,7 @@ use PVE::QMPClient;
 use PVE::Storage::Plugin;
 use PVE::Storage::PBSPlugin;
 use PVE::Storage;
-use PVE::Tools;
+use PVE::Tools qw(run_command);
 use PVE::VZDump;
 use PVE::Format qw(render_duration render_bytes);
 
@@ -277,6 +277,8 @@ sub archive {
 
 if ($self->{vzdump}->{opts}->{pbs}) {
$self->archive_pbs($task, $vmid);
+} elsif ($self->{vzdump}->{'backup-provider'}) {
+   $self->archive_external($task, $vmid);
 } else {
$self->archive_vma($task, $vmid, $filename, $comp);
 }
@@ -1149,6 +1151,23 @@ sub cleanup {
 
 # If VM was started only for backup, it is already stopped now.
 if (PVE::QemuServer::Helpers::vm_running_locally($vmid)) {
+   if ($task->{cleanup}->{'nbd-stop'}) {
+   eval { PVE::QemuServer::QMPHelpers::nbd_stop($vmid); };
+   $self->logerr($@) if $@;
+   }
+
+   if (my $info = $task->{cleanup}->{'backup-access-teardown'}) {
+   my $params = {
+   'target-id' => $info->{'target-id'},
+   timeout => 60,
+   success => $info->{success} ? JSON::true : JSON::false,
+   };
+
+   $self->loginfo("tearing down backup-access");
+   eval { mon_cmd($vmid, "backup-access-teardown", $params->%*) };
+   $self->logerr($@) if $@;
+   }
+
$detach_tpmstate_drive->($task, $vmid);
detach_fleecing_images($task->{disks}, $vmid) if 
$task->{'use-fleecing'};
 }
@@ -1160,4 +1179,292 @@ sub cleanup {
 }
 }
 
+my sub block_device_backup_cleanup {
+my ($self, $paths, $cpids) = @_;
+
+for my $path ($paths->@*) {
+   eval { run_command(["qemu-nbd", "-d", $path ]); };
+   $self->log('warn', "unable to disconnect NBD backup source '$path' - 
$@") if $@;
+}
+
+my $waited;
+my $wait_limit = 5;
+for ($waited = 0; $waited < $wait_limit && scalar(keys $cpids->%*); 
$waited++) {
+   while ((my $cpid = waitpid(-1, POSIX::WNOHANG)) > 0) {
+   delete($cpids->{$cpid});
+   }
+   if ($waited == 0) {
+   kill 15, $_ for keys $cpids->%*;
+   }
+   sleep 1;
+}
+if ($waited == $wait_limit && scalar(keys $cpids->%*)) {
+   kill 9, $_ for keys $cpids->%*;
+   sleep 1;
+   while ((my $cpid = waitpid(-1, POSIX::WNOHANG)) > 0) {
+   delete($cpids->{$cpid});
+   }
+   $self->log('warn', "unable to collect nbdinfo child process '$_'") for 
keys $cpids->%*;
+}
+}
+
+my sub block_device_backup_prepare {
+my ($self, $devicename, $size, $nbd_path, $bitmap_name, $count) = @_;
+
+my $nbd_info_uri = "nbd+unix:///${devicename}?socket=${nbd_path}";
+my $qemu_nbd_uri = "nbd:unix:${nbd_path}:exportname=${devicename}";
+
+my $cpid;
+my $error_fh;
+my $next_dirty_region;
+
+# If there is no dirty bitmap, it c

[pve-devel] [RFC container v3 29/34] external restore: don't use 'one-file-system' tar flag when restoring from a directory

2024-11-07 Thread Fiona Ebner
This gives backup providers more freedom, e.g. mount backed-up mount
point volumes individually.

Suggested-by: Fabian Grünbichler 
Signed-off-by: Fiona Ebner 
---

New in v3.

 src/PVE/LXC/Create.pm | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/PVE/LXC/Create.pm b/src/PVE/LXC/Create.pm
index 8657ac1..719f372 100644
--- a/src/PVE/LXC/Create.pm
+++ b/src/PVE/LXC/Create.pm
@@ -167,11 +167,15 @@ sub restore_external_archive {
or die "did not get path to archive directory from backup 
provider\n";
die "not a directory '$directory'" if !-d $directory;
 
+   # Give backup provider more freedom, e.g. mount backed-up mount 
point volumes
+   # individually.
+   my @flags = grep { $_ ne '--one-file-system' } 
@PVE::Storage::Plugin::COMMON_TAR_FLAGS;
+
my $create_cmd = [
'tar',
'cpf',
'-',
-   @PVE::Storage::Plugin::COMMON_TAR_FLAGS,
+   @flags,
"--directory=$directory",
'.',
];
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [RFC qemu-server v3 23/34] backup: implement restore for external providers

2024-11-07 Thread Fiona Ebner
First, the provider is asked about what restore mechanism to use.
Currently, only 'qemu-img' is possible. Then the configuration files
are restored, the provider gives information about volumes contained
in the backup and finally the volumes are restored via
'qemu-img convert'.

The code for the restore_external_archive() function was copied and
adapted from the restore_proxmox_backup_archive() function. Together
with restore_vma_archive() it seems sensible to extract the common
parts and use a dedicated module for restore code.

The parse_restore_archive() helper was renamed, because it's not just
parsing.

Signed-off-by: Fiona Ebner 
---

Changes in v3:
* use new storage_has_feature() helper

 PVE/API2/Qemu.pm  |  30 +-
 PVE/QemuServer.pm | 139 ++
 2 files changed, 166 insertions(+), 3 deletions(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index 1c3cb271..319518e8 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -845,7 +845,7 @@ __PACKAGE__->register_method({
return $res;
 }});
 
-my $parse_restore_archive = sub {
+my $classify_restore_archive = sub {
 my ($storecfg, $archive) = @_;
 
 my ($archive_storeid, $archive_volname) = 
PVE::Storage::parse_volume_id($archive, 1);
@@ -859,6 +859,22 @@ my $parse_restore_archive = sub {
$res->{type} = 'pbs';
return $res;
}
+   if (PVE::Storage::storage_has_feature($storecfg, $archive_storeid, 
'backup-provider')) {
+   my $log_function = sub {
+   my ($log_level, $message) = @_;
+   my $prefix = $log_level eq 'err' ? 'ERROR' : uc($log_level);
+   print "$prefix: $message\n";
+   };
+   my $backup_provider = PVE::Storage::new_backup_provider(
+   $storecfg,
+   $archive_storeid,
+   $log_function,
+   );
+
+   $res->{type} = 'external';
+   $res->{'backup-provider'} = $backup_provider;
+   return $res;
+   }
 }
 my $path = PVE::Storage::abs_filesystem_path($storecfg, $archive);
 $res->{type} = 'file';
@@ -1011,7 +1027,7 @@ __PACKAGE__->register_method({
'backup',
);
 
-   $archive = $parse_restore_archive->($storecfg, $archive);
+   $archive = $classify_restore_archive->($storecfg, $archive);
}
}
 
@@ -1069,7 +1085,15 @@ __PACKAGE__->register_method({
PVE::QemuServer::check_restore_permissions($rpcenv, 
$authuser, $merged);
}
}
-   if ($archive->{type} eq 'file' || $archive->{type} eq 'pipe') {
+   if (my $backup_provider = $archive->{'backup-provider'}) {
+   PVE::QemuServer::restore_external_archive(
+   $backup_provider,
+   $archive->{volid},
+   $vmid,
+   $authuser,
+   $restore_options,
+   );
+   } elsif ($archive->{type} eq 'file' || $archive->{type} eq 
'pipe') {
die "live-restore is only compatible with backup images 
from a Proxmox Backup Server\n"
if $live_restore;
PVE::QemuServer::restore_file_archive($archive->{path} // 
'-', $vmid, $authuser, $restore_options);
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 30e51a8c..f484d048 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -7303,6 +7303,145 @@ sub restore_proxmox_backup_archive {
 }
 }
 
+sub restore_external_archive {
+my ($backup_provider, $archive, $vmid, $user, $options) = @_;
+
+die "live restore from backup provider is not implemented\n" if 
$options->{live};
+
+my $storecfg = PVE::Storage::config();
+
+my ($storeid, $volname) = PVE::Storage::parse_volume_id($archive);
+my $scfg = PVE::Storage::storage_config($storecfg, $storeid);
+
+my $tmpdir = "/var/tmp/vzdumptmp$$";
+rmtree $tmpdir;
+mkpath $tmpdir;
+
+my $conffile = PVE::QemuConfig->config_file($vmid);
+# disable interrupts (always do cleanups)
+local $SIG{INT} =
+   local $SIG{TERM} =
+   local $SIG{QUIT} =
+   local $SIG{HUP} = sub { print STDERR "got interrupt - ignored\n"; };
+
+# Note: $oldconf is undef if VM does not exists
+my $cfs_path = PVE::QemuConfig->cfs_config_path($vmid);
+my $oldconf = PVE::Cluster::cfs_read_file($cfs_path);
+my $new_conf_raw = '';
+
+my $rpcenv = PVE::RPCEnvironment::get();
+my $devinfo = {}; # info about drives included in backup
+my $virtdev_hash = {}; # info about allocated drives
+
+ev

[pve-devel] [RFC qemu-server v3 24/34] backup restore: external: hardening check for untrusted source image

2024-11-07 Thread Fiona Ebner
Suggested-by: Fabian Grünbichler 
Signed-off-by: Fiona Ebner 
---

New in v3.

Actual checking being done depends on Fabian's hardening patches:
https://lore.proxmox.com/pve-devel/20241104104221.228730-1-f.gruenbich...@proxmox.com/

 PVE/QemuServer.pm | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index f484d048..c2e7b4a5 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -7385,6 +7385,12 @@ sub restore_external_archive {
$backup_provider->restore_vm_volume_init($volname, $storeid, 
$d->{devname}, {});
my $source_path = $info->{'qemu-img-path'}
or die "did not get source image path from backup provider\n";
+
+   print "importing drive '$d->{devname}' from '$source_path'\n";
+
+   # safety check for untrusted source image
+   PVE::Storage::file_size_info($source_path, undef, 1);
+
eval {
qemu_img_convert(
$source_path, $d->{volid}, $d->{size}, undef, 0, 
$options->{bwlimit});
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [PATCH container v3 25/34] create: add missing include of PVE::Storage::Plugin

2024-11-07 Thread Fiona Ebner
used for the shared 'COMMON_TAR_FLAGS' variable.

Signed-off-by: Fiona Ebner 
---

New in v3.

 src/PVE/LXC/Create.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/PVE/LXC/Create.pm b/src/PVE/LXC/Create.pm
index 117103c..7c5bf0a 100644
--- a/src/PVE/LXC/Create.pm
+++ b/src/PVE/LXC/Create.pm
@@ -8,6 +8,7 @@ use Fcntl;
 
 use PVE::RPCEnvironment;
 use PVE::Storage::PBSPlugin;
+use PVE::Storage::Plugin;
 use PVE::Storage;
 use PVE::DataCenterConfig;
 use PVE::LXC;
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu-server v3 17/34] backup: move cleanup of fleecing images to cleanup method

2024-11-07 Thread Fiona Ebner
TPM drives are already detached there and it's better to group
these things together.

Signed-off-by: Fiona Ebner 
---

No changes in v3.

 PVE/VZDump/QemuServer.pm | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 012c9210..b2ced154 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -690,7 +690,6 @@ sub archive_pbs {
 
 # get list early so we die on unkown drive types before doing anything
 my $devlist = _get_task_devlist($task);
-my $use_fleecing;
 
 $self->enforce_vm_running_for_backup($vmid);
 $self->{qmeventd_fh} = PVE::QemuServer::register_qmeventd_handle($vmid);
@@ -721,7 +720,7 @@ sub archive_pbs {
 
my $is_template = 
PVE::QemuConfig->is_template($self->{vmlist}->{$vmid});
 
-   $use_fleecing = check_and_prepare_fleecing(
+   $task->{'use-fleecing'} = check_and_prepare_fleecing(
$self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, 
$qemu_support);
 
my $fs_frozen = $self->qga_fs_freeze($task, $vmid);
@@ -735,7 +734,7 @@ sub archive_pbs {
devlist => $devlist,
'config-file' => $conffile,
};
-   $params->{fleecing} = JSON::true if $use_fleecing;
+   $params->{fleecing} = JSON::true if $task->{'use-fleecing'};
 
if (defined(my $ns = $scfg->{namespace})) {
$params->{'backup-ns'} = $ns;
@@ -784,11 +783,6 @@ sub archive_pbs {
 }
 $self->restore_vm_power_state($vmid);
 
-if ($use_fleecing) {
-   detach_fleecing_images($task->{disks}, $vmid);
-   cleanup_fleecing_images($self, $task->{disks});
-}
-
 die $err if $err;
 }
 
@@ -891,7 +885,6 @@ sub archive_vma {
 }
 
 my $devlist = _get_task_devlist($task);
-my $use_fleecing;
 
 $self->enforce_vm_running_for_backup($vmid);
 $self->{qmeventd_fh} = PVE::QemuServer::register_qmeventd_handle($vmid);
@@ -911,7 +904,7 @@ sub archive_vma {
 
$attach_tpmstate_drive->($self, $task, $vmid);
 
-   $use_fleecing = check_and_prepare_fleecing(
+   $task->{'use-fleecing'} = check_and_prepare_fleecing(
$self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, 
$qemu_support);
 
my $outfh;
@@ -942,7 +935,7 @@ sub archive_vma {
devlist => $devlist
};
$params->{'firewall-file'} = $firewall if -e $firewall;
-   $params->{fleecing} = JSON::true if $use_fleecing;
+   $params->{fleecing} = JSON::true if $task->{'use-fleecing'};
add_backup_performance_options($params, $opts->{performance}, 
$qemu_support);
 
$qmpclient->queue_cmd($vmid, $backup_cb, 'backup', %$params);
@@ -984,11 +977,6 @@ sub archive_vma {
 
 $self->restore_vm_power_state($vmid);
 
-if ($use_fleecing) {
-   detach_fleecing_images($task->{disks}, $vmid);
-   cleanup_fleecing_images($self, $task->{disks});
-}
-
 if ($err) {
if ($cpid) {
kill(9, $cpid);
@@ -1132,6 +1120,11 @@ sub cleanup {
 
 $detach_tpmstate_drive->($task, $vmid);
 
+if ($task->{'use-fleecing'}) {
+   detach_fleecing_images($task->{disks}, $vmid);
+   cleanup_fleecing_images($self, $task->{disks});
+}
+
 if ($self->{qmeventd_fh}) {
close($self->{qmeventd_fh});
 }
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu-server v3 16/34] move nbd_stop helper to QMPHelpers module

2024-11-07 Thread Fiona Ebner
Like this nbd_stop() can be called from a module that cannot include
QemuServer.pm.

Signed-off-by: Fiona Ebner 
---

No changes in v3.

 PVE/API2/Qemu.pm | 3 ++-
 PVE/CLI/qm.pm| 3 ++-
 PVE/QemuServer.pm| 6 --
 PVE/QemuServer/QMPHelpers.pm | 6 ++
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index 848001b6..1c3cb271 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -35,6 +35,7 @@ use PVE::QemuServer::Monitor qw(mon_cmd);
 use PVE::QemuServer::Machine;
 use PVE::QemuServer::Memory qw(get_current_memory);
 use PVE::QemuServer::PCI;
+use PVE::QemuServer::QMPHelpers;
 use PVE::QemuServer::USB;
 use PVE::QemuMigrate;
 use PVE::RPCEnvironment;
@@ -5910,7 +5911,7 @@ __PACKAGE__->register_method({
return;
},
'nbdstop' => sub {
-   PVE::QemuServer::nbd_stop($state->{vmid});
+   PVE::QemuServer::QMPHelpers::nbd_stop($state->{vmid});
return;
},
'resume' => sub {
diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm
index 8d8ce10a..47b87782 100755
--- a/PVE/CLI/qm.pm
+++ b/PVE/CLI/qm.pm
@@ -35,6 +35,7 @@ use PVE::QemuServer::Agent qw(agent_available);
 use PVE::QemuServer::ImportDisk;
 use PVE::QemuServer::Monitor qw(mon_cmd);
 use PVE::QemuServer::OVF;
+use PVE::QemuServer::QMPHelpers;
 use PVE::QemuServer;
 
 use PVE::CLIHandler;
@@ -385,7 +386,7 @@ __PACKAGE__->register_method ({
 
my $vmid = $param->{vmid};
 
-   eval { PVE::QemuServer::nbd_stop($vmid) };
+   eval { PVE::QemuServer::QMPHelpers::nbd_stop($vmid) };
warn $@ if $@;
 
return;
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 0df3bda0..49b6ca17 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -8606,12 +8606,6 @@ sub generate_smbios1_uuid {
 return "uuid=".generate_uuid();
 }
 
-sub nbd_stop {
-my ($vmid) = @_;
-
-mon_cmd($vmid, 'nbd-server-stop', timeout => 25);
-}
-
 sub create_reboot_request {
 my ($vmid) = @_;
 open(my $fh, '>', "/run/qemu-server/$vmid.reboot")
diff --git a/PVE/QemuServer/QMPHelpers.pm b/PVE/QemuServer/QMPHelpers.pm
index 0269ea46..826938de 100644
--- a/PVE/QemuServer/QMPHelpers.pm
+++ b/PVE/QemuServer/QMPHelpers.pm
@@ -15,6 +15,12 @@ qemu_objectadd
 qemu_objectdel
 );
 
+sub nbd_stop {
+my ($vmid) = @_;
+
+mon_cmd($vmid, 'nbd-server-stop', timeout => 25);
+}
+
 sub qemu_deviceadd {
 my ($vmid, $devicefull) = @_;
 
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC common v3 10/34] env: add module with helpers to run a Perl subroutine in a user namespace

2024-11-07 Thread Fiona Ebner
The first use case is running the container backup subroutine for
external providers inside a user namespace. That allows them to see
the filesystem to back-up from the containers perspective and also
improves security because of isolation.

Copied and adapted the relevant parts from the pve-buildpkg
repository.

Originally-by: Wolfgang Bumiller 
[FE: add $idmap parameter, drop $aux_groups parameter]
Signed-off-by: Fiona Ebner 
---

New in v3.

 src/Makefile   |   1 +
 src/PVE/Env.pm | 136 +
 2 files changed, 137 insertions(+)
 create mode 100644 src/PVE/Env.pm

diff --git a/src/Makefile b/src/Makefile
index 2d8bdc4..dba26e3 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -15,6 +15,7 @@ LIB_SOURCES = \
Certificate.pm \
CpuSet.pm \
Daemon.pm \
+   Env.pm \
Exception.pm \
Format.pm \
INotify.pm \
diff --git a/src/PVE/Env.pm b/src/PVE/Env.pm
new file mode 100644
index 000..e11bec0
--- /dev/null
+++ b/src/PVE/Env.pm
@@ -0,0 +1,136 @@
+package PVE::Env;
+
+use strict;
+use warnings;
+
+use Fcntl qw(O_WRONLY);
+use POSIX qw(EINTR);
+use Socket;
+
+require qw(syscall.ph);
+
+use constant {CLONE_NEWNS   => 0x0002,
+  CLONE_NEWUSER => 0x1000};
+
+sub unshare($) {
+my ($flags) = @_;
+return 0 == syscall(272, $flags);
+}
+
+sub __set_id_map($$$) {
+my ($pid, $what, $value) = @_;
+sysopen(my $fd, "/proc/$pid/${what}_map", O_WRONLY)
+   or die "failed to open child process' ${what}_map\n";
+my $rc = syswrite($fd, $value);
+if (!$rc || $rc != length($value)) {
+   die "failed to set sub$what: $!\n";
+}
+close($fd);
+}
+
+sub set_id_map($$) {
+my ($pid, $id_map) = @_;
+
+my $gid_map = '';
+my $uid_map = '';
+
+for my $map ($id_map->@*) {
+   my ($type, $ct, $host, $length) = $map->@*;
+
+   $gid_map .= "$ct $host $length\n" if $type eq 'g';
+   $uid_map .= "$ct $host $length\n" if $type eq 'u';
+}
+
+__set_id_map($pid, 'gid', $gid_map) if $gid_map;
+__set_id_map($pid, 'uid', $uid_map) if $uid_map;
+}
+
+sub wait_for_child($;$) {
+my ($pid, $noerr) = @_;
+my $interrupts = 0;
+while (waitpid($pid, 0) != $pid) {
+   if ($! == EINTR) {
+   warn "interrupted...\n";
+   kill(($interrupts > 3 ? 9 : 15), $pid);
+   $interrupts++;
+   }
+}
+my $status = POSIX::WEXITSTATUS($?);
+return $status if $noerr;
+
+if ($? == -1) {
+   die "failed to execute\n";
+} elsif (POSIX::WIFSIGNALED($?)) {
+   my $sig = POSIX::WTERMSIG($?);
+   die "got signal $sig\n";
+} elsif ($status != 0) {
+   warn "exit code $status\n";
+}
+return $status;
+}
+
+sub forked(&%) {
+my ($code, %opts) = @_;
+
+pipe(my $except_r, my $except_w) or die "pipe: $!\n";
+
+my $pid = fork();
+die "fork failed: $!\n" if !defined($pid);
+
+if ($pid == 0) {
+   close($except_r);
+   eval { $code->() };
+   if ($@) {
+   print {$except_w} $@;
+   $except_w->flush();
+   POSIX::_exit(1);
+   }
+   POSIX::_exit(0);
+}
+close($except_w);
+
+my $err;
+if (my $afterfork = $opts{afterfork}) {
+   eval { $afterfork->($pid); };
+   if ($err = $@) {
+   kill(15, $pid);
+   $opts{noerr} = 1;
+   }
+}
+if (!$err) {
+   $err = do { local $/ = undef; <$except_r> };
+}
+my $rv = wait_for_child($pid, $opts{noerr});
+die $err if $err;
+die "an unknown error occurred\n" if $rv != 0;
+return $rv;
+}
+
+sub run_in_userns(&;$) {
+my ($code, $id_map) = @_;
+socketpair(my $sp, my $sc, AF_UNIX, SOCK_STREAM, PF_UNSPEC)
+   or die "socketpair: $!\n";
+forked(sub {
+   close($sp);
+   unshare(CLONE_NEWUSER|CLONE_NEWNS) or die "unshare(NEWUSER|NEWNS): 
$!\n";
+   syswrite($sc, "1\n") == 2 or die "write: $!\n";
+   shutdown($sc, 1);
+   my $two = <$sc>;
+   die "failed to sync with parent process\n" if $two ne "2\n";
+   close($sc);
+   $! = undef;
+   ($(, $)) = (0, 0); die "$!\n" if $!;
+   ($<, $>) = (0, 0); die "$!\n" if $!;
+   $code->();
+}, afterfork => sub {
+   my ($pid) = @_;
+   close($sc);
+   my $one = <$sp>;
+   die "failed to sync with userprocess\n" if $one ne "1\n";
+   set_id_map($pid, $id_map);
+   syswrite($sp, "2\n") == 2 or die "write: $!\n";
+   close($sp);
+});
+}
+
+1;
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC storage v3 13/34] extract backup config: delegate to backup provider for storages that support it

2024-11-07 Thread Fiona Ebner
Signed-off-by: Fiona Ebner 
---

Changes in v3:
* use new storage_has_feature() helper

 src/PVE/Storage.pm | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm
index 69500bf..9f9a86b 100755
--- a/src/PVE/Storage.pm
+++ b/src/PVE/Storage.pm
@@ -1734,6 +1734,17 @@ sub extract_vzdump_config {
storage_check_enabled($cfg, $storeid);
return PVE::Storage::PBSPlugin->extract_vzdump_config($scfg, 
$volname, $storeid);
}
+
+   if (storage_has_feature($cfg, $storeid, 'backup-provider')) {
+   my $plugin = PVE::Storage::Plugin->lookup($scfg->{type});
+   my $log_function = sub {
+   my ($log_level, $message) = @_;
+   my $prefix = $log_level eq 'err' ? 'ERROR' : uc($log_level);
+   print "$prefix: $message\n";
+   };
+   my $backup_provider = $plugin->new_backup_provider($scfg, $storeid, 
$log_function);
+   return $backup_provider->restore_get_guest_config($volname, 
$storeid);
+   }
 }
 
 my $archive = abs_filesystem_path($cfg, $volid);
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC storage v3 12/34] plugin: introduce new_backup_provider() method

2024-11-07 Thread Fiona Ebner
The new_backup_provider() method can be used by storage plugins for
external backup providers. If the method returns a provider, Proxmox
VE will use callbacks to that provider for backups and restore instead
of using its usual backup/restore mechanisms.

API age and version are both bumped.

The backup provider API is split into two parts, both of which again
need different implementations for VM and LXC guests:

1. Backup API

There are two hook callback functions, namely:
1. job_hook() is called during the start/end/abort phases of the whole
   backup job.
2. backup_hook() is called during the start/end/abort phases of the
   backup of an individual guest. There also is a 'prepare' phase
   useful for container backups, because the backup method for
   containers itself is executed in the user namespace context
   associated to the container.

The backup_get_mechanism() method is used to decide on the backup
mechanism. Currently, 'block-device' or 'nbd' for VMs, and 'directory'
for containers is possible. The method also let's the plugin indicate
whether to use a bitmap for incremental VM backup or not. It is enough
to implement one mechanism for VMs and one mechanism for containers.

Next, there are methods for backing up the guest's configuration and
data, backup_vm() for VM backup and backup_container() for container
backup, with the latter running

Finally, some helpers like getting the provider name or volume ID for
the backup target, as well as for handling the backup log.

1.1 Backup Mechanisms

VM:

Access to the data on the VM's disk from the time the backup started
is made available via a so-called "snapshot access". This is either
the full image, or in case a bitmap is used, the dirty parts of the
image since the last time the bitmap was used for a successful backup.
Reading outside of the dirty parts will result in an error. After
backing up each part of the disk, it should be discarded in the export
to avoid unnecessary space usage on the Proxmox VE side (there is an
associated fleecing image).

VM mechanism 'block-device':

The snapshot access is exposed as a block device. If used, a bitmap is
passed along.

VM mechanism 'nbd':

The snapshot access and, if used, bitmap are exported via NBD.

Container mechanism 'directory':

A copy or snapshot of the container's filesystem state is made
available as a directory. The method is executed inside the user
namespace associated to the container.

2. Restore API

The restore_get_mechanism() method is used to decide on the restore
mechanism. Currently, 'qemu-img' for VMs, and 'directory' or 'tar' for
containers are possible. It is enough to implement one mechanism for
VMs and one mechanism for containers.

Next, methods for extracting the guest and firewall configuration and
the implementations of the restore mechanism via a pair of methods: an
init method, for making the data available to Proxmox VE and a cleanup
method that is called after restore.

For VMs, there also is a restore_vm_get_device_info() helper required,
to get the disks included in the backup and their sizes.

2.1. Restore Mechanisms

VM mechanism 'qemu-img':

The backup provider gives a path to the disk image that will be
restored. The path needs to be something 'qemu-img' can deal with,
e.g. can also be an NBD URI or similar.

Container mechanism 'directory':

The backup provider gives the path to a directory with the full
filesystem structure of the container.

Container mechanism 'tar':

The backup provider gives the path to a (potentially compressed) tar
archive with the full filesystem structure of the container.

See the PVE::BackupProvider::Plugin module for the full API
documentation.

Signed-off-by: Fiona Ebner 
---

Changes in v3:
* update docs regarding API changes:
  - prepare phase for backup hook
  - pass in configs as data instead of filenames

 src/PVE/BackupProvider/Makefile|3 +
 src/PVE/BackupProvider/Plugin/Base.pm  | 1158 
 src/PVE/BackupProvider/Plugin/Makefile |5 +
 src/PVE/Makefile   |1 +
 src/PVE/Storage.pm |   12 +-
 src/PVE/Storage/Plugin.pm  |   15 +
 6 files changed, 1192 insertions(+), 2 deletions(-)
 create mode 100644 src/PVE/BackupProvider/Makefile
 create mode 100644 src/PVE/BackupProvider/Plugin/Base.pm
 create mode 100644 src/PVE/BackupProvider/Plugin/Makefile

diff --git a/src/PVE/BackupProvider/Makefile b/src/PVE/BackupProvider/Makefile
new file mode 100644
index 000..f018cef
--- /dev/null
+++ b/src/PVE/BackupProvider/Makefile
@@ -0,0 +1,3 @@
+.PHONY: install
+install:
+   make -C Plugin install
diff --git a/src/PVE/BackupProvider/Plugin/Base.pm 
b/src/PVE/BackupProvider/Plugin/Base.pm
new file mode 100644
index 000..a8d0a88
--- /dev/null
+++ b/src/PVE/BackupProvid

[pve-devel] [RFC manager v3 34/34] backup: implement backup for external providers

2024-11-07 Thread Fiona Ebner
Hooks from the backup provider are called during start/end/abort for
both job and backup. And it is necessary to adapt some log messages
and special case some things like is already done for PBS, e.g. log
file handling.

Signed-off-by: Fiona Ebner 
---

Changes in v3:
* use new storage_has_feature() helper

 PVE/VZDump.pm   | 57 -
 test/vzdump_new_test.pl |  3 +++
 2 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm
index fd89945e..0b1b5dc1 100644
--- a/PVE/VZDump.pm
+++ b/PVE/VZDump.pm
@@ -217,7 +217,10 @@ sub storage_info {
 $info->{'prune-backups'} = 
PVE::JSONSchema::parse_property_string('prune-backups', 
$scfg->{'prune-backups'})
if defined($scfg->{'prune-backups'});
 
-if ($type eq 'pbs') {
+if (PVE::Storage::storage_has_feature($cfg, $storage, 'backup-provider')) {
+   $info->{'backup-provider'} =
+   PVE::Storage::new_backup_provider($cfg, $storage, sub { 
debugmsg($_[0], $_[1]); });
+} elsif ($type eq 'pbs') {
$info->{pbs} = 1;
 } else {
$info->{dumpdir} = PVE::Storage::get_backup_dir($cfg, $storage);
@@ -717,6 +720,7 @@ sub new {
$opts->{scfg} = $info->{scfg};
$opts->{pbs} = $info->{pbs};
$opts->{'prune-backups'} //= $info->{'prune-backups'};
+   $self->{'backup-provider'} = $info->{'backup-provider'} if 
$info->{'backup-provider'};
}
 } elsif ($opts->{dumpdir}) {
$add_error->("dumpdir '$opts->{dumpdir}' does not exist")
@@ -1001,7 +1005,7 @@ sub exec_backup_task {
}
}
 
-   if (!$self->{opts}->{pbs}) {
+   if (!$self->{opts}->{pbs} && !$self->{'backup-provider'}) {
$task->{logfile} = "$opts->{dumpdir}/$basename.log";
}
 
@@ -1011,7 +1015,11 @@ sub exec_backup_task {
$ext .= ".${comp_ext}";
}
 
-   if ($self->{opts}->{pbs}) {
+   if ($self->{'backup-provider'}) {
+   die "unable to pipe backup to stdout\n" if $opts->{stdout};
+   $task->{target} = 
$self->{'backup-provider'}->backup_get_archive_name(
+   $vmid, $vmtype, $task->{backup_time});
+   } elsif ($self->{opts}->{pbs}) {
die "unable to pipe backup to stdout\n" if $opts->{stdout};
$task->{target} = $pbs_snapshot_name;
} else {
@@ -1029,7 +1037,7 @@ sub exec_backup_task {
my $pid = $$;
if ($opts->{tmpdir}) {
$task->{tmpdir} = "$opts->{tmpdir}/vzdumptmp${pid}_$vmid/";
-   } elsif ($self->{opts}->{pbs}) {
+   } elsif ($self->{opts}->{pbs} || $self->{'backup-provider'}) {
$task->{tmpdir} = "/var/tmp/vzdumptmp${pid}_$vmid";
} else {
# dumpdir is posix? then use it as temporary dir
@@ -1101,6 +1109,10 @@ sub exec_backup_task {
if ($mode eq 'stop') {
$plugin->prepare ($task, $vmid, $mode);
 
+   if ($self->{'backup-provider'}) {
+   $self->{'backup-provider'}->backup_hook(
+   'start', $vmid, $vmtype, { 'start-time' => 
$task->{backup_time} });
+   }
$self->run_hook_script ('backup-start', $task, $logfd);
 
if ($running) {
@@ -1115,6 +1127,10 @@ sub exec_backup_task {
} elsif ($mode eq 'suspend') {
$plugin->prepare ($task, $vmid, $mode);
 
+   if ($self->{'backup-provider'}) {
+   $self->{'backup-provider'}->backup_hook(
+   'start', $vmid, $vmtype, { 'start-time' => 
$task->{backup_time} });
+   }
$self->run_hook_script ('backup-start', $task, $logfd);
 
if ($vmtype eq 'lxc') {
@@ -1141,6 +1157,10 @@ sub exec_backup_task {
}
 
} elsif ($mode eq 'snapshot') {
+   if ($self->{'backup-provider'}) {
+   $self->{'backup-provider'}->backup_hook(
+   'start', $vmid, $vmtype, { 'start-time' => 
$task->{backup_time} });
+   }
$self->run_hook_script ('backup-start', $task, $logfd);
 
my $snapshot_count = $task->{snapshot_count} || 0;
@@ -1183,11 +1203,13 @@ sub exec_backup_task {
return;
}
 
-   my $archive_txt = $self->{opts}->{pbs} ? 'Proxmox Backup Server' : 
'vzdump';
+   my $arch

[pve-devel] [PATCH qemu v3 05/34] PVE backup: include device name in error when setting up snapshot access fails

2024-11-07 Thread Fiona Ebner
Signed-off-by: Fiona Ebner 
---

No changes in v3.

 pve-backup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/pve-backup.c b/pve-backup.c
index 33c23e53c2..d931746453 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -626,7 +626,8 @@ static void create_backup_jobs_bh(void *opaque) {
 bool discard_source = false;
 if (di->fleecing.bs) {
 if (setup_snapshot_access(di, &local_err) < 0) {
-error_setg(errp, "setting up snapshot access for fleecing 
failed: %s",
+error_setg(errp, "%s - setting up snapshot access for fleecing 
failed: %s",
+   di->device_name,
local_err ? error_get_pretty(local_err) : "unknown 
error");
 cleanup_snapshot_access(di);
 bdrv_drained_end(di->bs);
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu-server v3 19/34] backup: keep track of block-node size for fleecing

2024-11-07 Thread Fiona Ebner
For fleecing, the size needs to match exactly what QEMU sees. In
particular, EFI disks might be attached with a 'size=' option, meaning
that size can be different from the volume's size. Commit 36377acf
("backup: disk info: also keep track of size") introduced size
tracking and it was used for fleecing since then, but the accurate
size information needs to be queried via QMP.

Should also help with the following issue reported in the community
forum:
https://forum.proxmox.com/threads/152202

Signed-off-by: Fiona Ebner 
---

Changes in v3:
* only use query-block QMP command after the VM is enforced running

 PVE/VZDump/QemuServer.pm | 37 -
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index c46e607c..1ebafe6d 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -551,7 +551,7 @@ my sub allocate_fleecing_images {
my $name = "vm-$vmid-fleece-$n";
$name .= ".$format" if $scfg->{path};
 
-   my $size = PVE::Tools::convert_size($di->{size}, 'b' => 'kb');
+   my $size = PVE::Tools::convert_size($di->{'block-node-size'}, 
'b' => 'kb');
 
$di->{'fleece-volid'} = PVE::Storage::vdisk_alloc(
$self->{storecfg}, $fleecing_storeid, $vmid, $format, 
$name, $size);
@@ -600,7 +600,7 @@ my sub attach_fleecing_images {
my $drive = 
"file=$path,if=none,id=$devid,format=$format,discard=unmap";
# Specify size explicitly, to make it work if storage backend 
rounded up size for
# fleecing image when allocating.
-   $drive .= ",size=$di->{size}" if $format eq 'raw';
+   $drive .= ",size=$di->{'block-node-size'}" if $format eq 'raw';
$drive =~ s/\\//g;
my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_add auto 
\"$drive\"", 60);
die "attaching fleecing image $volid failed - $ret\n" if $ret !~ 
m/OK/s;
@@ -609,7 +609,7 @@ my sub attach_fleecing_images {
 }
 
 my sub check_and_prepare_fleecing {
-my ($self, $vmid, $fleecing_opts, $disks, $is_template, $qemu_support) = 
@_;
+my ($self, $task, $vmid, $fleecing_opts, $disks, $is_template, 
$qemu_support) = @_;
 
 # Even if the VM was started specifically for fleecing, it's possible that 
the VM is resumed and
 # then starts doing IO. For VMs that are not resumed the fleecing images 
will just stay empty,
@@ -626,6 +626,8 @@ my sub check_and_prepare_fleecing {
 }
 
 if ($use_fleecing) {
+   $self->query_block_node_sizes($vmid, $task);
+
my ($default_format, $valid_formats) = 
PVE::Storage::storage_default_format(
$self->{storecfg}, $fleecing_opts->{storage});
my $format = scalar(grep { $_ eq 'qcow2' } $valid_formats->@*) ? 
'qcow2' : 'raw';
@@ -721,7 +723,7 @@ sub archive_pbs {
my $is_template = 
PVE::QemuConfig->is_template($self->{vmlist}->{$vmid});
 
$task->{'use-fleecing'} = check_and_prepare_fleecing(
-   $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, 
$qemu_support);
+   $self, $task, $vmid, $opts->{fleecing}, $task->{disks}, 
$is_template, $qemu_support);
 
my $fs_frozen = $self->qga_fs_freeze($task, $vmid);
 
@@ -905,7 +907,7 @@ sub archive_vma {
$attach_tpmstate_drive->($self, $task, $vmid);
 
$task->{'use-fleecing'} = check_and_prepare_fleecing(
-   $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, 
$qemu_support);
+   $self, $task, $vmid, $opts->{fleecing}, $task->{disks}, 
$is_template, $qemu_support);
 
my $outfh;
if ($opts->{stdout}) {
@@ -1042,6 +1044,31 @@ sub qga_fs_thaw {
 $self->logerr($@) if $@;
 }
 
+# The size for fleecing images needs to be exactly the same size as QEMU sees. 
E.g. EFI disk can bex
+# attached with a smaller size then the underyling image on the storage.
+sub query_block_node_sizes {
+my ($self, $vmid, $task) = @_;
+
+my $block_info = mon_cmd($vmid, "query-block");
+$block_info = { map { $_->{device} => $_ } $block_info->@* };
+
+for my $diskinfo ($task->{disks}->@*) {
+   my $drive_key = $diskinfo->{virtdev};
+   $drive_key .= "-backup" if $drive_key eq 'tpmstate0';
+   my $block_node_size =
+   eval { 
$block_info->{"drive-$drive_key"}->{inserted}->{image}->{'virtual-size'}; };
+   if (!$block_node_size) {
+   $self->loginfo(
+   "could not determine block node 

[pve-devel] [RFC qemu-server v3 20/34] backup: allow adding fleecing images also for EFI and TPM

2024-11-07 Thread Fiona Ebner
For the external backup API, it will be necessary to add a fleecing
image even for small disks like EFI and TPM, because there is no other
place the old data could be copied to when a new guest write comes in.

Signed-off-by: Fiona Ebner 
---

Changes in v3:
* adapt to context changes from previous patch

 PVE/VZDump/QemuServer.pm | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 1ebafe6d..b6dcd6cc 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -534,7 +534,7 @@ my sub cleanup_fleecing_images {
 }
 
 my sub allocate_fleecing_images {
-my ($self, $disks, $vmid, $fleecing_storeid, $format) = @_;
+my ($self, $disks, $vmid, $fleecing_storeid, $format, $all_images) = @_;
 
 die "internal error - no fleecing storage specified\n" if 
!$fleecing_storeid;
 
@@ -545,7 +545,8 @@ my sub allocate_fleecing_images {
my $n = 0; # counter for fleecing image names
 
for my $di ($disks->@*) {
-   next if $di->{virtdev} =~ m/^(?:tpmstate|efidisk)\d$/; # too small 
to be worth it
+   # EFI/TPM are usually too small to be worth it, but it's required 
for external providers
+   next if !$all_images && $di->{virtdev} =~ 
m/^(?:tpmstate|efidisk)\d$/;
if ($di->{type} eq 'block' || $di->{type} eq 'file') {
my $scfg = PVE::Storage::storage_config($self->{storecfg}, 
$fleecing_storeid);
my $name = "vm-$vmid-fleece-$n";
@@ -609,7 +610,7 @@ my sub attach_fleecing_images {
 }
 
 my sub check_and_prepare_fleecing {
-my ($self, $task, $vmid, $fleecing_opts, $disks, $is_template, 
$qemu_support) = @_;
+my ($self, $task, $vmid, $fleecing_opts, $disks, $is_template, 
$qemu_support, $all_images) = @_;
 
 # Even if the VM was started specifically for fleecing, it's possible that 
the VM is resumed and
 # then starts doing IO. For VMs that are not resumed the fleecing images 
will just stay empty,
@@ -632,7 +633,8 @@ my sub check_and_prepare_fleecing {
$self->{storecfg}, $fleecing_opts->{storage});
my $format = scalar(grep { $_ eq 'qcow2' } $valid_formats->@*) ? 
'qcow2' : 'raw';
 
-   allocate_fleecing_images($self, $disks, $vmid, 
$fleecing_opts->{storage}, $format);
+   allocate_fleecing_images(
+   $self, $disks, $vmid, $fleecing_opts->{storage}, $format, 
$all_images);
attach_fleecing_images($self, $disks, $vmid, $format);
 }
 
@@ -723,7 +725,7 @@ sub archive_pbs {
my $is_template = 
PVE::QemuConfig->is_template($self->{vmlist}->{$vmid});
 
$task->{'use-fleecing'} = check_and_prepare_fleecing(
-   $self, $task, $vmid, $opts->{fleecing}, $task->{disks}, 
$is_template, $qemu_support);
+   $self, $task, $vmid, $opts->{fleecing}, $task->{disks}, 
$is_template, $qemu_support, 0);
 
my $fs_frozen = $self->qga_fs_freeze($task, $vmid);
 
@@ -907,7 +909,7 @@ sub archive_vma {
$attach_tpmstate_drive->($self, $task, $vmid);
 
$task->{'use-fleecing'} = check_and_prepare_fleecing(
-   $self, $task, $vmid, $opts->{fleecing}, $task->{disks}, 
$is_template, $qemu_support);
+   $self, $task, $vmid, $opts->{fleecing}, $task->{disks}, 
$is_template, $qemu_support, 0);
 
my $outfh;
if ($opts->{stdout}) {
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu v3 01/34] block/reqlist: allow adding overlapping requests

2024-11-07 Thread Fiona Ebner
Allow overlapping request by removing the assert that made it
impossible. There are only two callers:

1. block_copy_task_create()

It already asserts the very same condition before calling
reqlist_init_req().

2. cbw_snapshot_read_lock()

There is no need to have read requests be non-overlapping in
copy-before-write when used for snapshot-access. In fact, there was no
protection against two callers of cbw_snapshot_read_lock() calling
reqlist_init_req() with overlapping ranges and this could lead to an
assertion failure [1].

In particular, with the reproducer script below [0], two
cbw_co_snapshot_block_status() callers could race, with the second
calling reqlist_init_req() before the first one finishes and removes
its conflicting request.

[0]:

> #!/bin/bash -e
> dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024
> ./qemu-img create /tmp/fleecing.raw -f raw 1G
> (
> ./qemu-system-x86_64 --qmp stdio \
> --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \
> --blockdev 
> raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \
> < {"execute": "qmp_capabilities"}
> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", 
> "file": "node0", "target": "node1", "node-name": "node3" } }
> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", 
> "file": "node3", "node-name": "snap0" } }
> {"execute": "nbd-server-start", "arguments": {"addr": { "type": "unix", 
> "data": { "path": "/tmp/nbd.socket" } } } }
> {"execute": "block-export-add", "arguments": {"id": "exp0", "node-name": 
> "snap0", "type": "nbd", "name": "exp0"}}
> EOF
> ) &
> sleep 5
> while true; do
> ./qemu-nbd -d /dev/nbd0
> ./qemu-nbd -c /dev/nbd0 nbd:unix:/tmp/nbd.socket:exportname=exp0 -f raw -r
> nbdinfo --map 'nbd+unix:///exp0?socket=/tmp/nbd.socket'
> done

[1]:

> #5  0x71e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101
> #6  0x615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23
> #7  0x6152853e2d98 in cbw_snapshot_read_lock (...) at 
> ../block/copy-before-write.c:237
> #8  0x6152853e3068 in cbw_co_snapshot_block_status (...) at 
> ../block/copy-before-write.c:304
> #9  0x6152853f4d22 in bdrv_co_snapshot_block_status (...) at 
> ../block/io.c:3726
> #10 0x61528543a63e in snapshot_access_co_block_status (...) at 
> ../block/snapshot-access.c:48
> #11 0x6152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474
> #12 0x6152853f2016 in bdrv_co_common_block_status_above (...) at 
> ../block/io.c:2652
> #13 0x6152853f22cf in bdrv_co_block_status_above (...) at 
> ../block/io.c:2732
> #14 0x6152853d9a86 in blk_co_block_status_above (...) at 
> ../block/block-backend.c:1473
> #15 0x61528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374
> #16 0x61528538deb1 in nbd_co_send_block_status (...) at 
> ../nbd/server.c:2481
> #17 0x61528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978
> #18 0x61528538f906 in nbd_trip (...) at ../nbd/server.c:3121
> #19 0x6152855a7caf in coroutine_trampoline (...) at 
> ../util/coroutine-ucontext.c:175

Cc: qemu-sta...@nongnu.org
Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Fiona Ebner 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

No changes in v3.

 block/copy-before-write.c | 3 ++-
 block/reqlist.c   | 2 --
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 50cc4c7aae..a5bb4d14f6 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -67,7 +67,8 @@ typedef struct BDRVCopyBeforeWriteState {
 
 /*
  * @frozen_read_reqs: current read requests for fleecing user in bs->file
- * node. These areas must not be rewritten by guest.
+ * node. These areas must not be rewritten by guest. There can be multiple
+ * overlapping read requests.
  */
 BlockReqList frozen_read_reqs;
 
diff --git a/block/reqlist.c b/block/reqlist.c
index 08cb57cfa4..098e807378 100644
--- a/block/reqlist.c
+++ b/block/reqlist.c
@@ -20,8 +20,6 @@
 void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
   int64_t bytes)
 {
-assert(!reqlist_find_conflict(reqs, offset, bytes));
-
 *req = (BlockReq) {
 .offset = offset,
 .bytes = bytes,
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu-server v3 22/34] restore: die early when there is no size for a device

2024-11-07 Thread Fiona Ebner
Makes it a clean error for buggy (external) backup providers where the
size might not be set at all.

Signed-off-by: Fiona Ebner 
---

No changes in v3.

 PVE/QemuServer.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 49b6ca17..30e51a8c 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6813,6 +6813,7 @@ my $restore_allocate_devices = sub {
 my $map = {};
 foreach my $virtdev (sort keys %$virtdev_hash) {
my $d = $virtdev_hash->{$virtdev};
+   die "got no size for '$virtdev'\n" if !defined($d->{size});
my $alloc_size = int(($d->{size} + 1024 - 1)/1024);
my $storeid = $d->{storeid};
my $scfg = PVE::Storage::storage_config($storecfg, $storeid);
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu v3 03/34] PVE backup: factor out setting up snapshot access for fleecing

2024-11-07 Thread Fiona Ebner
Avoids some line bloat in the create_backup_jobs_bh() function and is
in preparation for setting up the snapshot access independently of
fleecing, in particular that will be useful for providing access to
the snapshot via NBD.

Signed-off-by: Fiona Ebner 
---

No changes in v3.

 pve-backup.c | 95 
 1 file changed, 58 insertions(+), 37 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index c4178758b3..051ebffe48 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -525,6 +525,62 @@ static int coroutine_fn pvebackup_co_add_config(
 goto out;
 }
 
+/*
+ * Setup a snapshot-access block node for a device with associated fleecing 
image.
+ */
+static int setup_snapshot_access(PVEBackupDevInfo *di, Error **errp)
+{
+Error *local_err = NULL;
+
+if (!di->fleecing.bs) {
+error_setg(errp, "no associated fleecing image");
+return -1;
+}
+
+QDict *cbw_opts = qdict_new();
+qdict_put_str(cbw_opts, "driver", "copy-before-write");
+qdict_put_str(cbw_opts, "file", bdrv_get_node_name(di->bs));
+qdict_put_str(cbw_opts, "target", bdrv_get_node_name(di->fleecing.bs));
+
+if (di->bitmap) {
+/*
+ * Only guest writes to parts relevant for the backup need to be 
intercepted with
+ * old data being copied to the fleecing image.
+ */
+qdict_put_str(cbw_opts, "bitmap.node", bdrv_get_node_name(di->bs));
+qdict_put_str(cbw_opts, "bitmap.name", 
bdrv_dirty_bitmap_name(di->bitmap));
+}
+/*
+ * Fleecing storage is supposed to be fast and it's better to break backup 
than guest
+ * writes. Certain guest drivers like VirtIO-win have 60 seconds timeout 
by default, so
+ * abort a bit before that.
+ */
+qdict_put_str(cbw_opts, "on-cbw-error", "break-snapshot");
+qdict_put_int(cbw_opts, "cbw-timeout", 45);
+
+di->fleecing.cbw = bdrv_insert_node(di->bs, cbw_opts, BDRV_O_RDWR, 
&local_err);
+
+if (!di->fleecing.cbw) {
+error_setg(errp, "appending cbw node for fleecing failed: %s",
+   local_err ? error_get_pretty(local_err) : "unknown error");
+return -1;
+}
+
+QDict *snapshot_access_opts = qdict_new();
+qdict_put_str(snapshot_access_opts, "driver", "snapshot-access");
+qdict_put_str(snapshot_access_opts, "file", 
bdrv_get_node_name(di->fleecing.cbw));
+
+di->fleecing.snapshot_access =
+bdrv_open(NULL, NULL, snapshot_access_opts, BDRV_O_RDWR | 
BDRV_O_UNMAP, &local_err);
+if (!di->fleecing.snapshot_access) {
+error_setg(errp, "setting up snapshot access for fleecing failed: %s",
+   local_err ? error_get_pretty(local_err) : "unknown error");
+return -1;
+}
+
+return 0;
+}
+
 /*
  * backup_job_create can *not* be run from a coroutine, so this can't either.
  * The caller is responsible that backup_mutex is held nonetheless.
@@ -569,49 +625,14 @@ static void create_backup_jobs_bh(void *opaque) {
 const char *job_id = bdrv_get_device_name(di->bs);
 bdrv_graph_co_rdunlock();
 if (di->fleecing.bs) {
-QDict *cbw_opts = qdict_new();
-qdict_put_str(cbw_opts, "driver", "copy-before-write");
-qdict_put_str(cbw_opts, "file", bdrv_get_node_name(di->bs));
-qdict_put_str(cbw_opts, "target", 
bdrv_get_node_name(di->fleecing.bs));
-
-if (di->bitmap) {
-/*
- * Only guest writes to parts relevant for the backup need to 
be intercepted with
- * old data being copied to the fleecing image.
- */
-qdict_put_str(cbw_opts, "bitmap.node", 
bdrv_get_node_name(di->bs));
-qdict_put_str(cbw_opts, "bitmap.name", 
bdrv_dirty_bitmap_name(di->bitmap));
-}
-/*
- * Fleecing storage is supposed to be fast and it's better to 
break backup than guest
- * writes. Certain guest drivers like VirtIO-win have 60 seconds 
timeout by default, so
- * abort a bit before that.
- */
-qdict_put_str(cbw_opts, "on-cbw-error", "break-snapshot");
-qdict_put_int(cbw_opts, "cbw-timeout", 45);
-
-di->fleecing.cbw = bdrv_insert_node(di->bs, cbw_opts, BDRV_O_RDWR, 
&local_err);
-
-if (!di->fleecing.cbw) {
-error_setg(errp, "appending cbw node for fleecing failed: %s",
-   local_err ? error_get_pretty(local_err) : "unknown 
error");
-bdrv_drained_end(di->bs);
-break;
-  

[pve-devel] [RFC qemu v3 06/34] PVE backup: add target ID in backup state

2024-11-07 Thread Fiona Ebner
In preparation for allowing multiple backup providers. Each backup
target can then have its own dirty bitmap and there can be additional
checks that the current backup state is actually associated to the
expected target.

Signed-off-by: Fiona Ebner 
---

No changes in v3.

 pve-backup.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/pve-backup.c b/pve-backup.c
index d931746453..e8031bb89c 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -70,6 +70,7 @@ static struct PVEBackupState {
 JobTxn *txn;
 CoMutex backup_mutex;
 CoMutex dump_callback_mutex;
+char *target_id;
 } backup_state;
 
 static void pvebackup_init(void)
@@ -848,7 +849,7 @@ UuidInfo coroutine_fn *qmp_backup(
 
 if (backup_state.di_list) {
 error_set(errp, ERROR_CLASS_GENERIC_ERROR,
-  "previous backup not finished");
+  "previous backup by provider '%s' not finished", 
backup_state.target_id);
 qemu_co_mutex_unlock(&backup_state.backup_mutex);
 return NULL;
 }
@@ -1100,6 +1101,11 @@ UuidInfo coroutine_fn *qmp_backup(
 backup_state.vmaw = vmaw;
 backup_state.pbs = pbs;
 
+if (backup_state.target_id) {
+g_free(backup_state.target_id);
+}
+backup_state.target_id = g_strdup("Proxmox");
+
 backup_state.di_list = di_list;
 
 uuid_info = g_malloc0(sizeof(*uuid_info));
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC qemu v3 07/34] PVE backup: get device info: allow caller to specify filter for which devices use fleecing

2024-11-07 Thread Fiona Ebner
For providing snapshot-access to external backup providers, EFI and
TPM also need an associated fleecing image. The new caller will thus
need a different filter.

Signed-off-by: Fiona Ebner 
---

No changes in v3.

 pve-backup.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index e8031bb89c..d0593fc581 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -717,7 +717,7 @@ static void create_backup_jobs_bh(void *opaque) {
 /*
  * EFI disk and TPM state are small and it's just not worth setting up 
fleecing for them.
  */
-static bool device_uses_fleecing(const char *device_id)
+static bool fleecing_no_efi_tpm(const char *device_id)
 {
 return strncmp(device_id, "drive-efidisk", 13) && strncmp(device_id, 
"drive-tpmstate", 14);
 }
@@ -729,7 +729,7 @@ static bool device_uses_fleecing(const char *device_id)
  */
 static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
 const char *devlist,
-bool fleecing,
+bool (*device_uses_fleecing)(const char*),
 Error **errp)
 {
 gchar **devs = NULL;
@@ -755,7 +755,7 @@ static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
 di->bs = bs;
 di->device_name = g_strdup(bdrv_get_device_name(bs));
 
-if (fleecing && device_uses_fleecing(*d)) {
+if (device_uses_fleecing && device_uses_fleecing(*d)) {
 g_autofree gchar *fleecing_devid = g_strconcat(*d, 
"-fleecing", NULL);
 BlockBackend *fleecing_blk = blk_by_name(fleecing_devid);
 if (!fleecing_blk) {
@@ -858,7 +858,8 @@ UuidInfo coroutine_fn *qmp_backup(
 format = has_format ? format : BACKUP_FORMAT_VMA;
 
 bdrv_graph_co_rdlock();
-di_list = get_device_info(devlist, has_fleecing && fleecing, &local_err);
+di_list = get_device_info(devlist, (has_fleecing && fleecing) ? 
fleecing_no_efi_tpm : NULL,
+  &local_err);
 bdrv_graph_co_rdunlock();
 if (local_err) {
 error_propagate(errp, local_err);
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu v3 04/34] PVE backup: save device name in device info structure

2024-11-07 Thread Fiona Ebner
The device name needs to be queried while holding the graph read lock
and since it doesn't change during the whole operation, just get it
once during setup and avoid the need to query it again in different
places.

Also in preparation to use it more often in error messages and for the
upcoming external backup access API.

Signed-off-by: Fiona Ebner 
---

No changes in v3.

 pve-backup.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index 051ebffe48..33c23e53c2 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -94,6 +94,7 @@ typedef struct PVEBackupDevInfo {
 size_t size;
 uint64_t block_size;
 uint8_t dev_id;
+char* device_name;
 int completed_ret; // INT_MAX if not completed
 BdrvDirtyBitmap *bitmap;
 BlockDriverState *target;
@@ -327,6 +328,8 @@ static void coroutine_fn pvebackup_co_complete_stream(void 
*opaque)
 }
 
 di->bs = NULL;
+g_free(di->device_name);
+di->device_name = NULL;
 
 assert(di->target == NULL);
 
@@ -621,9 +624,6 @@ static void create_backup_jobs_bh(void *opaque) {
 
 BlockDriverState *source_bs = di->bs;
 bool discard_source = false;
-bdrv_graph_co_rdlock();
-const char *job_id = bdrv_get_device_name(di->bs);
-bdrv_graph_co_rdunlock();
 if (di->fleecing.bs) {
 if (setup_snapshot_access(di, &local_err) < 0) {
 error_setg(errp, "setting up snapshot access for fleecing 
failed: %s",
@@ -654,7 +654,7 @@ static void create_backup_jobs_bh(void *opaque) {
 }
 
 BlockJob *job = backup_job_create(
-job_id, source_bs, di->target, backup_state.speed, sync_mode, 
di->bitmap,
+di->device_name, source_bs, di->target, backup_state.speed, 
sync_mode, di->bitmap,
 bitmap_mode, false, discard_source, NULL, &perf, 
BLOCKDEV_ON_ERROR_REPORT,
 BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, 
backup_state.txn,
 &local_err);
@@ -751,6 +751,7 @@ static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
 }
 PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
 di->bs = bs;
+di->device_name = g_strdup(bdrv_get_device_name(bs));
 
 if (fleecing && device_uses_fleecing(*d)) {
 g_autofree gchar *fleecing_devid = g_strconcat(*d, 
"-fleecing", NULL);
@@ -789,6 +790,7 @@ static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
 
 PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
 di->bs = bs;
+di->device_name = g_strdup(bdrv_get_device_name(bs));
 di_list = g_list_append(di_list, di);
 }
 }
@@ -956,9 +958,6 @@ UuidInfo coroutine_fn *qmp_backup(
 
 di->block_size = dump_cb_block_size;
 
-bdrv_graph_co_rdlock();
-const char *devname = bdrv_get_device_name(di->bs);
-bdrv_graph_co_rdunlock();
 PBSBitmapAction action = PBS_BITMAP_ACTION_NOT_USED;
 size_t dirty = di->size;
 
@@ -973,7 +972,8 @@ UuidInfo coroutine_fn *qmp_backup(
 }
 action = PBS_BITMAP_ACTION_NEW;
 } else {
-expect_only_dirty = proxmox_backup_check_incremental(pbs, 
devname, di->size) != 0;
+expect_only_dirty =
+proxmox_backup_check_incremental(pbs, di->device_name, 
di->size) != 0;
 }
 
 if (expect_only_dirty) {
@@ -997,7 +997,8 @@ UuidInfo coroutine_fn *qmp_backup(
 }
 }
 
-int dev_id = proxmox_backup_co_register_image(pbs, devname, 
di->size, expect_only_dirty, errp);
+int dev_id = proxmox_backup_co_register_image(pbs, 
di->device_name, di->size,
+  expect_only_dirty, 
errp);
 if (dev_id < 0) {
 goto err_mutex;
 }
@@ -1009,7 +1010,7 @@ UuidInfo coroutine_fn *qmp_backup(
 di->dev_id = dev_id;
 
 PBSBitmapInfo *info = g_malloc(sizeof(*info));
-info->drive = g_strdup(devname);
+info->drive = g_strdup(di->device_name);
 info->action = action;
 info->size = di->size;
 info->dirty = dirty;
@@ -1034,10 +1035,7 @@ UuidInfo coroutine_fn *qmp_backup(
 goto err_mutex;
 }
 
-bdrv_graph_co_rdlock();
-const char *devname = bdrv_get_device_name(di->bs);
-bdrv_graph_co_rdunlock();
-di->dev_id = vma_writer_register_stream(vmaw, devname, di->size);
+di->dev_id = vma_writer_register_stream(vmaw, di->device_name, 
di->size);
   

[pve-devel] [PATCH qemu v3 02/34] PVE backup: fixup error handling for fleecing

2024-11-07 Thread Fiona Ebner
The drained section needs to be terminated before breaking out of the
loop in the error scenarios. Otherwise, guest IO on the drive would
become stuck.

If the job is created successfully, then the job completion callback
will clean up the snapshot access block nodes. In case failure
happened before the job is created, there was no cleanup for the
snapshot access block nodes yet. Add it.

Signed-off-by: Fiona Ebner 
---

No changes in v3.

 pve-backup.c | 38 +-
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index 4e730aa3da..c4178758b3 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -357,22 +357,23 @@ static void coroutine_fn 
pvebackup_co_complete_stream(void *opaque)
 qemu_co_mutex_unlock(&backup_state.backup_mutex);
 }
 
+static void cleanup_snapshot_access(PVEBackupDevInfo *di)
+{
+if (di->fleecing.snapshot_access) {
+bdrv_unref(di->fleecing.snapshot_access);
+di->fleecing.snapshot_access = NULL;
+}
+if (di->fleecing.cbw) {
+bdrv_cbw_drop(di->fleecing.cbw);
+di->fleecing.cbw = NULL;
+}
+}
+
 static void pvebackup_complete_cb(void *opaque, int ret)
 {
 PVEBackupDevInfo *di = opaque;
 di->completed_ret = ret;
 
-/*
- * Handle block-graph specific cleanup (for fleecing) outside of the 
coroutine, because the work
- * won't be done as a coroutine anyways:
- * - For snapshot_access, allows doing bdrv_unref() directly. Doing it via 
bdrv_co_unref() would
- *   just spawn a BH calling bdrv_unref().
- * - For cbw, draining would need to spawn a BH.
- */
-if (di->fleecing.snapshot_access) {
-bdrv_unref(di->fleecing.snapshot_access);
-di->fleecing.snapshot_access = NULL;
-}
 if (di->fleecing.cbw) {
 /*
  * With fleecing, failure for cbw does not fail the guest write, but 
only sets the snapshot
@@ -383,10 +384,17 @@ static void pvebackup_complete_cb(void *opaque, int ret)
 if (di->completed_ret == -EACCES && snapshot_error) {
 di->completed_ret = snapshot_error;
 }
-bdrv_cbw_drop(di->fleecing.cbw);
-di->fleecing.cbw = NULL;
 }
 
+/*
+ * Handle block-graph specific cleanup (for fleecing) outside of the 
coroutine, because the work
+ * won't be done as a coroutine anyways:
+ * - For snapshot_access, allows doing bdrv_unref() directly. Doing it via 
bdrv_co_unref() would
+ *   just spawn a BH calling bdrv_unref().
+ * - For cbw, draining would need to spawn a BH.
+ */
+cleanup_snapshot_access(di);
+
 /*
  * Needs to happen outside of coroutine, because it takes the graph write 
lock.
  */
@@ -587,6 +595,7 @@ static void create_backup_jobs_bh(void *opaque) {
 if (!di->fleecing.cbw) {
 error_setg(errp, "appending cbw node for fleecing failed: %s",
local_err ? error_get_pretty(local_err) : "unknown 
error");
+bdrv_drained_end(di->bs);
 break;
 }
 
@@ -599,6 +608,8 @@ static void create_backup_jobs_bh(void *opaque) {
 if (!di->fleecing.snapshot_access) {
 error_setg(errp, "setting up snapshot access for fleecing 
failed: %s",
local_err ? error_get_pretty(local_err) : "unknown 
error");
+cleanup_snapshot_access(di);
+bdrv_drained_end(di->bs);
 break;
 }
 source_bs = di->fleecing.snapshot_access;
@@ -637,6 +648,7 @@ static void create_backup_jobs_bh(void *opaque) {
 }
 
 if (!job || local_err) {
+cleanup_snapshot_access(di);
 error_setg(errp, "backup_job_create failed: %s",
local_err ? error_get_pretty(local_err) : "null");
 break;
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH v2 storage 1/1] file_size_info: implement untrusted mode

2024-11-07 Thread Fiona Ebner
Am 04.11.24 um 11:42 schrieb Fabian Grünbichler:
> this allows checking some extra attributes for images which come from a
> potentially malicious source.
> 
> since file_size_info is not part of the plugin API, no API bump is needed. if
> desired, a similar check could also be implemented in volume_size_info, which
> would entail bumping both APIVER and APIAGE (since the additional parameter
> would make checking untrusted volumes opt-in for external plugins).
> 
> Signed-off-by: Fabian Grünbichler 
> Reviewed-by: Fiona Ebner 

Tested-by: Fiona Ebner 

(FWIW it breaks my directory-based backup provider example in case of
incremental backups, because that relied on qcow2 backing files O:P)

> @@ -975,18 +985,34 @@ sub file_size_info {
>   warn $err_output;
>  }
>  if (!$json) {
> + die "failed to query file information with qemu-img\n" if $untrusted;

git complains about "space before tab" here


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [PATCH master+bookworm-6.8 kernel] cherry-pick fix mitigating host reboot issue affecting certain AMD Zen4 CPU models

2024-11-06 Thread Fiona Ebner
Reported in the community forum:
Issue: https://forum.proxmox.com/threads/139500/
Fix: https://forum.proxmox.com/threads/139500/post-717968

Signed-off-by: Fiona Ebner 
---

Should also be applied to the 6.8 kernel while renumbering the newly
added patch.

 ...r-virtualized-VMLOAD-VMSAVE-on-Zen4-.patch | 44 +++
 1 file changed, 44 insertions(+)
 create mode 100644 
patches/kernel/0015-x86-CPU-AMD-Clear-virtualized-VMLOAD-VMSAVE-on-Zen4-.patch

diff --git 
a/patches/kernel/0015-x86-CPU-AMD-Clear-virtualized-VMLOAD-VMSAVE-on-Zen4-.patch
 
b/patches/kernel/0015-x86-CPU-AMD-Clear-virtualized-VMLOAD-VMSAVE-on-Zen4-.patch
new file mode 100644
index 000..76957b6
--- /dev/null
+++ 
b/patches/kernel/0015-x86-CPU-AMD-Clear-virtualized-VMLOAD-VMSAVE-on-Zen4-.patch
@@ -0,0 +1,44 @@
+From  Mon Sep 17 00:00:00 2001
+From: Mario Limonciello 
+Date: Tue, 5 Nov 2024 10:02:34 -0600
+Subject: [PATCH] x86/CPU/AMD: Clear virtualized VMLOAD/VMSAVE on Zen4 client
+
+A number of Zen4 client SoCs advertise the ability to use virtualized
+VMLOAD/VMSAVE, but using these instructions is reported to be a cause
+of a random host reboot.
+
+These instructions aren't intended to be advertised on Zen4 client
+so clear the capability.
+
+Signed-off-by: Mario Limonciello 
+Signed-off-by: Borislav Petkov (AMD) 
+Cc: sta...@vger.kernel.org
+Link: https://bugzilla.kernel.org/show_bug.cgi?id=219009
+(cherry picked from commit a5ca1dc46a6b610dd4627d8b633d6c84f9724ef0)
+Signed-off-by: Fiona Ebner 
+---
+ arch/x86/kernel/cpu/amd.c | 11 +++
+ 1 file changed, 11 insertions(+)
+
+diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
+index 1e0fe5f8ab84..ee87f997d31f 100644
+--- a/arch/x86/kernel/cpu/amd.c
 b/arch/x86/kernel/cpu/amd.c
+@@ -924,6 +924,17 @@ static void init_amd_zen4(struct cpuinfo_x86 *c)
+ {
+   if (!cpu_has(c, X86_FEATURE_HYPERVISOR))
+   msr_set_bit(MSR_ZEN4_BP_CFG, 
MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT);
++
++  /*
++   * These Zen4 SoCs advertise support for virtualized VMLOAD/VMSAVE
++   * in some BIOS versions but they can lead to random host reboots.
++   */
++  switch (c->x86_model) {
++  case 0x18 ... 0x1f:
++  case 0x60 ... 0x7f:
++  clear_cpu_cap(c, X86_FEATURE_V_VMSAVE_VMLOAD);
++  break;
++  }
+ }
+ 
+ static void init_amd_zen5(struct cpuinfo_x86 *c)
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH kernel] d/rules: enable CONFIG_MEMCG_V1 needed for legacy containers

2024-11-05 Thread Fiona Ebner
As reported in the community forum [0], containers requiring cgroup v1
would not start anymore, even when systemd.unified_cgroup_hierarchy=0
was set on the kernel commandline. The error message would be:

> cgfsng_setup_limits_legacy: 3442 No such file or directory - Failed to set 
> "memory.limit_in_bytes" to "536870912"

Kernel commit e93d4166b40a ("mm: memcg: put cgroup v1-specific code
under a config option") made it necessary to explicitly enable the new
associated Kconfig flag.

[0]: https://forum.proxmox.com/threads/156830/

Signed-off-by: Fiona Ebner 
---
 debian/rules | 1 +
 1 file changed, 1 insertion(+)

diff --git a/debian/rules b/debian/rules
index f19a5b8..e1444b7 100755
--- a/debian/rules
+++ b/debian/rules
@@ -65,6 +65,7 @@ PMX_CONFIG_OPTS= \
 -e CONFIG_MODULE_SIG_SHA512 \
 -d CONFIG_MEMCG_DISABLED \
 -e CONFIG_MEMCG_SWAP_ENABLED \
+-e CONFIG_MEMCG_V1 \
 -e CONFIG_HYPERV \
 -m CONFIG_VFIO_IOMMU_TYPE1 \
 -e CONFIG_VFIO_VIRQFD \
-- 
2.39.5



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH manager] ui: cloudinit: disallow unprivileged users to regenerate images

2024-11-04 Thread Fiona Ebner
Am 31.10.24 um 15:06 schrieb Daniel Kral:
> Disables the "Regenerate image" button in the VM CloudInit tab for
> users, which lack the necessary permissions to do so, which are
> "VM.Config.CloudInit" and "VM.Config.CDROM". This is checked by the VM
> config update API endpoint in qemu-server, when ejecting and re-adding
> CloudInit drive images.
> 
> This is consistent with the permissions restricting access to adding
> CloudInit drives in the Hardware tab.
> 
> This is a cosmetic change as the VM config update API endpoint would
> fail because of insufficient permissions anyway.
> 

Since those users can already use the cloudinit_update endpoint (just
not via UI) and thus effectively don't need "VM.Config.CDROM", I think
it's better to change the UI to use that endpoint as well.

The endpoint was added here [0] and the switch in the UI was already
proposed [1] as part of that but it seems never applied.

[0]:
https://lore.proxmox.com/pve-devel/20220622115206.3295425-1-aderum...@odiso.com/

[1]:
https://lore.proxmox.com/pve-devel/20220622141335.3616129-2-aderum...@odiso.com/

> Signed-off-by: Daniel Kral 
> ---
> I stumbled upon this while checking on BugZilla #3105 and trying to use
> the "Regenerate image" button with a user, that had only VM.Audit and
> VM.Config.CloudInit permissions. where the user missed the
> "VM.Config.CDROM" permission and the action couldn't be completed.
> 
>  www/manager6/qemu/CloudInit.js | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/www/manager6/qemu/CloudInit.js b/www/manager6/qemu/CloudInit.js
> index 49519726..aeef8c15 100644
> --- a/www/manager6/qemu/CloudInit.js
> +++ b/www/manager6/qemu/CloudInit.js
> @@ -142,7 +142,10 @@ Ext.define('PVE.qemu.CloudInit', {
>   }
>   });
>  
> - me.down('#savebtn').setDisabled(!found);
> + let caps = Ext.state.Manager.get('GuiCap');
> + let canSaveImage = !!caps.vms['VM.Config.CDROM'] && 
> !!caps.vms['VM.Config.Cloudinit'];
> + me.down('#savebtn').setDisabled(!found || !canSaveImage);
> +
>   me.setDisabled(!found);
>   if (!found) {
>   me.getView().mask(gettext('No CloudInit Drive found'), 
> ['pve-static-mask']);


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu] async snapshot: stop vCPU throttling after finishing

2024-10-30 Thread Fiona Ebner
In the community forum, users reported issues about RCU stalls and
sluggish VMs after taking a snapshot with RAM in Proxmox VE [0]. Mario
was also experiencing similar issues from time to time and recently,
obtained a GDB stacktrace. The stacktrace showed that, in his case,
the vCPU threads were waiting in cpu_throttle_thread(). It is a good
guess that the issues in the forum could also be because of that.

>From searching in the source code, it seems that migration is the only
user of the vCPU throttling functions in QEMU relevant for Proxmox VE
(the only other place where it is used is the Cocoa UI). In
particular, RAM migration will begin throttling vCPUs for
auto-converge.

In migration_iteration_finish() there is an unconditional call to
cpu_throttle_stop(), so do the same in the async snapshot code
specific to Proxmox VE.

It's not clear why the issue began to surface more prominently only
now, since the vCPU throttling was there since commit 070afca258
("migration: Dynamic cpu throttling for auto-converge") in QEMU
v2.10.0. However, there were a lot of changes in the migration code
between v8.1.5 and v9.0.2 and a few of them might have affected the
likelihood of cpu_throttle_set() being called, for example, 4e1871c450
("migration: Don't serialize devices in qemu_savevm_state_iterate()")

[0]: https://forum.proxmox.com/threads/153483

Reported-by: Mario Loderer 
Signed-off-by: Fiona Ebner 
Tested-by: Mario Loderer 
---
 ...-async-for-background-state-snapshots.patch | 18 +-
 ...-add-optional-buffer-size-to-QEMUFile.patch |  6 +++---
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git 
a/debian/patches/pve/0017-PVE-add-savevm-async-for-background-state-snapshots.patch
 
b/debian/patches/pve/0017-PVE-add-savevm-async-for-background-state-snapshots.patch
index 964caf3..b0e75e9 100644
--- 
a/debian/patches/pve/0017-PVE-add-savevm-async-for-background-state-snapshots.patch
+++ 
b/debian/patches/pve/0017-PVE-add-savevm-async-for-background-state-snapshots.patch
@@ -28,7 +28,8 @@ Signed-off-by: Stefan Reiter 
  adapt to removal of QEMUFileOps
  improve condition for entering final stage
  adapt to QAPI and other changes for 8.2
- make sure to not call vm_start() from coroutine]
+ make sure to not call vm_start() from coroutine
+ stop CPU throttling after finishing]
 Signed-off-by: Fiona Ebner 
 ---
  hmp-commands-info.hx |  13 +
@@ -36,13 +37,13 @@ Signed-off-by: Fiona Ebner 
  include/migration/snapshot.h |   2 +
  include/monitor/hmp.h|   3 +
  migration/meson.build|   1 +
- migration/savevm-async.c | 538 +++
+ migration/savevm-async.c | 545 +++
  monitor/hmp-cmds.c   |  38 +++
  qapi/migration.json  |  34 +++
  qapi/misc.json   |  18 ++
  qemu-options.hx  |  12 +
  system/vl.c  |  10 +
- 11 files changed, 686 insertions(+)
+ 11 files changed, 693 insertions(+)
  create mode 100644 migration/savevm-async.c
 
 diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
@@ -140,10 +141,10 @@ index 95d1cf2250..800f12a60d 100644
'threadinfo.c',
 diff --git a/migration/savevm-async.c b/migration/savevm-async.c
 new file mode 100644
-index 00..72cf6588c2
+index 00..1af32604c7
 --- /dev/null
 +++ b/migration/savevm-async.c
-@@ -0,0 +1,538 @@
+@@ -0,0 +1,545 @@
 +#include "qemu/osdep.h"
 +#include "migration/channel-savevm-async.h"
 +#include "migration/migration.h"
@@ -154,6 +155,7 @@ index 00..72cf6588c2
 +#include "migration/global_state.h"
 +#include "migration/ram.h"
 +#include "migration/qemu-file.h"
++#include "sysemu/cpu-throttle.h"
 +#include "sysemu/sysemu.h"
 +#include "sysemu/runstate.h"
 +#include "block/block.h"
@@ -342,6 +344,12 @@ index 00..72cf6588c2
 +ret || aborted ? MIGRATION_STATUS_FAILED : 
MIGRATION_STATUS_COMPLETED);
 +ms->to_dst_file = NULL;
 +
++/*
++ * Same as in migration_iteration_finish(): saving RAM might've turned on 
CPU throttling for
++ * auto-converge, make sure to disable it.
++ */
++cpu_throttle_stop();
++
 +qemu_savevm_state_cleanup();
 +
 +ret = save_snapshot_cleanup();
diff --git 
a/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch 
b/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch
index 043b249..92bc9f2 100644
--- a/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch
+++ b/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch
@@ -193,10 +193,10 @@ index 32fd4a34fd..36a0cd8cc8 100644
  
  /*
 diff --git a/migration/savevm-async.c b/migration/savevm-async.c
-index 72cf6588c2..fb4e8ea689 100644
+index 1af32604c7..be2035cd2e 100644
 --- a/migration/savevm-async.c
 +++ b/migrati

Re: [pve-devel] [PATCH container] fix #5761: add the "discard" mount option

2024-10-25 Thread Fiona Ebner
Am 09.10.24 um 16:22 schrieb Filip Schauer:
> Introduce the "discard" mount option for rootfs and mount points. This
> ensures that unused container volume blocks are discarded from the
> underlying storage backend when deleting files within the container.
> 
> Signed-off-by: Filip Schauer 

Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 

However, this misses the UI part or you can get non-editable volumes there.

> ---
>  src/PVE/LXC/Config.pm | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm
> index ce64c4c..e980f8a 100644
> --- a/src/PVE/LXC/Config.pm
> +++ b/src/PVE/LXC/Config.pm
> @@ -311,7 +311,7 @@ sub __snapshot_rollback_get_unused {
>  cfs_register_file('/lxc/', \&parse_pct_config, \&write_pct_config);
>  
>  
> -my $valid_mount_option_re = qr/(noatime|lazytime|nodev|nosuid|noexec)/;
> +my $valid_mount_option_re = 
> qr/(discard|lazytime|noatime|nodev|noexec|nosuid)/;
>  
>  sub is_valid_mount_option {
>  my ($option) = @_;


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH qemu 1/2] vm-network-scripts: move scripts to /usr/libexec

2024-10-24 Thread Fiona Ebner
It's the "qemu-server" repository, not "qemu".

Am 09.10.24 um 14:55 schrieb Maximiliano Sandoval:
> Moves the network scripts from /var/lib/qemu-server into
> /usr/libexec/qemu-server.
> 
> /usr/libexec is described as binaries run by programs which are not
> intended to be directly executed by the user on [FHS 4.7]. On the other
> hand /var/lib corresponds to variable state information, which does not
> fit the use case here, see [FHS 5.8].
> 
> [FHS 4.7]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s07.html
> [FHS 5.8]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch05s08.html
> 

Can this cause issues for a brief time during upgrade/unpacking, i.e.

A) if the old QemuServer.pm is still loaded while the scripts get
removed from its old location (or to be more precise, a QEMU process
with the old parameter is started at that time)
B) if the new QemuServer.pm gets loaded before the scripts are extracted
to their new location (or to be more precise, a QEMU process with the
new parameter is started at that time)

?

Is there something preventing those from happening? Otherwise, we might
need to ship the scripts to both locations until the next major release
and drop them from the old location only then.

> Signed-off-by: Maximiliano Sandoval 
> ---
>  PVE/QemuServer.pm |  4 ++--
>  vm-network-scripts/Makefile   | 10 +-
>  vm-network-scripts/pve-bridge-hotplug |  2 +-
>  3 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index b26da505..88100638 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -1759,8 +1759,8 @@ sub print_netdev_full {
>  my $script = $hotplug ? "pve-bridge-hotplug" : "pve-bridge";
>  
>  if ($net->{bridge}) {
> - $netdev = 
> "type=tap,id=$netid,ifname=${ifname},script=/var/lib/qemu-server/$script"
> - .",downscript=/var/lib/qemu-server/pve-bridgedown$vhostparam";
> + $netdev = 
> "type=tap,id=$netid,ifname=${ifname},script=/usr/libexec/qemu-server/$script"
> + .",downscript=/usr/libexec/qemu-server/pve-bridgedown$vhostparam";
>  } else {
>  $netdev = "type=user,id=$netid,hostname=$vmname";
>  }
> diff --git a/vm-network-scripts/Makefile b/vm-network-scripts/Makefile
> index 5ba537d0..dc2c85ff 100644
> --- a/vm-network-scripts/Makefile
> +++ b/vm-network-scripts/Makefile
> @@ -1,12 +1,12 @@
>  DESTDIR=
> -VARLIBDIR=$(DESTDIR)/var/lib/qemu-server
> +LIBEXECDIR=$(DESTDIR)/usr/libexec/qemu-server
>  
>  .PHONY: install
>  install: pve-bridge pve-bridge-hotplug pve-bridgedown
> - install -d ${VARLIBDIR}
> - install -m 0755 pve-bridge ${VARLIBDIR}/pve-bridge
> - install -m 0755 pve-bridge-hotplug ${VARLIBDIR}/pve-bridge-hotplug
> - install -m 0755 pve-bridgedown ${VARLIBDIR}/pve-bridgedown
> + install -d ${LIBEXECDIR}
> + install -m 0755 pve-bridge ${LIBEXECDIR}/pve-bridge
> + install -m 0755 pve-bridge-hotplug ${LIBEXECDIR}/pve-bridge-hotplug
> + install -m 0755 pve-bridgedown ${LIBEXECDIR}/pve-bridgedown
>  
>  .PHONY: clean
>  clean:
> diff --git a/vm-network-scripts/pve-bridge-hotplug 
> b/vm-network-scripts/pve-bridge-hotplug
> index f36ed408..3ae01ea5 100755
> --- a/vm-network-scripts/pve-bridge-hotplug
> +++ b/vm-network-scripts/pve-bridge-hotplug
> @@ -1,3 +1,3 @@
>  #!/bin/sh
>  
> -exec /var/lib/qemu-server/pve-bridge --hotplug "$@"
> +exec /usr/libexec/qemu-server/pve-bridge --hotplug "$@"


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH qemu] fix typos in user-visible strings

2024-10-24 Thread Fiona Ebner
Am 08.10.24 um 17:14 schrieb Maximiliano Sandoval:
> This includes docs, and strings printed to stderr or stdout.
> 
> These were caught with:
> 
> typos --exclude test --exclude changelog
> 
> Signed-off-by: Maximiliano Sandoval 

applied, thanks!

But please note that it's the "qemu-server" repository, not "qemu" ;)


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH qemu] api: qemu: create: default cpu to x86-64-v2-AES

2024-10-23 Thread Fiona Ebner
Am 22.10.24 um 15:35 schrieb Thomas Lamprecht:
> Am 22/10/2024 um 14:15 schrieb Maximiliano Sandoval:
>> Thomas Lamprecht  writes:
>>
>>> And even then, would this still break restoring backups made before that
>>> change?
>>> If, this would fall under the changes that need versioning the guest
>>> configs. Which naturally is possible but is IMO also not something I'd
>>> do lightly, as that's something that might have wide-reaching consequences.
>>
>> Would it make more sense to do this only for the CLI tool, and only when 
>> creating new VMs?
>>
> 
> Hmm, it would feel a bit odd to me to single that specific "mismatch" between
> UI default and CLI/schema out but ignore the others, like e.g., the memory
> default of 2048, or some dynamic defaults like the OS-type dependent ones.
> 
> And while it would have reduced impact, it would be still a breaking change
> that might affect peoples automation/scripts.
> But if more than just the CPU type would be looked at, and adapted such that
> UI and CLI behave more the same, it might be indeed a feasible way to improve
> UX slightly on the next major release next year.
> Conveying the reason for different defaults and how one can look into which
> one will be applied would warrant a short paragraph in the docs then too.
> And taking the chance and looking at the CT story on this would be great too,
> e.g., like is 512 MB really still a good default for UI/CLI, or unprivileged
> vs. privileged.

Maybe we could have (datacenter-wide?) default profiles (one for
containers, one for VMs) whose values are used for new guest creation?
Users could modify them if desired and they could also be used to
pre-fill values in the UI. Could then be shipped for PVE 9 with sensible
values.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [[PATCH kernel]] fix 5683: netfs: reset subreq iov iter before tail clean

2024-10-22 Thread Fiona Ebner
Am 02.10.24 um 16:36 schrieb Christian Ebner:
> Fixes rare read corruption issues using the in kernel ceph client.
> 
> On incomplete read requests, the clean tail flag should make sure to
> zero fill the remaining bytes for the subrequest.
> If the iov iterator is not at the correct position, this can however
> zero fill downloaded data, corrupting the read content.
> 
> Link to issue:
> https://bugzilla.proxmox.com/show_bug.cgi?id=5683
> 
> Link to upstream issue:
> https://bugzilla.kernel.org/show_bug.cgi?id=219237
> 
> Signed-off-by: Christian Ebner 
> ---
> This fixes the read corruption issue with my local reproducer.
> 
> Providing a patched kernel to users affected by the issue for testing
> would be probably the best way to verify the fix.
> 
> Also, I reached out once again to the kernel developers asking if
> this fix is a valid approach, hoping this can be included in current
> stable (as the patch does fix the issue also when applied on 6.11.1).
> 
>  ...et-subreq-iov-iter-before-tail-clean.patch | 31 +++
>  1 file changed, 31 insertions(+)
>  create mode 100644 
> patches/kernel/0021-netfs-reset-subreq-iov-iter-before-tail-clean.patch
> 
> diff --git 
> a/patches/kernel/0021-netfs-reset-subreq-iov-iter-before-tail-clean.patch 
> b/patches/kernel/0021-netfs-reset-subreq-iov-iter-before-tail-clean.patch
> new file mode 100644
> index 000..a87e722
> --- /dev/null
> +++ b/patches/kernel/0021-netfs-reset-subreq-iov-iter-before-tail-clean.patch
> @@ -0,0 +1,31 @@
> +From cd27abf0c555f39b12c05f9f6a8cb59ff25dfe45 Mon Sep 17 00:00:00 2001
> +From: Christian Ebner 
> +Date: Wed, 2 Oct 2024 15:24:31 +0200
> +Subject: [PATCH] netfs: reset subreq iov iter before tail clean
> +
> +Make sure the iter is at the correct location when cleaning up tail
> +bytes for incomplete read subrequests.
> +

Disclaimer that I'm not familiar at all with the code.

So AFAIU, after short IO, the iov_iter_count() and subreq->len -
subreq->transferred might disagree. That is why before resubmission,
netfs_reset_subreq_iter() is called. That function aligns the iterator
position, so it will match the information from 'subreq'.

In your edge case, there is no resubmission though, because the
NETFS_SREQ_CLEAR_TAIL flag is set. But it still was short IO, so the
mentioned mismatch happened.

Now netfs_clear_unread() relies on the information from
iov_iter_count(), which does not match the actual 'subreq'. To fix it,
you call netfs_reset_subreq_iter() (like is done before resubmission) to
align that information.

Before commit 92b6cc5d1e7c ("netfs: Add iov_iters to (sub)requests to
describe various buffers"), the information from the 'subreq' was used
to set up the iterator:

> diff --git a/fs/netfs/io.c b/fs/netfs/io.c
> index 7f753380e047..e9d408e211b8 100644
> --- a/fs/netfs/io.c
> +++ b/fs/netfs/io.c
> @@ -21,12 +21,7 @@
>   */
>  static void netfs_clear_unread(struct netfs_io_subrequest *subreq)
>  {
> -   struct iov_iter iter;
> -
> -   iov_iter_xarray(&iter, ITER_DEST, &subreq->rreq->mapping->i_pages,
> -   subreq->start + subreq->transferred,
> -   subreq->len   - subreq->transferred);
> -   iov_iter_zero(iov_iter_count(&iter), &iter);
> +   iov_iter_zero(iov_iter_count(&subreq->io_iter), &subreq->io_iter);
>  }

so that sounds good :)

So with and without your change, after the netfs_clear_unread() call,
the iterator will be in the final position, i.e. iov_iter_count() == 0?
Then the information in 'subreq' is updated manually in the same branch
and it moves on to completion.

How far off from reality am I ;)? FWIW, the change looks okay to me, but
again, I'm not familiar with the code and I haven't done any testing
(and have no reproducer).

Of course it would be much nicer to have some confirmation from upstream
and/or users about this.

> +Fixes: 92b6cc5d ("netfs: Add iov_iters to (sub)requests to describe various 
> buffers")
> +Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219237
> +
> +Signed-off-by: Christian Ebner 
> +---
> + fs/netfs/io.c | 1 +
> + 1 file changed, 1 insertion(+)
> +
> +diff --git a/fs/netfs/io.c b/fs/netfs/io.c
> +index d6ada4eba744..500119285346 100644
> +--- a/fs/netfs/io.c
>  b/fs/netfs/io.c
> +@@ -528,6 +528,7 @@ void netfs_subreq_terminated(struct netfs_io_subrequest 
> *subreq,
> + 
> + incomplete:
> + if (test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags)) {
> ++netfs_reset_subreq_iter(rreq, subreq);
> + netfs_clear_unread(subreq);
> + subreq->transferred = subreq->len;
> + goto complete;
> +-- 
> +2.39.5
> +


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH qemu] api: qemu: create: default cpu to x86-64-v2-AES

2024-10-22 Thread Fiona Ebner
Am 01.10.24 um 16:08 schrieb Maximiliano Sandoval:
> This makes it so newly created VMs, e.g. with `qm create` will have the
> same default value as VMs created via the web UI.
> 
> Signed-off-by: Maximiliano Sandoval 
> ---
> I am not entirely sure if there is side-effect that I am not taking into 
> account.
> 

This is a breaking change, because existing API callers now suddenly get
a new default. Even if it were just CLI and not API, it would be
breaking for the same reason (there are scripts using the CLI tools out
there).

If we do this, then in a major release and prominently communicate it to
all users in the release notes. And it should also be documented it in
the API schema, that creation uses another default than the schema default.

>  PVE/API2/Qemu.pm | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> index d25a79fe..383218fd 100644
> --- a/PVE/API2/Qemu.pm
> +++ b/PVE/API2/Qemu.pm
> @@ -1152,6 +1152,10 @@ __PACKAGE__->register_method({
>   $conf->{vmgenid} = PVE::QemuServer::generate_uuid();
>   }
>  
> + if (!defined($param->{cpu})) {
> + $conf->{cpu} = 'x86-64-v2-AES';
> + }
> +
>   my $machine_conf = 
> PVE::QemuServer::Machine::parse_machine($conf->{machine});
>   my $machine = $machine_conf->{type};
>   if (!$machine || $machine =~ m/^(?:pc|q35|virt)$/) {



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH qemu-server] vm_start: add syslog info with which PID a VM was started

2024-10-22 Thread Fiona Ebner
Am 26.09.24 um 10:36 schrieb Daniel Kral:
> Adds a syslog entry to log the process id that has been given to the
> QEMU VM process at start. This is helpful debugging information if the
> pid shows up at other places, like a kernel stack trace, while the VM
> has been running, but cannot be retrieved anymore (e.g. the pidfile has
> been deleted or only the syslog is available).
> 
> The syslog has been put in the `PVE::QemuServer::vm_start_nolock`
> subroutine to make sure that the PID is logged not only when the VM has
> been started by the API endpoint `vm_start`, but also when the VM is
> started by a remote migration.
> 
> Signed-off-by: Daniel Kral 
> Suggested-by: Hannes Dürr 
> Suggested-by: Friedrich Weber 

Nit: the Signed-off-by trailer should come after the Suggested-by
trailers, because it was the last thing that happened chronologically
(the change was first suggested, then you wrote it ;)).

With the nits addressed:

Reviewed-by: Fiona Ebner 

> ---
>  PVE/QemuServer.pm | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index b26da505..d998b360 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -43,6 +43,7 @@ use PVE::ProcFSTools;
>  use PVE::PBSClient;
>  use PVE::RESTEnvironment qw(log_warn);
>  use PVE::RPCEnvironment;
> +use PVE::SafeSyslog;
>  use PVE::Storage;
>  use PVE::SysFSTools;
>  use PVE::Systemd;
> @@ -5971,6 +5972,8 @@ sub vm_start_nolock {
>  eval { PVE::QemuServer::PCI::reserve_pci_usage($pci_reserve_list, $vmid, 
> undef, $pid) };
>  warn $@ if $@;
>  
> +syslog("info", "VM $vmid started with pid $pid.");

Nit: I'd capitalize "PID"

> +
>  if (defined(my $migrate = $res->{migrate})) {
>   if ($migrate->{proto} eq 'tcp') {
>   my $nodename = nodename();


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] applied: [PATCH pve-storage] lvmplugin: fix: activate|deactivate volume, add missing storeid param in path sub

2024-10-22 Thread Fiona Ebner
Am 25.09.24 um 11:00 schrieb Alexandre Derumier via pve-devel:
> $storeid param is missing and $snapname is used as third param.
> 
> This seem to works actually because $snapname is always empty in lvm
> 
> Signed-off-by: Alexandre Derumier 

applied, thanks!


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH manager] fix #5787: ui: display vm description in confirm dialogs

2024-10-22 Thread Fiona Ebner
Am 22.10.24 um 08:58 schrieb Thomas Lamprecht:
> Am 21/10/2024 um 14:17 schrieb Timothy Nicholson:
>> Signed-off-by: Timothy Nicholson 
>> ---
>>
>> This patch adds a new function to the PVE Utils that formats a task 
>> confirmation message to display in confirm dialogs, for example when 
>> shutting down a VM. As requested by the Bugzilla entry #5787, the message 
>> now includes the VM name in addition to the VM ID.
> 
> why isn't above stated in the commit message but only as patch note?
> 

Also please note (from [0]):

> Make sure the line-length (text-width, column count) of the commit's message 
> is not longer than 72 characters.

which is a good limit for readability in mails in general :)

[0]:
https://pve.proxmox.com/wiki/Developer_Documentation#Commits_and_Commit_Messages


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [RFC PATCH 1/2] config: allow hypens and dots in config keys

2024-10-21 Thread Fiona Ebner
Am 24.09.24 um 16:35 schrieb Maximiliano Sandoval:
> The first character still has to be a letter.
> 
> Signed-off-by: Maximiliano Sandoval 
> ---
>  PVE/QemuServer.pm | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index b26da505..1566cf91 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -2382,7 +2382,7 @@ sub parse_vm_config {
>   } else {
>   $handle_error->("vm $vmid - property 'delete' is only allowed 
> in [PENDING]\n");
>   }
> - } elsif ($line =~ m/^([a-z][a-z_]*\d*):\s*(.+?)\s*$/) {
> + } elsif ($line =~ m/^([a-z][a-z_\.-]*\d*):\s*(.+?)\s*$/) {
>   my $key = $1;
>   my $value = $2;
>   if ($section eq 'cloudinit') {

Regarding the hyphen, see also:
https://lore.proxmox.com/pve-devel/7c911e11-d1cb-415b-9282-b40faa34d...@proxmox.com/#t

Also friendly ping for @Thomas for that mail, in particular, for where
this should be documented.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH v3 storage] fix #5191: api, cli: implement moving a volume between storages

2024-10-21 Thread Fiona Ebner
Am 18.09.24 um 16:56 schrieb Filip Schauer:
> On 05/09/2024 14:12, Fiona Ebner wrote:
>>> @@ -483,15 +485,173 @@ __PACKAGE__->register_method ({
>>>   return $upid;
>>>   }});
>>>   +sub volume_move {
>> Should this even be a new top-level method? Or can/should we extend
>> export/import instead, to not only cover guest images? Because with this
>> top-level method we block the way for external storage plugins to
>> support this functionality too if they don't adhere to our assumptions.
> 
> 
> We do not use the existing export/import because we want to use `move`
> instead of copying when `--delete` is passed. This improves performance
> when moving between storages on the same storage backend.
> 

Is that really a common enough use case? I mean, can be fine if the
added complexity is not too much, but it feels a bit like duplication
since we could also re-use storage_migrate() and add the special casing
there.

> Other than that the feedback was incorporated in patch v4:
> https://lists.proxmox.com/pipermail/pve-devel/2024-September/065380.html
> 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH v4 storage 2/6] api: content: implement moving a volume between storages

2024-10-21 Thread Fiona Ebner
Am 18.09.24 um 16:49 schrieb Filip Schauer:
> +sub volume_move {
> +my ($cfg, $source_volid, $target_storeid, $delete) = @_;
> +
> +my ($source_storeid, $source_volname) = 
> PVE::Storage::parse_volume_id($source_volid, 0);
> +
> +die "source and target storage cannot be the same\n" if $source_storeid 
> eq $target_storeid;
> +
> +my $source_scfg = PVE::Storage::storage_config($cfg, $source_storeid);
> +my $source_plugin = PVE::Storage::Plugin->lookup($source_scfg->{type});
> +my ($vtype) = $source_plugin->parse_volname($source_volname);
> +
> +die "source storage '$source_storeid' does not support volumes of type 
> '$vtype'\n"
> + if !$source_scfg->{content}->{$vtype};
> +
> +my $target_scfg = PVE::Storage::storage_config($cfg, $target_storeid);
> +die "target storage '$target_storeid' does not support volumes of type 
> '$vtype'\n"
> + if !$target_scfg->{content}->{$vtype};
> +
> +die "use pct move-volume or qm disk move\n" if $vtype eq 'images' || 
> $vtype eq 'rootdir';
> +die "moving volume of type '$vtype' not implemented\n"
> + if ($vtype ne 'iso' && $vtype ne 'vztmpl' && $vtype ne 'snippets');
> +
> +PVE::Storage::activate_storage($cfg, $source_storeid);
> +
> +die "expected storage '$source_storeid' to be path based\n" if 
> !$source_scfg->{path};
> +my $source_path = $source_plugin->path($source_scfg, $source_volname, 
> $source_storeid);
> +die "$source_path does not exist" if (!-e $source_path);
> +my $source_dirname = dirname($source_path);
> +
> +die "expected storage '$target_storeid' to be path based\n" if 
> !$target_scfg->{path};
> +my $target_plugin = PVE::Storage::Plugin->lookup($target_scfg->{type});
> +my $target_path = $target_plugin->path($target_scfg, $source_volname, 
> $target_storeid);
> +
> +PVE::Storage::activate_storage($cfg, $target_storeid);
> +die "$target_path already exists" if (-e $target_path);
> +

What if a volume with the same name is created after this check, but
before the move/copy? I guess you'll need to lock the storage (see e.g.
vdisk_alloc() in Storage.pm for comparison).

> +my @created_files = ();
> +
> +eval {
> + if ($delete) {
> + move($source_path, $target_path) or die "failed to move $vtype: $!";
> + } else {
> + copy($source_path, $target_path) or die "failed to copy $vtype: $!";
> + }
> +};
> +if (my $err = $@) {
> + for my $created_file (@created_files) {
> + unlink $created_file or $!{ENOENT} or warn $!;
> + }
> + die $err;
> +}
> +
> +PVE::Storage::archive_remove($source_path) if $delete;
> +
> +return;


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH pve-firmware] d/control: add Conflicts and Replaces for firmware-realtek-rtl8723cs-bt

2024-10-21 Thread Fiona Ebner
Reported in the community forum [0].

Both firmware files that are in the package
firmware-realtek-rtl8723cs-bt [1]

> /lib/firmware/rtl_bt/rtl8723cs_xx_config.bin
> /lib/firmware/rtl_bt/rtl8723cs_xx_fw.bin

are present in pve-firmware since the linux-firmware submodule
includes commit ed9c1349 ("rtl_bt: Add firmware file for the the
RTL8723CS Bluetooth part").

[0]: https://forum.proxmox.com/threads/132671/post-706514
[1]: 
https://packages.debian.org/bookworm/all/firmware-realtek-rtl8723cs-bt/filelist

Signed-off-by: Fiona Ebner 
---
 debian/control | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/debian/control b/debian/control
index 1fc601b..57e45c6 100644
--- a/debian/control
+++ b/debian/control
@@ -31,6 +31,7 @@ Conflicts: firmware-amd-graphics,
firmware-qlogic,
firmware-ralink,
firmware-realtek,
+   firmware-realtek-rtl8723cs-bt,
firmware-siano,
firmware-ti-connectivity,
 Replaces: firmware-amd-graphics,
@@ -54,6 +55,7 @@ Replaces: firmware-amd-graphics,
   firmware-qlogic,
   firmware-ralink,
   firmware-realtek,
+  firmware-realtek-rtl8723cs-bt,
   firmware-siano,
   firmware-ti-connectivity,
 Description: Binary firmware code for the pve-kernel
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [RFC container v2 22/25] backup: implement backup for external providers

2024-09-16 Thread Fiona Ebner
Am 13.09.24 um 08:19 schrieb Fabian Grünbichler:
>> Fiona Ebner  hat am 12.09.2024 15:38 CEST geschrieben:
>>  
>> Am 12.09.24 um 14:43 schrieb Fabian Grünbichler:
>>> On August 13, 2024 3:28 pm, Fiona Ebner wrote:
>>>> +  $info->{'firewall-config'} = $firewall_file if -e $firewall_file;
>>>> +  $info->{'bandwidth-limit'} = $opts->{bwlimit} * 1024 if 
>>>> $opts->{bwlimit};
>>>> +  $backup_provider->backup_container($vmid, $config_file, $id_map, 
>>>> $findexcl, $info);
>>>
>>> it might be easier to hide the idmapping from the backup provider? e.g.,
>>> hand it a idmapped bindmount or something like that?
>>>
>>
>> Yes, that would be nicer. But could that potentially lead to permission
>> issues? A mid/long term plan is to have the backup provider code run
>> with lower privileges. I suppose to later implement that, the subroutine
>> for the provider could run within a matching user namespace too?
> 
> yeah, I think there are a few options here
> - run the provider as root-in-user-ns, give it access to the mapped FS (this 
> is how we do regular backups, but requires some glue code/forking)

Gave this a try. Issue is that the backup provider also needs access to
the backup target/etc. Can network access also be an issue (I guess it
is not for PBS)?

E.g. directory example plugin fails with
> ERROR: Backup of VM 112 failed - unable to open file 
> '/mnt/pve/sparschwein/112/lxc-1726484790/guest.conf.tmp.125275' - Permission 
> denied
and Borg plugin fails with
> ERROR: Backup of VM 112 failed - mkdir /run/pve-storage-borg-plugin: 
> Permission denied at /usr/share/perl5/PVE/BackupProvider/Plugin/Borg.pm line 
> 41
or after switching to /tmp with
> ERROR: Backup of VM 112 failed - file '/etc/pve/priv/storage/borg.pw' exists 
> but open for reading failed - Permission denied

Less coupling with the associated storage plugin or a special kind of
"unprivileged" storage plugin would help. In PBS we do the
storage-plugin-related stuff first with root privileges and only run the
final pbs-client command in user namespace. Maybe we need something like
that here too, a preparatory method run as root that prepares for the
unprivileged backup operation? But that makes life more complicated for
provider implementers (and also us).

> - run the provider as root-on-host, give it access to a reverse-mapped FS 
> somehow (well, it would be nicer to run the backup code in the userns instead 
> of as root)

I'd try and go with this option for now if that is okay.

> - run the provider as root-on-host, give it access to the mapped FS and let 
> it handle the (un)mapping itself (if they are not familiar with namespaces, 
> this might go wrong)
> 
> so if we find a generic way to do the first variant, we are both closer to 
> how we do backups, and err on the side of caution w.r.t. context of execution.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [RFC container v2 23/25] backup: implement restore for external providers

2024-09-13 Thread Fiona Ebner
Am 13.09.24 um 08:35 schrieb Fabian Grünbichler:
> 
>> Fiona Ebner  hat am 12.09.2024 16:08 CEST geschrieben:
>>
>>  
>> Am 12.09.24 um 15:56 schrieb Fiona Ebner:
>>> Am 12.09.24 um 14:43 schrieb Fabian Grünbichler:
>>>> On August 13, 2024 3:28 pm, Fiona Ebner wrote:
>>>>> + } elsif ($mechanism eq 'directory') {
>>>>> + my $directory = $info->{'archive-directory'}
>>>>> + or die "did not get path to archive directory from backup 
>>>>> provider\n";
>>>>> + die "not a directory '$directory'" if !-d $directory;
>>>>> +
>>>>> + my $rsync = ['rsync', '--stats', '-h', '-X', '-A', '--numeric-ids', 
>>>>> '-aH', '--delete',
>>>>> + '--no-whole-file', '--sparse', '--one-file-system', 
>>>>> '--relative'];
>>>>> + push $rsync->@*, '--bwlimit', $bwlimit if $bwlimit;
>>>>> + push $rsync->@*, "${directory}/./", $rootdir;
>>>>
>>>> and this as well?
>>>>
>>>
>>> Good catch, will fix!
>>>
>>
>> Hmm, then rsync won't be able to access the source (for my Borg example)
>> anymore :/
>>
>> WARN: rsync: [sender] change_dir
>> "/run/pve-storage-borg-plugin/pve-lxc-111-2024-08-13T09:34:25Z.restore-container/filesystem"
>> failed: Permission denied (13)
>>
>> Wit restore_tar_archive we stream the contents via stdin, can't do that
>> here. But maybe some kind of bind mount to make it accessible?
> 
> or rsync-on-host piped to rsync-in-ns ? haven't tried though

Would that require setting up an rsync daemon process? Or how would you
achieve the split? The man page says that --server/sender should not be
used:

> INTERNAL OPTIONS
>The options --server and --sender are used internally by rsync, and 
> should never be typed by a user under normal circumstances.  Some awareness 
> of these options may be needed in certain  sce‐
>narios,  such  as when setting up a login that can only run an rsync 
> command.  For instance, the support directory of the rsync distribution has 
> an example script named rrsync (for restricted
>rsync) that can be used with a restricted ssh login.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] Bug 2582 roadmap

2024-09-13 Thread Fiona Ebner
Hi Pavel,

Am 10.09.24 um 12:18 schrieb Pavel Tide via pve-devel:
> Hi Proxmox team,
> 
> Would you provide any delivery estimates on this item?
> https://bugzilla.proxmox.com/show_bug.cgi?id=2582
> 
> As far as I understand it's been implemented already, but currently stays in 
> the development branch - our lab is up to date and yet we don't see how we 
> can create a separate non-root account to work with PVE cluster.
> 

a patch series had been proposed, but the implementation was not
finished AFAIK, see Fabian's review[0]. Since Oguz left the company,
nobody else has picked up work on the series yet. Maybe you can describe
what exactly your use case is, which privileges you'd need in
particular. Of course, proposing patches for what you need (or a rebased
version of Oguz's full series) is welcome too :)

[0]:
https://lore.proxmox.com/pve-devel/1658908014.zeyifvlr1o.astr...@nora.none/

Best Regards,
Fiona


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [RFC container v2 23/25] backup: implement restore for external providers

2024-09-12 Thread Fiona Ebner
Am 12.09.24 um 15:56 schrieb Fiona Ebner:
> Am 12.09.24 um 14:43 schrieb Fabian Grünbichler:
>> On August 13, 2024 3:28 pm, Fiona Ebner wrote:
>>> +   } elsif ($mechanism eq 'directory') {
>>> +   my $directory = $info->{'archive-directory'}
>>> +   or die "did not get path to archive directory from backup 
>>> provider\n";
>>> +   die "not a directory '$directory'" if !-d $directory;
>>> +
>>> +   my $rsync = ['rsync', '--stats', '-h', '-X', '-A', '--numeric-ids', 
>>> '-aH', '--delete',
>>> +   '--no-whole-file', '--sparse', '--one-file-system', 
>>> '--relative'];
>>> +   push $rsync->@*, '--bwlimit', $bwlimit if $bwlimit;
>>> +   push $rsync->@*, "${directory}/./", $rootdir;
>>
>> and this as well?
>>
> 
> Good catch, will fix!
> 

Hmm, then rsync won't be able to access the source (for my Borg example)
anymore :/

WARN: rsync: [sender] change_dir
"/run/pve-storage-borg-plugin/pve-lxc-111-2024-08-13T09:34:25Z.restore-container/filesystem"
failed: Permission denied (13)

Wit restore_tar_archive we stream the contents via stdin, can't do that
here. But maybe some kind of bind mount to make it accessible?


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [RFC container v2 23/25] backup: implement restore for external providers

2024-09-12 Thread Fiona Ebner
Am 12.09.24 um 14:43 schrieb Fabian Grünbichler:
> On August 13, 2024 3:28 pm, Fiona Ebner wrote:
>> @@ -118,6 +136,55 @@ sub restore_tar_archive {
>>  die $err if $err && !$no_unpack_error;
>>  }
>>  
>> +sub restore_external_archive {
>> +my ($backup_provider, $storeid, $volname, $rootdir, $conf, 
>> $no_unpack_error, $bwlimit) = @_;
>> +
>> +my ($mechanism, $vmtype) = 
>> $backup_provider->restore_get_mechanism($volname, $storeid);
>> +die "cannot restore non-LXC guest of type '$vmtype'\n" if $vmtype ne 
>> 'lxc';
>> +
>> +my $info = $backup_provider->restore_container_init($volname, $storeid, 
>> {});
>> +eval {
>> +if ($mechanism eq 'tar') {
>> +my $tar_path = $info->{'tar-path'}
>> +or die "did not get path to tar file from backup provider\n";
>> +die "not a regular file '$tar_path'" if !-f $tar_path;
>> +restore_tar_archive($tar_path, $rootdir, $conf, $no_unpack_error, 
>> $bwlimit);
> 
> shouldn't this be `lxc-userns-exec`-ed?
> 

The restore_tar_archive() function does that AFAICS.

>> +} elsif ($mechanism eq 'directory') {
>> +my $directory = $info->{'archive-directory'}
>> +or die "did not get path to archive directory from backup 
>> provider\n";
>> +die "not a directory '$directory'" if !-d $directory;
>> +
>> +my $rsync = ['rsync', '--stats', '-h', '-X', '-A', '--numeric-ids', 
>> '-aH', '--delete',
>> +'--no-whole-file', '--sparse', '--one-file-system', 
>> '--relative'];
>> +push $rsync->@*, '--bwlimit', $bwlimit if $bwlimit;
>> +push $rsync->@*, "${directory}/./", $rootdir;
> 
> and this as well?
> 

Good catch, will fix!

> also, for both tar and rsync we probably need to think about how to
> prevent bogus input here (which might be user-creatable if they have
> write access to the backup storage) from violating our assumptions..
> 
What assumptions do you mean exactly?


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [RFC container v2 22/25] backup: implement backup for external providers

2024-09-12 Thread Fiona Ebner
Am 12.09.24 um 14:43 schrieb Fabian Grünbichler:
> On August 13, 2024 3:28 pm, Fiona Ebner wrote:
>> +$info->{'firewall-config'} = $firewall_file if -e $firewall_file;
>> +$info->{'bandwidth-limit'} = $opts->{bwlimit} * 1024 if 
>> $opts->{bwlimit};
>> +$backup_provider->backup_container($vmid, $config_file, $id_map, 
>> $findexcl, $info);
> 
> it might be easier to hide the idmapping from the backup provider? e.g.,
> hand it a idmapped bindmount or something like that?
>

Yes, that would be nicer. But could that potentially lead to permission
issues? A mid/long term plan is to have the backup provider code run
with lower privileges. I suppose to later implement that, the subroutine
for the provider could run within a matching user namespace too?


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [RFC qemu-server v2 21/25] backup: implement restore for external providers

2024-09-12 Thread Fiona Ebner
Am 12.09.24 um 14:44 schrieb Fabian Grünbichler:
> On August 13, 2024 3:28 pm, Fiona Ebner wrote:
>> +
>> +# allocate volumes
>> +my $map = $restore_allocate_devices->($storecfg, $virtdev_hash, $vmid);
>> +
>> +for my $virtdev (sort keys $virtdev_hash->%*) {
>> +my $d = $virtdev_hash->{$virtdev};
>> +next if $d->{is_cloudinit}; # no need to restore cloudinit
>> +
>> +my $info =
>> +$backup_provider->restore_vm_volume_init($volname, $storeid, 
>> $d->{devname}, {});
>> +my $source_path = $info->{'qemu-img-path'}
>> +or die "did not get source image path from backup provider\n";
>> +eval {
>> +qemu_img_convert(
>> +$source_path, $d->{volid}, $d->{size}, undef, 0, 
>> $options->{bwlimit});
>> +};
> 
> this definitely needs to get a call to file_size_info with import
> hardening patches applied ;)
> 

Sure, added a reminder to myself for now :)


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [RFC storage v2 10/25] plugin: introduce new_backup_provider() method

2024-09-12 Thread Fiona Ebner
Am 12.09.24 um 14:43 schrieb Fabian Grünbichler:
> On August 13, 2024 3:28 pm, Fiona Ebner wrote:
>>
>> Container mechanism 'directory':
>>
>> The backup provider gives the path to a directory with the full
>> filesystem structure of the container.
>>
>> Container mechanism 'directory':
>>
>> The backup provider gives the path to a (potentially compressed) tar
>> archive with the full filesystem structure of the container.
> 
> same as in the cover letter ;) base on the code here I assume the second
> one should be tar. I wonder whether just tar wouldn't be enough (easier
> to not get ACLs/xattrs/ownership/.. right)?
> 

Yes, should be tar, will fix!

I'd guess it is more convenient for many providers to expose (a FUSE
mount of) a directory. Using tar can mean more work. E.g. with Borg,
mounting the archive seems much cheaper than creating the tar. It's also
that the archive looks like:
guest.config
firewall.config
filesystem/
and while borg has "export-tar" where one can specify specific paths,
the tar will still contain the "filesystem/" prefix. Not sure if there
is an easy way to get rid of that.

>> diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
>> index 6444390..d5b76ae 100644
>> --- a/src/PVE/Storage/Plugin.pm
>> +++ b/src/PVE/Storage/Plugin.pm
>> @@ -1755,6 +1755,21 @@ sub rename_volume {
>>  return "${storeid}:${base}${target_vmid}/${target_volname}";
>>  }
>>  
>> +# Used by storage plugins for external backup providers. See 
>> PVE::BackupProvider::Plugin for the API
>> +# the provider needs to implement.
>> +#
>> +# $scfg - the storage configuration
>> +# $storeid - the storage ID
>> +# $log_function($log_level, $message) - this log function can be used to 
>> write to the backup task
>> +#   log in Proxmox VE. $log_level is 'info', 'warn' or 'err', $message is 
>> the message to be printed.
>> +#
>> +# Returns a blessed reference to the backup provider class.
>> +sub new_backup_provider {
>> +my ($class, $scfg, $storeid, $log_function) = @_;
>> +
>> +return;
>> +}
> 
> would it maybe make sense to make this a "die implement me" and make the
> opt-in via the storage plugin features? it would be more in line with
> what we do in other parts and less subtle..
> 

We don't have a method for storage plugin features yet, only
volume_has_feature() and the stand-alone storage_can_replicate(). We
could generalize (and deprecate) the latter though.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [PATCH v2 container] fix #3367: prevent adding bind and device mountpoints to template

2024-09-12 Thread Fiona Ebner
There is a check in the clone API call that prohibits cloning when a
bind or device mountpoint is present. Converting to a template also
will fail when such a mountpoint is present. In particular, that
failure will also trigger at the end of a restore operation, because
restore will also try to convert to a template again. Add a safeguard
so that users won't get into the problematic situation in the first
place.

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* different approach with safeguard instead of silently dropping
  bind mounts during conversion to template

Previous discussion:
https://lore.proxmox.com/pve-devel/20210402123636.27037-1-f.eb...@proxmox.com/T/#u

 src/PVE/LXC/Config.pm | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm
index ce64c4c..09687ab 100644
--- a/src/PVE/LXC/Config.pm
+++ b/src/PVE/LXC/Config.pm
@@ -1181,7 +1181,13 @@ sub update_pct_config {
if ($opt =~ m/^mp(\d+)$/ || $opt eq 'rootfs') {
$class->check_protection($conf, "can't update CT $vmid drive 
'$opt'");
my $mp = $class->parse_volume($opt, $value);
-   $check_content_type->($mp) if ($mp->{type} eq 'volume');
+   if ($mp->{type} eq 'volume') {
+   $check_content_type->($mp);
+   } elsif ($class->is_template($conf)) {
+   # TODO allow bind mounts once admin-defined bind mount sources 
and corresponding ACL
+   # support is implemented
+   die "cannot add mountpoint '$opt' of type '$mp->{type}' to 
template\n";
+   }
} elsif ($opt eq 'hookscript') {
PVE::GuestHelpers::check_hookscript($value);
} elsif ($opt eq 'nameserver') {
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC v2 qemu] savevm-async: reuse migration blocker check for snapshots/hibernation

2024-09-11 Thread Fiona Ebner
Same rationale as with upstream QEMU commit 5aaac46793 ("migration:
savevm: consult migration blockers"), migration and (async) snapshot
are essentially the same operation and thus snapshot also needs to
check for migration blockers. For example, this catches passed-through
PCI devices, where the driver does not support migration.

However, the commit notes:

> There is really no difference between live migration and savevm, except
> that savevm does not require bdrv_invalidate_cache to be implemented
> by all disks.  However, it is unlikely that savevm is used with anything
> except qcow2 disks, so the penalty is small and worth the improvement
> in catching bad usage of savevm.

and for Proxmox VE, suspend-to-disk with VMDK does use savevm-async
and would be broken by simply using migration_is_blocked(). To keep
this working, introduce a new helper that filters blockers with the
prefix used by the VMDK migration blocker.

The function qemu_savevm_state_blocked() is called as part of
savevm_async_is_blocked() so no check is lost with this
patch. The helper is declared in migration/migration.c to be able to
access the 'migration_blockers'.

The VMDK blocker message is declared via a '#define', because using a
'const char*' led to the linker to complain about multiple
declarations. The message does not include the reference to the block
node anymore, but it would've just been a generated one like
'#block334' in any case, which didn't help users identify the VMDK
disk before the change either.

Note, this also "breaks" snapshot and hibernate with VNC clipboard by
preventing it. Previously, this would "work", because the Proxmox VE
API has no check yet, but the clipboard will be broken after rollback,
in the sense that it cannot be used anymore, not just lost contents.
So some users might consider adding the check here a breaking change
even if it's technically correct to prevent snapshot and hibernate
with VNC clipboard. But other users might rightfully complain about
broken clipboard. And again, the check also prevents blockers from
passed-through PCI devices, etc. so it seems worth tolerating that
breakage.

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* hard-code blocker message to get a conflict if it changes
  upstream.
* avoid making helper more general than it needs to be (had a
  parameter for matching the prefix of blocker messages).

 block/vmdk.c|  4 +---
 include/migration/blocker.h |  2 ++
 migration/migration.c   | 24 
 migration/migration.h   |  1 +
 migration/savevm-async.c|  2 +-
 5 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 3b82979fdf..eb15109620 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1403,9 +1403,7 @@ static int vmdk_open(BlockDriverState *bs, QDict 
*options, int flags,
 qemu_co_mutex_init(&s->lock);
 
 /* Disable migration when VMDK images are used */
-error_setg(&s->migration_blocker, "The vmdk format used by node '%s' "
-   "does not support live migration",
-   bdrv_get_device_or_node_name(bs));
+error_setg(&s->migration_blocker, "%s", MIGRATION_BLOCKER_VMDK);
 ret = migrate_add_blocker_normal(&s->migration_blocker, errp);
 if (ret < 0) {
 goto fail;
diff --git a/include/migration/blocker.h b/include/migration/blocker.h
index a687ac0efe..f36bfb2df1 100644
--- a/include/migration/blocker.h
+++ b/include/migration/blocker.h
@@ -18,6 +18,8 @@
 
 #define MIG_MODE_ALL MIG_MODE__MAX
 
+#define MIGRATION_BLOCKER_VMDK "The vmdk format used by a disk does not 
support live migration"
+
 /**
  * @migrate_add_blocker - prevent all modes of migration from proceeding
  *
diff --git a/migration/migration.c b/migration/migration.c
index b8d7e471a4..b18c3720b2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1913,6 +1913,30 @@ bool migration_is_blocked(Error **errp)
 return false;
 }
 
+bool savevm_async_is_blocked(Error **errp)
+{
+GSList *blockers = migration_blockers[migrate_mode()];
+
+if (qemu_savevm_state_blocked(errp)) {
+return true;
+}
+
+/*
+ * The limitation for VMDK images only applies to live-migration, not
+ * snapshots, see commit 5aaac46793 ("migration: savevm: consult migration
+ * blockers").
+ */
+while (blockers) {
+if (strcmp(error_get_pretty(blockers->data), MIGRATION_BLOCKER_VMDK)) {
+error_propagate(errp, error_copy(blockers->data));
+return true;
+}
+blockers = g_slist_next(blockers);
+}
+
+return false;
+}
+
 /* Returns true if continue to migrate, or false if error detected */
 static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
  

Re: [pve-devel] [PATCH v2 common] REST environment: warn helpers: also log messages to syslog

2024-09-11 Thread Fiona Ebner
Ping

Am 12.06.24 um 11:51 schrieb Fiona Ebner:
> Ping
> 
> Am 09.02.24 um 15:15 schrieb Fiona Ebner:
>> for better visibility. When not in a task, warnings from these helpers
>> are only logged to STDERR, which is particularly unhelpful in case of
>> daemons. This is the main motivation behind this change.
>>
>> For tasks, warnings from these helpers are already more visible on the
>> UI side, but when looking at the syslog, one can only see the warning
>> count from the task right now, not the actual messages. This is
>> another reason in favor of the change.
>>
>> Reported-by: Friedrich Weber 
>> Suggested-by: Thomas Lamprecht 
>> Signed-off-by: Fiona Ebner 
>> ---
>>
>> Changes in v2:
>> * Go with the approach suggested by Thomas.
>>
>>  src/PVE/RESTEnvironment.pm | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
>> index 191c6eb..bfde7e6 100644
>> --- a/src/PVE/RESTEnvironment.pm
>> +++ b/src/PVE/RESTEnvironment.pm
>> @@ -723,6 +723,7 @@ sub log_warn {
>>  $rest_env->warn($message);
>>  } else {
>>  chomp($message);
>> +syslog('warning', "%s", $message);
>>  print STDERR "WARN: $message\n";
>>  }
>>  }
>> @@ -732,6 +733,7 @@ sub warn {
>>  
>>  chomp($message);
>>  
>> +syslog('warning', "%s", $message);
>>  print STDERR "WARN: $message\n";
>>  
>>  $self->{warning_count}++;
> 
> 
> ___
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH pve-storage] qcow2 format: enable subcluster allocation by default

2024-09-11 Thread Fiona Ebner
Am 03.07.24 um 16:24 schrieb Alexandre Derumier via pve-devel:
> 
> 
> extended_l2 is an optimisation to reduce write amplification.
> Currently,without it, when a vm write 4k, a full 64k cluster

s/write/writes/

> need to be writen.

needs to be written.

> 
> When enabled, the cluster is splitted in 32 subclusters.

s/splitted/split/

> 
> We use a 128k cluster by default, to have 32 * 4k subclusters
> 
> https://blogs.igalia.com/berto/2020/12/03/subcluster-allocation-for-qcow2-images/
> https://static.sched.com/hosted_files/kvmforum2020/d9/qcow2-subcluster-allocation.pdf
> 
> some stats for 4k randwrite benchmark

Can you please share the exact command you used? What kind of underlying
disks do you have?

> 
> Cluster size   Without subclusters With subclusters
> 16 KB  5859 IOPS   8063 IOPS
> 32 KB  5674 IOPS   11107 IOPS
> 64 KB  2527 IOPS   12731 IOPS
> 128 KB 1576 IOPS   11808 IOPS
> 256 KB 976 IOPS 9195 IOPS
> 512 KB 510 IOPS 7079 IOPS
> 1 MB   448 IOPS 3306 IOPS
> 2 MB   262 IOPS 2269 IOPS
> 

How does read performance compare for you (with 128 KiB cluster size)?

I don't see any noticeable difference in my testing with an ext4
directory storage on an SSD, attaching the qcow2 images as SCSI disks to
the VM, neither for reading nor writing. I only tested without your
change and with your change using 4k (rand)read and (rand)write.

I'm not sure we should enable this for everybody, there's always a risk
to break stuff with added complexity. Maybe it's better to have a
storage configuration option that people can opt-in to, e.g.

qcow2-create-opts extended_l2=on,cluster_size=128k

If we get enough positive feedback, we can still change the default in a
future (major) release.

> Signed-off-by: Alexandre Derumier 
> ---
>  src/PVE/Storage/Plugin.pm | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
> index 6444390..31b20fe 100644
> --- a/src/PVE/Storage/Plugin.pm
> +++ b/src/PVE/Storage/Plugin.pm
> @@ -561,7 +561,7 @@ sub preallocation_cmd_option {
>   die "preallocation mode '$prealloc' not supported by format '$fmt'\n"
>   if !$QCOW2_PREALLOCATION->{$prealloc};
>  
> - return "preallocation=$prealloc";
> + return "preallocation=$prealloc,extended_l2=on,cluster_size=128k";

Also, it doesn't really fit here in the preallocation helper as the
helper is specific to that setting.

>  } elsif ($fmt eq 'raw') {
>   $prealloc = $prealloc // 'off';
>   $prealloc = 'off' if $prealloc eq 'metadata';
> -- 
> 2.39.2
> 
> 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] qcow2 internal snapshot: 4k write speed reduce to 100~200 iops on ssd ?

2024-09-11 Thread Fiona Ebner
Hi,

Am 30.08.24 um 18:01 schrieb DERUMIER, Alexandre via pve-devel:
> Hi,
> 
> I was doing tests with gfs2 && ocfs2,
> 
> and I have notice 4k randwrite iops going from 2 iops to
> 100~200iops when a snapshot is present.
> 
> I thinked it was related to gfs2 && ocfs2 allocation, 
> but I can reproduce too with a simple qcow2 file on
> a local ssd drive.
> 

Is the performance drop also this big with the local storage?

> is it expected ?
> 
> (I don't have use qcow2 snapshot since 10year, so I really don't 
>  remember performance).
> 

A performance drop is of course expected, because AFAIU it needs to do
COW for the sectors that the snapshot references, but the drop does
sound rather dramatic in your case. The performance should be better
again once the COW is done, e.g. when running another test afterwards. I
quickly tested in a nested VM with qcow2 disk on a directory storage and
--numjobs=4:

ran test, got ~33k IOPS
made a snapshot
ran test, got ~20k IOPS (it starts out very slow and then recovers)
ran test, got ~33k IOPS again

Best Regards,
Fiona

> 
> Using external snapshot file, I'm around 1iops.
> 
> test inside the vm:
> 
> fio --filename=/dev/sdX --ioengine=libaio --direct 1 --bs=4k  --
> rw=randwrite --numjobs=64
> 
> 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH storage 1/2] fix #3873: btrfs: check for correct subvolume taking snapshot

2024-09-10 Thread Fiona Ebner
Am 09.07.24 um 13:51 schrieb Maximiliano Sandoval:
> Suppose we are doing a snapshot of disk 0 for VM 100. The
> dir_glob_foreach runs over $path=/subvolume/images/100, lists all
> snapshot names and appends their names to the path of the disk, e.g.
> /subvolume/images/vm-100-disk-0@SNAP_NAME, but the original directory
> $path might contain a second disk `vm-100-disk-1` which is also listed
> by the dir_glib_foreach.
> 
> The patch skips images which reference other disks.
> 
> Signed-off-by: Maximiliano Sandoval 
> ---
>  src/PVE/Storage/BTRFSPlugin.pm | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/PVE/Storage/BTRFSPlugin.pm b/src/PVE/Storage/BTRFSPlugin.pm
> index 42815cb..dc0420d 100644
> --- a/src/PVE/Storage/BTRFSPlugin.pm
> +++ b/src/PVE/Storage/BTRFSPlugin.pm
> @@ -767,6 +767,9 @@ sub volume_export {
>   push @$cmd, (map { "$path\@$_" } ($with_snapshots // [])->@*), $path;
>  } else {
>   dir_glob_foreach(dirname($path), $BTRFS_VOL_REGEX, sub {

I feel like this would be a bit nicer (and slightly more robust) if it
used foreach_subvol() and would check the basename like done in
free_image(). Or even better, introduce a new helper that iterates over
all snapshots of a specific volume to not repeat ourselves here and in
free_image(). Since foreach_subvol() is only used once (i.e. in
free_image()), we could even replace that. There's another pre-existing
thing that's a bit confusing, which is that BTRFS_VOL_REGEX only matches
snapshot volumes and not the actual volume itself AFAICS.

> + if (index($path, $_[1]) < 0) {
> + return
> + }
>   push @$cmd, "$path\@$_[2]" if !(defined($snapshot) && $_[2] eq 
> $snapshot);
>   });
>  }


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH storage v3 2/2] d/control: dependencies: add bzip2, gzip, lzop, zstd

2024-09-10 Thread Fiona Ebner
Am 10.09.24 um 14:52 schrieb Fabian Grünbichler:
>> Fiona Ebner  hat am 10.09.2024 14:24 CEST geschrieben:
>> Am 12.08.24 um 13:44 schrieb Maximiliano Sandoval:
>>> The decompressor_info method calls binaries provided by these packages
>>> so they are (alphabetically) added explicitly as dependencies.
>>>
>>> To avoid a build-time error
>>>
>>> E: libpve-storage-perl: 
>>> depends-on-essential-package-without-using-version Depends: gzip
>>>
>>> the current minor version available in bullseye was set for gzip.
>>>
>>
>> Since I didn't get that error, I'm interested: what command are you
>> building with? Why the version for Bullseye?
> 
> it's wrong in any case - gzip is Essential (installed on all Debian systems). 
> you don't ever need a dependency (build or otherwise) for it. this is what 
> lintian is trying to tell you - it's wrong to depend on gzip without a 
> version constraint (and okay to use it without a dependency). *only* if you 
> have a certain version requirement you need to add the dependency (with the 
> proper version qualifier) to be able to encode that version requirement.
>  

I see, thank you for the explanation!

>> I guess many systems already have bzip2 installed, but do we really want
>> to require it for everybody? Or should we rather keep it optional (could
>> be a Recommends or Suggests dependency) and add a clean error if not
>> installed?
> 
> I am not sure about *how* many (most dev machines yes, since it's 
> build-essential and required by a lot of build-related packages ;)). it's 
> only 124kB though, and libbz2-1.0 is always installed (it's transitively 
> Essential via dpkg), so it doesn't pull in anything that isn't installed on 
> all standard Debian-based systems anyway since it only depends on that and 
> libc6. I don't really see a reason to use Recommends or Suggests here, even 
> if bz2 compress isos are not that common.

Okay :)


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH storage v3 1/2] fix #5267: storage: add bzip2 support

2024-09-10 Thread Fiona Ebner
Am 12.08.24 um 13:44 schrieb Maximiliano Sandoval:
> A popular ISO compressed exclusively with bz2 is OPNsense [2].
> 
> Since this requires adding `bz2` to the list of known compression
> formats we add decompression methods for vmz a tar.

Typo here I presume.

> diff --git a/src/test/list_volumes_test.pm b/src/test/list_volumes_test.pm
> index d155cb9..ba959ff 100644
> --- a/src/test/list_volumes_test.pm
> +++ b/src/test/list_volumes_test.pm
> @@ -189,6 +189,7 @@ my @tests = (
>   "$storage_dir/dump/vzdump-lxc-16112-2020_03_30-21_49_30.tar.gz",
>   "$storage_dir/dump/vzdump-lxc-16112-2020_03_30-21_49_30.tar.zst",
>   "$storage_dir/dump/vzdump-lxc-16112-2020_03_30-21_59_30.tgz",
> + "$storage_dir/dump/vzdump-openvz-16112-2020_03_30-21_39_30.tar.bz2",
>   ],
>   expected => [
>   {
> @@ -237,6 +238,15 @@ my @tests = (
>   'vmid'=> '16112',
>   'volid'   => 
> 'local:backup/vzdump-lxc-16112-2020_03_30-21_59_30.tgz',
>   },
> + {
> + 'content' => 'backup',
> + 'ctime'   => 1585604370,
> + 'format'  => 'tar.bz2',
> + 'size'=> DEFAULT_SIZE,
> + 'subtype' => 'openvz',
> + 'vmid'=> '16112',
> + 'volid'   => 
> 'local:backup/vzdump-openvz-16112-2020_03_30-21_39_30.tar.bz2',
> + },
>   ],
>  },
>  {
> @@ -440,7 +450,6 @@ my @tests = (
>   
> "$storage_dir/template/cache/debian-10.0-standard_10.0-1_amd64.zip.gz",
>   
> "$storage_dir/template/cache/debian-10.0-standard_10.0-1_amd64.tar.bz2",

Why is this one still listed here in the non-matching test cases?
Shouldn't this be recognized too now?

>   "$storage_dir/private/subvol-19254-disk-0/19254",
> - "$storage_dir/dump/vzdump-openvz-16112-2020_03_30-21_39_30.tar.bz2",
>   "$storage_dir/dump/vzdump-openvz-16112-2020_03_30-21_39_30.zip.gz",
>   "$storage_dir/dump/vzdump-openvz-16112-2020_03_30-21_39_30.tgz.lzo",
>   "$storage_dir/dump/vzdump-qemu-16110-2020_03_30-21_12_40.vma.xz",


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH storage v3 2/2] d/control: dependencies: add bzip2, gzip, lzop, zstd

2024-09-10 Thread Fiona Ebner
Am 12.08.24 um 13:44 schrieb Maximiliano Sandoval:
> The decompressor_info method calls binaries provided by these packages
> so they are (alphabetically) added explicitly as dependencies.
> 
> To avoid a build-time error
> 
> E: libpve-storage-perl: 
> depends-on-essential-package-without-using-version Depends: gzip
> 
> the current minor version available in bullseye was set for gzip.
> 

Since I didn't get that error, I'm interested: what command are you
building with? Why the version for Bullseye?

I guess many systems already have bzip2 installed, but do we really want
to require it for everybody? Or should we rather keep it optional (could
be a Recommends or Suggests dependency) and add a clean error if not
installed?


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH guest-common 1/1] storage tunnel: check just-imported image files

2024-09-10 Thread Fiona Ebner
Am 09.08.24 um 13:22 schrieb Fabian Grünbichler:
> remote migration requires elevated privileges already and can thus only be
> triggered by trusted sources, but an additional safeguard of checking the 
> image
> for external references doesn't hurt.
> 
> Signed-off-by: Fabian Grünbichler 
> ---
> 
> Notes:
> requires pve-storage change to actually have an effect
> 
>  src/PVE/StorageTunnel.pm | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/src/PVE/StorageTunnel.pm b/src/PVE/StorageTunnel.pm
> index c880889..21780bd 100644
> --- a/src/PVE/StorageTunnel.pm
> +++ b/src/PVE/StorageTunnel.pm
> @@ -280,6 +280,13 @@ sub handle_query_disk_import {
>   delete $state->{sockets}->{$unix};
>   delete $state->{disk_import};
>   $state->{cleanup}->{volumes}->{$volid} = 1;
> + my $cfg = PVE::Storage::config();
> + my ($storage, $volume) = PVE::Storage::parse_volume_id($volid);
> + my $scfg = PVE::Storage::storage_config($cfg, $storage);
> + # check imported image for bad references
> + if ($scfg->{path}) {

Seems a bit like a plugin method would be nicest here. But I guess we
can still add that if/when it becomes necessary for non-path-based
plugins to do such checks.

> + PVE::Storage::file_size_info(PVE::Storage::path($cfg, $volid), 
> undef, 1);

Isn't this broken because PVE::Storage::path() uses wantarray?

At least a small test program does "break" in the same scenario:

> febner@enia ~ % cat asdf.pm
> use strict;
> use warnings;
>
> sub a {
>   return wantarray ? (1, 2) : 1;
> }
>
> sub b {
>   my ($a, $b, $c) = @_;
>
>   print "$a $b $c\n";
> }
>
> b(a(), 3, 4);
> febner@enia ~ % perl asdf.pm
> 1 2 3


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH storage 1/1] file_size_info: implement untrusted mode

2024-09-10 Thread Fiona Ebner
Am 09.08.24 um 13:22 schrieb Fabian Grünbichler:
> this allows checking some extra attributes for images which come from a
> potentially malicious source.
> 
> since file_size_info is not part of the plugin API, no API bump is needed. if
> desired, a similar check could also be implemented in volume_size_info, which
> would entail bumping both APIVER and APIAGE (since the additional parameter
> would make checking untrusted volumes opt-in for external plugins).
> 
> Signed-off-by: Fabian Grünbichler 

Reviewed-by: Fiona Ebner 

> @@ -977,12 +987,27 @@ sub file_size_info {
>  

There's a new early return now because of commit 851cc07 ("base plugin:
do not decode the empty string") where we should also die if $untrusted.
Although it is very unlikely that we'll reach that since you do not set
a timeout at the call sites with $untrusted set.

>  my $info = eval { decode_json($json) };
>  if (my $err = $@) {


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH qemu-server 1/1] disk import: add additional safeguards for imported image files

2024-09-10 Thread Fiona Ebner
Am 09.08.24 um 13:22 schrieb Fabian Grünbichler:
> creating non-raw disk images with arbitrary content is only possible with raw
> access to the storage, but checking for references to external files doesn't
> hurt.
> 
> Signed-off-by: Fabian Grünbichler 
> ---
> 
> Notes:
> requires pve-storage change to actually have an effect
> 
>  PVE/API2/Qemu.pm | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> index d25a79fe..30839745 100644
> --- a/PVE/API2/Qemu.pm
> +++ b/PVE/API2/Qemu.pm
> @@ -412,12 +412,14 @@ my sub create_disks : prototype($$) {
>  
>   $needs_creation = $live_import;
>  
> - if (PVE::Storage::parse_volume_id($source, 1)) { # PVE-managed 
> volume
> + if (my ($source_storage, $source_volid) = 
> PVE::Storage::parse_volume_id($source, 1)) { # PVE-managed volume

Nit: line too long and honestly, becomes a bit hard to read.

>   if ($live_import && $ds ne 'efidisk0') {
>   my $path = PVE::Storage::path($storecfg, $source)
>   or die "failed to get a path for '$source'\n";
>   $source = $path;
> - ($size, my $source_format) = 
> PVE::Storage::file_size_info($source);
> + # check potentially untrusted image file!
> + ($size, my $source_format) = 
> PVE::Storage::file_size_info($source, undef, 1);
> +
>   die "could not get file size of $source\n" if !$size;
>   $live_import_mapping->{$ds} = {
>   path => $source,
> @@ -431,6 +433,12 @@ my sub create_disks : prototype($$) {
>   format => $disk->{format},
>   };
>  
> + my $scfg = PVE::Storage::storage_config($storecfg, 
> $source_storage);
> + # check potentially untrusted image file!
> + if ($scfg->{path}) {
> + 
> PVE::Storage::file_size_info(PVE::Storage::path($storecfg, $source), undef, 
> 1);

This call is probably broken, see reply to guest-common patch.

Nit: I'd move this to the beginning of the branch to avoid having it
between $dest_info assignments.

> + }
> +
>   $dest_info->{efisize} = 
> PVE::QemuServer::get_efivars_size($conf, $disk)
>   if $ds eq 'efidisk0';
>  
> @@ -441,7 +449,8 @@ my sub create_disks : prototype($$) {
>   }
>   } else {
>   $source = PVE::Storage::abs_filesystem_path($storecfg, 
> $source, 1);
> - ($size, my $source_format) = 
> PVE::Storage::file_size_info($source);
> + # check potentially untrusted image file!
> + ($size, my $source_format) = 
> PVE::Storage::file_size_info($source, undef, 1);
>   die "could not get file size of $source\n" if !$size;
>  
>   if ($live_import && $ds ne 'efidisk0') {


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] applied: [PATCH manager] ui: qemu: hardware: fix permission check for adding tpmstate volume

2024-09-10 Thread Fiona Ebner
Am 05.08.24 um 15:33 schrieb Friedrich Weber:
> Previously, the "Add -> TPM State" menu item in the GUI was disabled
> if the user did not have Sys.Console privileges. This deviated from
> the permission check in the backend, which does not require
> Sys.Console but (among others) VM.Config.Disk.
> 
> Fix this inconsistency by checking for VM.Config.Disk in the frontend
> instead of Sys.Console.
> 
> Reported in enterprise support.
> 
> Signed-off-by: Friedrich Weber 

applied, thanks!


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH storage] base plugin: do not decode the empty string

2024-09-10 Thread Fiona Ebner
Am 02.09.24 um 14:47 schrieb Maximiliano Sandoval:
> If the json was empty, for example if the qemu-img command times out, a
> message
> 
> warn "could not parse qemu-img info command output for '$filename' - 
> $err\n";
> 
> would have been printed.
> 
> This message could lead one to think the issue lies in the contents of
> the json, even if the previous warning said that there was a timeout.
> 

I know it's already applied, so mentioning it for the future: the commit
title (and actually also message) really should've included the context,
e.g. "when querying file size".


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH common v4 3/7] inotify: interfaces: make sure bridge_vids use space as separator

2024-09-10 Thread Fiona Ebner
Am 29.07.24 um 13:55 schrieb Aaron Lauterer:
> Because the API accepts multiple possible list separators we need to
> make sure that we write the bridge_vids with space as separator, no
> matter which separator was used when passing it to the API.
> 

Nit: The reason why it's necessary to write out with space as the
separator is missing. Who gets confused otherwise ;)?

Should we do the conversion earlier, i.e. in the API endpoint when
writing the interface parameters? Like that we don't end up carrying the
invalid value around.

> Signed-off-by: Aaron Lauterer 
> Tested-By: Stefan Hanreich 
> Reviewed-by: Shannon Sterz 

Still:

Reviewed-by: Fiona Ebner 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH common v4 2/7] fix #3893: network: add vlan id and range parameter definitions

2024-09-10 Thread Fiona Ebner
Am 29.07.24 um 13:55 schrieb Aaron Lauterer:
> @@ -595,6 +601,34 @@ sub pve_verify_iface {
>  return $id;
>  }
>  
> +# vlan id (vids), single or range
> +register_format('pve-vlan-id-or-range', \&pve_verify_vlan_id_or_range);
> +sub pve_verify_vlan_id_or_range {
> +my ($vlan, $noerr) = @_;
> +
> +my $check_vid = sub {
> + my $tag = shift;
> + if ( $tag < 2 || $tag > 4094) {

Style nit: extra space after parenthesis

> + return undef if $noerr;
> + die "invalid VLAN tag '$tag'\n";
> + }
> +};
> +
> +if ($vlan !~ m/^(\d+)(?:-(\d+))?$/) {
> + return undef if $noerr;
> + die "invalid VLAN configuration '$vlan'\n";
> +}
> +my $start = $1;
> +my $end = $2;
> +return undef if (!defined $check_vid->($start));

check_vid does not explicitly return anything in the success case, so
this check seems very brittle and I guess relies on implicit return?

Style nits:
parentheses for post-if
missing parentheses for defined()
perlcritic prefers 'return' over 'return undef'

> +if ($end) {
> + return undef if (!defined $check_vid->($end));
> + die "VLAN range must go from lower to higher tag '$vlan'" if $start >= 
> $end && !$noerr;

What if $noerr is set, but $start >= $end? You want to have a 'return'
then, right?

> +}
> +
> +return $vlan;
> +}
> +
>  # general addresses by name or IP
>  register_format('address', \&pve_verify_address);
>  sub pve_verify_address {


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH common v4 1/7] tools: add check_list_empty function

2024-09-09 Thread Fiona Ebner
Am 29.07.24 um 13:55 schrieb Aaron Lauterer:
> In some situations we don't want a total empty list. I opted for a
> dedicated function instead of integrating it as error in the
> `split_list` function. It is used in many places and the potential
> fallout from unintended behavior changes is too big.
> 
> Signed-off-by: Aaron Lauterer 
> Tested-By: Stefan Hanreich 
> Reviewed-by: Shannon Sterz 
> ---
> changes since: v3: none
> v2: newly added
> 
>  src/PVE/Tools.pm | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/src/PVE/Tools.pm b/src/PVE/Tools.pm
> index bd305bd..f796bd0 100644
> --- a/src/PVE/Tools.pm
> +++ b/src/PVE/Tools.pm
> @@ -718,6 +718,14 @@ sub split_list {
>  return @data;
>  }
>  
> +sub check_list_empty {
> +my ($list) = @_;
> +if (scalar(PVE::Tools::split_list($list)) < 1) {
> + return 0;
> +}
> +return 1;
> +}

This can be very confusing IMHO. Intuitively, I'd expect the expression
check_list_empty($list) to be truthy if $list is empty. I'd rather call
it list_not_empty. But looking at the caller you introduce later, it
might be better to avoid the double negative, flip the truth table and
call it list_is_empty.

> +
>  sub trim {
>  my $txt = shift;
>  


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH v2 proxmox-apt-hook] initial commit

2024-09-09 Thread Fiona Ebner
Many people will use 'upgrade' instead of 'full-upgrade' or
'dist-upgrade' (e.g. [0][1]) despite the documentation explicitly
mentioning 'dist-upgrade' [3]. Proxmox projects use different
packaging guarantees than Debian (necessary for a rolling release
model) and using 'upgrade' can lead to the system being stuck on
outdated versions, or in rare cases, even break the system [2].

The match is kept simple, to not accidentally catch things like
> -o 'foo=bar upgrade baz'
and trip up advanced users.

It does not catch invocations with '-y' either, making it less likely
to break automated user scripts. Although they should not use
'upgrade' either, it still would be bad to break them. If the risk is
still considered too high, this change should wait until a major or
at least point release.

To avoid false positives, it would be necessary to properly parse
options, which is likely not worth the effort.

A downside is that the hook is only invoked after the user confirms
the upgrade and fetching the packages, but there doesn't seem to be an
early enough hook entry that provides access to the command line.
Since this is just an additional safety warning to guide new users, it
should still be good enough.

It is intended that meta-packages for Proxmox projects recommend this
package.

The same postinst/postrm logic for the hook like in proxmox-ve and
apt-listchanges is used to not have disable/re-enable the hook upon
removal/re-install of the package.

[0]: https://forum.proxmox.com/threads/150217/post-680158
[1]: https://forum.proxmox.com/threads/140580/post-630419
[2]: 
https://www.reddit.com/r/Proxmox/comments/ujqig9/use_apt_distupgrade_or_the_gui_not_apt_upgrade/
[3]: 
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#system_software_updates

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Mention that actual breakage is rare, being stuck on outdated
versions is much more common.
* Do not ask for confirmation, only log the warning.
* Split into own package, so it can be re-used by different products.

 .gitignore |  7 
 Makefile   | 47 +++
 debian/apt-hook/10proxmox-apt-hook |  4 +++
 debian/apt-hook/proxmox-apt-hook   | 52 ++
 debian/changelog   |  5 +++
 debian/control | 17 ++
 debian/copyright   | 14 
 debian/docs|  1 +
 debian/install |  2 ++
 debian/postrm  | 35 
 debian/preinst | 13 
 debian/rules   |  8 +
 debian/source/format   |  1 +
 13 files changed, 206 insertions(+)
 create mode 100644 .gitignore
 create mode 100644 Makefile
 create mode 100644 debian/apt-hook/10proxmox-apt-hook
 create mode 100755 debian/apt-hook/proxmox-apt-hook
 create mode 100644 debian/changelog
 create mode 100644 debian/control
 create mode 100644 debian/copyright
 create mode 100644 debian/docs
 create mode 100644 debian/install
 create mode 100644 debian/postrm
 create mode 100644 debian/preinst
 create mode 100755 debian/rules
 create mode 100644 debian/source/format

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 000..5e6053d
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,7 @@
+/*.build
+/*.buildinfo
+/*.changes
+/*.deb
+/*.dsc
+/*.tar*
+/proxmox-apt-hook-*/
diff --git a/Makefile b/Makefile
new file mode 100644
index 000..e0cd704
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,47 @@
+include /usr/share/dpkg/default.mk
+
+PACKAGE=proxmox-apt-hook
+
+GITVERSION:=$(shell git rev-parse HEAD)
+
+BUILDDIR ?= $(PACKAGE)-$(DEB_VERSION)
+DSC=$(PACKAGE)_$(DEB_VERSION).dsc
+
+DEB=$(PACKAGE)_$(DEB_VERSION_UPSTREAM_REVISION)_all.deb
+
+all: $(DEB)
+
+$(BUILDDIR): debian
+   rm -rf $@ $@.tmp
+   mkdir -p $@.tmp/debian
+   cp -a debian/ $@.tmp/
+   echo "git clone git://git.proxmox.com/git/proxmox-apt-hook.git\\ngit 
checkout $(GITVERSION)" > $@.tmp/debian/SOURCE
+   mv $@.tmp $@
+
+.PHONY: deb
+deb: $(DEB)
+$(DEB): $(BUILDDIR)
+   cd $(BUILDDIR); dpkg-buildpackage -b -us -uc
+   lintian $(DEB)
+
+.PHONY: dsc
+dsc: $(DSC)
+$(DSC): $(BUILDDIR)
+   cd $(BUILDDIR); dpkg-buildpackage -S -us -uc -d
+   lintian $(DSC)
+
+.PHONY: sbuild
+sbuild: $(DSC)
+   sbuild $(DSC)
+
+.PHONY: upload
+upload: UPLOAD_DIST ?= $(DEB_DISTRIBUTION)
+upload: $(DEB)
+   tar cf - $(DEB)|ssh repo...@repo.proxmox.com -- upload --product pve 
--dist $(UPLOAD_DIST)
+
+.PHONY: distclean
+distclean: clean
+
+.PHONY: clean
+clean:
+   rm -rf *~ $(PACKAGE)-[0-9]*/ $(PACKAGE)*.tar.* *.deb *.dsc *.changes 
*.build *.buildinfo
diff --git a/debian/apt-hook/10proxmox-apt-hook 
b/debian/apt-hook/10proxmox-apt-hook
new file mode 100644
index 000..c4e6978
--- /dev/null
+++ b/debian/apt-h

Re: [pve-devel] [RFC qemu-server] apt hook: warn against using 'upgrade' command

2024-09-09 Thread Fiona Ebner
Am 09.09.24 um 09:48 schrieb Fabian Grünbichler:
>> Thomas Lamprecht  hat am 06.09.2024 18:58 CEST 
>> geschrieben:  
>> Am 06/09/2024 um 12:40 schrieb Fiona Ebner:
>>> Many people will use 'upgrade' instead of 'full-upgrade' or
>>> 'dist-upgrade' (e.g. [0][1]) despite the documentation explicitly
>>> mentioning 'dist-upgrade' [3]. Proxmox VE uses different packaging
>>> guarantees than Debian and using 'upgrade' can lead to a broken
>>> system [2].
> 
> just a slight nit here: you should only end up with a broken system if we 
> miss properly tracking some inter-package relationship. it can happen happen 
> (and probably does, from time to time), but in the vast majority of cases 
> "apt[-get] upgrade" should at most leave you stuck with an outdated system 
> (with APT telling you that there are still packages to be upgraded), not a 
> broken one. we did get a lot better about accounting for these things over 
> the past few years (but of course, we don't have anywhere close to the 
> infrastructure that Debian has for automated tracking and testing).

Okay, will change the commit message in v2.

> 
>>> The match is kept simple, to not accidentally catch things like
>>>> -o 'foo=bar upgrade baz'
>>> and trip up advanced users.
>>>
>>> It does not catch invocations with '-y' either, making it less likely
>>> to break automated user scripts. Although they should not use
>>> 'upgrade' either, it still would be bad to break them. If the risk is
>>> still considered too high, this change should wait until a major or
>>> at least point release.
>>>
>>> To avoid false positives, it would be necessary to properly parse
>>> options, which is likely not worth the effort.
>>>
>>> A downside is that the hook is only invoked after the user confirms
>>> the upgrade, but there doesn't seem to be an early enough hook entry
>>> (DPkg::Pre-Invoke is also too late). Since this is just an additional
>>> safety warning to guide new users, it should still be good enough.
>>>
>>> [0]: https://forum.proxmox.com/threads/150217/post-680158
>>> [1]: https://forum.proxmox.com/threads/140580/post-630419
>>> [2]: 
>>> https://www.reddit.com/r/Proxmox/comments/ujqig9/use_apt_distupgrade_or_the_gui_not_apt_upgrade/
>>> [3]: 
>>> https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#system_software_updates
>>>
>>
>> yeah, it's something I considered here and then but never pulled through,
>> as it just somehow doesn't feel right...
>>
>> But it's definitively a real problem, and so I surely won't block this on
>> the basis of some gut feeling, I'd rather like to hear Fabian's opinion on
>> it.
> 
> given that I also use `apt upgrade` from time to time (habit from being an 
> unstable user ;)), and that it might alienate power users coming from Debian, 
> I'd prefer this to be a non-interactive warning with the text "disarmed" a 
> bit?
> 
> something like
> 
> !! WARNING !!
> Since Proxmox VE follows a rolling release model, using 'upgrade' can lead to 
> a system being stuck on outdated versions, or in rare cases, break upon 
> upgrading. Use 'dist-upgrade' or 'full-upgrade' instead.
> !! WARNING !!
> 
> with or without a prompt (it's a pity that the hook is not executed with the 
> config before the regular confirmation prompt, else we could just depend on 
> that)?

Sounds good. And since it is so rare, I'd just leave out the additional
confirmation prompt then.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH qemu-server 10/10] test: cfg2cmd: don't use QEMU binary version less than 8.0 for tests

2024-09-09 Thread Fiona Ebner
Am 06.09.24 um 16:39 schrieb Daniel Kral:
> Another small misspelling that could be corrected here:
> 
> s/depreacation/deprecation
> 
> On Tue, 23 Jul 2024 17:25:40 +0200, Fiona Ebner wrote:
>> diff --git a/test/cfg2cmd/cputype-icelake-client-deprecation.conf 
>> b/test/cfg2cmd/cputype-icelake-client-deprecation.conf
>> index 523dd275..2f595162 100644
>> --- a/test/cfg2cmd/cputype-icelake-client-deprecation.conf
>> +++ b/test/cfg2cmd/cputype-icelake-client-deprecation.conf
>> @@ -1,5 +1,4 @@
>>  # TEST: test CPU type depreacation for Icelake-Client (never existed in the 
>> wild)
> 
> Otherwise, I haven't found anything to add to the changes and the test
> cases are all passing on my machine as expected.
> 
> Therefore:
> 
> Reviewed-by: Daniel Kral 
> Tested-by: Daniel Kral 
> 

Thank you for the review/testing!

Note (for the future): If the tags here are supposed to be for the whole
series, it would be customary to add them in a reply to the cover letter.

Feel free to send patches for the pre-existing typos you found. You
might want to wait until my series got applied to avoid merge-conflicts.
If there's going to be a v2 of my series for some reason, I can also add
them as follow-ups myself.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH container/manager v2 0/2] add deny read/write options for device passthrough

2024-09-06 Thread Fiona Ebner
Am 24.07.24 um 19:18 schrieb Filip Schauer:
> Add the deny_read and deny_write options for device passthrough, to
> restrict container access to devices.
> 
> This allows for passing through a device in read-only mode without
> giving the container full access it.
> 
> Up until now a container with a device passed through to it was granted
> full access to that device without an option to restrict that access as
> pointed out by @Fiona.
> 
> Changes since v1:
> * set default values for deny_read & deny_write
> * remove the deny_read checkbox from the UI, since it is expected to
>   only have a very niche use case.
> 

We could also use dashes instead of underscores, i.e.
"deny-read"/"deny-write" as we often do for new properties.

Still not fully sure we need deny_read in the backend until somebody
complains with a sensible use case, but I guess it doesn't hurt if it's
already there.

In any case:

Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH docs] passthrough: viommu: replace host requirement with reference to pcie passthrough

2024-09-06 Thread Fiona Ebner
For reference, this already got applied:
https://git.proxmox.com/?p=pve-docs.git;a=commit;h=2173def64373a50a13390ee342fb95abc901f203


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [RFC qemu-server] apt hook: warn against using 'upgrade' command

2024-09-06 Thread Fiona Ebner
Sorry, messed up the subject-prefix, since I copy-pasted it in the
.git/config and forgot to change. This is for proxmox-ve of course.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC qemu-server] apt hook: warn against using 'upgrade' command

2024-09-06 Thread Fiona Ebner
Many people will use 'upgrade' instead of 'full-upgrade' or
'dist-upgrade' (e.g. [0][1]) despite the documentation explicitly
mentioning 'dist-upgrade' [3]. Proxmox VE uses different packaging
guarantees than Debian and using 'upgrade' can lead to a broken
system [2].

The match is kept simple, to not accidentally catch things like
> -o 'foo=bar upgrade baz'
and trip up advanced users.

It does not catch invocations with '-y' either, making it less likely
to break automated user scripts. Although they should not use
'upgrade' either, it still would be bad to break them. If the risk is
still considered too high, this change should wait until a major or
at least point release.

To avoid false positives, it would be necessary to properly parse
options, which is likely not worth the effort.

A downside is that the hook is only invoked after the user confirms
the upgrade, but there doesn't seem to be an early enough hook entry
(DPkg::Pre-Invoke is also too late). Since this is just an additional
safety warning to guide new users, it should still be good enough.

[0]: https://forum.proxmox.com/threads/150217/post-680158
[1]: https://forum.proxmox.com/threads/140580/post-630419
[2]: 
https://www.reddit.com/r/Proxmox/comments/ujqig9/use_apt_distupgrade_or_the_gui_not_apt_upgrade/
[3]: 
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#system_software_updates

Signed-off-by: Fiona Ebner 
---
 debian/apthook/pve-apt-hook | 8 
 1 file changed, 8 insertions(+)

diff --git a/debian/apthook/pve-apt-hook b/debian/apthook/pve-apt-hook
index 47629bc..b41d0fe 100755
--- a/debian/apthook/pve-apt-hook
+++ b/debian/apthook/pve-apt-hook
@@ -55,6 +55,14 @@ my $blank;
 while (my $line = <$fh>) {
   chomp $line;
 
+  if ($line && $line =~ m/^CommandLine::AsString=apt(-get)?%20upgrade$/) {
+$log->("!! WARNING !!\n");
+$log->("Using 'upgrade' can violate packaging assumptions made by Proxmox 
VE. Use 'dist-upgrade' instead.\n");
+$log->("\n");
+$log->("Press enter to continue, or C^c to abort.\n");
+$cleanup->(0, 1);
+  }
+
   if (!defined($blank)) {
 $blank = 1 if !$line;
 next;
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH v3 storage] fix #5191: api, cli: implement moving a volume between storages

2024-09-05 Thread Fiona Ebner
Am 03.07.24 um 14:59 schrieb Filip Schauer:
> Add the ability to move a backup, ISO, container template or snippet
> between storages and nodes via an API method. Moving a VMA backup to a
> Proxmox Backup Server requires the proxmox-vma-to-pbs package to be
> installed. Currently only VMA backups can be moved to a Proxmox Backup
> Server and moving backups from a Proxmox Backup Server is not yet
> supported. 

Can we split this? I.e. first add the "easy moves", export/import
preparations, then support moving backups without PBS, introduce the
decompress_archive_into_pipe() helper, finally support moving VMA to
PBS. API/CLI could be separate too.

> @@ -483,15 +485,173 @@ __PACKAGE__->register_method ({
>   return $upid;
>  }});
>  
> +sub volume_move {

Should this even be a new top-level method? Or can/should we extend
export/import instead, to not only cover guest images? Because with this
top-level method we block the way for external storage plugins to
support this functionality too if they don't adhere to our assumptions.

> +my ($cfg, $source_volid, $target_storeid, $delete) = @_;
> +
> +my ($source_storeid, $source_volname) = 
> PVE::Storage::parse_volume_id($source_volid, 0);
> +
> +die "source and target storage cannot be the same\n" if ($source_storeid 
> eq $target_storeid);

Style nit: superfluous parentheses for post-if (there are other
instances of the same below)

---snip 8<---

> +
> + if ($vtype eq 'backup' && $target_scfg->{type} eq 'pbs') {

IMHO this branch should be factored out into a helper. Either inside
PBSPlugin or inside PBSClient, not sure which one is the best fit.

---snip 8<---

> + if (my $err = $@) {
> + for my $created_file (@created_files) {
> + eval { unlink($created_file) };
> + warn $@ if $@;

I think you need to check the return value for unlink. And if it failed,
use $! (don't log anything if it's already ENOENT).

> + }
> + die $err;
> + }
> + }
> +
> +PVE::Storage::archive_remove($source_path) if $delete;
> +} elsif ($vtype eq 'images') {
> + die "use pct move-volume or qm disk move\n";
> +} elsif ($vtype eq 'rootdir') {
> + die "cannot move OpenVZ rootdir\n";
Maybe put these on top as early exits? Then you could save one level of
indentation. And I'd also catch the case when you encounter a new
content type with something like "not implemented yet".

Note that content type 'rootdir' is nowadays used for all container
images (that's how it's used in the storage configuration and
list_images()). It's just that the default implementation of
parse_volname() (wrongly) classifies those as 'images'.

> +}
> +
> +return;
> +}
> +
>  __PACKAGE__->register_method ({
> -name => 'copy',
> +name => 'move',
>  path => '{volume}',
>  method => 'POST',
> -description => "Copy a volume. This is experimental code - do not use.",
> +description => "Move a volume.",
> +permissions => {
> + description => "You need the 'Datastore.Allocate' privilege on the 
> storages.",

The documentation says:

Datastore.Allocate: create/modify/remove a datastore and delete volumes
Datastore.AllocateSpace: allocate space on a datastore

and the DELTE method for volumes has:

You need 'Datastore.Allocate' privilege on the storage (or
'Datastore.AllocateSpace' for backup volumes if you have VM.Backup
privilege on the VM)

I think we can use that too here. Require Datastore.AllocateSpace if not
using --delete and require Datastore.Allocate on the source if using
--delete, as well as special casing the backup case.

And writing this later, after looking at the actual checks below: the
documentation could be much more precise here ;)

> + user => 'all',
> +},
>  protected => 1,
>  proxyto => 'node',
>  parameters => {
> - additionalProperties => 0,
> + additionalProperties => 0,
>   properties => {
>   node => get_standard_option('pve-node'),
>   storage => get_standard_option('pve-storage-id', { optional => 1}),

---snip 8<---

> + if ($delete) {
> + $rpcenv->check($user, "/storage/$src_storeid", 
> ["Datastore.Allocate"]);

Aha, so this is only required when using delete ;)

> + } else {
> + $rpcenv->check($user, "/storage/$src_storeid", ["Datastore.Audit"]);

But Datastore.Audit does not entail permission to read the volume
contents. There is check_volume_access() for that.

> + }
>  
> - my ($target_sid, $target_volname) = 
> PVE::Storage::parse_volume_id($dst_volid);
> - #my $target_ip = PVE::Cluster::remote_node_ip($target_node);
> + my ($vtype) = PVE::Storage::parse_volname($cfg, $src_volid);
> + die "use pct move-volume or qm disk move" if $vtype eq 'images';

|| $vtype eq 'rootdir'

---snip 8<---

> @@ -1661,16 +1677,21 @@ sub volume_import {
>  my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $file_format) =
>  

[pve-devel] [PATCH qemu 2/2] pick up stable fixes for 9.0

2024-09-05 Thread Fiona Ebner
Includes fixes for VirtIO-net, ARM and x86(_64) emulation, CVEs to
harden NBD server against malicious clients, as well as a few others
(VNC, physmem, Intel IOMMU, ...).

Signed-off-by: Fiona Ebner 
---
 ...net-Ensure-queue-index-fits-with-RSS.patch |  35 ++
 ...etwork-stall-at-the-host-side-waitin.patch | 338 ++
 ...t-nic-model-help-output-as-documente.patch |  70 
 ...net-nic-model-for-non-help-arguments.patch |  32 ++
 ...-assert-for-128-bit-tile-accesses-wh.patch |  57 +++
 ...arm-Fix-UMOPA-UMOPS-of-16-bit-values.patch |  59 +++
 ...-shifts-by-1-in-tszimm_shr-and-tszim.patch |  62 
 ...e-SMCR_EL2.LEN-and-SVCR_EL2.LEN-if-E.patch |  41 +++
 ...e-denormals-correctly-for-FMOPA-wide.patch | 164 +
 ...el_iommu-fix-FRCD-construction-macro.patch |  39 ++
 ...386-Do-not-apply-REX-to-MMX-operands.patch |  33 ++
 ...rash-by-resetting-local_err-in-modul.patch |  42 +++
 ...-Plumb-in-new-args-to-nbd_client_add.patch | 164 +
 ...024-7409-Cap-default-max-connections.patch | 172 +
 ...024-7409-Drop-non-negotiating-client.patch | 123 +++
 ...024-7409-Close-stray-clients-at-serv.patch | 161 +
 ...c-fix-crash-when-no-console-attached.patch |  47 +++
 ...024-7409-Avoid-use-after-free-when-c.patch |  89 +
 ...fix-memory-leak-in-dirty_memory_exte.patch | 134 +++
 ...ckup-Proxmox-backup-patches-for-QEMU.patch |   4 +-
 .../0046-PVE-backup-add-fleecing-option.patch |   4 +-
 debian/patches/series |  19 +
 22 files changed, 1885 insertions(+), 4 deletions(-)
 create mode 100644 
debian/patches/extra/0018-virtio-net-Ensure-queue-index-fits-with-RSS.patch
 create mode 100644 
debian/patches/extra/0019-virtio-net-Fix-network-stall-at-the-host-side-waitin.patch
 create mode 100644 
debian/patches/extra/0020-net-Reinstate-net-nic-model-help-output-as-documente.patch
 create mode 100644 
debian/patches/extra/0021-net-Fix-net-nic-model-for-non-help-arguments.patch
 create mode 100644 
debian/patches/extra/0022-target-arm-Don-t-assert-for-128-bit-tile-accesses-wh.patch
 create mode 100644 
debian/patches/extra/0023-target-arm-Fix-UMOPA-UMOPS-of-16-bit-values.patch
 create mode 100644 
debian/patches/extra/0024-target-arm-Avoid-shifts-by-1-in-tszimm_shr-and-tszim.patch
 create mode 100644 
debian/patches/extra/0025-target-arm-Ignore-SMCR_EL2.LEN-and-SVCR_EL2.LEN-if-E.patch
 create mode 100644 
debian/patches/extra/0026-target-arm-Handle-denormals-correctly-for-FMOPA-wide.patch
 create mode 100644 
debian/patches/extra/0027-intel_iommu-fix-FRCD-construction-macro.patch
 create mode 100644 
debian/patches/extra/0028-target-i386-Do-not-apply-REX-to-MMX-operands.patch
 create mode 100644 
debian/patches/extra/0029-module-Prevent-crash-by-resetting-local_err-in-modul.patch
 create mode 100644 
debian/patches/extra/0030-nbd-server-Plumb-in-new-args-to-nbd_client_add.patch
 create mode 100644 
debian/patches/extra/0031-nbd-server-CVE-2024-7409-Cap-default-max-connections.patch
 create mode 100644 
debian/patches/extra/0032-nbd-server-CVE-2024-7409-Drop-non-negotiating-client.patch
 create mode 100644 
debian/patches/extra/0033-nbd-server-CVE-2024-7409-Close-stray-clients-at-serv.patch
 create mode 100644 
debian/patches/extra/0034-vnc-fix-crash-when-no-console-attached.patch
 create mode 100644 
debian/patches/extra/0035-nbd-server-CVE-2024-7409-Avoid-use-after-free-when-c.patch
 create mode 100644 
debian/patches/extra/0036-softmmu-physmem-fix-memory-leak-in-dirty_memory_exte.patch

diff --git 
a/debian/patches/extra/0018-virtio-net-Ensure-queue-index-fits-with-RSS.patch 
b/debian/patches/extra/0018-virtio-net-Ensure-queue-index-fits-with-RSS.patch
new file mode 100644
index 000..1dcb129
--- /dev/null
+++ 
b/debian/patches/extra/0018-virtio-net-Ensure-queue-index-fits-with-RSS.patch
@@ -0,0 +1,35 @@
+From  Mon Sep 17 00:00:00 2001
+From: Akihiko Odaki 
+Date: Mon, 1 Jul 2024 20:58:04 +0900
+Subject: [PATCH] virtio-net: Ensure queue index fits with RSS
+
+Ensure the queue index points to a valid queue when software RSS
+enabled. The new calculation matches with the behavior of Linux's TAP
+device with the RSS eBPF program.
+
+Fixes: 4474e37a5b3a ("virtio-net: implement RX RSS processing")
+Reported-by: Zhibin Hu 
+Cc: qemu-sta...@nongnu.org
+Signed-off-by: Akihiko Odaki 
+Reviewed-by: Michael S. Tsirkin 
+Signed-off-by: Jason Wang 
+(cherry picked from commit f1595ceb9aad36a6c1da95bcb77ab9509b38822d)
+Signed-off-by: Fiona Ebner 
+---
+ hw/net/virtio-net.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
+index 3644bfd91b..f48588638d 100644
+--- a/hw/net/virtio-net.c
 b/hw/net/virtio-net.c
+@@ -1949,7 +1949,8 @@ static ssize_t virtio_net_receive_rcu(NetClientState 
*nc, const uint8_t *buf,
+ if (!no_rss && n->rss_data.enabled && n->rss_data.enabled_software_rss) {
+ int index = virtio_net_pro

[pve-devel] [PATCH qemu 1/2] pick up fix for VirtIO PCI regressions

2024-09-05 Thread Fiona Ebner
Commit f06b222 ("fixes for QEMU 9.0") included a revert for the QEMU
commit 2ce6cff94d ("virtio-pci: fix use of a released vector"). That
commit caused some regressions which sounded just as bad as the fix.
Those regressions have now been addressed upstream, so pick up the fix
and drop the revert. Dropping the revert fixes the original issue that
commit 2ce6cff94d ("virtio-pci: fix use of a released vector")
addressed.

Signed-off-by: Fiona Ebner 
---
 ...tio-pci-fix-use-of-a-released-vector.patch | 87 ---
 ...ck-copy-before-write-fix-permission.patch} |  0
 ...-write-support-unligned-snapshot-di.patch} |  0
 ...-write-create-block_copy-bitmap-in-.patch} |  0
 ...backup-add-discard-source-parameter.patch} |  0
 ...-de-initialization-of-vhost-user-de.patch} |  0
 ...se-float_status-copy-in-sme_fmopa_s.patch} |  0
 ...Use-FPST_F16-for-SME-FMOPA-widening.patch} |  0
 ...on-and-honor-bootindex-again-for-le.patch} |  0
 ...a-bump-instruction-limit-in-scripts.patch} |  0
 ...5-block-copy-Fix-missing-graph-lock.patch} |  0
 ...do-not-operate-on-sources-from-fina.patch} |  0
 ...ix-the-use-of-an-uninitialized-irqfd.patch | 77 
 debian/patches/series | 24 ++---
 14 files changed, 89 insertions(+), 99 deletions(-)
 delete mode 100644 
debian/patches/extra/0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch
 rename debian/patches/extra/{0007-block-copy-before-write-fix-permission.patch 
=> 0006-block-copy-before-write-fix-permission.patch} (100%)
 rename 
debian/patches/extra/{0008-block-copy-before-write-support-unligned-snapshot-di.patch
 => 0007-block-copy-before-write-support-unligned-snapshot-di.patch} (100%)
 rename 
debian/patches/extra/{0009-block-copy-before-write-create-block_copy-bitmap-in-.patch
 => 0008-block-copy-before-write-create-block_copy-bitmap-in-.patch} (100%)
 rename 
debian/patches/extra/{0010-qapi-blockdev-backup-add-discard-source-parameter.patch
 => 0009-qapi-blockdev-backup-add-discard-source-parameter.patch} (100%)
 rename 
debian/patches/extra/{0011-hw-virtio-Fix-the-de-initialization-of-vhost-user-de.patch
 => 0010-hw-virtio-Fix-the-de-initialization-of-vhost-user-de.patch} (100%)
 rename 
debian/patches/extra/{0012-target-arm-Use-float_status-copy-in-sme_fmopa_s.patch
 => 0011-target-arm-Use-float_status-copy-in-sme_fmopa_s.patch} (100%)
 rename 
debian/patches/extra/{0013-target-arm-Use-FPST_F16-for-SME-FMOPA-widening.patch 
=> 0012-target-arm-Use-FPST_F16-for-SME-FMOPA-widening.patch} (100%)
 rename 
debian/patches/extra/{0014-scsi-fix-regression-and-honor-bootindex-again-for-le.patch
 => 0013-scsi-fix-regression-and-honor-bootindex-again-for-le.patch} (100%)
 rename 
debian/patches/extra/{0015-hw-scsi-lsi53c895a-bump-instruction-limit-in-scripts.patch
 => 0014-hw-scsi-lsi53c895a-bump-instruction-limit-in-scripts.patch} (100%)
 rename debian/patches/extra/{0016-block-copy-Fix-missing-graph-lock.patch => 
0015-block-copy-Fix-missing-graph-lock.patch} (100%)
 rename 
debian/patches/extra/{0017-Revert-qemu-char-do-not-operate-on-sources-from-fina.patch
 => 0016-Revert-qemu-char-do-not-operate-on-sources-from-fina.patch} (100%)
 create mode 100644 
debian/patches/extra/0017-virtio-pci-Fix-the-use-of-an-uninitialized-irqfd.patch

diff --git 
a/debian/patches/extra/0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch
 
b/debian/patches/extra/0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch
deleted file mode 100644
index d2de6d1..000
--- 
a/debian/patches/extra/0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch
+++ /dev/null
@@ -1,87 +0,0 @@
-From 00000000 Mon Sep 17 00:00:00 2001
-From: Fiona Ebner 
-Date: Thu, 16 May 2024 12:59:52 +0200
-Subject: [PATCH] Revert "virtio-pci: fix use of a released vector"
-
-This reverts commit 2ce6cff94df2650c460f809e5ad263f1d22507c0.
-
-The fix causes some issues:
-https://gitlab.com/qemu-project/qemu/-/issues/2321
-https://gitlab.com/qemu-project/qemu/-/issues/2334
-
-The CVE fixed by commit 2ce6cff94d ("virtio-pci: fix use of a released
-vector") is CVE-2024-4693 [0] and allows a malicious guest that
-controls the boot process in the guest to crash its QEMU process.
-
-The issues sound worse than the CVE, so revert until there is a proper
-fix.
-
-[0]: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-4693
-
-Signed-off-by: Fiona Ebner 

- hw/virtio/virtio-pci.c | 37 ++---
- 1 file changed, 2 insertions(+), 35 deletions(-)
-
-diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
-index e04218a9fb..fd66713848 100644
 a/hw/virtio/virtio-pci.c
-+++ b/hw/virtio/virtio-pci.c
-@@ -1410,38 +1410,6 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
- return offset;
- }
- 
--static void virtio_pci_set_vector(VirtIODevice *vdev,
--  VirtIOPCI

Re: [pve-devel] [PATCH v2 qemu-server] remote migration: fix online migration via API clients

2024-09-04 Thread Fiona Ebner
Am 03.09.24 um 11:37 schrieb Fabian Grünbichler:
> On August 13, 2024 10:42 am, Fiona Ebner wrote:
>> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
>> index e71face4..d31589e7 100644
>> --- a/PVE/QemuMigrate.pm
>> +++ b/PVE/QemuMigrate.pm
>> @@ -1095,7 +1095,10 @@ sub phase2 {
>>  die "only UNIX sockets are supported for remote migration\n"
>>  if $tunnel_info->{proto} ne 'unix';
>>  
>> -my $remote_socket = $tunnel_info->{addr};
>> +# untaint
>> +my ($remote_socket) =
>> +$tunnel_info->{addr} =~ 
>> m|^(/run/qemu-server/(?:(?!\.\./).)+\.migrate)$|;
> 
> should we just switch to `\d+`, like we do for regular migration? in
> phase2_start_local_cluster we have:
> 
>   elsif ($line =~ m!^migration listens on 
> (unix):(/run/qemu-server/(\d+)\.migrate)$!) {
>   $tunnel_info->{addr} = $2;
>   die "Destination UNIX sockets VMID does not match source VMID" if 
> $vmid ne $3;
>   $tunnel_info->{proto} = $1;
>   }
> 
> and I don't really see a reason to deviate from that scheme any time
> soon?
> 

Sounds good, did so in v3:
https://lore.proxmox.com/pve-devel/20240904111231.106570-1-f.eb...@proxmox.com/T/#u


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [PATCH v3 qemu-server] remote migration: fix online migration via API clients

2024-09-04 Thread Fiona Ebner
As reported in the community forum [0], when a remote migration
request comes in via an API client, the -T flag for Perl is set, so an
insecure dependency in a call like unlink() in forward_unix_socket()
will fail with:

> failed to write forwarding command - Insecure dependency in unlink while 
> running with -T switch

To fix it, untaint the problematic socket addresses coming from the
remote side. Require that all sockets are below '/run/qemu-server/'
and end with '.migrate' with the main socket being matched more
strictly. This allows extensions in the future while still being quite
strict.

[0]: https://forum.proxmox.com/threads/123048/post-691958

Signed-off-by: Fiona Ebner 
---

Changes in v3:
* Match main socket address more strictly as suggested by Fabian.

 PVE/QemuMigrate.pm | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index e71face4..6591f3f7 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -1095,7 +1095,9 @@ sub phase2 {
die "only UNIX sockets are supported for remote migration\n"
if $tunnel_info->{proto} ne 'unix';
 
-   my $remote_socket = $tunnel_info->{addr};
+   # untaint
+   my ($remote_socket) = $tunnel_info->{addr} =~ 
m|^(/run/qemu-server/\d+\.migrate)$|
+   or die "unexpected socket address '$tunnel_info->{addr}'\n";
my $local_socket = $remote_socket;
$local_socket =~ s/$remote_vmid/$vmid/g;
$tunnel_info->{addr} = $local_socket;
@@ -1104,6 +1106,9 @@ sub phase2 {
PVE::Tunnel::forward_unix_socket($self->{tunnel}, $local_socket, 
$remote_socket);
 
foreach my $remote_socket (@{$tunnel_info->{unix_sockets}}) {
+   # untaint
+   ($remote_socket) = $remote_socket =~ 
m|^(/run/qemu-server/(?:(?!\.\./).)+\.migrate)$|
+   or die "unexpected socket address '$remote_socket'\n";
my $local_socket = $remote_socket;
$local_socket =~ s/$remote_vmid/$vmid/g;
next if $self->{tunnel}->{forwarded}->{$local_socket};
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] applied: [PATCH docs] pve-network: correct language errors

2024-08-13 Thread Fiona Ebner
Am 13.08.24 um 15:57 schrieb Fiona Ebner:
> Am 13.08.24 um 12:23 schrieb Alexander Zeidler:
>> Signed-off-by: Alexander Zeidler 
>> ---
>>  pve-network.adoc | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/pve-network.adoc b/pve-network.adoc
>> index acdcf39..434430d 100644
>> --- a/pve-network.adoc
>> +++ b/pve-network.adoc
>> @@ -447,7 +447,7 @@ slaves in the single logical bonded interface such that 
>> different
>>  network-peers use different MAC addresses for their network packet
>>  traffic.
>>  
>> -If your switch support the LACP (IEEE 802.3ad) protocol then we recommend 
>> using
>> +If your switch supports the LACP (IEEE 802.3ad) protocol then we recommend 
>> using
>>  the corresponding bonding mode (802.3ad). Otherwise you should generally 
>> use the
>>  active-backup mode.
>>  
>> @@ -490,7 +490,7 @@ iface vmbr0 inet static
>>  
>>  
>>  [thumbnail="default-network-setup-bond.svg"]
>> -Another possibility it to use the bond directly as bridge port.
>> +Another possibility is to use the bond directly as bridge port.
>>  This can be used to make the guest network fault-tolerant.
>>  
>>  .Example: Use a bond as bridge port
> 
> applied, thanks! While adding the missing comma Lukas noted and his R-b.

I also added a follow-up using "as the bridge port" instead of "as
bridge port" while at it.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH docs] local-zfs: correct language errors

2024-08-13 Thread Fiona Ebner
Am 13.08.24 um 12:23 schrieb Alexander Zeidler:
> Signed-off-by: Alexander Zeidler 
> ---
>  local-zfs.adoc | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 

applied, thanks! Squashed in a change to wrap lines that would've been
longer than 80 characters.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH docs] pve-network: correct language errors

2024-08-13 Thread Fiona Ebner
Am 13.08.24 um 12:23 schrieb Alexander Zeidler:
> Signed-off-by: Alexander Zeidler 
> ---
>  pve-network.adoc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/pve-network.adoc b/pve-network.adoc
> index acdcf39..434430d 100644
> --- a/pve-network.adoc
> +++ b/pve-network.adoc
> @@ -447,7 +447,7 @@ slaves in the single logical bonded interface such that 
> different
>  network-peers use different MAC addresses for their network packet
>  traffic.
>  
> -If your switch support the LACP (IEEE 802.3ad) protocol then we recommend 
> using
> +If your switch supports the LACP (IEEE 802.3ad) protocol then we recommend 
> using
>  the corresponding bonding mode (802.3ad). Otherwise you should generally use 
> the
>  active-backup mode.
>  
> @@ -490,7 +490,7 @@ iface vmbr0 inet static
>  
>  
>  [thumbnail="default-network-setup-bond.svg"]
> -Another possibility it to use the bond directly as bridge port.
> +Another possibility is to use the bond directly as bridge port.
>  This can be used to make the guest network fault-tolerant.
>  
>  .Example: Use a bond as bridge port

applied, thanks! While adding the missing comma Lukas noted and his R-b.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [RFC container v2 23/25] backup: implement restore for external providers

2024-08-13 Thread Fiona Ebner
First, the provider is asked about what restore mechanism to use.
Currently, 'directory' and 'tar' are possible, for restoring either
from a directory containing the full filesystem structure (for which
rsync is used) or a potentially compressed tar file containing the
same.

The new functions are copied and adapted from the existing ones for
PBS or tar and it might be worth to factor out the common parts.

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Adapt to API changes.

 src/PVE/LXC/Create.pm | 141 ++
 1 file changed, 141 insertions(+)

diff --git a/src/PVE/LXC/Create.pm b/src/PVE/LXC/Create.pm
index 117103c..9d1c337 100644
--- a/src/PVE/LXC/Create.pm
+++ b/src/PVE/LXC/Create.pm
@@ -25,6 +25,24 @@ sub restore_archive {
if ($scfg->{type} eq 'pbs') {
return restore_proxmox_backup_archive($storage_cfg, $archive, 
$rootdir, $conf, $no_unpack_error, $bwlimit);
}
+   my $log_function = sub {
+   my ($log_level, $message) = @_;
+   my $prefix = $log_level eq 'err' ? 'ERROR' : uc($log_level);
+   print "$prefix: $message\n";
+   };
+   my $backup_provider =
+   PVE::Storage::new_backup_provider($storage_cfg, $storeid, 
$log_function);
+   if ($backup_provider) {
+   return restore_external_archive(
+   $backup_provider,
+   $storeid,
+   $volname,
+   $rootdir,
+   $conf,
+   $no_unpack_error,
+   $bwlimit,
+   );
+   }
 }
 
 $archive = PVE::Storage::abs_filesystem_path($storage_cfg, $archive) if 
$archive ne '-';
@@ -118,6 +136,55 @@ sub restore_tar_archive {
 die $err if $err && !$no_unpack_error;
 }
 
+sub restore_external_archive {
+my ($backup_provider, $storeid, $volname, $rootdir, $conf, 
$no_unpack_error, $bwlimit) = @_;
+
+my ($mechanism, $vmtype) = 
$backup_provider->restore_get_mechanism($volname, $storeid);
+die "cannot restore non-LXC guest of type '$vmtype'\n" if $vmtype ne 'lxc';
+
+my $info = $backup_provider->restore_container_init($volname, $storeid, 
{});
+eval {
+   if ($mechanism eq 'tar') {
+   my $tar_path = $info->{'tar-path'}
+   or die "did not get path to tar file from backup provider\n";
+   die "not a regular file '$tar_path'" if !-f $tar_path;
+   restore_tar_archive($tar_path, $rootdir, $conf, $no_unpack_error, 
$bwlimit);
+   } elsif ($mechanism eq 'directory') {
+   my $directory = $info->{'archive-directory'}
+   or die "did not get path to archive directory from backup 
provider\n";
+   die "not a directory '$directory'" if !-d $directory;
+
+   my $rsync = ['rsync', '--stats', '-h', '-X', '-A', '--numeric-ids', 
'-aH', '--delete',
+   '--no-whole-file', '--sparse', '--one-file-system', 
'--relative'];
+   push $rsync->@*, '--bwlimit', $bwlimit if $bwlimit;
+   push $rsync->@*, "${directory}/./", $rootdir;
+
+   my $transferred = '';
+   my $outfunc = sub {
+   return if $_[0] !~ /^Total transferred file size: (.+)$/;
+   $transferred = $1;
+   };
+   my $errfunc = sub { log_warn($_[0]); };
+
+   my $starttime = time();
+   PVE::Tools::run_command($rsync, outfunc => $outfunc, errfunc => 
$errfunc);
+   my $delay = time () - $starttime;
+
+   print "sync finished - transferred ${transferred} in ${delay}s\n";
+   } else {
+   die "mechanism '$mechanism' requested by backup provider is not 
supported for LXCs\n";
+   }
+};
+my $err = $@;
+eval { $backup_provider->restore_container_cleanup($volname, $storeid, 
{}); };
+if (my $cleanup_err = $@) {
+   die $cleanup_err if !$err;
+   warn $cleanup_err;
+}
+die $err if $err;
+
+}
+
 sub recover_config {
 my ($storage_cfg, $volid, $vmid) = @_;
 
@@ -126,6 +193,8 @@ sub recover_config {
my $scfg = PVE::Storage::storage_check_enabled($storage_cfg, $storeid);
if ($scfg->{type} eq 'pbs') {
return recover_config_from_proxmox_backup($storage_cfg, $volid, 
$vmid);
+   } elsif (PVE::Storage::new_backup_provider($storage_cfg, $storeid, sub 
{})) {
+   return recover_config_from_external_backup($storage_cfg, $volid, 
$vmid);
}
 }
 
@@ -200,6 +269,26 @@ sub recover_config_from_tar {
 return wantarray ? ($conf, $mp_param) : $conf;
 }
 
+sub recover_config

[pve-devel] [RFC qemu-server v2 19/25] backup: implement backup for external providers

2024-08-13 Thread Fiona Ebner
The state of the VM's disk images at the time the backup is started is
preserved via a snapshot-access block node. Old data is moved to the
fleecing image when new guest writes come in. The snapshot-access
block node, as well as the associated bitmap in case of incremental
backup, will be made available to the external provider. They are
exported via NBD and for 'nbd' mechanism, the NBD socket path is
passed to the provider, while for 'block-device' mechanism, the NBD
export is made accessible as a regular block device first and the
bitmap information is made available via a $next_dirty_region->()
function. For 'block-device', the 'nbdinfo' binary is required.

The provider can indicate that it wants to do an incremental backup by
returning the bitmap ID that was used for a previous backup and it
will then be told if the bitmap was newly created (either first backup
or old bitmap was invalid) or if the bitmap can be reused.

The provider then reads the parts of the NBD or block device it needs,
either the full disk for full backup, or the dirty parts according to
the bitmap for incremental backup. The bitmap has to be respected,
reads to other parts of the image will return an error. After backing
up each part of the disk, it should be discarded in the export to
avoid unnecessary space usage in the fleecing image (requires the
storage underlying the fleecing image to support discard too).

Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Support 'block-device' mechanism, by exposing the NBD export as a
  block device via qemu-nbd.
* Adapt to API changes, i.e. pass all volumes as well as configuration
  files to the provider at once.

 PVE/VZDump/QemuServer.pm | 308 ++-
 1 file changed, 307 insertions(+), 1 deletion(-)

diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 4ad4a154..9daebbc2 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -20,7 +20,7 @@ use PVE::QMPClient;
 use PVE::Storage::Plugin;
 use PVE::Storage::PBSPlugin;
 use PVE::Storage;
-use PVE::Tools;
+use PVE::Tools qw(run_command);
 use PVE::VZDump;
 use PVE::Format qw(render_duration render_bytes);
 
@@ -291,6 +291,8 @@ sub archive {
 
 if ($self->{vzdump}->{opts}->{pbs}) {
$self->archive_pbs($task, $vmid);
+} elsif ($self->{vzdump}->{'backup-provider'}) {
+   $self->archive_external($task, $vmid);
 } else {
$self->archive_vma($task, $vmid, $filename, $comp);
 }
@@ -1136,6 +1138,23 @@ sub cleanup {
 
 # If VM was started only for backup, it is already stopped now.
 if (PVE::QemuServer::Helpers::vm_running_locally($vmid)) {
+   if ($task->{cleanup}->{'nbd-stop'}) {
+   eval { PVE::QemuServer::QMPHelpers::nbd_stop($vmid); };
+   $self->logerr($@) if $@;
+   }
+
+   if (my $info = $task->{cleanup}->{'backup-access-teardown'}) {
+   my $params = {
+   'target-id' => $info->{'target-id'},
+   timeout => 60,
+   success => $info->{success} ? JSON::true : JSON::false,
+   };
+
+   $self->loginfo("tearing down backup-access");
+   eval { mon_cmd($vmid, "backup-access-teardown", $params->%*) };
+   $self->logerr($@) if $@;
+   }
+
$detach_tpmstate_drive->($task, $vmid);
detach_fleecing_images($task->{disks}, $vmid) if 
$task->{'use-fleecing'};
 }
@@ -1147,4 +1166,291 @@ sub cleanup {
 }
 }
 
+my sub block_device_backup_cleanup {
+my ($self, $paths, $cpids) = @_;
+
+for my $path ($paths->@*) {
+   eval { run_command(["qemu-nbd", "-d", $path ]); };
+   $self->log('warn', "unable to disconnect NBD backup source '$path' - 
$@") if $@;
+}
+
+my $waited;
+my $wait_limit = 5;
+for ($waited = 0; $waited < $wait_limit && scalar(keys $cpids->%*); 
$waited++) {
+   while ((my $cpid = waitpid(-1, POSIX::WNOHANG)) > 0) {
+   delete($cpids->{$cpid});
+   }
+   if ($waited == 0) {
+   kill 15, $_ for keys $cpids->%*;
+   }
+   sleep 1;
+}
+if ($waited == $wait_limit && scalar(keys $cpids->%*)) {
+   kill 9, $_ for keys $cpids->%*;
+   sleep 1;
+   while ((my $cpid = waitpid(-1, POSIX::WNOHANG)) > 0) {
+   delete($cpids->{$cpid});
+   }
+   $self->log('warn', "unable to collect nbdinfo child process '$_'") for 
keys $cpids->%*;
+}
+}
+
+my sub block_device_backup_prepare {
+my ($self, $devicename, $size, $nbd_path, $bitmap_name, $count) = @_;
+
+my $nbd_info_uri = "nbd+unix:///${devicename}?socket=${nbd_path}";
+my $qemu_nbd_uri = "nbd:unix:$

[pve-devel] [POC storage v2 13/25] Borg plugin

2024-08-13 Thread Fiona Ebner
Archive names start with the guest type and ID and then the same
timestamp format as PBS.

Container archives have the following structure:
guest.config
firewall.config
filesystem/ # containing the whole filesystem structure

VM archives have the following structure
guest.config
firewall.config
volumes/ # containing a raw file for each device

A bindmount (respectively symlinks) are used to achieve this
structure, because Borg doesn't seem to support renaming on-the-fly.
(Prefix stripping via the "slashdot hack" would have helped slightly,
but is only in Borg >= 1.4
https://github.com/borgbackup/borg/actions/runs/7967940995)

NOTE: running via SSH was not yet tested. Bandwidth limit is not yet
honored and the task size is not calculated yet. Discard for VM
backups would also be nice to have, but it's not entirely clear how
(parsing progress and discarding according to that is one idea).
There is no dirty bitmap support, not sure if that is feasible to add.

Signed-off-by: Fiona Ebner 
---

New in v2.

 src/PVE/BackupProvider/Plugin/Borg.pm  | 373 ++
 src/PVE/BackupProvider/Plugin/Makefile |   2 +-
 src/PVE/Storage.pm |   2 +
 src/PVE/Storage/BorgBackupPlugin.pm| 506 +
 src/PVE/Storage/Makefile   |   1 +
 5 files changed, 883 insertions(+), 1 deletion(-)
 create mode 100644 src/PVE/BackupProvider/Plugin/Borg.pm
 create mode 100644 src/PVE/Storage/BorgBackupPlugin.pm

diff --git a/src/PVE/BackupProvider/Plugin/Borg.pm 
b/src/PVE/BackupProvider/Plugin/Borg.pm
new file mode 100644
index 000..b65321c
--- /dev/null
+++ b/src/PVE/BackupProvider/Plugin/Borg.pm
@@ -0,0 +1,373 @@
+package PVE::BackupProvider::Plugin::Borg;
+
+use strict;
+use warnings;
+
+use File::chdir;
+use File::Basename qw(basename);
+use File::Path qw(make_path remove_tree);
+use POSIX qw(strftime);
+
+use PVE::Tools;
+
+# ($vmtype, $vmid, $time_string)
+our $ARCHIVE_RE_3 = 
qr!^pve-(lxc|qemu)-([0-9]+)-([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z)$!;
+
+sub archive_name {
+my ($vmtype, $vmid, $backup_time) = @_;
+
+return "pve-${vmtype}-${vmid}-" . strftime("%FT%TZ", gmtime($backup_time));
+}
+
+# remove_tree can be very verbose by default, do explicit error handling and 
limit to one message
+my sub _remove_tree {
+my ($path) = @_;
+
+remove_tree($path, { error => \my $err });
+if ($err && @$err) { # empty array if no error
+   for my $diag (@$err) {
+   my ($file, $message) = %$diag;
+   die "cannot remove_tree '$path': $message\n" if $file eq '';
+   die "cannot remove_tree '$path': unlinking $file failed - 
$message\n";
+   }
+}
+}
+
+my sub prepare_run_dir {
+my ($archive, $operation) = @_;
+
+my $run_dir = "/run/pve-storage-borg-plugin/${archive}.${operation}";
+_remove_tree($run_dir);
+make_path($run_dir);
+die "unable to create directory $run_dir\n" if !-d $run_dir;
+
+return $run_dir;
+}
+
+my sub log_info {
+my ($self, $message) = @_;
+
+$self->{'log-function'}->('info', $message);
+}
+
+my sub log_warning {
+my ($self, $message) = @_;
+
+$self->{'log-function'}->('warn', $message);
+}
+
+my sub log_error {
+my ($self, $message) = @_;
+
+$self->{'log-function'}->('err', $message);
+}
+
+my sub file_contents_from_archive {
+my ($self, $archive, $file) = @_;
+
+my $run_dir = prepare_run_dir($archive, "file-contents");
+
+my $raw;
+
+eval {
+   local $CWD = $run_dir;
+
+   $self->{'storage-plugin'}->borg_cmd_extract(
+   $self->{scfg},
+   $self->{storeid},
+   $archive,
+   [$file],
+   );
+
+   $raw = PVE::Tools::file_get_contents("${run_dir}/${file}");
+};
+my $err = $@;
+eval { _remove_tree($run_dir); };
+log_warning($self, $@) if $@;
+die $err if $err;
+
+return $raw;
+}
+
+# Plugin implementation
+
+sub new {
+my ($class, $storage_plugin, $scfg, $storeid, $log_function) = @_;
+
+my $self = bless {
+   scfg => $scfg,
+   storeid => $storeid,
+   'storage-plugin' => $storage_plugin,
+   'log-function' => $log_function,
+}, $class;
+
+return $self;
+}
+
+sub provider_name {
+my ($self) = @_;
+
+return "Borg";
+}
+
+sub job_hook {
+my ($self, $phase, $info) = @_;
+
+if ($phase eq 'start') {
+   $self->{'job-id'} = $info->{'start-time'};
+}
+
+return;
+}
+
+sub backup_hook {
+my ($self, $phase, $vmid, $vmtype, $info) = @_;
+
+if ($phase eq 'start') {
+   $self->{$vmid}->{'task-size'} = 0;
+}
+
+return;
+}
+
+sub backup_get_mechanism {

[pve-devel] [RFC qemu v2 09/25] PVE backup: implement bitmap support for external backup access

2024-08-13 Thread Fiona Ebner
There can be one dirty bitmap for each backup target ID (which are
tracked in the backup_access_bitmaps hash table). The QMP user can
specify the ID of the bitmap it likes to use. This ID is then compared
to the current one for the given target. If they match, the bitmap is
re-used (should it still exist on the drive, otherwise re-created). If
there is a mismatch, the old bitmap is removed and a new one is
created.

The return value of the QMP command includes information about what
bitmap action was taken. Similar to what the query-backup QMP command
returns for regular backup. It also includes the bitmap name and
associated block node, so the management layer can then set up an NBD
export with the bitmap.

While the backup access is active, a background bitmap is also
required. This is necessary to implement bitmap handling according to
the original reference [0]. In particular:

- in the error case, new writes since the backup access was set up are
  in the background bitmap. Because of failure, the previously tracked
  writes from the backup access bitmap are still required too. Thus,
  the bitmap is merged with the background bitmap to get all new
  writes since the last backup.

- in the success case, continue tracking for the next incremental
  backup in the backup access bitmap. New writes since the backup
  access was set up are in the background bitmap. Because the backup
  was successfully, clear the backup access bitmap and merge back the
  background bitmap to get only the new writes.

Since QEMU cannot know if the backup was successful or not (except if
failure already happens during the setup QMP command), the management
layer needs to tell it via the teardown QMP command.

The bitmap action is also recorded in the device info now.

[0]: 
https://lore.kernel.org/qemu-devel/b68833dd-8864-4d72-7c61-c134a9835...@ya.ru/

Signed-off-by: Fiona Ebner 
---

No changes in v2.

 pve-backup.c | 175 ++-
 pve-backup.h |   2 +-
 qapi/block-core.json |  22 +-
 system/runstate.c|   2 +-
 4 files changed, 193 insertions(+), 8 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index d3370d6744..5f8dd396d5 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -15,6 +15,7 @@
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/cutils.h"
+#include "qemu/error-report.h"
 
 #if defined(CONFIG_MALLOC_TRIM)
 #include 
@@ -41,6 +42,7 @@
  */
 
 const char *PBS_BITMAP_NAME = "pbs-incremental-dirty-bitmap";
+const char *BACKGROUND_BITMAP_NAME = "backup-access-background-bitmap";
 
 static struct PVEBackupState {
 struct {
@@ -72,6 +74,7 @@ static struct PVEBackupState {
 CoMutex backup_mutex;
 CoMutex dump_callback_mutex;
 char *target_id;
+GHashTable *backup_access_bitmaps; // key=target_id, value=bitmap_name
 } backup_state;
 
 static void pvebackup_init(void)
@@ -99,6 +102,8 @@ typedef struct PVEBackupDevInfo {
 char* device_name;
 int completed_ret; // INT_MAX if not completed
 BdrvDirtyBitmap *bitmap;
+BdrvDirtyBitmap *background_bitmap; // used for external backup access
+PBSBitmapAction bitmap_action;
 BlockDriverState *target;
 BlockJob *job;
 } PVEBackupDevInfo;
@@ -362,6 +367,67 @@ static void coroutine_fn pvebackup_co_complete_stream(void 
*opaque)
 qemu_co_mutex_unlock(&backup_state.backup_mutex);
 }
 
+/*
+ * New writes since the backup access was set up are in the background bitmap. 
Because of failure,
+ * the previously tracked writes in di->bitmap are still required too. Thus, 
merge with the
+ * background bitmap to get all new writes since the last backup.
+ */
+static void handle_backup_access_bitmaps_in_error_case(PVEBackupDevInfo *di)
+{
+Error *local_err = NULL;
+
+if (di->bs && di->background_bitmap) {
+bdrv_drained_begin(di->bs);
+if (di->bitmap) {
+bdrv_enable_dirty_bitmap(di->bitmap);
+if (!bdrv_merge_dirty_bitmap(di->bitmap, di->background_bitmap, 
NULL, &local_err)) {
+warn_report("backup access: %s - could not merge bitmaps in 
error path - %s",
+di->device_name,
+local_err ? error_get_pretty(local_err) : "unknown 
error");
+/*
+ * Could not merge, drop original bitmap too.
+ */
+bdrv_release_dirty_bitmap(di->bitmap);
+}
+} else {
+warn_report("backup access: %s - expected bitmap not present", 
di->device_name);
+}
+bdrv_release_dirty_bitmap(di->background_bitmap);
+bdrv_drained_end(di->bs);
+}
+}
+
+/*
+ * Continue tracking for next incremental backup in di->bitmap. New writes 
since the backup access
+ * was set up are in the background

[pve-devel] [RFC qemu-server v2 18/25] backup: allow adding fleecing images also for EFI and TPM

2024-08-13 Thread Fiona Ebner
For the external backup API, it will be necessary to add a fleecing
image even for small disks like EFI and TPM, because there is no other
place the old data could be copied to when a new guest write comes in.

Signed-off-by: Fiona Ebner 
---

No changes in v2.

 PVE/VZDump/QemuServer.pm | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 98685127..4ad4a154 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -548,7 +548,7 @@ my sub cleanup_fleecing_images {
 }
 
 my sub allocate_fleecing_images {
-my ($self, $disks, $vmid, $fleecing_storeid, $format) = @_;
+my ($self, $disks, $vmid, $fleecing_storeid, $format, $all_images) = @_;
 
 die "internal error - no fleecing storage specified\n" if 
!$fleecing_storeid;
 
@@ -559,7 +559,8 @@ my sub allocate_fleecing_images {
my $n = 0; # counter for fleecing image names
 
for my $di ($disks->@*) {
-   next if $di->{virtdev} =~ m/^(?:tpmstate|efidisk)\d$/; # too small 
to be worth it
+   # EFI/TPM are usually too small to be worth it, but it's required 
for external providers
+   next if !$all_images && $di->{virtdev} =~ 
m/^(?:tpmstate|efidisk)\d$/;
if ($di->{type} eq 'block' || $di->{type} eq 'file') {
my $scfg = PVE::Storage::storage_config($self->{storecfg}, 
$fleecing_storeid);
my $name = "vm-$vmid-fleece-$n";
@@ -623,7 +624,7 @@ my sub attach_fleecing_images {
 }
 
 my sub check_and_prepare_fleecing {
-my ($self, $vmid, $fleecing_opts, $disks, $is_template, $qemu_support) = 
@_;
+my ($self, $vmid, $fleecing_opts, $disks, $is_template, $qemu_support, 
$all_images) = @_;
 
 # Even if the VM was started specifically for fleecing, it's possible that 
the VM is resumed and
 # then starts doing IO. For VMs that are not resumed the fleecing images 
will just stay empty,
@@ -644,7 +645,8 @@ my sub check_and_prepare_fleecing {
$self->{storecfg}, $fleecing_opts->{storage});
my $format = scalar(grep { $_ eq 'qcow2' } $valid_formats->@*) ? 
'qcow2' : 'raw';
 
-   allocate_fleecing_images($self, $disks, $vmid, 
$fleecing_opts->{storage}, $format);
+   allocate_fleecing_images(
+   $self, $disks, $vmid, $fleecing_opts->{storage}, $format, 
$all_images);
attach_fleecing_images($self, $disks, $vmid, $format);
 }
 
@@ -735,7 +737,7 @@ sub archive_pbs {
my $is_template = 
PVE::QemuConfig->is_template($self->{vmlist}->{$vmid});
 
$task->{'use-fleecing'} = check_and_prepare_fleecing(
-   $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, 
$qemu_support);
+   $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, 
$qemu_support, 0);
 
my $fs_frozen = $self->qga_fs_freeze($task, $vmid);
 
@@ -919,7 +921,7 @@ sub archive_vma {
$attach_tpmstate_drive->($self, $task, $vmid);
 
$task->{'use-fleecing'} = check_and_prepare_fleecing(
-   $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, 
$qemu_support);
+   $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, 
$qemu_support, 0);
 
my $outfh;
if ($opts->{stdout}) {
-- 
2.39.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



  1   2   3   4   5   6   7   8   9   10   >