Re: [pve-devel] [PATCH qemu] async snapshot: stop vCPU throttling after finishing
Ping, as there are still new affected users showing up in the forum thread. Am 30.10.24 um 10:52 schrieb Fiona Ebner: > In the community forum, users reported issues about RCU stalls and > sluggish VMs after taking a snapshot with RAM in Proxmox VE [0]. Mario > was also experiencing similar issues from time to time and recently, > obtained a GDB stacktrace. The stacktrace showed that, in his case, > the vCPU threads were waiting in cpu_throttle_thread(). It is a good > guess that the issues in the forum could also be because of that. > > From searching in the source code, it seems that migration is the only > user of the vCPU throttling functions in QEMU relevant for Proxmox VE > (the only other place where it is used is the Cocoa UI). In > particular, RAM migration will begin throttling vCPUs for > auto-converge. > > In migration_iteration_finish() there is an unconditional call to > cpu_throttle_stop(), so do the same in the async snapshot code > specific to Proxmox VE. > > It's not clear why the issue began to surface more prominently only > now, since the vCPU throttling was there since commit 070afca258 > ("migration: Dynamic cpu throttling for auto-converge") in QEMU > v2.10.0. However, there were a lot of changes in the migration code > between v8.1.5 and v9.0.2 and a few of them might have affected the > likelihood of cpu_throttle_set() being called, for example, 4e1871c450 > ("migration: Don't serialize devices in qemu_savevm_state_iterate()") > > [0]: https://forum.proxmox.com/threads/153483 > > Reported-by: Mario Loderer > Signed-off-by: Fiona Ebner > Tested-by: Mario Loderer ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH docs] fix #5665: add note about short-lived cert renewal
Am 09.09.24 um 14:39 schrieb Fabian Grünbichler: > not that obvious behaviour on the systemd side, and missing cert renewal can > have wide-reaching consequences. > > Signed-off-by: Fabian Grünbichler > --- > certificate-management.adoc | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/certificate-management.adoc b/certificate-management.adoc > index 71c6d71..3bb9bb9 100644 > --- a/certificate-management.adoc > +++ b/certificate-management.adoc > @@ -223,6 +223,9 @@ If a node has been successfully configured with an > ACME-provided certificate > renewed by the `pve-daily-update.service`. Currently, renewal will be > attempted > if the certificate has expired already, or will expire in the next 30 days. > > +NOTE: If you are using a custom directory that issues short-lived > certificates, > +disabling the random delay for the `pve-daily-update.timer` unit might be > +advisable to avoid missing a certificate renewal after a reboot. > > ACME Examples with `pvenode` > applied, thanks! ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [RFC container v3 30/34] create: factor out compression option helper
In preparation to re-use it for checking potentially untrusted archives. Signed-off-by: Fiona Ebner --- New in v3. src/PVE/LXC/Create.pm | 51 +-- 1 file changed, 30 insertions(+), 21 deletions(-) diff --git a/src/PVE/LXC/Create.pm b/src/PVE/LXC/Create.pm index 719f372..d2f675e 100644 --- a/src/PVE/LXC/Create.pm +++ b/src/PVE/LXC/Create.pm @@ -78,15 +78,38 @@ sub restore_proxmox_backup_archive { $scfg, $storeid, $cmd, $param, userns_cmd => $userns_cmd); } +my sub tar_compression_option { +my ($archive) = @_; + +my %compression_map = ( + '.gz' => '-z', + '.bz2' => '-j', + '.xz' => '-J', + '.lzo' => '--lzop', + '.zst' => '--zstd', +); +if ($archive =~ /\.tar(\.[^.]+)?$/) { + if (defined($1)) { + die "unrecognized compression format: $1\n" if !defined($compression_map{$1}); + return $compression_map{$1}; + } + return; +} else { + die "file does not look like a template archive: $archive\n"; +} +} + my sub restore_tar_archive_command { -my ($conf, $opts, $rootdir, $bwlimit) = @_; +my ($conf, $compression_opt, $rootdir, $bwlimit) = @_; my ($id_map, $root_uid, $root_gid) = PVE::LXC::parse_id_maps($conf); my $userns_cmd = PVE::LXC::userns_command($id_map); -my $cmd = [@$userns_cmd, 'tar', 'xpf', '-', $opts->@*, '--totals', - @PVE::Storage::Plugin::COMMON_TAR_FLAGS, - '-C', $rootdir]; +my $cmd = [@$userns_cmd, 'tar', 'xpf', '-']; +push $cmd->@*, $compression_opt if $compression_opt; +push $cmd->@*, '--totals'; +push $cmd->@*, @PVE::Storage::Plugin::COMMON_TAR_FLAGS; +push $cmd->@*, '-C', $rootdir; # skip-old-files doesn't have anything to do with time (old/new), but is # simply -k (annoyingly also called --keep-old-files) without the 'treat @@ -108,24 +131,10 @@ sub restore_tar_archive { my $archive_fh; my $tar_input = '<&STDIN'; -my @compression_opt; +my $compression_opt; if ($archive ne '-') { # GNU tar refuses to autodetect this... *sigh* - my %compression_map = ( - '.gz' => '-z', - '.bz2' => '-j', - '.xz' => '-J', - '.lzo' => '--lzop', - '.zst' => '--zstd', - ); - if ($archive =~ /\.tar(\.[^.]+)?$/) { - if (defined($1)) { - die "unrecognized compression format: $1\n" if !defined($compression_map{$1}); - @compression_opt = $compression_map{$1}; - } - } else { - die "file does not look like a template archive: $archive\n"; - } + $compression_opt = tar_compression_option($archive); sysopen($archive_fh, $archive, O_RDONLY) or die "failed to open '$archive': $!\n"; my $flags = $archive_fh->fcntl(Fcntl::F_GETFD(), 0); @@ -133,7 +142,7 @@ sub restore_tar_archive { $tar_input = '<&'.fileno($archive_fh); } -my $cmd = restore_tar_archive_command($conf, [@compression_opt], $rootdir, $bwlimit); +my $cmd = restore_tar_archive_command($conf, $compression_opt, $rootdir, $bwlimit); if ($archive eq '-') { print "extracting archive from STDIN\n"; -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [RFC qemu/common/storage/qemu-server/container/manager v3 00/34] backup provider API
ss and, if used, bitmap are exported via NBD. Container mechanism 'directory': A copy or snapshot of the container's filesystem state is made available as a directory. The method is executed inside the user namespace associated to the container. 2. Restore API The restore_get_mechanism() method is used to decide on the restore mechanism. Currently, 'qemu-img' for VMs, and 'directory' or 'tar' for containers are possible. It is enough to implement one mechanism for VMs and one mechanism for containers. Next, methods for extracting the guest and firewall configuration and the implementations of the restore mechanism via a pair of methods: an init method, for making the data available to Proxmox VE and a cleanup method that is called after restore. For VMs, there also is a restore_vm_get_device_info() helper required, to get the disks included in the backup and their sizes. 2.1. Restore Mechanisms VM mechanism 'qemu-img': The backup provider gives a path to the disk image that will be restored. The path needs to be something 'qemu-img' can deal with, e.g. can also be an NBD URI or similar. Container mechanism 'directory': The backup provider gives the path to a directory with the full filesystem structure of the container. Container mechanism 'tar': The backup provider gives the path to a (potentially compressed) tar archive with the full filesystem structure of the container. See the PVE::BackupProvider::Plugin module for the full API documentation. == This series adapts the backup stack in Proxmox VE to allow using the above API. For QEMU, backup access setup and teardown QMP commands are implemented to be able to provide access to a consistent disk state to the backup provider. The series also provides an example implementation for a backup provider as a proof-of-concept, exposing the different features. == Open questions: Should the backup provider plugin system also follow the same API age+version schema with a Custom/ directory for external plugins derived from the base plugin? Should the bitmap action be passed directly to the backup provider? I.e. have 'not-used', 'not-used-removed', 'new', 'used', 'invalid', instead of only 'none', 'new' and 'reuse'. It makes API slightly more complicated. Is there any situation where backup provider could care if bitmap is new, because it was the first or bitmap is new because previous was invalid? Both cases require the backup provider to do a full backup. == The patches marked as PATCH rather than RFC can make sense independently, with QEMU patches 02 and 03 having been sent already before (touching same code, so included here): https://lore.proxmox.com/pve-devel/20240625133551.210636-1-f.eb...@proxmox.com/#r == Feedback is very welcome, especially from people wishing to implement such a backup provider plugin! Please tell me what issues you see with the proposed API, what would and wouldn't work from your perspective? == Dependencies: pve-manager, pve-container and qemu-server all depend on new libpve-storage-perl. pve-manager also build-depends on the new libpve-storage-perl for its tests. pve-container depends on new pve-common. To keep things clean, pve-manager should also depend on new pve-container and qemu-server. In qemu-server, there is no version guard added yet, as that depends on the QEMU version the feature will land in. == qemu: Fiona Ebner (9): block/reqlist: allow adding overlapping requests PVE backup: fixup error handling for fleecing PVE backup: factor out setting up snapshot access for fleecing PVE backup: save device name in device info structure PVE backup: include device name in error when setting up snapshot access fails PVE backup: add target ID in backup state PVE backup: get device info: allow caller to specify filter for which devices use fleecing PVE backup: implement backup access setup and teardown API for external providers PVE backup: implement bitmap support for external backup access block/copy-before-write.c | 3 +- block/reqlist.c | 2 - pve-backup.c | 620 +- pve-backup.h | 16 + qapi/block-core.json | 61 system/runstate.c | 6 + 6 files changed, 637 insertions(+), 71 deletions(-) create mode 100644 pve-backup.h common: Fiona Ebner (1): env: add module with helpers to run a Perl subroutine in a user namespace src/Makefile | 1 + src/PVE/Env.pm | 136 + 2 files changed, 137 insertions(+) create mode 100644 src/PVE/Env.pm storage: Fiona Ebner (5): add storage_has_feature() helper function plugin: introduce new_backup_provider() method extract backup config: delegate to backup provider for s
[pve-devel] [RFC container v3 32/34] api: add early check against restoring privileged container from external source
While restore_external_archive() already has a check, that happens after an existing container is destroyed. Signed-off-by: Fiona Ebner --- New in v3. src/PVE/API2/LXC.pm | 14 ++ 1 file changed, 14 insertions(+) diff --git a/src/PVE/API2/LXC.pm b/src/PVE/API2/LXC.pm index 213e518..dca0e35 100644 --- a/src/PVE/API2/LXC.pm +++ b/src/PVE/API2/LXC.pm @@ -39,6 +39,17 @@ BEGIN { } } +my sub assert_not_restore_from_external { +my ($archive, $storage_cfg) = @_; + +my ($storeid, undef) = PVE::Storage::parse_volume_id($archive, 1); + +return if !defined($storeid); +return if !PVE::Storage::storage_has_feature($storage_cfg, $storeid, 'backup-provider'); + +die "refusing to restore privileged container backup from external source\n"; +} + my $check_storage_access_migrate = sub { my ($rpcenv, $authuser, $storecfg, $storage, $node) = @_; @@ -408,6 +419,9 @@ __PACKAGE__->register_method({ $conf->{unprivileged} = $orig_conf->{unprivileged} if !defined($unprivileged) && defined($orig_conf->{unprivileged}); + assert_not_restore_from_external($archive, $storage_cfg) + if !$conf->{unprivileged}; + # implicit privileged change is checked here if ($old_conf->{unprivileged} && !$conf->{unprivileged}) { $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Allocate']); -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [RFC qemu v3 08/34] PVE backup: implement backup access setup and teardown API for external providers
For external backup providers, the state of the VM's disk images at the time the backup is started is preserved via a snapshot-access block node. Old data is moved to the fleecing image when new guest writes come in. The snapshot-access block node, as well as the associated bitmap in case of incremental backup, will be exported via NBD to the external provider. The NBD export will be done by the management layer, the missing functionality is setting up and tearing down the snapshot-access block nodes, which this patch adds. It is necessary to also set up fleecing for EFI and TPM disks, so that old data can be moved out of the way when a new guest write comes in. There can only be one regular backup or one active backup access at a time, because both require replacing the original block node of the drive. Thus the backup state is re-used, and checks are added to prohibit regular backup while snapshot access is active and vice versa. The block nodes added by the backup-access-setup QMP call are not tracked anywhere else (there is no job they are associated to like for regular backup). This requires adding a callback for teardown when QEMU exits, i.e. in qemu_cleanup(). Otherwise, there will be an assertion failure that the block graph is not empty when QEMU exits before the backup-access-teardown QMP command is called. The code for the qmp_backup_access_setup() was based on the existing qmp_backup() routine. The return value for the setup QMP command contains information about the snapshot-access block nodes that can be used by the management layer to set up the NBD exports. Signed-off-by: Fiona Ebner --- No changes in v3. pve-backup.c | 273 +++ pve-backup.h | 16 +++ qapi/block-core.json | 45 +++ system/runstate.c| 6 + 4 files changed, 340 insertions(+) create mode 100644 pve-backup.h diff --git a/pve-backup.c b/pve-backup.c index d0593fc581..d3370d6744 100644 --- a/pve-backup.c +++ b/pve-backup.c @@ -1,4 +1,5 @@ #include "proxmox-backup-client.h" +#include "pve-backup.h" #include "vma.h" #include "qemu/osdep.h" @@ -585,6 +586,37 @@ static int setup_snapshot_access(PVEBackupDevInfo *di, Error **errp) return 0; } +static void setup_all_snapshot_access_bh(void *opaque) +{ +assert(!qemu_in_coroutine()); + +CoCtxData *data = (CoCtxData*)opaque; +Error **errp = (Error**)data->data; + +Error *local_err = NULL; + +GList *l = backup_state.di_list; +while (l) { +PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data; +l = g_list_next(l); + +bdrv_drained_begin(di->bs); + +if (setup_snapshot_access(di, &local_err) < 0) { +cleanup_snapshot_access(di); +bdrv_drained_end(di->bs); +error_setg(errp, "%s - setting up snapshot access failed: %s", di->device_name, + local_err ? error_get_pretty(local_err) : "unknown error"); +break; +} + +bdrv_drained_end(di->bs); +} + +/* return */ +aio_co_enter(data->ctx, data->co); +} + /* * backup_job_create can *not* be run from a coroutine, so this can't either. * The caller is responsible that backup_mutex is held nonetheless. @@ -722,6 +754,11 @@ static bool fleecing_no_efi_tpm(const char *device_id) return strncmp(device_id, "drive-efidisk", 13) && strncmp(device_id, "drive-tpmstate", 14); } +static bool fleecing_all(const char *device_id) +{ +return true; +} + /* * Returns a list of device infos, which needs to be freed by the caller. In * case of an error, errp will be set, but the returned value might still be a @@ -810,6 +847,242 @@ err: return di_list; } +BackupAccessInfoList *coroutine_fn qmp_backup_access_setup( +const char *target_id, +const char *devlist, +Error **errp) +{ +assert(qemu_in_coroutine()); + +qemu_co_mutex_lock(&backup_state.backup_mutex); + +Error *local_err = NULL; +GList *di_list = NULL; +GList *l; + +if (backup_state.di_list) { +error_set(errp, ERROR_CLASS_GENERIC_ERROR, + "previous backup by provider '%s' not finished", backup_state.target_id); +qemu_co_mutex_unlock(&backup_state.backup_mutex); +return NULL; +} + +bdrv_graph_co_rdlock(); +di_list = get_device_info(devlist, fleecing_all, &local_err); +bdrv_graph_co_rdunlock(); +if (local_err) { +error_propagate(errp, local_err); +goto err; +} +assert(di_list); + +size_t total = 0; + +l = di_list; +while (l) { +PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data; +l = g_list_next(l); + +ssize_t size = bdrv_getlength(di->bs); +if (size < 0) { +error_setg_errno(errp, -size, "bdr
[pve-devel] [RFC container v3 28/34] backup: implement restore for external providers
First, the provider is asked about what restore mechanism to use. Currently, 'directory' and 'tar' are possible, for restoring either from a directory containing the full filesystem structure (for which rsync is used) or a potentially compressed tar file containing the same. The new functions are copied and adapted from the existing ones for PBS or tar and it might be worth to factor out the common parts. Restore of containers as privileged are prohibited, because the archives from an external provider are considered less trusted than from Proxmox VE storages. If ever allowing that in the future, at least it would be worth extracting the tar archive in a restricted context (e.g. user namespace with ID mapped mount or seccomp). Signed-off-by: Fiona Ebner --- Changes in v3: * Use user namespace when restoring directory (and use tar instead of rsync, because it is easier to split in privileged and unprivileged half) src/PVE/LXC/Create.pm | 141 ++ 1 file changed, 141 insertions(+) diff --git a/src/PVE/LXC/Create.pm b/src/PVE/LXC/Create.pm index 8c8cb9a..8657ac1 100644 --- a/src/PVE/LXC/Create.pm +++ b/src/PVE/LXC/Create.pm @@ -7,6 +7,7 @@ use File::Path; use Fcntl; use PVE::RPCEnvironment; +use PVE::RESTEnvironment qw(log_warn); use PVE::Storage::PBSPlugin; use PVE::Storage::Plugin; use PVE::Storage; @@ -26,6 +27,24 @@ sub restore_archive { if ($scfg->{type} eq 'pbs') { return restore_proxmox_backup_archive($storage_cfg, $archive, $rootdir, $conf, $no_unpack_error, $bwlimit); } + if (PVE::Storage::storage_has_feature($storage_cfg, $storeid, 'backup-provider')) { + my $log_function = sub { + my ($log_level, $message) = @_; + my $prefix = $log_level eq 'err' ? 'ERROR' : uc($log_level); + print "$prefix: $message\n"; + }; + my $backup_provider = + PVE::Storage::new_backup_provider($storage_cfg, $storeid, $log_function); + return restore_external_archive( + $backup_provider, + $storeid, + $volname, + $rootdir, + $conf, + $no_unpack_error, + $bwlimit, + ); + } } $archive = PVE::Storage::abs_filesystem_path($storage_cfg, $archive) if $archive ne '-'; @@ -127,6 +146,54 @@ sub restore_tar_archive { die $err if $err && !$no_unpack_error; } +sub restore_external_archive { +my ($backup_provider, $storeid, $volname, $rootdir, $conf, $no_unpack_error, $bwlimit) = @_; + +die "refusing to restore privileged container backup from external source\n" + if !$conf->{unprivileged}; + +my ($mechanism, $vmtype) = $backup_provider->restore_get_mechanism($volname, $storeid); +die "cannot restore non-LXC guest of type '$vmtype'\n" if $vmtype ne 'lxc'; + +my $info = $backup_provider->restore_container_init($volname, $storeid, {}); +eval { + if ($mechanism eq 'tar') { + my $tar_path = $info->{'tar-path'} + or die "did not get path to tar file from backup provider\n"; + die "not a regular file '$tar_path'" if !-f $tar_path; + restore_tar_archive($tar_path, $rootdir, $conf, $no_unpack_error, $bwlimit); + } elsif ($mechanism eq 'directory') { + my $directory = $info->{'archive-directory'} + or die "did not get path to archive directory from backup provider\n"; + die "not a directory '$directory'" if !-d $directory; + + my $create_cmd = [ + 'tar', + 'cpf', + '-', + @PVE::Storage::Plugin::COMMON_TAR_FLAGS, + "--directory=$directory", + '.', + ]; + + my $extract_cmd = restore_tar_archive_command($conf, undef, $rootdir, $bwlimit); + + eval { PVE::Tools::run_command([$create_cmd, $extract_cmd]); }; + die $@ if $@ && !$no_unpack_error; + } else { + die "mechanism '$mechanism' requested by backup provider is not supported for LXCs\n"; + } +}; +my $err = $@; +eval { $backup_provider->restore_container_cleanup($volname, $storeid, {}); }; +if (my $cleanup_err = $@) { + die $cleanup_err if !$err; + warn $cleanup_err; +} +die $err if $err; + +} + sub recover_config { my ($storage_cfg, $volid, $vmid) = @_; @@ -135,6 +202,8 @@ sub recover_config { my $scfg = PVE::Storage::storage_check_enabled($storage_cfg, $storeid); if ($scfg->{type} eq 'pbs') {
[pve-devel] [PATCH qemu-server v3 18/34] backup: cleanup: check if VM is running before issuing QMP commands
When the VM is only started for backup, the VM will be stopped at that point again. While the detach helpers do not warn about errors currently, that might change in the future. This is also in preparation for other cleanup QMP helpers that are more verbose about failure. Signed-off-by: Fiona Ebner --- No changes in v3. PVE/VZDump/QemuServer.pm | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm index b2ced154..c46e607c 100644 --- a/PVE/VZDump/QemuServer.pm +++ b/PVE/VZDump/QemuServer.pm @@ -1118,13 +1118,14 @@ sub snapshot { sub cleanup { my ($self, $task, $vmid) = @_; -$detach_tpmstate_drive->($task, $vmid); - -if ($task->{'use-fleecing'}) { - detach_fleecing_images($task->{disks}, $vmid); - cleanup_fleecing_images($self, $task->{disks}); +# If VM was started only for backup, it is already stopped now. +if (PVE::QemuServer::Helpers::vm_running_locally($vmid)) { + $detach_tpmstate_drive->($task, $vmid); + detach_fleecing_images($task->{disks}, $vmid) if $task->{'use-fleecing'}; } +cleanup_fleecing_images($self, $task->{disks}) if $task->{'use-fleecing'}; + if ($self->{qmeventd_fh}) { close($self->{qmeventd_fh}); } -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [RFC storage v3 11/34] add storage_has_feature() helper function
Which looks up whether a storage supports a given feature in its 'plugindata'. This is intentionally kept simple and not implemented as a plugin method for now. Should it ever become more complex requiring plugins to override the default implementation, it can later be changed to a method. Suggested-by: Fabian Grünbichler Signed-off-by: Fiona Ebner --- New in v3. src/PVE/Storage.pm| 8 src/PVE/Storage/Plugin.pm | 10 ++ 2 files changed, 18 insertions(+) diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm index 57b2038..e251056 100755 --- a/src/PVE/Storage.pm +++ b/src/PVE/Storage.pm @@ -204,6 +204,14 @@ sub storage_check_enabled { return storage_check_node($cfg, $storeid, $node, $noerr); } +sub storage_has_feature { +my ($cfg, $storeid, $feature) = @_; + +my $scfg = storage_config($cfg, $storeid); + +return PVE::Storage::Plugin::storage_has_feature($scfg->{type}, $feature); +} + # storage_can_replicate: # return true if storage supports replication # (volumes allocated with vdisk_alloc() has replication feature) diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm index 8cc693c..6071e45 100644 --- a/src/PVE/Storage/Plugin.pm +++ b/src/PVE/Storage/Plugin.pm @@ -244,6 +244,16 @@ sub dirs_hash_to_string { return join(',', map { "$_=$hash->{$_}" } sort keys %$hash); } +sub storage_has_feature { +my ($type, $feature) = @_; + +my $data = $defaultData->{plugindata}->{$type}; +if (my $features = $data->{features}) { + return $features->{$feature}; +} +return; +} + sub default_format { my ($scfg) = @_; -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [RFC container v3 26/34] backup: implement backup for external providers
The filesystem structure is made available as a directory in a consistent manner (with details depending on the vzdump backup mode) just like for regular backup via tar. The backup_container() method of the backup provider is executed in a user namespace with the container's ID mapping applied. This allows the backup provider to see the container's filesystem from the container's perspective. The 'prepare' phase of the backup hook is executed right before and allows the backup provider to prepare for the (usually) unprivileged execution context in the user namespace. The backup provider needs to back up the guest and firewall configuration and the filesystem structure of the container, honoring file exclusions and the bandwidth limit. Signed-off-by: Fiona Ebner --- Changes in v3: * pass in config as raw data instead of file name * run backup_container() method in user namespace associated to the container * warn if backing up privileged container to external provider src/PVE/VZDump/LXC.pm | 38 +- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/src/PVE/VZDump/LXC.pm b/src/PVE/VZDump/LXC.pm index 1928548..d201d8a 100644 --- a/src/PVE/VZDump/LXC.pm +++ b/src/PVE/VZDump/LXC.pm @@ -14,6 +14,7 @@ use PVE::LXC::Config; use PVE::LXC; use PVE::Storage; use PVE::Tools; +use PVE::Env; use PVE::VZDump; use base qw (PVE::VZDump::Plugin); @@ -124,6 +125,7 @@ sub prepare { my ($id_map, $root_uid, $root_gid) = PVE::LXC::parse_id_maps($conf); $task->{userns_cmd} = PVE::LXC::userns_command($id_map); +$task->{id_map} = $id_map; $task->{root_uid} = $root_uid; $task->{root_gid} = $root_gid; @@ -373,7 +375,41 @@ sub archive { my $userns_cmd = $task->{userns_cmd}; my $findexcl = $self->{vzdump}->{findexcl}; -if ($self->{vzdump}->{opts}->{pbs}) { +if (my $backup_provider = $self->{vzdump}->{'backup-provider'}) { + $self->loginfo("starting external backup via " . $backup_provider->provider_name()); + + if (!scalar($task->{id_map}->@*) || $task->{root_uid} == 0 || $task->{root_gid} == 0) { + $self->log("warn", "external backup of privileged container can only be restored as" + ." unprivileged which might not work in all cases"); + } + + my ($mechanism) = $backup_provider->backup_get_mechanism($vmid, 'lxc'); + die "mechanism '$mechanism' requested by backup provider is not supported for containers\n" + if $mechanism ne 'directory'; + + my $guest_config = PVE::Tools::file_get_contents("$tmpdir/etc/vzdump/pct.conf"); + my $firewall_file = "$tmpdir/etc/vzdump/pct.fw"; + + my $info = { + directory => $snapdir, + sources => [@sources], + 'backup-user-id' => $task->{root_uid}, + }; + $info->{'firewall-config'} = PVE::Tools::file_get_contents($firewall_file) + if -e $firewall_file; + $info->{'bandwidth-limit'} = $opts->{bwlimit} * 1024 if $opts->{bwlimit}; + + $backup_provider->backup_hook('prepare', $vmid, 'lxc', $info); + + if (scalar($task->{id_map}->@*)) { + PVE::Env::run_in_userns( + sub { $backup_provider->backup_container($vmid, $guest_config, $findexcl, $info); }, + $task->{id_map}, + ); + } else { + $backup_provider->backup_container($vmid, $guest_config, $findexcl, $info); + } +} elsif ($self->{vzdump}->{opts}->{pbs}) { my $param = []; push @$param, "pct.conf:$tmpdir/etc/vzdump/pct.conf"; -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [RFC container v3 31/34] restore tar archive: check potentially untrusted archive
'tar' itself already protects against '..' in component names and strips absolute member names when extracting (if not used with the --absolute-names option) and in general seems sane for extracting. Additionally, the extraction already happens in the user namespace associated to the container. So for now, start out with some basic sanity checks. The helper can still be extended with more checks. Checks: * list files in archive - will already catch many corrupted/bogus archives. * check that there are at least 10 members - should also catch archives not actually containing a container root filesystem or structural issues early. * check that /sbin directory or link exists in archive - ideally the check would be for /sbin/init, but this cannot be done efficiently before extraction, because it would require to keep track of the whole archive to be able to follow symlinks. * abort if there is a multi-volume member in the archive - cheap and is never expected. Checks that were considered, but not (yet) added: * abort when a file has unrealistically large size - while this could help to detect certain kinds of bogus archives, there can be valid. use cases for extremely large sparse files, so it's not clear what a good limit would be (1 EiB maybe?). Also, an attacker could just adapt to such a limit creating multiple files and the actual extraction is already limited by the size of the allocated container volume. * check that /sbin/init exists after extracting - cannot be done efficiently before extraction, because it would require to keep track of the whole archive to be able to follow symlinks. During setup there already is detection of /etc/os-release, so issues with the structure will already be noticed. Adding a hard fail for untrusted archives would require either passing that information to the setup phase or extracting the protected_call method from there into a helper. * adding 'restrict' to the (common) tar flags - the tar manual (not the man page) documents: "Disable use of some potentially harmful 'tar' options. Currently this option disables shell invocation from multi-volume menu.". The flag was introduced in 2005 and this is still the only thing it is used for. Trying to restore a multi-volume archive already fails without giving multiple '--file' arguments and '--multi-volume', so don't bother adding the flag. * check format of tar file - would require yet another invocation of the decompressor and there seems to be no built-in way to just display the format with 'tar'. The 'file' program could be used, but it seems to not make a distinction between old GNU and GNU and old POSIX and POSIX formats, with the old ones being candidates to prohibit. So that would leave just detecting the old 'v7' format. Suggested-by: Fabian Grünbichler Signed-off-by: Fiona Ebner --- New in v3. src/PVE/LXC/Create.pm | 67 --- 1 file changed, 63 insertions(+), 4 deletions(-) diff --git a/src/PVE/LXC/Create.pm b/src/PVE/LXC/Create.pm index d2f675e..bf424f6 100644 --- a/src/PVE/LXC/Create.pm +++ b/src/PVE/LXC/Create.pm @@ -99,12 +99,65 @@ my sub tar_compression_option { } } +# Basic checks trying to detect issues with a potentially untrusted or bogus tar archive. +# Just listing the files is already a good check against corruption. +# 'tar' itself already protects against '..' in component names and strips absolute member names +# when extracting, so no need to check for those here. +my sub check_tar_archive { +my ($archive) = @_; + +print "checking archive..\n"; + +# To resolve links to get to 'sbin/init' would mean keeping track of everything in the archive, +# because the target might be ordered first. Check only that 'sbin' exists here. +my $found_sbin; + +# Just to detect bogus archives, any valid container filesystem should have more than this. +my $required_members = 10; +my $member_count = 0; + +my $check_file_list = sub { + my ($line) = @_; + + # The date is in ISO 8601 format. The last part contains the potentially quoted file name, + # potentially followed by some additional info (e.g. where a link points to). + my ($type, $perms, $uid, $gid, $size, $date, $time, $file_info) = + $line =~ m!^([a-zA-Z\-])(\S+)\s+(\d+)/(\d+)\s+(\d+)\s+(\S+)\s+(\S+)\s+(.*)$!; + + die "found multi-volume member in archive\n" if $type eq 'M'; + + if (!$found_sbin && ( + ($file_info =~ m!^(?:\./)?sbin/$! && $type eq 'd') + || ($file_info =~ m!^(?:\./)?sbin ->! && $type eq 'l') + || ($file_info =~ m!^(?:\./)?sbin link to! && $type eq 'h') +
[pve-devel] [PATCH manager v3 33/34] ui: backup: also check for backup subtype to classify archive
In anticipation of future storage plugins that might not have PBS-specific formats or adhere to the vzdump naming scheme for backups. Signed-off-by: Fiona Ebner --- No changes in v3. www/manager6/Utils.js | 10 ++ www/manager6/grid/BackupView.js| 4 ++-- www/manager6/storage/BackupView.js | 4 ++-- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/www/manager6/Utils.js b/www/manager6/Utils.js index db86fa9a..a8e4e8ee 100644 --- a/www/manager6/Utils.js +++ b/www/manager6/Utils.js @@ -693,12 +693,14 @@ Ext.define('PVE.Utils', { 'snippets': gettext('Snippets'), }, -volume_is_qemu_backup: function(volid, format) { - return format === 'pbs-vm' || volid.match(':backup/vzdump-qemu-'); +volume_is_qemu_backup: function(volume) { + return volume.format === 'pbs-vm' || volume.volid.match(':backup/vzdump-qemu-') || + volume.subtype === 'qemu'; }, -volume_is_lxc_backup: function(volid, format) { - return format === 'pbs-ct' || volid.match(':backup/vzdump-(lxc|openvz)-'); +volume_is_lxc_backup: function(volume) { + return volume.format === 'pbs-ct' || volume.volid.match(':backup/vzdump-(lxc|openvz)-') || + volume.subtype === 'lxc'; }, authSchema: { diff --git a/www/manager6/grid/BackupView.js b/www/manager6/grid/BackupView.js index e71d1c88..ef3649c6 100644 --- a/www/manager6/grid/BackupView.js +++ b/www/manager6/grid/BackupView.js @@ -29,11 +29,11 @@ Ext.define('PVE.grid.BackupView', { var vmtypeFilter; if (vmtype === 'lxc' || vmtype === 'openvz') { vmtypeFilter = function(item) { - return PVE.Utils.volume_is_lxc_backup(item.data.volid, item.data.format); + return PVE.Utils.volume_is_lxc_backup(item.data); }; } else if (vmtype === 'qemu') { vmtypeFilter = function(item) { - return PVE.Utils.volume_is_qemu_backup(item.data.volid, item.data.format); + return PVE.Utils.volume_is_qemu_backup(item.data); }; } else { throw "unsupported VM type '" + vmtype + "'"; diff --git a/www/manager6/storage/BackupView.js b/www/manager6/storage/BackupView.js index db184def..749c2136 100644 --- a/www/manager6/storage/BackupView.js +++ b/www/manager6/storage/BackupView.js @@ -86,9 +86,9 @@ Ext.define('PVE.storage.BackupView', { disabled: true, handler: function(b, e, rec) { let vmtype; - if (PVE.Utils.volume_is_qemu_backup(rec.data.volid, rec.data.format)) { + if (PVE.Utils.volume_is_qemu_backup(rec.data)) { vmtype = 'qemu'; - } else if (PVE.Utils.volume_is_lxc_backup(rec.data.volid, rec.data.format)) { + } else if (PVE.Utils.volume_is_lxc_backup(rec.data)) { vmtype = 'lxc'; } else { return; -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [RFC container v3 27/34] create: factor out tar restore command helper
In preparation to re-use it for restore from backup providers. Signed-off-by: Fiona Ebner --- New in v3. src/PVE/LXC/Create.pm | 42 +- 1 file changed, 25 insertions(+), 17 deletions(-) diff --git a/src/PVE/LXC/Create.pm b/src/PVE/LXC/Create.pm index 7c5bf0a..8c8cb9a 100644 --- a/src/PVE/LXC/Create.pm +++ b/src/PVE/LXC/Create.pm @@ -59,12 +59,34 @@ sub restore_proxmox_backup_archive { $scfg, $storeid, $cmd, $param, userns_cmd => $userns_cmd); } -sub restore_tar_archive { -my ($archive, $rootdir, $conf, $no_unpack_error, $bwlimit) = @_; +my sub restore_tar_archive_command { +my ($conf, $opts, $rootdir, $bwlimit) = @_; my ($id_map, $root_uid, $root_gid) = PVE::LXC::parse_id_maps($conf); my $userns_cmd = PVE::LXC::userns_command($id_map); +my $cmd = [@$userns_cmd, 'tar', 'xpf', '-', $opts->@*, '--totals', + @PVE::Storage::Plugin::COMMON_TAR_FLAGS, + '-C', $rootdir]; + +# skip-old-files doesn't have anything to do with time (old/new), but is +# simply -k (annoyingly also called --keep-old-files) without the 'treat +# existing files as errors' part... iow. it's bsdtar's interpretation of -k +# *sigh*, gnu... +push @$cmd, '--skip-old-files'; +push @$cmd, '--anchored'; +push @$cmd, '--exclude' , './dev/*'; + +if (defined($bwlimit)) { + $cmd = [ ['cstream', '-t', $bwlimit*1024], $cmd ]; +} + +return $cmd; +} + +sub restore_tar_archive { +my ($archive, $rootdir, $conf, $no_unpack_error, $bwlimit) = @_; + my $archive_fh; my $tar_input = '<&STDIN'; my @compression_opt; @@ -92,21 +114,7 @@ sub restore_tar_archive { $tar_input = '<&'.fileno($archive_fh); } -my $cmd = [@$userns_cmd, 'tar', 'xpf', '-', @compression_opt, '--totals', - @PVE::Storage::Plugin::COMMON_TAR_FLAGS, - '-C', $rootdir]; - -# skip-old-files doesn't have anything to do with time (old/new), but is -# simply -k (annoyingly also called --keep-old-files) without the 'treat -# existing files as errors' part... iow. it's bsdtar's interpretation of -k -# *sigh*, gnu... -push @$cmd, '--skip-old-files'; -push @$cmd, '--anchored'; -push @$cmd, '--exclude' , './dev/*'; - -if (defined($bwlimit)) { - $cmd = [ ['cstream', '-t', $bwlimit*1024], $cmd ]; -} +my $cmd = restore_tar_archive_command($conf, [@compression_opt], $rootdir, $bwlimit); if ($archive eq '-') { print "extracting archive from STDIN\n"; -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [POC storage v3 14/34] add backup provider example
The example uses a simple directory structure to save the backups, grouped by guest ID. VM backups are saved as configuration files and qcow2 images, with backing files when doing incremental backups. Container backups are saved as configuration files and a tar file or squashfs image (added to test the 'directory' restore mechanism). Whether to use incremental VM backups and which backup mechanisms to use can be configured in the storage configuration. The 'nbdinfo' binary from the 'libnbd-bin' package is required for backup mechanism 'nbd' for VM backups, the 'mksquashfs' binary from the 'squashfs-tools' package is required for backup mechanism 'squashfs' for containers. Signed-off-by: Fiona Ebner --- Changes in v3: * adapt to API changes * use NBD export when restoring VM image, to make incremental backups using qcow2 chains work again .../BackupProvider/Plugin/DirectoryExample.pm | 697 ++ src/PVE/BackupProvider/Plugin/Makefile| 2 +- .../Custom/BackupProviderDirExamplePlugin.pm | 307 src/PVE/Storage/Custom/Makefile | 5 + src/PVE/Storage/Makefile | 1 + 5 files changed, 1011 insertions(+), 1 deletion(-) create mode 100644 src/PVE/BackupProvider/Plugin/DirectoryExample.pm create mode 100644 src/PVE/Storage/Custom/BackupProviderDirExamplePlugin.pm create mode 100644 src/PVE/Storage/Custom/Makefile diff --git a/src/PVE/BackupProvider/Plugin/DirectoryExample.pm b/src/PVE/BackupProvider/Plugin/DirectoryExample.pm new file mode 100644 index 000..99825ef --- /dev/null +++ b/src/PVE/BackupProvider/Plugin/DirectoryExample.pm @@ -0,0 +1,697 @@ +package PVE::BackupProvider::Plugin::DirectoryExample; + +use strict; +use warnings; + +use Fcntl qw(SEEK_SET); +use File::Path qw(make_path remove_tree); +use IO::File; +use IPC::Open3; + +use PVE::Storage::Plugin; +use PVE::Tools qw(file_get_contents file_read_firstline file_set_contents run_command); + +use base qw(PVE::BackupProvider::Plugin::Base); + +use constant { +BLKDISCARD => 0x1277, # see linux/fs.h +}; + +# Private helpers + +my sub log_info { +my ($self, $message) = @_; + +$self->{'log-function'}->('info', $message); +} + +my sub log_warning { +my ($self, $message) = @_; + +$self->{'log-function'}->('warn', $message); +} + +my sub log_error { +my ($self, $message) = @_; + +$self->{'log-function'}->('err', $message); +} + +# Try to use the same bitmap ID as last time for incremental backup if the storage is configured for +# incremental VM backup. Need to start fresh if there is no previous ID or the associated backup +# doesn't exist. +my sub get_bitmap_id { +my ($self, $vmid, $vmtype) = @_; + +return if $self->{'storage-plugin'}->get_vm_backup_mode($self->{scfg}) ne 'incremental'; + +my $previous_info_dir = "$self->{scfg}->{path}/$vmid/"; + +my $previous_info_file = "$previous_info_dir/previous-info"; +my $info = file_read_firstline($previous_info_file) // ''; +$self->{$vmid}->{'old-previous-info'} = $info; +my ($bitmap_id, $previous_backup_id) = $info =~ m/^(\d+)\s+(\d+)$/; +my $previous_backup_dir = + $previous_backup_id ? "$self->{scfg}->{path}/$vmid/$vmtype-$previous_backup_id" : undef; + +if ($bitmap_id && -d $previous_backup_dir) { + $self->{$vmid}->{'previous-backup-dir'} = $previous_backup_dir; +} else { + # need to start fresh if there is no previous ID or the associated backup doesn't exist + $bitmap_id = $self->{$vmid}->{'backup-time'}; +} + +$self->{$vmid}->{'bitmap-id'} = $bitmap_id; +make_path($previous_info_dir); +die "unable to create directory $previous_info_dir\n" if !-d $previous_info_dir; +file_set_contents($previous_info_file, "$bitmap_id $self->{$vmid}->{'backup-time'}"); + +return $bitmap_id; +} + +# Backup Provider API + +sub new { +my ($class, $storage_plugin, $scfg, $storeid, $log_function) = @_; + +my $self = bless { + scfg => $scfg, + storeid => $storeid, + 'storage-plugin' => $storage_plugin, + 'log-function' => $log_function, +}, $class; + +return $self; +} + +sub provider_name { +my ($self) = @_; + +return 'dir provider example'; +} + +# Hooks + +my sub job_start { +my ($self, $start_time) = @_; + +log_info($self, "job start hook called"); + +run_command(["modprobe", "nbd"]); + +log_info($self, "backup provider initialized successfully for new job $start_time"); +} + +sub job_hook { +my ($self, $phase, $info) = @_; + +
[pve-devel] [RFC qemu v3 09/34] PVE backup: implement bitmap support for external backup access
There can be one dirty bitmap for each backup target ID (which are tracked in the backup_access_bitmaps hash table). The QMP user can specify the ID of the bitmap it likes to use. This ID is then compared to the current one for the given target. If they match, the bitmap is re-used (should it still exist on the drive, otherwise re-created). If there is a mismatch, the old bitmap is removed and a new one is created. The return value of the QMP command includes information about what bitmap action was taken. Similar to what the query-backup QMP command returns for regular backup. It also includes the bitmap name and associated block node, so the management layer can then set up an NBD export with the bitmap. While the backup access is active, a background bitmap is also required. This is necessary to implement bitmap handling according to the original reference [0]. In particular: - in the error case, new writes since the backup access was set up are in the background bitmap. Because of failure, the previously tracked writes from the backup access bitmap are still required too. Thus, the bitmap is merged with the background bitmap to get all new writes since the last backup. - in the success case, continue tracking for the next incremental backup in the backup access bitmap. New writes since the backup access was set up are in the background bitmap. Because the backup was successfully, clear the backup access bitmap and merge back the background bitmap to get only the new writes. Since QEMU cannot know if the backup was successful or not (except if failure already happens during the setup QMP command), the management layer needs to tell it via the teardown QMP command. The bitmap action is also recorded in the device info now. [0]: https://lore.kernel.org/qemu-devel/b68833dd-8864-4d72-7c61-c134a9835...@ya.ru/ Signed-off-by: Fiona Ebner --- No changes in v3. pve-backup.c | 175 ++- pve-backup.h | 2 +- qapi/block-core.json | 22 +- system/runstate.c| 2 +- 4 files changed, 193 insertions(+), 8 deletions(-) diff --git a/pve-backup.c b/pve-backup.c index d3370d6744..5f8dd396d5 100644 --- a/pve-backup.c +++ b/pve-backup.c @@ -15,6 +15,7 @@ #include "qapi/qmp/qdict.h" #include "qapi/qmp/qerror.h" #include "qemu/cutils.h" +#include "qemu/error-report.h" #if defined(CONFIG_MALLOC_TRIM) #include @@ -41,6 +42,7 @@ */ const char *PBS_BITMAP_NAME = "pbs-incremental-dirty-bitmap"; +const char *BACKGROUND_BITMAP_NAME = "backup-access-background-bitmap"; static struct PVEBackupState { struct { @@ -72,6 +74,7 @@ static struct PVEBackupState { CoMutex backup_mutex; CoMutex dump_callback_mutex; char *target_id; +GHashTable *backup_access_bitmaps; // key=target_id, value=bitmap_name } backup_state; static void pvebackup_init(void) @@ -99,6 +102,8 @@ typedef struct PVEBackupDevInfo { char* device_name; int completed_ret; // INT_MAX if not completed BdrvDirtyBitmap *bitmap; +BdrvDirtyBitmap *background_bitmap; // used for external backup access +PBSBitmapAction bitmap_action; BlockDriverState *target; BlockJob *job; } PVEBackupDevInfo; @@ -362,6 +367,67 @@ static void coroutine_fn pvebackup_co_complete_stream(void *opaque) qemu_co_mutex_unlock(&backup_state.backup_mutex); } +/* + * New writes since the backup access was set up are in the background bitmap. Because of failure, + * the previously tracked writes in di->bitmap are still required too. Thus, merge with the + * background bitmap to get all new writes since the last backup. + */ +static void handle_backup_access_bitmaps_in_error_case(PVEBackupDevInfo *di) +{ +Error *local_err = NULL; + +if (di->bs && di->background_bitmap) { +bdrv_drained_begin(di->bs); +if (di->bitmap) { +bdrv_enable_dirty_bitmap(di->bitmap); +if (!bdrv_merge_dirty_bitmap(di->bitmap, di->background_bitmap, NULL, &local_err)) { +warn_report("backup access: %s - could not merge bitmaps in error path - %s", +di->device_name, +local_err ? error_get_pretty(local_err) : "unknown error"); +/* + * Could not merge, drop original bitmap too. + */ +bdrv_release_dirty_bitmap(di->bitmap); +} +} else { +warn_report("backup access: %s - expected bitmap not present", di->device_name); +} +bdrv_release_dirty_bitmap(di->background_bitmap); +bdrv_drained_end(di->bs); +} +} + +/* + * Continue tracking for next incremental backup in di->bitmap. New writes since the backup access + * was set up are in the background
[pve-devel] [POC storage v3 15/34] WIP Borg plugin
Archive names start with the guest type and ID and then the same timestamp format as PBS. Container archives have the following structure: guest.config firewall.config filesystem/ # containing the whole filesystem structure VM archives have the following structure guest.config firewall.config volumes/ # containing a raw file for each device A bindmount (respectively symlinks) are used to achieve this structure, because Borg doesn't seem to support renaming on-the-fly. (Prefix stripping via the "slashdot hack" would have helped slightly, but is only in Borg >= 1.4 https://github.com/borgbackup/borg/actions/runs/7967940995) NOTE: Bandwidth limit is not yet honored and the task size is not calculated yet. Discard for VM backups would also be nice to have, but it's not entirely clear how (parsing progress and discarding according to that is one idea). There is no dirty bitmap support, not sure if that is feasible to add. Signed-off-by: Fiona Ebner --- Changes in v3: * make SSH work. * adapt to API changes, i.e. config as raw data and user namespace execution context for containers. src/PVE/API2/Storage/Config.pm | 2 +- src/PVE/BackupProvider/Plugin/Borg.pm | 439 ++ src/PVE/BackupProvider/Plugin/Makefile | 2 +- src/PVE/Storage.pm | 2 + src/PVE/Storage/BorgBackupPlugin.pm| 595 + src/PVE/Storage/Makefile | 1 + 6 files changed, 1039 insertions(+), 2 deletions(-) create mode 100644 src/PVE/BackupProvider/Plugin/Borg.pm create mode 100644 src/PVE/Storage/BorgBackupPlugin.pm diff --git a/src/PVE/API2/Storage/Config.pm b/src/PVE/API2/Storage/Config.pm index e04b6ab..1cbf09d 100755 --- a/src/PVE/API2/Storage/Config.pm +++ b/src/PVE/API2/Storage/Config.pm @@ -190,7 +190,7 @@ __PACKAGE__->register_method ({ return &$api_storage_config($cfg, $param->{storage}); }}); -my $sensitive_params = [qw(password encryption-key master-pubkey keyring)]; +my $sensitive_params = [qw(password encryption-key master-pubkey keyring ssh-key)]; __PACKAGE__->register_method ({ name => 'create', diff --git a/src/PVE/BackupProvider/Plugin/Borg.pm b/src/PVE/BackupProvider/Plugin/Borg.pm new file mode 100644 index 000..7bb3ae3 --- /dev/null +++ b/src/PVE/BackupProvider/Plugin/Borg.pm @@ -0,0 +1,439 @@ +package PVE::BackupProvider::Plugin::Borg; + +use strict; +use warnings; + +use File::chdir; +use File::Basename qw(basename); +use File::Path qw(make_path remove_tree); +use Net::IP; +use POSIX qw(strftime); + +use PVE::Tools; + +# ($vmtype, $vmid, $time_string) +our $ARCHIVE_RE_3 = qr!^pve-(lxc|qemu)-([0-9]+)-([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z)$!; + +sub archive_name { +my ($vmtype, $vmid, $backup_time) = @_; + +return "pve-${vmtype}-${vmid}-" . strftime("%FT%TZ", gmtime($backup_time)); +} + +# remove_tree can be very verbose by default, do explicit error handling and limit to one message +my sub _remove_tree { +my ($path) = @_; + +remove_tree($path, { error => \my $err }); +if ($err && @$err) { # empty array if no error + for my $diag (@$err) { + my ($file, $message) = %$diag; + die "cannot remove_tree '$path': $message\n" if $file eq ''; + die "cannot remove_tree '$path': unlinking $file failed - $message\n"; + } +} +} + +my sub prepare_run_dir { +my ($archive, $operation) = @_; + +my $run_dir = "/run/pve-storage-borg-plugin/${archive}.${operation}"; +_remove_tree($run_dir); +make_path($run_dir); +die "unable to create directory $run_dir\n" if !-d $run_dir; + +return $run_dir; +} + +my sub log_info { +my ($self, $message) = @_; + +$self->{'log-function'}->('info', $message); +} + +my sub log_warning { +my ($self, $message) = @_; + +$self->{'log-function'}->('warn', $message); +} + +my sub log_error { +my ($self, $message) = @_; + +$self->{'log-function'}->('err', $message); +} + +my sub file_contents_from_archive { +my ($self, $archive, $file) = @_; + +my $run_dir = prepare_run_dir($archive, "file-contents"); + +my $raw; + +eval { + local $CWD = $run_dir; + + $self->{'storage-plugin'}->borg_cmd_extract( + $self->{scfg}, + $self->{storeid}, + $archive, + [$file], + ); + + $raw = PVE::Tools::file_get_contents("${run_dir}/${file}"); +}; +my $err = $@; +eval { _remove_tree($run_dir); }; +log_warning($self, $@) if $@; +die $err if $err; + +return $raw; +} + +# Plugin implementation + +sub new { +my ($class, $storage_plugin, $scfg, $storeid, $log_function) = @_; + +my $self = bless { + sc
[pve-devel] [RFC qemu-server v3 21/34] backup: implement backup for external providers
The state of the VM's disk images at the time the backup is started is preserved via a snapshot-access block node. Old data is moved to the fleecing image when new guest writes come in. The snapshot-access block node, as well as the associated bitmap in case of incremental backup, will be made available to the external provider. They are exported via NBD and for 'nbd' mechanism, the NBD socket path is passed to the provider, while for 'block-device' mechanism, the NBD export is made accessible as a regular block device first and the bitmap information is made available via a $next_dirty_region->() function. For 'block-device', the 'nbdinfo' binary is required. The provider can indicate that it wants to do an incremental backup by returning the bitmap ID that was used for a previous backup and it will then be told if the bitmap was newly created (either first backup or old bitmap was invalid) or if the bitmap can be reused. The provider then reads the parts of the NBD or block device it needs, either the full disk for full backup, or the dirty parts according to the bitmap for incremental backup. The bitmap has to be respected, reads to other parts of the image will return an error. After backing up each part of the disk, it should be discarded in the export to avoid unnecessary space usage in the fleecing image (requires the storage underlying the fleecing image to support discard too). Signed-off-by: Fiona Ebner --- Changes in v3: * adapt to API changes, config files are now passed as raw PVE/VZDump/QemuServer.pm | 309 ++- 1 file changed, 308 insertions(+), 1 deletion(-) diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm index b6dcd6cc..d0218c9b 100644 --- a/PVE/VZDump/QemuServer.pm +++ b/PVE/VZDump/QemuServer.pm @@ -20,7 +20,7 @@ use PVE::QMPClient; use PVE::Storage::Plugin; use PVE::Storage::PBSPlugin; use PVE::Storage; -use PVE::Tools; +use PVE::Tools qw(run_command); use PVE::VZDump; use PVE::Format qw(render_duration render_bytes); @@ -277,6 +277,8 @@ sub archive { if ($self->{vzdump}->{opts}->{pbs}) { $self->archive_pbs($task, $vmid); +} elsif ($self->{vzdump}->{'backup-provider'}) { + $self->archive_external($task, $vmid); } else { $self->archive_vma($task, $vmid, $filename, $comp); } @@ -1149,6 +1151,23 @@ sub cleanup { # If VM was started only for backup, it is already stopped now. if (PVE::QemuServer::Helpers::vm_running_locally($vmid)) { + if ($task->{cleanup}->{'nbd-stop'}) { + eval { PVE::QemuServer::QMPHelpers::nbd_stop($vmid); }; + $self->logerr($@) if $@; + } + + if (my $info = $task->{cleanup}->{'backup-access-teardown'}) { + my $params = { + 'target-id' => $info->{'target-id'}, + timeout => 60, + success => $info->{success} ? JSON::true : JSON::false, + }; + + $self->loginfo("tearing down backup-access"); + eval { mon_cmd($vmid, "backup-access-teardown", $params->%*) }; + $self->logerr($@) if $@; + } + $detach_tpmstate_drive->($task, $vmid); detach_fleecing_images($task->{disks}, $vmid) if $task->{'use-fleecing'}; } @@ -1160,4 +1179,292 @@ sub cleanup { } } +my sub block_device_backup_cleanup { +my ($self, $paths, $cpids) = @_; + +for my $path ($paths->@*) { + eval { run_command(["qemu-nbd", "-d", $path ]); }; + $self->log('warn', "unable to disconnect NBD backup source '$path' - $@") if $@; +} + +my $waited; +my $wait_limit = 5; +for ($waited = 0; $waited < $wait_limit && scalar(keys $cpids->%*); $waited++) { + while ((my $cpid = waitpid(-1, POSIX::WNOHANG)) > 0) { + delete($cpids->{$cpid}); + } + if ($waited == 0) { + kill 15, $_ for keys $cpids->%*; + } + sleep 1; +} +if ($waited == $wait_limit && scalar(keys $cpids->%*)) { + kill 9, $_ for keys $cpids->%*; + sleep 1; + while ((my $cpid = waitpid(-1, POSIX::WNOHANG)) > 0) { + delete($cpids->{$cpid}); + } + $self->log('warn', "unable to collect nbdinfo child process '$_'") for keys $cpids->%*; +} +} + +my sub block_device_backup_prepare { +my ($self, $devicename, $size, $nbd_path, $bitmap_name, $count) = @_; + +my $nbd_info_uri = "nbd+unix:///${devicename}?socket=${nbd_path}"; +my $qemu_nbd_uri = "nbd:unix:${nbd_path}:exportname=${devicename}"; + +my $cpid; +my $error_fh; +my $next_dirty_region; + +# If there is no dirty bitmap, it c
[pve-devel] [RFC container v3 29/34] external restore: don't use 'one-file-system' tar flag when restoring from a directory
This gives backup providers more freedom, e.g. mount backed-up mount point volumes individually. Suggested-by: Fabian Grünbichler Signed-off-by: Fiona Ebner --- New in v3. src/PVE/LXC/Create.pm | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/PVE/LXC/Create.pm b/src/PVE/LXC/Create.pm index 8657ac1..719f372 100644 --- a/src/PVE/LXC/Create.pm +++ b/src/PVE/LXC/Create.pm @@ -167,11 +167,15 @@ sub restore_external_archive { or die "did not get path to archive directory from backup provider\n"; die "not a directory '$directory'" if !-d $directory; + # Give backup provider more freedom, e.g. mount backed-up mount point volumes + # individually. + my @flags = grep { $_ ne '--one-file-system' } @PVE::Storage::Plugin::COMMON_TAR_FLAGS; + my $create_cmd = [ 'tar', 'cpf', '-', - @PVE::Storage::Plugin::COMMON_TAR_FLAGS, + @flags, "--directory=$directory", '.', ]; -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [RFC qemu-server v3 23/34] backup: implement restore for external providers
First, the provider is asked about what restore mechanism to use. Currently, only 'qemu-img' is possible. Then the configuration files are restored, the provider gives information about volumes contained in the backup and finally the volumes are restored via 'qemu-img convert'. The code for the restore_external_archive() function was copied and adapted from the restore_proxmox_backup_archive() function. Together with restore_vma_archive() it seems sensible to extract the common parts and use a dedicated module for restore code. The parse_restore_archive() helper was renamed, because it's not just parsing. Signed-off-by: Fiona Ebner --- Changes in v3: * use new storage_has_feature() helper PVE/API2/Qemu.pm | 30 +- PVE/QemuServer.pm | 139 ++ 2 files changed, 166 insertions(+), 3 deletions(-) diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm index 1c3cb271..319518e8 100644 --- a/PVE/API2/Qemu.pm +++ b/PVE/API2/Qemu.pm @@ -845,7 +845,7 @@ __PACKAGE__->register_method({ return $res; }}); -my $parse_restore_archive = sub { +my $classify_restore_archive = sub { my ($storecfg, $archive) = @_; my ($archive_storeid, $archive_volname) = PVE::Storage::parse_volume_id($archive, 1); @@ -859,6 +859,22 @@ my $parse_restore_archive = sub { $res->{type} = 'pbs'; return $res; } + if (PVE::Storage::storage_has_feature($storecfg, $archive_storeid, 'backup-provider')) { + my $log_function = sub { + my ($log_level, $message) = @_; + my $prefix = $log_level eq 'err' ? 'ERROR' : uc($log_level); + print "$prefix: $message\n"; + }; + my $backup_provider = PVE::Storage::new_backup_provider( + $storecfg, + $archive_storeid, + $log_function, + ); + + $res->{type} = 'external'; + $res->{'backup-provider'} = $backup_provider; + return $res; + } } my $path = PVE::Storage::abs_filesystem_path($storecfg, $archive); $res->{type} = 'file'; @@ -1011,7 +1027,7 @@ __PACKAGE__->register_method({ 'backup', ); - $archive = $parse_restore_archive->($storecfg, $archive); + $archive = $classify_restore_archive->($storecfg, $archive); } } @@ -1069,7 +1085,15 @@ __PACKAGE__->register_method({ PVE::QemuServer::check_restore_permissions($rpcenv, $authuser, $merged); } } - if ($archive->{type} eq 'file' || $archive->{type} eq 'pipe') { + if (my $backup_provider = $archive->{'backup-provider'}) { + PVE::QemuServer::restore_external_archive( + $backup_provider, + $archive->{volid}, + $vmid, + $authuser, + $restore_options, + ); + } elsif ($archive->{type} eq 'file' || $archive->{type} eq 'pipe') { die "live-restore is only compatible with backup images from a Proxmox Backup Server\n" if $live_restore; PVE::QemuServer::restore_file_archive($archive->{path} // '-', $vmid, $authuser, $restore_options); diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index 30e51a8c..f484d048 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -7303,6 +7303,145 @@ sub restore_proxmox_backup_archive { } } +sub restore_external_archive { +my ($backup_provider, $archive, $vmid, $user, $options) = @_; + +die "live restore from backup provider is not implemented\n" if $options->{live}; + +my $storecfg = PVE::Storage::config(); + +my ($storeid, $volname) = PVE::Storage::parse_volume_id($archive); +my $scfg = PVE::Storage::storage_config($storecfg, $storeid); + +my $tmpdir = "/var/tmp/vzdumptmp$$"; +rmtree $tmpdir; +mkpath $tmpdir; + +my $conffile = PVE::QemuConfig->config_file($vmid); +# disable interrupts (always do cleanups) +local $SIG{INT} = + local $SIG{TERM} = + local $SIG{QUIT} = + local $SIG{HUP} = sub { print STDERR "got interrupt - ignored\n"; }; + +# Note: $oldconf is undef if VM does not exists +my $cfs_path = PVE::QemuConfig->cfs_config_path($vmid); +my $oldconf = PVE::Cluster::cfs_read_file($cfs_path); +my $new_conf_raw = ''; + +my $rpcenv = PVE::RPCEnvironment::get(); +my $devinfo = {}; # info about drives included in backup +my $virtdev_hash = {}; # info about allocated drives + +ev
[pve-devel] [RFC qemu-server v3 24/34] backup restore: external: hardening check for untrusted source image
Suggested-by: Fabian Grünbichler Signed-off-by: Fiona Ebner --- New in v3. Actual checking being done depends on Fabian's hardening patches: https://lore.proxmox.com/pve-devel/20241104104221.228730-1-f.gruenbich...@proxmox.com/ PVE/QemuServer.pm | 6 ++ 1 file changed, 6 insertions(+) diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index f484d048..c2e7b4a5 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -7385,6 +7385,12 @@ sub restore_external_archive { $backup_provider->restore_vm_volume_init($volname, $storeid, $d->{devname}, {}); my $source_path = $info->{'qemu-img-path'} or die "did not get source image path from backup provider\n"; + + print "importing drive '$d->{devname}' from '$source_path'\n"; + + # safety check for untrusted source image + PVE::Storage::file_size_info($source_path, undef, 1); + eval { qemu_img_convert( $source_path, $d->{volid}, $d->{size}, undef, 0, $options->{bwlimit}); -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH container v3 25/34] create: add missing include of PVE::Storage::Plugin
used for the shared 'COMMON_TAR_FLAGS' variable. Signed-off-by: Fiona Ebner --- New in v3. src/PVE/LXC/Create.pm | 1 + 1 file changed, 1 insertion(+) diff --git a/src/PVE/LXC/Create.pm b/src/PVE/LXC/Create.pm index 117103c..7c5bf0a 100644 --- a/src/PVE/LXC/Create.pm +++ b/src/PVE/LXC/Create.pm @@ -8,6 +8,7 @@ use Fcntl; use PVE::RPCEnvironment; use PVE::Storage::PBSPlugin; +use PVE::Storage::Plugin; use PVE::Storage; use PVE::DataCenterConfig; use PVE::LXC; -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH qemu-server v3 17/34] backup: move cleanup of fleecing images to cleanup method
TPM drives are already detached there and it's better to group these things together. Signed-off-by: Fiona Ebner --- No changes in v3. PVE/VZDump/QemuServer.pm | 25 + 1 file changed, 9 insertions(+), 16 deletions(-) diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm index 012c9210..b2ced154 100644 --- a/PVE/VZDump/QemuServer.pm +++ b/PVE/VZDump/QemuServer.pm @@ -690,7 +690,6 @@ sub archive_pbs { # get list early so we die on unkown drive types before doing anything my $devlist = _get_task_devlist($task); -my $use_fleecing; $self->enforce_vm_running_for_backup($vmid); $self->{qmeventd_fh} = PVE::QemuServer::register_qmeventd_handle($vmid); @@ -721,7 +720,7 @@ sub archive_pbs { my $is_template = PVE::QemuConfig->is_template($self->{vmlist}->{$vmid}); - $use_fleecing = check_and_prepare_fleecing( + $task->{'use-fleecing'} = check_and_prepare_fleecing( $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support); my $fs_frozen = $self->qga_fs_freeze($task, $vmid); @@ -735,7 +734,7 @@ sub archive_pbs { devlist => $devlist, 'config-file' => $conffile, }; - $params->{fleecing} = JSON::true if $use_fleecing; + $params->{fleecing} = JSON::true if $task->{'use-fleecing'}; if (defined(my $ns = $scfg->{namespace})) { $params->{'backup-ns'} = $ns; @@ -784,11 +783,6 @@ sub archive_pbs { } $self->restore_vm_power_state($vmid); -if ($use_fleecing) { - detach_fleecing_images($task->{disks}, $vmid); - cleanup_fleecing_images($self, $task->{disks}); -} - die $err if $err; } @@ -891,7 +885,6 @@ sub archive_vma { } my $devlist = _get_task_devlist($task); -my $use_fleecing; $self->enforce_vm_running_for_backup($vmid); $self->{qmeventd_fh} = PVE::QemuServer::register_qmeventd_handle($vmid); @@ -911,7 +904,7 @@ sub archive_vma { $attach_tpmstate_drive->($self, $task, $vmid); - $use_fleecing = check_and_prepare_fleecing( + $task->{'use-fleecing'} = check_and_prepare_fleecing( $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support); my $outfh; @@ -942,7 +935,7 @@ sub archive_vma { devlist => $devlist }; $params->{'firewall-file'} = $firewall if -e $firewall; - $params->{fleecing} = JSON::true if $use_fleecing; + $params->{fleecing} = JSON::true if $task->{'use-fleecing'}; add_backup_performance_options($params, $opts->{performance}, $qemu_support); $qmpclient->queue_cmd($vmid, $backup_cb, 'backup', %$params); @@ -984,11 +977,6 @@ sub archive_vma { $self->restore_vm_power_state($vmid); -if ($use_fleecing) { - detach_fleecing_images($task->{disks}, $vmid); - cleanup_fleecing_images($self, $task->{disks}); -} - if ($err) { if ($cpid) { kill(9, $cpid); @@ -1132,6 +1120,11 @@ sub cleanup { $detach_tpmstate_drive->($task, $vmid); +if ($task->{'use-fleecing'}) { + detach_fleecing_images($task->{disks}, $vmid); + cleanup_fleecing_images($self, $task->{disks}); +} + if ($self->{qmeventd_fh}) { close($self->{qmeventd_fh}); } -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH qemu-server v3 16/34] move nbd_stop helper to QMPHelpers module
Like this nbd_stop() can be called from a module that cannot include QemuServer.pm. Signed-off-by: Fiona Ebner --- No changes in v3. PVE/API2/Qemu.pm | 3 ++- PVE/CLI/qm.pm| 3 ++- PVE/QemuServer.pm| 6 -- PVE/QemuServer/QMPHelpers.pm | 6 ++ 4 files changed, 10 insertions(+), 8 deletions(-) diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm index 848001b6..1c3cb271 100644 --- a/PVE/API2/Qemu.pm +++ b/PVE/API2/Qemu.pm @@ -35,6 +35,7 @@ use PVE::QemuServer::Monitor qw(mon_cmd); use PVE::QemuServer::Machine; use PVE::QemuServer::Memory qw(get_current_memory); use PVE::QemuServer::PCI; +use PVE::QemuServer::QMPHelpers; use PVE::QemuServer::USB; use PVE::QemuMigrate; use PVE::RPCEnvironment; @@ -5910,7 +5911,7 @@ __PACKAGE__->register_method({ return; }, 'nbdstop' => sub { - PVE::QemuServer::nbd_stop($state->{vmid}); + PVE::QemuServer::QMPHelpers::nbd_stop($state->{vmid}); return; }, 'resume' => sub { diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm index 8d8ce10a..47b87782 100755 --- a/PVE/CLI/qm.pm +++ b/PVE/CLI/qm.pm @@ -35,6 +35,7 @@ use PVE::QemuServer::Agent qw(agent_available); use PVE::QemuServer::ImportDisk; use PVE::QemuServer::Monitor qw(mon_cmd); use PVE::QemuServer::OVF; +use PVE::QemuServer::QMPHelpers; use PVE::QemuServer; use PVE::CLIHandler; @@ -385,7 +386,7 @@ __PACKAGE__->register_method ({ my $vmid = $param->{vmid}; - eval { PVE::QemuServer::nbd_stop($vmid) }; + eval { PVE::QemuServer::QMPHelpers::nbd_stop($vmid) }; warn $@ if $@; return; diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index 0df3bda0..49b6ca17 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -8606,12 +8606,6 @@ sub generate_smbios1_uuid { return "uuid=".generate_uuid(); } -sub nbd_stop { -my ($vmid) = @_; - -mon_cmd($vmid, 'nbd-server-stop', timeout => 25); -} - sub create_reboot_request { my ($vmid) = @_; open(my $fh, '>', "/run/qemu-server/$vmid.reboot") diff --git a/PVE/QemuServer/QMPHelpers.pm b/PVE/QemuServer/QMPHelpers.pm index 0269ea46..826938de 100644 --- a/PVE/QemuServer/QMPHelpers.pm +++ b/PVE/QemuServer/QMPHelpers.pm @@ -15,6 +15,12 @@ qemu_objectadd qemu_objectdel ); +sub nbd_stop { +my ($vmid) = @_; + +mon_cmd($vmid, 'nbd-server-stop', timeout => 25); +} + sub qemu_deviceadd { my ($vmid, $devicefull) = @_; -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [RFC common v3 10/34] env: add module with helpers to run a Perl subroutine in a user namespace
The first use case is running the container backup subroutine for external providers inside a user namespace. That allows them to see the filesystem to back-up from the containers perspective and also improves security because of isolation. Copied and adapted the relevant parts from the pve-buildpkg repository. Originally-by: Wolfgang Bumiller [FE: add $idmap parameter, drop $aux_groups parameter] Signed-off-by: Fiona Ebner --- New in v3. src/Makefile | 1 + src/PVE/Env.pm | 136 + 2 files changed, 137 insertions(+) create mode 100644 src/PVE/Env.pm diff --git a/src/Makefile b/src/Makefile index 2d8bdc4..dba26e3 100644 --- a/src/Makefile +++ b/src/Makefile @@ -15,6 +15,7 @@ LIB_SOURCES = \ Certificate.pm \ CpuSet.pm \ Daemon.pm \ + Env.pm \ Exception.pm \ Format.pm \ INotify.pm \ diff --git a/src/PVE/Env.pm b/src/PVE/Env.pm new file mode 100644 index 000..e11bec0 --- /dev/null +++ b/src/PVE/Env.pm @@ -0,0 +1,136 @@ +package PVE::Env; + +use strict; +use warnings; + +use Fcntl qw(O_WRONLY); +use POSIX qw(EINTR); +use Socket; + +require qw(syscall.ph); + +use constant {CLONE_NEWNS => 0x0002, + CLONE_NEWUSER => 0x1000}; + +sub unshare($) { +my ($flags) = @_; +return 0 == syscall(272, $flags); +} + +sub __set_id_map($$$) { +my ($pid, $what, $value) = @_; +sysopen(my $fd, "/proc/$pid/${what}_map", O_WRONLY) + or die "failed to open child process' ${what}_map\n"; +my $rc = syswrite($fd, $value); +if (!$rc || $rc != length($value)) { + die "failed to set sub$what: $!\n"; +} +close($fd); +} + +sub set_id_map($$) { +my ($pid, $id_map) = @_; + +my $gid_map = ''; +my $uid_map = ''; + +for my $map ($id_map->@*) { + my ($type, $ct, $host, $length) = $map->@*; + + $gid_map .= "$ct $host $length\n" if $type eq 'g'; + $uid_map .= "$ct $host $length\n" if $type eq 'u'; +} + +__set_id_map($pid, 'gid', $gid_map) if $gid_map; +__set_id_map($pid, 'uid', $uid_map) if $uid_map; +} + +sub wait_for_child($;$) { +my ($pid, $noerr) = @_; +my $interrupts = 0; +while (waitpid($pid, 0) != $pid) { + if ($! == EINTR) { + warn "interrupted...\n"; + kill(($interrupts > 3 ? 9 : 15), $pid); + $interrupts++; + } +} +my $status = POSIX::WEXITSTATUS($?); +return $status if $noerr; + +if ($? == -1) { + die "failed to execute\n"; +} elsif (POSIX::WIFSIGNALED($?)) { + my $sig = POSIX::WTERMSIG($?); + die "got signal $sig\n"; +} elsif ($status != 0) { + warn "exit code $status\n"; +} +return $status; +} + +sub forked(&%) { +my ($code, %opts) = @_; + +pipe(my $except_r, my $except_w) or die "pipe: $!\n"; + +my $pid = fork(); +die "fork failed: $!\n" if !defined($pid); + +if ($pid == 0) { + close($except_r); + eval { $code->() }; + if ($@) { + print {$except_w} $@; + $except_w->flush(); + POSIX::_exit(1); + } + POSIX::_exit(0); +} +close($except_w); + +my $err; +if (my $afterfork = $opts{afterfork}) { + eval { $afterfork->($pid); }; + if ($err = $@) { + kill(15, $pid); + $opts{noerr} = 1; + } +} +if (!$err) { + $err = do { local $/ = undef; <$except_r> }; +} +my $rv = wait_for_child($pid, $opts{noerr}); +die $err if $err; +die "an unknown error occurred\n" if $rv != 0; +return $rv; +} + +sub run_in_userns(&;$) { +my ($code, $id_map) = @_; +socketpair(my $sp, my $sc, AF_UNIX, SOCK_STREAM, PF_UNSPEC) + or die "socketpair: $!\n"; +forked(sub { + close($sp); + unshare(CLONE_NEWUSER|CLONE_NEWNS) or die "unshare(NEWUSER|NEWNS): $!\n"; + syswrite($sc, "1\n") == 2 or die "write: $!\n"; + shutdown($sc, 1); + my $two = <$sc>; + die "failed to sync with parent process\n" if $two ne "2\n"; + close($sc); + $! = undef; + ($(, $)) = (0, 0); die "$!\n" if $!; + ($<, $>) = (0, 0); die "$!\n" if $!; + $code->(); +}, afterfork => sub { + my ($pid) = @_; + close($sc); + my $one = <$sp>; + die "failed to sync with userprocess\n" if $one ne "1\n"; + set_id_map($pid, $id_map); + syswrite($sp, "2\n") == 2 or die "write: $!\n"; + close($sp); +}); +} + +1; -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [RFC storage v3 13/34] extract backup config: delegate to backup provider for storages that support it
Signed-off-by: Fiona Ebner --- Changes in v3: * use new storage_has_feature() helper src/PVE/Storage.pm | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm index 69500bf..9f9a86b 100755 --- a/src/PVE/Storage.pm +++ b/src/PVE/Storage.pm @@ -1734,6 +1734,17 @@ sub extract_vzdump_config { storage_check_enabled($cfg, $storeid); return PVE::Storage::PBSPlugin->extract_vzdump_config($scfg, $volname, $storeid); } + + if (storage_has_feature($cfg, $storeid, 'backup-provider')) { + my $plugin = PVE::Storage::Plugin->lookup($scfg->{type}); + my $log_function = sub { + my ($log_level, $message) = @_; + my $prefix = $log_level eq 'err' ? 'ERROR' : uc($log_level); + print "$prefix: $message\n"; + }; + my $backup_provider = $plugin->new_backup_provider($scfg, $storeid, $log_function); + return $backup_provider->restore_get_guest_config($volname, $storeid); + } } my $archive = abs_filesystem_path($cfg, $volid); -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [RFC storage v3 12/34] plugin: introduce new_backup_provider() method
The new_backup_provider() method can be used by storage plugins for external backup providers. If the method returns a provider, Proxmox VE will use callbacks to that provider for backups and restore instead of using its usual backup/restore mechanisms. API age and version are both bumped. The backup provider API is split into two parts, both of which again need different implementations for VM and LXC guests: 1. Backup API There are two hook callback functions, namely: 1. job_hook() is called during the start/end/abort phases of the whole backup job. 2. backup_hook() is called during the start/end/abort phases of the backup of an individual guest. There also is a 'prepare' phase useful for container backups, because the backup method for containers itself is executed in the user namespace context associated to the container. The backup_get_mechanism() method is used to decide on the backup mechanism. Currently, 'block-device' or 'nbd' for VMs, and 'directory' for containers is possible. The method also let's the plugin indicate whether to use a bitmap for incremental VM backup or not. It is enough to implement one mechanism for VMs and one mechanism for containers. Next, there are methods for backing up the guest's configuration and data, backup_vm() for VM backup and backup_container() for container backup, with the latter running Finally, some helpers like getting the provider name or volume ID for the backup target, as well as for handling the backup log. 1.1 Backup Mechanisms VM: Access to the data on the VM's disk from the time the backup started is made available via a so-called "snapshot access". This is either the full image, or in case a bitmap is used, the dirty parts of the image since the last time the bitmap was used for a successful backup. Reading outside of the dirty parts will result in an error. After backing up each part of the disk, it should be discarded in the export to avoid unnecessary space usage on the Proxmox VE side (there is an associated fleecing image). VM mechanism 'block-device': The snapshot access is exposed as a block device. If used, a bitmap is passed along. VM mechanism 'nbd': The snapshot access and, if used, bitmap are exported via NBD. Container mechanism 'directory': A copy or snapshot of the container's filesystem state is made available as a directory. The method is executed inside the user namespace associated to the container. 2. Restore API The restore_get_mechanism() method is used to decide on the restore mechanism. Currently, 'qemu-img' for VMs, and 'directory' or 'tar' for containers are possible. It is enough to implement one mechanism for VMs and one mechanism for containers. Next, methods for extracting the guest and firewall configuration and the implementations of the restore mechanism via a pair of methods: an init method, for making the data available to Proxmox VE and a cleanup method that is called after restore. For VMs, there also is a restore_vm_get_device_info() helper required, to get the disks included in the backup and their sizes. 2.1. Restore Mechanisms VM mechanism 'qemu-img': The backup provider gives a path to the disk image that will be restored. The path needs to be something 'qemu-img' can deal with, e.g. can also be an NBD URI or similar. Container mechanism 'directory': The backup provider gives the path to a directory with the full filesystem structure of the container. Container mechanism 'tar': The backup provider gives the path to a (potentially compressed) tar archive with the full filesystem structure of the container. See the PVE::BackupProvider::Plugin module for the full API documentation. Signed-off-by: Fiona Ebner --- Changes in v3: * update docs regarding API changes: - prepare phase for backup hook - pass in configs as data instead of filenames src/PVE/BackupProvider/Makefile|3 + src/PVE/BackupProvider/Plugin/Base.pm | 1158 src/PVE/BackupProvider/Plugin/Makefile |5 + src/PVE/Makefile |1 + src/PVE/Storage.pm | 12 +- src/PVE/Storage/Plugin.pm | 15 + 6 files changed, 1192 insertions(+), 2 deletions(-) create mode 100644 src/PVE/BackupProvider/Makefile create mode 100644 src/PVE/BackupProvider/Plugin/Base.pm create mode 100644 src/PVE/BackupProvider/Plugin/Makefile diff --git a/src/PVE/BackupProvider/Makefile b/src/PVE/BackupProvider/Makefile new file mode 100644 index 000..f018cef --- /dev/null +++ b/src/PVE/BackupProvider/Makefile @@ -0,0 +1,3 @@ +.PHONY: install +install: + make -C Plugin install diff --git a/src/PVE/BackupProvider/Plugin/Base.pm b/src/PVE/BackupProvider/Plugin/Base.pm new file mode 100644 index 000..a8d0a88 --- /dev/null +++ b/src/PVE/BackupProvid
[pve-devel] [RFC manager v3 34/34] backup: implement backup for external providers
Hooks from the backup provider are called during start/end/abort for both job and backup. And it is necessary to adapt some log messages and special case some things like is already done for PBS, e.g. log file handling. Signed-off-by: Fiona Ebner --- Changes in v3: * use new storage_has_feature() helper PVE/VZDump.pm | 57 - test/vzdump_new_test.pl | 3 +++ 2 files changed, 53 insertions(+), 7 deletions(-) diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm index fd89945e..0b1b5dc1 100644 --- a/PVE/VZDump.pm +++ b/PVE/VZDump.pm @@ -217,7 +217,10 @@ sub storage_info { $info->{'prune-backups'} = PVE::JSONSchema::parse_property_string('prune-backups', $scfg->{'prune-backups'}) if defined($scfg->{'prune-backups'}); -if ($type eq 'pbs') { +if (PVE::Storage::storage_has_feature($cfg, $storage, 'backup-provider')) { + $info->{'backup-provider'} = + PVE::Storage::new_backup_provider($cfg, $storage, sub { debugmsg($_[0], $_[1]); }); +} elsif ($type eq 'pbs') { $info->{pbs} = 1; } else { $info->{dumpdir} = PVE::Storage::get_backup_dir($cfg, $storage); @@ -717,6 +720,7 @@ sub new { $opts->{scfg} = $info->{scfg}; $opts->{pbs} = $info->{pbs}; $opts->{'prune-backups'} //= $info->{'prune-backups'}; + $self->{'backup-provider'} = $info->{'backup-provider'} if $info->{'backup-provider'}; } } elsif ($opts->{dumpdir}) { $add_error->("dumpdir '$opts->{dumpdir}' does not exist") @@ -1001,7 +1005,7 @@ sub exec_backup_task { } } - if (!$self->{opts}->{pbs}) { + if (!$self->{opts}->{pbs} && !$self->{'backup-provider'}) { $task->{logfile} = "$opts->{dumpdir}/$basename.log"; } @@ -1011,7 +1015,11 @@ sub exec_backup_task { $ext .= ".${comp_ext}"; } - if ($self->{opts}->{pbs}) { + if ($self->{'backup-provider'}) { + die "unable to pipe backup to stdout\n" if $opts->{stdout}; + $task->{target} = $self->{'backup-provider'}->backup_get_archive_name( + $vmid, $vmtype, $task->{backup_time}); + } elsif ($self->{opts}->{pbs}) { die "unable to pipe backup to stdout\n" if $opts->{stdout}; $task->{target} = $pbs_snapshot_name; } else { @@ -1029,7 +1037,7 @@ sub exec_backup_task { my $pid = $$; if ($opts->{tmpdir}) { $task->{tmpdir} = "$opts->{tmpdir}/vzdumptmp${pid}_$vmid/"; - } elsif ($self->{opts}->{pbs}) { + } elsif ($self->{opts}->{pbs} || $self->{'backup-provider'}) { $task->{tmpdir} = "/var/tmp/vzdumptmp${pid}_$vmid"; } else { # dumpdir is posix? then use it as temporary dir @@ -1101,6 +1109,10 @@ sub exec_backup_task { if ($mode eq 'stop') { $plugin->prepare ($task, $vmid, $mode); + if ($self->{'backup-provider'}) { + $self->{'backup-provider'}->backup_hook( + 'start', $vmid, $vmtype, { 'start-time' => $task->{backup_time} }); + } $self->run_hook_script ('backup-start', $task, $logfd); if ($running) { @@ -1115,6 +1127,10 @@ sub exec_backup_task { } elsif ($mode eq 'suspend') { $plugin->prepare ($task, $vmid, $mode); + if ($self->{'backup-provider'}) { + $self->{'backup-provider'}->backup_hook( + 'start', $vmid, $vmtype, { 'start-time' => $task->{backup_time} }); + } $self->run_hook_script ('backup-start', $task, $logfd); if ($vmtype eq 'lxc') { @@ -1141,6 +1157,10 @@ sub exec_backup_task { } } elsif ($mode eq 'snapshot') { + if ($self->{'backup-provider'}) { + $self->{'backup-provider'}->backup_hook( + 'start', $vmid, $vmtype, { 'start-time' => $task->{backup_time} }); + } $self->run_hook_script ('backup-start', $task, $logfd); my $snapshot_count = $task->{snapshot_count} || 0; @@ -1183,11 +1203,13 @@ sub exec_backup_task { return; } - my $archive_txt = $self->{opts}->{pbs} ? 'Proxmox Backup Server' : 'vzdump'; + my $arch
[pve-devel] [PATCH qemu v3 05/34] PVE backup: include device name in error when setting up snapshot access fails
Signed-off-by: Fiona Ebner --- No changes in v3. pve-backup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/pve-backup.c b/pve-backup.c index 33c23e53c2..d931746453 100644 --- a/pve-backup.c +++ b/pve-backup.c @@ -626,7 +626,8 @@ static void create_backup_jobs_bh(void *opaque) { bool discard_source = false; if (di->fleecing.bs) { if (setup_snapshot_access(di, &local_err) < 0) { -error_setg(errp, "setting up snapshot access for fleecing failed: %s", +error_setg(errp, "%s - setting up snapshot access for fleecing failed: %s", + di->device_name, local_err ? error_get_pretty(local_err) : "unknown error"); cleanup_snapshot_access(di); bdrv_drained_end(di->bs); -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH qemu-server v3 19/34] backup: keep track of block-node size for fleecing
For fleecing, the size needs to match exactly what QEMU sees. In particular, EFI disks might be attached with a 'size=' option, meaning that size can be different from the volume's size. Commit 36377acf ("backup: disk info: also keep track of size") introduced size tracking and it was used for fleecing since then, but the accurate size information needs to be queried via QMP. Should also help with the following issue reported in the community forum: https://forum.proxmox.com/threads/152202 Signed-off-by: Fiona Ebner --- Changes in v3: * only use query-block QMP command after the VM is enforced running PVE/VZDump/QemuServer.pm | 37 - 1 file changed, 32 insertions(+), 5 deletions(-) diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm index c46e607c..1ebafe6d 100644 --- a/PVE/VZDump/QemuServer.pm +++ b/PVE/VZDump/QemuServer.pm @@ -551,7 +551,7 @@ my sub allocate_fleecing_images { my $name = "vm-$vmid-fleece-$n"; $name .= ".$format" if $scfg->{path}; - my $size = PVE::Tools::convert_size($di->{size}, 'b' => 'kb'); + my $size = PVE::Tools::convert_size($di->{'block-node-size'}, 'b' => 'kb'); $di->{'fleece-volid'} = PVE::Storage::vdisk_alloc( $self->{storecfg}, $fleecing_storeid, $vmid, $format, $name, $size); @@ -600,7 +600,7 @@ my sub attach_fleecing_images { my $drive = "file=$path,if=none,id=$devid,format=$format,discard=unmap"; # Specify size explicitly, to make it work if storage backend rounded up size for # fleecing image when allocating. - $drive .= ",size=$di->{size}" if $format eq 'raw'; + $drive .= ",size=$di->{'block-node-size'}" if $format eq 'raw'; $drive =~ s/\\//g; my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_add auto \"$drive\"", 60); die "attaching fleecing image $volid failed - $ret\n" if $ret !~ m/OK/s; @@ -609,7 +609,7 @@ my sub attach_fleecing_images { } my sub check_and_prepare_fleecing { -my ($self, $vmid, $fleecing_opts, $disks, $is_template, $qemu_support) = @_; +my ($self, $task, $vmid, $fleecing_opts, $disks, $is_template, $qemu_support) = @_; # Even if the VM was started specifically for fleecing, it's possible that the VM is resumed and # then starts doing IO. For VMs that are not resumed the fleecing images will just stay empty, @@ -626,6 +626,8 @@ my sub check_and_prepare_fleecing { } if ($use_fleecing) { + $self->query_block_node_sizes($vmid, $task); + my ($default_format, $valid_formats) = PVE::Storage::storage_default_format( $self->{storecfg}, $fleecing_opts->{storage}); my $format = scalar(grep { $_ eq 'qcow2' } $valid_formats->@*) ? 'qcow2' : 'raw'; @@ -721,7 +723,7 @@ sub archive_pbs { my $is_template = PVE::QemuConfig->is_template($self->{vmlist}->{$vmid}); $task->{'use-fleecing'} = check_and_prepare_fleecing( - $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support); + $self, $task, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support); my $fs_frozen = $self->qga_fs_freeze($task, $vmid); @@ -905,7 +907,7 @@ sub archive_vma { $attach_tpmstate_drive->($self, $task, $vmid); $task->{'use-fleecing'} = check_and_prepare_fleecing( - $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support); + $self, $task, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support); my $outfh; if ($opts->{stdout}) { @@ -1042,6 +1044,31 @@ sub qga_fs_thaw { $self->logerr($@) if $@; } +# The size for fleecing images needs to be exactly the same size as QEMU sees. E.g. EFI disk can bex +# attached with a smaller size then the underyling image on the storage. +sub query_block_node_sizes { +my ($self, $vmid, $task) = @_; + +my $block_info = mon_cmd($vmid, "query-block"); +$block_info = { map { $_->{device} => $_ } $block_info->@* }; + +for my $diskinfo ($task->{disks}->@*) { + my $drive_key = $diskinfo->{virtdev}; + $drive_key .= "-backup" if $drive_key eq 'tpmstate0'; + my $block_node_size = + eval { $block_info->{"drive-$drive_key"}->{inserted}->{image}->{'virtual-size'}; }; + if (!$block_node_size) { + $self->loginfo( + "could not determine block node
[pve-devel] [RFC qemu-server v3 20/34] backup: allow adding fleecing images also for EFI and TPM
For the external backup API, it will be necessary to add a fleecing image even for small disks like EFI and TPM, because there is no other place the old data could be copied to when a new guest write comes in. Signed-off-by: Fiona Ebner --- Changes in v3: * adapt to context changes from previous patch PVE/VZDump/QemuServer.pm | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm index 1ebafe6d..b6dcd6cc 100644 --- a/PVE/VZDump/QemuServer.pm +++ b/PVE/VZDump/QemuServer.pm @@ -534,7 +534,7 @@ my sub cleanup_fleecing_images { } my sub allocate_fleecing_images { -my ($self, $disks, $vmid, $fleecing_storeid, $format) = @_; +my ($self, $disks, $vmid, $fleecing_storeid, $format, $all_images) = @_; die "internal error - no fleecing storage specified\n" if !$fleecing_storeid; @@ -545,7 +545,8 @@ my sub allocate_fleecing_images { my $n = 0; # counter for fleecing image names for my $di ($disks->@*) { - next if $di->{virtdev} =~ m/^(?:tpmstate|efidisk)\d$/; # too small to be worth it + # EFI/TPM are usually too small to be worth it, but it's required for external providers + next if !$all_images && $di->{virtdev} =~ m/^(?:tpmstate|efidisk)\d$/; if ($di->{type} eq 'block' || $di->{type} eq 'file') { my $scfg = PVE::Storage::storage_config($self->{storecfg}, $fleecing_storeid); my $name = "vm-$vmid-fleece-$n"; @@ -609,7 +610,7 @@ my sub attach_fleecing_images { } my sub check_and_prepare_fleecing { -my ($self, $task, $vmid, $fleecing_opts, $disks, $is_template, $qemu_support) = @_; +my ($self, $task, $vmid, $fleecing_opts, $disks, $is_template, $qemu_support, $all_images) = @_; # Even if the VM was started specifically for fleecing, it's possible that the VM is resumed and # then starts doing IO. For VMs that are not resumed the fleecing images will just stay empty, @@ -632,7 +633,8 @@ my sub check_and_prepare_fleecing { $self->{storecfg}, $fleecing_opts->{storage}); my $format = scalar(grep { $_ eq 'qcow2' } $valid_formats->@*) ? 'qcow2' : 'raw'; - allocate_fleecing_images($self, $disks, $vmid, $fleecing_opts->{storage}, $format); + allocate_fleecing_images( + $self, $disks, $vmid, $fleecing_opts->{storage}, $format, $all_images); attach_fleecing_images($self, $disks, $vmid, $format); } @@ -723,7 +725,7 @@ sub archive_pbs { my $is_template = PVE::QemuConfig->is_template($self->{vmlist}->{$vmid}); $task->{'use-fleecing'} = check_and_prepare_fleecing( - $self, $task, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support); + $self, $task, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support, 0); my $fs_frozen = $self->qga_fs_freeze($task, $vmid); @@ -907,7 +909,7 @@ sub archive_vma { $attach_tpmstate_drive->($self, $task, $vmid); $task->{'use-fleecing'} = check_and_prepare_fleecing( - $self, $task, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support); + $self, $task, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support, 0); my $outfh; if ($opts->{stdout}) { -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH qemu v3 01/34] block/reqlist: allow adding overlapping requests
Allow overlapping request by removing the assert that made it impossible. There are only two callers: 1. block_copy_task_create() It already asserts the very same condition before calling reqlist_init_req(). 2. cbw_snapshot_read_lock() There is no need to have read requests be non-overlapping in copy-before-write when used for snapshot-access. In fact, there was no protection against two callers of cbw_snapshot_read_lock() calling reqlist_init_req() with overlapping ranges and this could lead to an assertion failure [1]. In particular, with the reproducer script below [0], two cbw_co_snapshot_block_status() callers could race, with the second calling reqlist_init_req() before the first one finishes and removes its conflicting request. [0]: > #!/bin/bash -e > dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024 > ./qemu-img create /tmp/fleecing.raw -f raw 1G > ( > ./qemu-system-x86_64 --qmp stdio \ > --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \ > --blockdev > raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \ > < {"execute": "qmp_capabilities"} > {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", > "file": "node0", "target": "node1", "node-name": "node3" } } > {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", > "file": "node3", "node-name": "snap0" } } > {"execute": "nbd-server-start", "arguments": {"addr": { "type": "unix", > "data": { "path": "/tmp/nbd.socket" } } } } > {"execute": "block-export-add", "arguments": {"id": "exp0", "node-name": > "snap0", "type": "nbd", "name": "exp0"}} > EOF > ) & > sleep 5 > while true; do > ./qemu-nbd -d /dev/nbd0 > ./qemu-nbd -c /dev/nbd0 nbd:unix:/tmp/nbd.socket:exportname=exp0 -f raw -r > nbdinfo --map 'nbd+unix:///exp0?socket=/tmp/nbd.socket' > done [1]: > #5 0x71e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101 > #6 0x615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23 > #7 0x6152853e2d98 in cbw_snapshot_read_lock (...) at > ../block/copy-before-write.c:237 > #8 0x6152853e3068 in cbw_co_snapshot_block_status (...) at > ../block/copy-before-write.c:304 > #9 0x6152853f4d22 in bdrv_co_snapshot_block_status (...) at > ../block/io.c:3726 > #10 0x61528543a63e in snapshot_access_co_block_status (...) at > ../block/snapshot-access.c:48 > #11 0x6152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474 > #12 0x6152853f2016 in bdrv_co_common_block_status_above (...) at > ../block/io.c:2652 > #13 0x6152853f22cf in bdrv_co_block_status_above (...) at > ../block/io.c:2732 > #14 0x6152853d9a86 in blk_co_block_status_above (...) at > ../block/block-backend.c:1473 > #15 0x61528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374 > #16 0x61528538deb1 in nbd_co_send_block_status (...) at > ../nbd/server.c:2481 > #17 0x61528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978 > #18 0x61528538f906 in nbd_trip (...) at ../nbd/server.c:3121 > #19 0x6152855a7caf in coroutine_trampoline (...) at > ../util/coroutine-ucontext.c:175 Cc: qemu-sta...@nongnu.org Suggested-by: Vladimir Sementsov-Ogievskiy Signed-off-by: Fiona Ebner Reviewed-by: Vladimir Sementsov-Ogievskiy --- No changes in v3. block/copy-before-write.c | 3 ++- block/reqlist.c | 2 -- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/block/copy-before-write.c b/block/copy-before-write.c index 50cc4c7aae..a5bb4d14f6 100644 --- a/block/copy-before-write.c +++ b/block/copy-before-write.c @@ -67,7 +67,8 @@ typedef struct BDRVCopyBeforeWriteState { /* * @frozen_read_reqs: current read requests for fleecing user in bs->file - * node. These areas must not be rewritten by guest. + * node. These areas must not be rewritten by guest. There can be multiple + * overlapping read requests. */ BlockReqList frozen_read_reqs; diff --git a/block/reqlist.c b/block/reqlist.c index 08cb57cfa4..098e807378 100644 --- a/block/reqlist.c +++ b/block/reqlist.c @@ -20,8 +20,6 @@ void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset, int64_t bytes) { -assert(!reqlist_find_conflict(reqs, offset, bytes)); - *req = (BlockReq) { .offset = offset, .bytes = bytes, -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH qemu-server v3 22/34] restore: die early when there is no size for a device
Makes it a clean error for buggy (external) backup providers where the size might not be set at all. Signed-off-by: Fiona Ebner --- No changes in v3. PVE/QemuServer.pm | 1 + 1 file changed, 1 insertion(+) diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index 49b6ca17..30e51a8c 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -6813,6 +6813,7 @@ my $restore_allocate_devices = sub { my $map = {}; foreach my $virtdev (sort keys %$virtdev_hash) { my $d = $virtdev_hash->{$virtdev}; + die "got no size for '$virtdev'\n" if !defined($d->{size}); my $alloc_size = int(($d->{size} + 1024 - 1)/1024); my $storeid = $d->{storeid}; my $scfg = PVE::Storage::storage_config($storecfg, $storeid); -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH qemu v3 03/34] PVE backup: factor out setting up snapshot access for fleecing
Avoids some line bloat in the create_backup_jobs_bh() function and is in preparation for setting up the snapshot access independently of fleecing, in particular that will be useful for providing access to the snapshot via NBD. Signed-off-by: Fiona Ebner --- No changes in v3. pve-backup.c | 95 1 file changed, 58 insertions(+), 37 deletions(-) diff --git a/pve-backup.c b/pve-backup.c index c4178758b3..051ebffe48 100644 --- a/pve-backup.c +++ b/pve-backup.c @@ -525,6 +525,62 @@ static int coroutine_fn pvebackup_co_add_config( goto out; } +/* + * Setup a snapshot-access block node for a device with associated fleecing image. + */ +static int setup_snapshot_access(PVEBackupDevInfo *di, Error **errp) +{ +Error *local_err = NULL; + +if (!di->fleecing.bs) { +error_setg(errp, "no associated fleecing image"); +return -1; +} + +QDict *cbw_opts = qdict_new(); +qdict_put_str(cbw_opts, "driver", "copy-before-write"); +qdict_put_str(cbw_opts, "file", bdrv_get_node_name(di->bs)); +qdict_put_str(cbw_opts, "target", bdrv_get_node_name(di->fleecing.bs)); + +if (di->bitmap) { +/* + * Only guest writes to parts relevant for the backup need to be intercepted with + * old data being copied to the fleecing image. + */ +qdict_put_str(cbw_opts, "bitmap.node", bdrv_get_node_name(di->bs)); +qdict_put_str(cbw_opts, "bitmap.name", bdrv_dirty_bitmap_name(di->bitmap)); +} +/* + * Fleecing storage is supposed to be fast and it's better to break backup than guest + * writes. Certain guest drivers like VirtIO-win have 60 seconds timeout by default, so + * abort a bit before that. + */ +qdict_put_str(cbw_opts, "on-cbw-error", "break-snapshot"); +qdict_put_int(cbw_opts, "cbw-timeout", 45); + +di->fleecing.cbw = bdrv_insert_node(di->bs, cbw_opts, BDRV_O_RDWR, &local_err); + +if (!di->fleecing.cbw) { +error_setg(errp, "appending cbw node for fleecing failed: %s", + local_err ? error_get_pretty(local_err) : "unknown error"); +return -1; +} + +QDict *snapshot_access_opts = qdict_new(); +qdict_put_str(snapshot_access_opts, "driver", "snapshot-access"); +qdict_put_str(snapshot_access_opts, "file", bdrv_get_node_name(di->fleecing.cbw)); + +di->fleecing.snapshot_access = +bdrv_open(NULL, NULL, snapshot_access_opts, BDRV_O_RDWR | BDRV_O_UNMAP, &local_err); +if (!di->fleecing.snapshot_access) { +error_setg(errp, "setting up snapshot access for fleecing failed: %s", + local_err ? error_get_pretty(local_err) : "unknown error"); +return -1; +} + +return 0; +} + /* * backup_job_create can *not* be run from a coroutine, so this can't either. * The caller is responsible that backup_mutex is held nonetheless. @@ -569,49 +625,14 @@ static void create_backup_jobs_bh(void *opaque) { const char *job_id = bdrv_get_device_name(di->bs); bdrv_graph_co_rdunlock(); if (di->fleecing.bs) { -QDict *cbw_opts = qdict_new(); -qdict_put_str(cbw_opts, "driver", "copy-before-write"); -qdict_put_str(cbw_opts, "file", bdrv_get_node_name(di->bs)); -qdict_put_str(cbw_opts, "target", bdrv_get_node_name(di->fleecing.bs)); - -if (di->bitmap) { -/* - * Only guest writes to parts relevant for the backup need to be intercepted with - * old data being copied to the fleecing image. - */ -qdict_put_str(cbw_opts, "bitmap.node", bdrv_get_node_name(di->bs)); -qdict_put_str(cbw_opts, "bitmap.name", bdrv_dirty_bitmap_name(di->bitmap)); -} -/* - * Fleecing storage is supposed to be fast and it's better to break backup than guest - * writes. Certain guest drivers like VirtIO-win have 60 seconds timeout by default, so - * abort a bit before that. - */ -qdict_put_str(cbw_opts, "on-cbw-error", "break-snapshot"); -qdict_put_int(cbw_opts, "cbw-timeout", 45); - -di->fleecing.cbw = bdrv_insert_node(di->bs, cbw_opts, BDRV_O_RDWR, &local_err); - -if (!di->fleecing.cbw) { -error_setg(errp, "appending cbw node for fleecing failed: %s", - local_err ? error_get_pretty(local_err) : "unknown error"); -bdrv_drained_end(di->bs); -break; -
[pve-devel] [RFC qemu v3 06/34] PVE backup: add target ID in backup state
In preparation for allowing multiple backup providers. Each backup target can then have its own dirty bitmap and there can be additional checks that the current backup state is actually associated to the expected target. Signed-off-by: Fiona Ebner --- No changes in v3. pve-backup.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/pve-backup.c b/pve-backup.c index d931746453..e8031bb89c 100644 --- a/pve-backup.c +++ b/pve-backup.c @@ -70,6 +70,7 @@ static struct PVEBackupState { JobTxn *txn; CoMutex backup_mutex; CoMutex dump_callback_mutex; +char *target_id; } backup_state; static void pvebackup_init(void) @@ -848,7 +849,7 @@ UuidInfo coroutine_fn *qmp_backup( if (backup_state.di_list) { error_set(errp, ERROR_CLASS_GENERIC_ERROR, - "previous backup not finished"); + "previous backup by provider '%s' not finished", backup_state.target_id); qemu_co_mutex_unlock(&backup_state.backup_mutex); return NULL; } @@ -1100,6 +1101,11 @@ UuidInfo coroutine_fn *qmp_backup( backup_state.vmaw = vmaw; backup_state.pbs = pbs; +if (backup_state.target_id) { +g_free(backup_state.target_id); +} +backup_state.target_id = g_strdup("Proxmox"); + backup_state.di_list = di_list; uuid_info = g_malloc0(sizeof(*uuid_info)); -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [RFC qemu v3 07/34] PVE backup: get device info: allow caller to specify filter for which devices use fleecing
For providing snapshot-access to external backup providers, EFI and TPM also need an associated fleecing image. The new caller will thus need a different filter. Signed-off-by: Fiona Ebner --- No changes in v3. pve-backup.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/pve-backup.c b/pve-backup.c index e8031bb89c..d0593fc581 100644 --- a/pve-backup.c +++ b/pve-backup.c @@ -717,7 +717,7 @@ static void create_backup_jobs_bh(void *opaque) { /* * EFI disk and TPM state are small and it's just not worth setting up fleecing for them. */ -static bool device_uses_fleecing(const char *device_id) +static bool fleecing_no_efi_tpm(const char *device_id) { return strncmp(device_id, "drive-efidisk", 13) && strncmp(device_id, "drive-tpmstate", 14); } @@ -729,7 +729,7 @@ static bool device_uses_fleecing(const char *device_id) */ static GList coroutine_fn GRAPH_RDLOCK *get_device_info( const char *devlist, -bool fleecing, +bool (*device_uses_fleecing)(const char*), Error **errp) { gchar **devs = NULL; @@ -755,7 +755,7 @@ static GList coroutine_fn GRAPH_RDLOCK *get_device_info( di->bs = bs; di->device_name = g_strdup(bdrv_get_device_name(bs)); -if (fleecing && device_uses_fleecing(*d)) { +if (device_uses_fleecing && device_uses_fleecing(*d)) { g_autofree gchar *fleecing_devid = g_strconcat(*d, "-fleecing", NULL); BlockBackend *fleecing_blk = blk_by_name(fleecing_devid); if (!fleecing_blk) { @@ -858,7 +858,8 @@ UuidInfo coroutine_fn *qmp_backup( format = has_format ? format : BACKUP_FORMAT_VMA; bdrv_graph_co_rdlock(); -di_list = get_device_info(devlist, has_fleecing && fleecing, &local_err); +di_list = get_device_info(devlist, (has_fleecing && fleecing) ? fleecing_no_efi_tpm : NULL, + &local_err); bdrv_graph_co_rdunlock(); if (local_err) { error_propagate(errp, local_err); -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH qemu v3 04/34] PVE backup: save device name in device info structure
The device name needs to be queried while holding the graph read lock and since it doesn't change during the whole operation, just get it once during setup and avoid the need to query it again in different places. Also in preparation to use it more often in error messages and for the upcoming external backup access API. Signed-off-by: Fiona Ebner --- No changes in v3. pve-backup.c | 29 +++-- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/pve-backup.c b/pve-backup.c index 051ebffe48..33c23e53c2 100644 --- a/pve-backup.c +++ b/pve-backup.c @@ -94,6 +94,7 @@ typedef struct PVEBackupDevInfo { size_t size; uint64_t block_size; uint8_t dev_id; +char* device_name; int completed_ret; // INT_MAX if not completed BdrvDirtyBitmap *bitmap; BlockDriverState *target; @@ -327,6 +328,8 @@ static void coroutine_fn pvebackup_co_complete_stream(void *opaque) } di->bs = NULL; +g_free(di->device_name); +di->device_name = NULL; assert(di->target == NULL); @@ -621,9 +624,6 @@ static void create_backup_jobs_bh(void *opaque) { BlockDriverState *source_bs = di->bs; bool discard_source = false; -bdrv_graph_co_rdlock(); -const char *job_id = bdrv_get_device_name(di->bs); -bdrv_graph_co_rdunlock(); if (di->fleecing.bs) { if (setup_snapshot_access(di, &local_err) < 0) { error_setg(errp, "setting up snapshot access for fleecing failed: %s", @@ -654,7 +654,7 @@ static void create_backup_jobs_bh(void *opaque) { } BlockJob *job = backup_job_create( -job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap, +di->device_name, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap, bitmap_mode, false, discard_source, NULL, &perf, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn, &local_err); @@ -751,6 +751,7 @@ static GList coroutine_fn GRAPH_RDLOCK *get_device_info( } PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1); di->bs = bs; +di->device_name = g_strdup(bdrv_get_device_name(bs)); if (fleecing && device_uses_fleecing(*d)) { g_autofree gchar *fleecing_devid = g_strconcat(*d, "-fleecing", NULL); @@ -789,6 +790,7 @@ static GList coroutine_fn GRAPH_RDLOCK *get_device_info( PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1); di->bs = bs; +di->device_name = g_strdup(bdrv_get_device_name(bs)); di_list = g_list_append(di_list, di); } } @@ -956,9 +958,6 @@ UuidInfo coroutine_fn *qmp_backup( di->block_size = dump_cb_block_size; -bdrv_graph_co_rdlock(); -const char *devname = bdrv_get_device_name(di->bs); -bdrv_graph_co_rdunlock(); PBSBitmapAction action = PBS_BITMAP_ACTION_NOT_USED; size_t dirty = di->size; @@ -973,7 +972,8 @@ UuidInfo coroutine_fn *qmp_backup( } action = PBS_BITMAP_ACTION_NEW; } else { -expect_only_dirty = proxmox_backup_check_incremental(pbs, devname, di->size) != 0; +expect_only_dirty = +proxmox_backup_check_incremental(pbs, di->device_name, di->size) != 0; } if (expect_only_dirty) { @@ -997,7 +997,8 @@ UuidInfo coroutine_fn *qmp_backup( } } -int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, errp); +int dev_id = proxmox_backup_co_register_image(pbs, di->device_name, di->size, + expect_only_dirty, errp); if (dev_id < 0) { goto err_mutex; } @@ -1009,7 +1010,7 @@ UuidInfo coroutine_fn *qmp_backup( di->dev_id = dev_id; PBSBitmapInfo *info = g_malloc(sizeof(*info)); -info->drive = g_strdup(devname); +info->drive = g_strdup(di->device_name); info->action = action; info->size = di->size; info->dirty = dirty; @@ -1034,10 +1035,7 @@ UuidInfo coroutine_fn *qmp_backup( goto err_mutex; } -bdrv_graph_co_rdlock(); -const char *devname = bdrv_get_device_name(di->bs); -bdrv_graph_co_rdunlock(); -di->dev_id = vma_writer_register_stream(vmaw, devname, di->size); +di->dev_id = vma_writer_register_stream(vmaw, di->device_name, di->size);
[pve-devel] [PATCH qemu v3 02/34] PVE backup: fixup error handling for fleecing
The drained section needs to be terminated before breaking out of the loop in the error scenarios. Otherwise, guest IO on the drive would become stuck. If the job is created successfully, then the job completion callback will clean up the snapshot access block nodes. In case failure happened before the job is created, there was no cleanup for the snapshot access block nodes yet. Add it. Signed-off-by: Fiona Ebner --- No changes in v3. pve-backup.c | 38 +- 1 file changed, 25 insertions(+), 13 deletions(-) diff --git a/pve-backup.c b/pve-backup.c index 4e730aa3da..c4178758b3 100644 --- a/pve-backup.c +++ b/pve-backup.c @@ -357,22 +357,23 @@ static void coroutine_fn pvebackup_co_complete_stream(void *opaque) qemu_co_mutex_unlock(&backup_state.backup_mutex); } +static void cleanup_snapshot_access(PVEBackupDevInfo *di) +{ +if (di->fleecing.snapshot_access) { +bdrv_unref(di->fleecing.snapshot_access); +di->fleecing.snapshot_access = NULL; +} +if (di->fleecing.cbw) { +bdrv_cbw_drop(di->fleecing.cbw); +di->fleecing.cbw = NULL; +} +} + static void pvebackup_complete_cb(void *opaque, int ret) { PVEBackupDevInfo *di = opaque; di->completed_ret = ret; -/* - * Handle block-graph specific cleanup (for fleecing) outside of the coroutine, because the work - * won't be done as a coroutine anyways: - * - For snapshot_access, allows doing bdrv_unref() directly. Doing it via bdrv_co_unref() would - * just spawn a BH calling bdrv_unref(). - * - For cbw, draining would need to spawn a BH. - */ -if (di->fleecing.snapshot_access) { -bdrv_unref(di->fleecing.snapshot_access); -di->fleecing.snapshot_access = NULL; -} if (di->fleecing.cbw) { /* * With fleecing, failure for cbw does not fail the guest write, but only sets the snapshot @@ -383,10 +384,17 @@ static void pvebackup_complete_cb(void *opaque, int ret) if (di->completed_ret == -EACCES && snapshot_error) { di->completed_ret = snapshot_error; } -bdrv_cbw_drop(di->fleecing.cbw); -di->fleecing.cbw = NULL; } +/* + * Handle block-graph specific cleanup (for fleecing) outside of the coroutine, because the work + * won't be done as a coroutine anyways: + * - For snapshot_access, allows doing bdrv_unref() directly. Doing it via bdrv_co_unref() would + * just spawn a BH calling bdrv_unref(). + * - For cbw, draining would need to spawn a BH. + */ +cleanup_snapshot_access(di); + /* * Needs to happen outside of coroutine, because it takes the graph write lock. */ @@ -587,6 +595,7 @@ static void create_backup_jobs_bh(void *opaque) { if (!di->fleecing.cbw) { error_setg(errp, "appending cbw node for fleecing failed: %s", local_err ? error_get_pretty(local_err) : "unknown error"); +bdrv_drained_end(di->bs); break; } @@ -599,6 +608,8 @@ static void create_backup_jobs_bh(void *opaque) { if (!di->fleecing.snapshot_access) { error_setg(errp, "setting up snapshot access for fleecing failed: %s", local_err ? error_get_pretty(local_err) : "unknown error"); +cleanup_snapshot_access(di); +bdrv_drained_end(di->bs); break; } source_bs = di->fleecing.snapshot_access; @@ -637,6 +648,7 @@ static void create_backup_jobs_bh(void *opaque) { } if (!job || local_err) { +cleanup_snapshot_access(di); error_setg(errp, "backup_job_create failed: %s", local_err ? error_get_pretty(local_err) : "null"); break; -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v2 storage 1/1] file_size_info: implement untrusted mode
Am 04.11.24 um 11:42 schrieb Fabian Grünbichler: > this allows checking some extra attributes for images which come from a > potentially malicious source. > > since file_size_info is not part of the plugin API, no API bump is needed. if > desired, a similar check could also be implemented in volume_size_info, which > would entail bumping both APIVER and APIAGE (since the additional parameter > would make checking untrusted volumes opt-in for external plugins). > > Signed-off-by: Fabian Grünbichler > Reviewed-by: Fiona Ebner Tested-by: Fiona Ebner (FWIW it breaks my directory-based backup provider example in case of incremental backups, because that relied on qcow2 backing files O:P) > @@ -975,18 +985,34 @@ sub file_size_info { > warn $err_output; > } > if (!$json) { > + die "failed to query file information with qemu-img\n" if $untrusted; git complains about "space before tab" here ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH master+bookworm-6.8 kernel] cherry-pick fix mitigating host reboot issue affecting certain AMD Zen4 CPU models
Reported in the community forum: Issue: https://forum.proxmox.com/threads/139500/ Fix: https://forum.proxmox.com/threads/139500/post-717968 Signed-off-by: Fiona Ebner --- Should also be applied to the 6.8 kernel while renumbering the newly added patch. ...r-virtualized-VMLOAD-VMSAVE-on-Zen4-.patch | 44 +++ 1 file changed, 44 insertions(+) create mode 100644 patches/kernel/0015-x86-CPU-AMD-Clear-virtualized-VMLOAD-VMSAVE-on-Zen4-.patch diff --git a/patches/kernel/0015-x86-CPU-AMD-Clear-virtualized-VMLOAD-VMSAVE-on-Zen4-.patch b/patches/kernel/0015-x86-CPU-AMD-Clear-virtualized-VMLOAD-VMSAVE-on-Zen4-.patch new file mode 100644 index 000..76957b6 --- /dev/null +++ b/patches/kernel/0015-x86-CPU-AMD-Clear-virtualized-VMLOAD-VMSAVE-on-Zen4-.patch @@ -0,0 +1,44 @@ +From Mon Sep 17 00:00:00 2001 +From: Mario Limonciello +Date: Tue, 5 Nov 2024 10:02:34 -0600 +Subject: [PATCH] x86/CPU/AMD: Clear virtualized VMLOAD/VMSAVE on Zen4 client + +A number of Zen4 client SoCs advertise the ability to use virtualized +VMLOAD/VMSAVE, but using these instructions is reported to be a cause +of a random host reboot. + +These instructions aren't intended to be advertised on Zen4 client +so clear the capability. + +Signed-off-by: Mario Limonciello +Signed-off-by: Borislav Petkov (AMD) +Cc: sta...@vger.kernel.org +Link: https://bugzilla.kernel.org/show_bug.cgi?id=219009 +(cherry picked from commit a5ca1dc46a6b610dd4627d8b633d6c84f9724ef0) +Signed-off-by: Fiona Ebner +--- + arch/x86/kernel/cpu/amd.c | 11 +++ + 1 file changed, 11 insertions(+) + +diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c +index 1e0fe5f8ab84..ee87f997d31f 100644 +--- a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c +@@ -924,6 +924,17 @@ static void init_amd_zen4(struct cpuinfo_x86 *c) + { + if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) + msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT); ++ ++ /* ++ * These Zen4 SoCs advertise support for virtualized VMLOAD/VMSAVE ++ * in some BIOS versions but they can lead to random host reboots. ++ */ ++ switch (c->x86_model) { ++ case 0x18 ... 0x1f: ++ case 0x60 ... 0x7f: ++ clear_cpu_cap(c, X86_FEATURE_V_VMSAVE_VMLOAD); ++ break; ++ } + } + + static void init_amd_zen5(struct cpuinfo_x86 *c) -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH kernel] d/rules: enable CONFIG_MEMCG_V1 needed for legacy containers
As reported in the community forum [0], containers requiring cgroup v1 would not start anymore, even when systemd.unified_cgroup_hierarchy=0 was set on the kernel commandline. The error message would be: > cgfsng_setup_limits_legacy: 3442 No such file or directory - Failed to set > "memory.limit_in_bytes" to "536870912" Kernel commit e93d4166b40a ("mm: memcg: put cgroup v1-specific code under a config option") made it necessary to explicitly enable the new associated Kconfig flag. [0]: https://forum.proxmox.com/threads/156830/ Signed-off-by: Fiona Ebner --- debian/rules | 1 + 1 file changed, 1 insertion(+) diff --git a/debian/rules b/debian/rules index f19a5b8..e1444b7 100755 --- a/debian/rules +++ b/debian/rules @@ -65,6 +65,7 @@ PMX_CONFIG_OPTS= \ -e CONFIG_MODULE_SIG_SHA512 \ -d CONFIG_MEMCG_DISABLED \ -e CONFIG_MEMCG_SWAP_ENABLED \ +-e CONFIG_MEMCG_V1 \ -e CONFIG_HYPERV \ -m CONFIG_VFIO_IOMMU_TYPE1 \ -e CONFIG_VFIO_VIRQFD \ -- 2.39.5 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH manager] ui: cloudinit: disallow unprivileged users to regenerate images
Am 31.10.24 um 15:06 schrieb Daniel Kral: > Disables the "Regenerate image" button in the VM CloudInit tab for > users, which lack the necessary permissions to do so, which are > "VM.Config.CloudInit" and "VM.Config.CDROM". This is checked by the VM > config update API endpoint in qemu-server, when ejecting and re-adding > CloudInit drive images. > > This is consistent with the permissions restricting access to adding > CloudInit drives in the Hardware tab. > > This is a cosmetic change as the VM config update API endpoint would > fail because of insufficient permissions anyway. > Since those users can already use the cloudinit_update endpoint (just not via UI) and thus effectively don't need "VM.Config.CDROM", I think it's better to change the UI to use that endpoint as well. The endpoint was added here [0] and the switch in the UI was already proposed [1] as part of that but it seems never applied. [0]: https://lore.proxmox.com/pve-devel/20220622115206.3295425-1-aderum...@odiso.com/ [1]: https://lore.proxmox.com/pve-devel/20220622141335.3616129-2-aderum...@odiso.com/ > Signed-off-by: Daniel Kral > --- > I stumbled upon this while checking on BugZilla #3105 and trying to use > the "Regenerate image" button with a user, that had only VM.Audit and > VM.Config.CloudInit permissions. where the user missed the > "VM.Config.CDROM" permission and the action couldn't be completed. > > www/manager6/qemu/CloudInit.js | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/www/manager6/qemu/CloudInit.js b/www/manager6/qemu/CloudInit.js > index 49519726..aeef8c15 100644 > --- a/www/manager6/qemu/CloudInit.js > +++ b/www/manager6/qemu/CloudInit.js > @@ -142,7 +142,10 @@ Ext.define('PVE.qemu.CloudInit', { > } > }); > > - me.down('#savebtn').setDisabled(!found); > + let caps = Ext.state.Manager.get('GuiCap'); > + let canSaveImage = !!caps.vms['VM.Config.CDROM'] && > !!caps.vms['VM.Config.Cloudinit']; > + me.down('#savebtn').setDisabled(!found || !canSaveImage); > + > me.setDisabled(!found); > if (!found) { > me.getView().mask(gettext('No CloudInit Drive found'), > ['pve-static-mask']); ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH qemu] async snapshot: stop vCPU throttling after finishing
In the community forum, users reported issues about RCU stalls and sluggish VMs after taking a snapshot with RAM in Proxmox VE [0]. Mario was also experiencing similar issues from time to time and recently, obtained a GDB stacktrace. The stacktrace showed that, in his case, the vCPU threads were waiting in cpu_throttle_thread(). It is a good guess that the issues in the forum could also be because of that. >From searching in the source code, it seems that migration is the only user of the vCPU throttling functions in QEMU relevant for Proxmox VE (the only other place where it is used is the Cocoa UI). In particular, RAM migration will begin throttling vCPUs for auto-converge. In migration_iteration_finish() there is an unconditional call to cpu_throttle_stop(), so do the same in the async snapshot code specific to Proxmox VE. It's not clear why the issue began to surface more prominently only now, since the vCPU throttling was there since commit 070afca258 ("migration: Dynamic cpu throttling for auto-converge") in QEMU v2.10.0. However, there were a lot of changes in the migration code between v8.1.5 and v9.0.2 and a few of them might have affected the likelihood of cpu_throttle_set() being called, for example, 4e1871c450 ("migration: Don't serialize devices in qemu_savevm_state_iterate()") [0]: https://forum.proxmox.com/threads/153483 Reported-by: Mario Loderer Signed-off-by: Fiona Ebner Tested-by: Mario Loderer --- ...-async-for-background-state-snapshots.patch | 18 +- ...-add-optional-buffer-size-to-QEMUFile.patch | 6 +++--- 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/debian/patches/pve/0017-PVE-add-savevm-async-for-background-state-snapshots.patch b/debian/patches/pve/0017-PVE-add-savevm-async-for-background-state-snapshots.patch index 964caf3..b0e75e9 100644 --- a/debian/patches/pve/0017-PVE-add-savevm-async-for-background-state-snapshots.patch +++ b/debian/patches/pve/0017-PVE-add-savevm-async-for-background-state-snapshots.patch @@ -28,7 +28,8 @@ Signed-off-by: Stefan Reiter adapt to removal of QEMUFileOps improve condition for entering final stage adapt to QAPI and other changes for 8.2 - make sure to not call vm_start() from coroutine] + make sure to not call vm_start() from coroutine + stop CPU throttling after finishing] Signed-off-by: Fiona Ebner --- hmp-commands-info.hx | 13 + @@ -36,13 +37,13 @@ Signed-off-by: Fiona Ebner include/migration/snapshot.h | 2 + include/monitor/hmp.h| 3 + migration/meson.build| 1 + - migration/savevm-async.c | 538 +++ + migration/savevm-async.c | 545 +++ monitor/hmp-cmds.c | 38 +++ qapi/migration.json | 34 +++ qapi/misc.json | 18 ++ qemu-options.hx | 12 + system/vl.c | 10 + - 11 files changed, 686 insertions(+) + 11 files changed, 693 insertions(+) create mode 100644 migration/savevm-async.c diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx @@ -140,10 +141,10 @@ index 95d1cf2250..800f12a60d 100644 'threadinfo.c', diff --git a/migration/savevm-async.c b/migration/savevm-async.c new file mode 100644 -index 00..72cf6588c2 +index 00..1af32604c7 --- /dev/null +++ b/migration/savevm-async.c -@@ -0,0 +1,538 @@ +@@ -0,0 +1,545 @@ +#include "qemu/osdep.h" +#include "migration/channel-savevm-async.h" +#include "migration/migration.h" @@ -154,6 +155,7 @@ index 00..72cf6588c2 +#include "migration/global_state.h" +#include "migration/ram.h" +#include "migration/qemu-file.h" ++#include "sysemu/cpu-throttle.h" +#include "sysemu/sysemu.h" +#include "sysemu/runstate.h" +#include "block/block.h" @@ -342,6 +344,12 @@ index 00..72cf6588c2 +ret || aborted ? MIGRATION_STATUS_FAILED : MIGRATION_STATUS_COMPLETED); +ms->to_dst_file = NULL; + ++/* ++ * Same as in migration_iteration_finish(): saving RAM might've turned on CPU throttling for ++ * auto-converge, make sure to disable it. ++ */ ++cpu_throttle_stop(); ++ +qemu_savevm_state_cleanup(); + +ret = save_snapshot_cleanup(); diff --git a/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch b/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch index 043b249..92bc9f2 100644 --- a/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch +++ b/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch @@ -193,10 +193,10 @@ index 32fd4a34fd..36a0cd8cc8 100644 /* diff --git a/migration/savevm-async.c b/migration/savevm-async.c -index 72cf6588c2..fb4e8ea689 100644 +index 1af32604c7..be2035cd2e 100644 --- a/migration/savevm-async.c +++ b/migrati
Re: [pve-devel] [PATCH container] fix #5761: add the "discard" mount option
Am 09.10.24 um 16:22 schrieb Filip Schauer: > Introduce the "discard" mount option for rootfs and mount points. This > ensures that unused container volume blocks are discarded from the > underlying storage backend when deleting files within the container. > > Signed-off-by: Filip Schauer Reviewed-by: Fiona Ebner Tested-by: Fiona Ebner However, this misses the UI part or you can get non-editable volumes there. > --- > src/PVE/LXC/Config.pm | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm > index ce64c4c..e980f8a 100644 > --- a/src/PVE/LXC/Config.pm > +++ b/src/PVE/LXC/Config.pm > @@ -311,7 +311,7 @@ sub __snapshot_rollback_get_unused { > cfs_register_file('/lxc/', \&parse_pct_config, \&write_pct_config); > > > -my $valid_mount_option_re = qr/(noatime|lazytime|nodev|nosuid|noexec)/; > +my $valid_mount_option_re = > qr/(discard|lazytime|noatime|nodev|noexec|nosuid)/; > > sub is_valid_mount_option { > my ($option) = @_; ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH qemu 1/2] vm-network-scripts: move scripts to /usr/libexec
It's the "qemu-server" repository, not "qemu". Am 09.10.24 um 14:55 schrieb Maximiliano Sandoval: > Moves the network scripts from /var/lib/qemu-server into > /usr/libexec/qemu-server. > > /usr/libexec is described as binaries run by programs which are not > intended to be directly executed by the user on [FHS 4.7]. On the other > hand /var/lib corresponds to variable state information, which does not > fit the use case here, see [FHS 5.8]. > > [FHS 4.7]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s07.html > [FHS 5.8]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch05s08.html > Can this cause issues for a brief time during upgrade/unpacking, i.e. A) if the old QemuServer.pm is still loaded while the scripts get removed from its old location (or to be more precise, a QEMU process with the old parameter is started at that time) B) if the new QemuServer.pm gets loaded before the scripts are extracted to their new location (or to be more precise, a QEMU process with the new parameter is started at that time) ? Is there something preventing those from happening? Otherwise, we might need to ship the scripts to both locations until the next major release and drop them from the old location only then. > Signed-off-by: Maximiliano Sandoval > --- > PVE/QemuServer.pm | 4 ++-- > vm-network-scripts/Makefile | 10 +- > vm-network-scripts/pve-bridge-hotplug | 2 +- > 3 files changed, 8 insertions(+), 8 deletions(-) > > diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm > index b26da505..88100638 100644 > --- a/PVE/QemuServer.pm > +++ b/PVE/QemuServer.pm > @@ -1759,8 +1759,8 @@ sub print_netdev_full { > my $script = $hotplug ? "pve-bridge-hotplug" : "pve-bridge"; > > if ($net->{bridge}) { > - $netdev = > "type=tap,id=$netid,ifname=${ifname},script=/var/lib/qemu-server/$script" > - .",downscript=/var/lib/qemu-server/pve-bridgedown$vhostparam"; > + $netdev = > "type=tap,id=$netid,ifname=${ifname},script=/usr/libexec/qemu-server/$script" > + .",downscript=/usr/libexec/qemu-server/pve-bridgedown$vhostparam"; > } else { > $netdev = "type=user,id=$netid,hostname=$vmname"; > } > diff --git a/vm-network-scripts/Makefile b/vm-network-scripts/Makefile > index 5ba537d0..dc2c85ff 100644 > --- a/vm-network-scripts/Makefile > +++ b/vm-network-scripts/Makefile > @@ -1,12 +1,12 @@ > DESTDIR= > -VARLIBDIR=$(DESTDIR)/var/lib/qemu-server > +LIBEXECDIR=$(DESTDIR)/usr/libexec/qemu-server > > .PHONY: install > install: pve-bridge pve-bridge-hotplug pve-bridgedown > - install -d ${VARLIBDIR} > - install -m 0755 pve-bridge ${VARLIBDIR}/pve-bridge > - install -m 0755 pve-bridge-hotplug ${VARLIBDIR}/pve-bridge-hotplug > - install -m 0755 pve-bridgedown ${VARLIBDIR}/pve-bridgedown > + install -d ${LIBEXECDIR} > + install -m 0755 pve-bridge ${LIBEXECDIR}/pve-bridge > + install -m 0755 pve-bridge-hotplug ${LIBEXECDIR}/pve-bridge-hotplug > + install -m 0755 pve-bridgedown ${LIBEXECDIR}/pve-bridgedown > > .PHONY: clean > clean: > diff --git a/vm-network-scripts/pve-bridge-hotplug > b/vm-network-scripts/pve-bridge-hotplug > index f36ed408..3ae01ea5 100755 > --- a/vm-network-scripts/pve-bridge-hotplug > +++ b/vm-network-scripts/pve-bridge-hotplug > @@ -1,3 +1,3 @@ > #!/bin/sh > > -exec /var/lib/qemu-server/pve-bridge --hotplug "$@" > +exec /usr/libexec/qemu-server/pve-bridge --hotplug "$@" ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH qemu] fix typos in user-visible strings
Am 08.10.24 um 17:14 schrieb Maximiliano Sandoval: > This includes docs, and strings printed to stderr or stdout. > > These were caught with: > > typos --exclude test --exclude changelog > > Signed-off-by: Maximiliano Sandoval applied, thanks! But please note that it's the "qemu-server" repository, not "qemu" ;) ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH qemu] api: qemu: create: default cpu to x86-64-v2-AES
Am 22.10.24 um 15:35 schrieb Thomas Lamprecht: > Am 22/10/2024 um 14:15 schrieb Maximiliano Sandoval: >> Thomas Lamprecht writes: >> >>> And even then, would this still break restoring backups made before that >>> change? >>> If, this would fall under the changes that need versioning the guest >>> configs. Which naturally is possible but is IMO also not something I'd >>> do lightly, as that's something that might have wide-reaching consequences. >> >> Would it make more sense to do this only for the CLI tool, and only when >> creating new VMs? >> > > Hmm, it would feel a bit odd to me to single that specific "mismatch" between > UI default and CLI/schema out but ignore the others, like e.g., the memory > default of 2048, or some dynamic defaults like the OS-type dependent ones. > > And while it would have reduced impact, it would be still a breaking change > that might affect peoples automation/scripts. > But if more than just the CPU type would be looked at, and adapted such that > UI and CLI behave more the same, it might be indeed a feasible way to improve > UX slightly on the next major release next year. > Conveying the reason for different defaults and how one can look into which > one will be applied would warrant a short paragraph in the docs then too. > And taking the chance and looking at the CT story on this would be great too, > e.g., like is 512 MB really still a good default for UI/CLI, or unprivileged > vs. privileged. Maybe we could have (datacenter-wide?) default profiles (one for containers, one for VMs) whose values are used for new guest creation? Users could modify them if desired and they could also be used to pre-fill values in the UI. Could then be shipped for PVE 9 with sensible values. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [[PATCH kernel]] fix 5683: netfs: reset subreq iov iter before tail clean
Am 02.10.24 um 16:36 schrieb Christian Ebner: > Fixes rare read corruption issues using the in kernel ceph client. > > On incomplete read requests, the clean tail flag should make sure to > zero fill the remaining bytes for the subrequest. > If the iov iterator is not at the correct position, this can however > zero fill downloaded data, corrupting the read content. > > Link to issue: > https://bugzilla.proxmox.com/show_bug.cgi?id=5683 > > Link to upstream issue: > https://bugzilla.kernel.org/show_bug.cgi?id=219237 > > Signed-off-by: Christian Ebner > --- > This fixes the read corruption issue with my local reproducer. > > Providing a patched kernel to users affected by the issue for testing > would be probably the best way to verify the fix. > > Also, I reached out once again to the kernel developers asking if > this fix is a valid approach, hoping this can be included in current > stable (as the patch does fix the issue also when applied on 6.11.1). > > ...et-subreq-iov-iter-before-tail-clean.patch | 31 +++ > 1 file changed, 31 insertions(+) > create mode 100644 > patches/kernel/0021-netfs-reset-subreq-iov-iter-before-tail-clean.patch > > diff --git > a/patches/kernel/0021-netfs-reset-subreq-iov-iter-before-tail-clean.patch > b/patches/kernel/0021-netfs-reset-subreq-iov-iter-before-tail-clean.patch > new file mode 100644 > index 000..a87e722 > --- /dev/null > +++ b/patches/kernel/0021-netfs-reset-subreq-iov-iter-before-tail-clean.patch > @@ -0,0 +1,31 @@ > +From cd27abf0c555f39b12c05f9f6a8cb59ff25dfe45 Mon Sep 17 00:00:00 2001 > +From: Christian Ebner > +Date: Wed, 2 Oct 2024 15:24:31 +0200 > +Subject: [PATCH] netfs: reset subreq iov iter before tail clean > + > +Make sure the iter is at the correct location when cleaning up tail > +bytes for incomplete read subrequests. > + Disclaimer that I'm not familiar at all with the code. So AFAIU, after short IO, the iov_iter_count() and subreq->len - subreq->transferred might disagree. That is why before resubmission, netfs_reset_subreq_iter() is called. That function aligns the iterator position, so it will match the information from 'subreq'. In your edge case, there is no resubmission though, because the NETFS_SREQ_CLEAR_TAIL flag is set. But it still was short IO, so the mentioned mismatch happened. Now netfs_clear_unread() relies on the information from iov_iter_count(), which does not match the actual 'subreq'. To fix it, you call netfs_reset_subreq_iter() (like is done before resubmission) to align that information. Before commit 92b6cc5d1e7c ("netfs: Add iov_iters to (sub)requests to describe various buffers"), the information from the 'subreq' was used to set up the iterator: > diff --git a/fs/netfs/io.c b/fs/netfs/io.c > index 7f753380e047..e9d408e211b8 100644 > --- a/fs/netfs/io.c > +++ b/fs/netfs/io.c > @@ -21,12 +21,7 @@ > */ > static void netfs_clear_unread(struct netfs_io_subrequest *subreq) > { > - struct iov_iter iter; > - > - iov_iter_xarray(&iter, ITER_DEST, &subreq->rreq->mapping->i_pages, > - subreq->start + subreq->transferred, > - subreq->len - subreq->transferred); > - iov_iter_zero(iov_iter_count(&iter), &iter); > + iov_iter_zero(iov_iter_count(&subreq->io_iter), &subreq->io_iter); > } so that sounds good :) So with and without your change, after the netfs_clear_unread() call, the iterator will be in the final position, i.e. iov_iter_count() == 0? Then the information in 'subreq' is updated manually in the same branch and it moves on to completion. How far off from reality am I ;)? FWIW, the change looks okay to me, but again, I'm not familiar with the code and I haven't done any testing (and have no reproducer). Of course it would be much nicer to have some confirmation from upstream and/or users about this. > +Fixes: 92b6cc5d ("netfs: Add iov_iters to (sub)requests to describe various > buffers") > +Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219237 > + > +Signed-off-by: Christian Ebner > +--- > + fs/netfs/io.c | 1 + > + 1 file changed, 1 insertion(+) > + > +diff --git a/fs/netfs/io.c b/fs/netfs/io.c > +index d6ada4eba744..500119285346 100644 > +--- a/fs/netfs/io.c > b/fs/netfs/io.c > +@@ -528,6 +528,7 @@ void netfs_subreq_terminated(struct netfs_io_subrequest > *subreq, > + > + incomplete: > + if (test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags)) { > ++netfs_reset_subreq_iter(rreq, subreq); > + netfs_clear_unread(subreq); > + subreq->transferred = subreq->len; > + goto complete; > +-- > +2.39.5 > + ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH qemu] api: qemu: create: default cpu to x86-64-v2-AES
Am 01.10.24 um 16:08 schrieb Maximiliano Sandoval: > This makes it so newly created VMs, e.g. with `qm create` will have the > same default value as VMs created via the web UI. > > Signed-off-by: Maximiliano Sandoval > --- > I am not entirely sure if there is side-effect that I am not taking into > account. > This is a breaking change, because existing API callers now suddenly get a new default. Even if it were just CLI and not API, it would be breaking for the same reason (there are scripts using the CLI tools out there). If we do this, then in a major release and prominently communicate it to all users in the release notes. And it should also be documented it in the API schema, that creation uses another default than the schema default. > PVE/API2/Qemu.pm | 4 > 1 file changed, 4 insertions(+) > > diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm > index d25a79fe..383218fd 100644 > --- a/PVE/API2/Qemu.pm > +++ b/PVE/API2/Qemu.pm > @@ -1152,6 +1152,10 @@ __PACKAGE__->register_method({ > $conf->{vmgenid} = PVE::QemuServer::generate_uuid(); > } > > + if (!defined($param->{cpu})) { > + $conf->{cpu} = 'x86-64-v2-AES'; > + } > + > my $machine_conf = > PVE::QemuServer::Machine::parse_machine($conf->{machine}); > my $machine = $machine_conf->{type}; > if (!$machine || $machine =~ m/^(?:pc|q35|virt)$/) { ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH qemu-server] vm_start: add syslog info with which PID a VM was started
Am 26.09.24 um 10:36 schrieb Daniel Kral: > Adds a syslog entry to log the process id that has been given to the > QEMU VM process at start. This is helpful debugging information if the > pid shows up at other places, like a kernel stack trace, while the VM > has been running, but cannot be retrieved anymore (e.g. the pidfile has > been deleted or only the syslog is available). > > The syslog has been put in the `PVE::QemuServer::vm_start_nolock` > subroutine to make sure that the PID is logged not only when the VM has > been started by the API endpoint `vm_start`, but also when the VM is > started by a remote migration. > > Signed-off-by: Daniel Kral > Suggested-by: Hannes Dürr > Suggested-by: Friedrich Weber Nit: the Signed-off-by trailer should come after the Suggested-by trailers, because it was the last thing that happened chronologically (the change was first suggested, then you wrote it ;)). With the nits addressed: Reviewed-by: Fiona Ebner > --- > PVE/QemuServer.pm | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm > index b26da505..d998b360 100644 > --- a/PVE/QemuServer.pm > +++ b/PVE/QemuServer.pm > @@ -43,6 +43,7 @@ use PVE::ProcFSTools; > use PVE::PBSClient; > use PVE::RESTEnvironment qw(log_warn); > use PVE::RPCEnvironment; > +use PVE::SafeSyslog; > use PVE::Storage; > use PVE::SysFSTools; > use PVE::Systemd; > @@ -5971,6 +5972,8 @@ sub vm_start_nolock { > eval { PVE::QemuServer::PCI::reserve_pci_usage($pci_reserve_list, $vmid, > undef, $pid) }; > warn $@ if $@; > > +syslog("info", "VM $vmid started with pid $pid."); Nit: I'd capitalize "PID" > + > if (defined(my $migrate = $res->{migrate})) { > if ($migrate->{proto} eq 'tcp') { > my $nodename = nodename(); ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH pve-storage] lvmplugin: fix: activate|deactivate volume, add missing storeid param in path sub
Am 25.09.24 um 11:00 schrieb Alexandre Derumier via pve-devel: > $storeid param is missing and $snapname is used as third param. > > This seem to works actually because $snapname is always empty in lvm > > Signed-off-by: Alexandre Derumier applied, thanks! ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH manager] fix #5787: ui: display vm description in confirm dialogs
Am 22.10.24 um 08:58 schrieb Thomas Lamprecht: > Am 21/10/2024 um 14:17 schrieb Timothy Nicholson: >> Signed-off-by: Timothy Nicholson >> --- >> >> This patch adds a new function to the PVE Utils that formats a task >> confirmation message to display in confirm dialogs, for example when >> shutting down a VM. As requested by the Bugzilla entry #5787, the message >> now includes the VM name in addition to the VM ID. > > why isn't above stated in the commit message but only as patch note? > Also please note (from [0]): > Make sure the line-length (text-width, column count) of the commit's message > is not longer than 72 characters. which is a good limit for readability in mails in general :) [0]: https://pve.proxmox.com/wiki/Developer_Documentation#Commits_and_Commit_Messages ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [RFC PATCH 1/2] config: allow hypens and dots in config keys
Am 24.09.24 um 16:35 schrieb Maximiliano Sandoval: > The first character still has to be a letter. > > Signed-off-by: Maximiliano Sandoval > --- > PVE/QemuServer.pm | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm > index b26da505..1566cf91 100644 > --- a/PVE/QemuServer.pm > +++ b/PVE/QemuServer.pm > @@ -2382,7 +2382,7 @@ sub parse_vm_config { > } else { > $handle_error->("vm $vmid - property 'delete' is only allowed > in [PENDING]\n"); > } > - } elsif ($line =~ m/^([a-z][a-z_]*\d*):\s*(.+?)\s*$/) { > + } elsif ($line =~ m/^([a-z][a-z_\.-]*\d*):\s*(.+?)\s*$/) { > my $key = $1; > my $value = $2; > if ($section eq 'cloudinit') { Regarding the hyphen, see also: https://lore.proxmox.com/pve-devel/7c911e11-d1cb-415b-9282-b40faa34d...@proxmox.com/#t Also friendly ping for @Thomas for that mail, in particular, for where this should be documented. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v3 storage] fix #5191: api, cli: implement moving a volume between storages
Am 18.09.24 um 16:56 schrieb Filip Schauer: > On 05/09/2024 14:12, Fiona Ebner wrote: >>> @@ -483,15 +485,173 @@ __PACKAGE__->register_method ({ >>> return $upid; >>> }}); >>> +sub volume_move { >> Should this even be a new top-level method? Or can/should we extend >> export/import instead, to not only cover guest images? Because with this >> top-level method we block the way for external storage plugins to >> support this functionality too if they don't adhere to our assumptions. > > > We do not use the existing export/import because we want to use `move` > instead of copying when `--delete` is passed. This improves performance > when moving between storages on the same storage backend. > Is that really a common enough use case? I mean, can be fine if the added complexity is not too much, but it feels a bit like duplication since we could also re-use storage_migrate() and add the special casing there. > Other than that the feedback was incorporated in patch v4: > https://lists.proxmox.com/pipermail/pve-devel/2024-September/065380.html > ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v4 storage 2/6] api: content: implement moving a volume between storages
Am 18.09.24 um 16:49 schrieb Filip Schauer: > +sub volume_move { > +my ($cfg, $source_volid, $target_storeid, $delete) = @_; > + > +my ($source_storeid, $source_volname) = > PVE::Storage::parse_volume_id($source_volid, 0); > + > +die "source and target storage cannot be the same\n" if $source_storeid > eq $target_storeid; > + > +my $source_scfg = PVE::Storage::storage_config($cfg, $source_storeid); > +my $source_plugin = PVE::Storage::Plugin->lookup($source_scfg->{type}); > +my ($vtype) = $source_plugin->parse_volname($source_volname); > + > +die "source storage '$source_storeid' does not support volumes of type > '$vtype'\n" > + if !$source_scfg->{content}->{$vtype}; > + > +my $target_scfg = PVE::Storage::storage_config($cfg, $target_storeid); > +die "target storage '$target_storeid' does not support volumes of type > '$vtype'\n" > + if !$target_scfg->{content}->{$vtype}; > + > +die "use pct move-volume or qm disk move\n" if $vtype eq 'images' || > $vtype eq 'rootdir'; > +die "moving volume of type '$vtype' not implemented\n" > + if ($vtype ne 'iso' && $vtype ne 'vztmpl' && $vtype ne 'snippets'); > + > +PVE::Storage::activate_storage($cfg, $source_storeid); > + > +die "expected storage '$source_storeid' to be path based\n" if > !$source_scfg->{path}; > +my $source_path = $source_plugin->path($source_scfg, $source_volname, > $source_storeid); > +die "$source_path does not exist" if (!-e $source_path); > +my $source_dirname = dirname($source_path); > + > +die "expected storage '$target_storeid' to be path based\n" if > !$target_scfg->{path}; > +my $target_plugin = PVE::Storage::Plugin->lookup($target_scfg->{type}); > +my $target_path = $target_plugin->path($target_scfg, $source_volname, > $target_storeid); > + > +PVE::Storage::activate_storage($cfg, $target_storeid); > +die "$target_path already exists" if (-e $target_path); > + What if a volume with the same name is created after this check, but before the move/copy? I guess you'll need to lock the storage (see e.g. vdisk_alloc() in Storage.pm for comparison). > +my @created_files = (); > + > +eval { > + if ($delete) { > + move($source_path, $target_path) or die "failed to move $vtype: $!"; > + } else { > + copy($source_path, $target_path) or die "failed to copy $vtype: $!"; > + } > +}; > +if (my $err = $@) { > + for my $created_file (@created_files) { > + unlink $created_file or $!{ENOENT} or warn $!; > + } > + die $err; > +} > + > +PVE::Storage::archive_remove($source_path) if $delete; > + > +return; ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH pve-firmware] d/control: add Conflicts and Replaces for firmware-realtek-rtl8723cs-bt
Reported in the community forum [0]. Both firmware files that are in the package firmware-realtek-rtl8723cs-bt [1] > /lib/firmware/rtl_bt/rtl8723cs_xx_config.bin > /lib/firmware/rtl_bt/rtl8723cs_xx_fw.bin are present in pve-firmware since the linux-firmware submodule includes commit ed9c1349 ("rtl_bt: Add firmware file for the the RTL8723CS Bluetooth part"). [0]: https://forum.proxmox.com/threads/132671/post-706514 [1]: https://packages.debian.org/bookworm/all/firmware-realtek-rtl8723cs-bt/filelist Signed-off-by: Fiona Ebner --- debian/control | 2 ++ 1 file changed, 2 insertions(+) diff --git a/debian/control b/debian/control index 1fc601b..57e45c6 100644 --- a/debian/control +++ b/debian/control @@ -31,6 +31,7 @@ Conflicts: firmware-amd-graphics, firmware-qlogic, firmware-ralink, firmware-realtek, + firmware-realtek-rtl8723cs-bt, firmware-siano, firmware-ti-connectivity, Replaces: firmware-amd-graphics, @@ -54,6 +55,7 @@ Replaces: firmware-amd-graphics, firmware-qlogic, firmware-ralink, firmware-realtek, + firmware-realtek-rtl8723cs-bt, firmware-siano, firmware-ti-connectivity, Description: Binary firmware code for the pve-kernel -- 2.39.2 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [RFC container v2 22/25] backup: implement backup for external providers
Am 13.09.24 um 08:19 schrieb Fabian Grünbichler: >> Fiona Ebner hat am 12.09.2024 15:38 CEST geschrieben: >> >> Am 12.09.24 um 14:43 schrieb Fabian Grünbichler: >>> On August 13, 2024 3:28 pm, Fiona Ebner wrote: >>>> + $info->{'firewall-config'} = $firewall_file if -e $firewall_file; >>>> + $info->{'bandwidth-limit'} = $opts->{bwlimit} * 1024 if >>>> $opts->{bwlimit}; >>>> + $backup_provider->backup_container($vmid, $config_file, $id_map, >>>> $findexcl, $info); >>> >>> it might be easier to hide the idmapping from the backup provider? e.g., >>> hand it a idmapped bindmount or something like that? >>> >> >> Yes, that would be nicer. But could that potentially lead to permission >> issues? A mid/long term plan is to have the backup provider code run >> with lower privileges. I suppose to later implement that, the subroutine >> for the provider could run within a matching user namespace too? > > yeah, I think there are a few options here > - run the provider as root-in-user-ns, give it access to the mapped FS (this > is how we do regular backups, but requires some glue code/forking) Gave this a try. Issue is that the backup provider also needs access to the backup target/etc. Can network access also be an issue (I guess it is not for PBS)? E.g. directory example plugin fails with > ERROR: Backup of VM 112 failed - unable to open file > '/mnt/pve/sparschwein/112/lxc-1726484790/guest.conf.tmp.125275' - Permission > denied and Borg plugin fails with > ERROR: Backup of VM 112 failed - mkdir /run/pve-storage-borg-plugin: > Permission denied at /usr/share/perl5/PVE/BackupProvider/Plugin/Borg.pm line > 41 or after switching to /tmp with > ERROR: Backup of VM 112 failed - file '/etc/pve/priv/storage/borg.pw' exists > but open for reading failed - Permission denied Less coupling with the associated storage plugin or a special kind of "unprivileged" storage plugin would help. In PBS we do the storage-plugin-related stuff first with root privileges and only run the final pbs-client command in user namespace. Maybe we need something like that here too, a preparatory method run as root that prepares for the unprivileged backup operation? But that makes life more complicated for provider implementers (and also us). > - run the provider as root-on-host, give it access to a reverse-mapped FS > somehow (well, it would be nicer to run the backup code in the userns instead > of as root) I'd try and go with this option for now if that is okay. > - run the provider as root-on-host, give it access to the mapped FS and let > it handle the (un)mapping itself (if they are not familiar with namespaces, > this might go wrong) > > so if we find a generic way to do the first variant, we are both closer to > how we do backups, and err on the side of caution w.r.t. context of execution. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [RFC container v2 23/25] backup: implement restore for external providers
Am 13.09.24 um 08:35 schrieb Fabian Grünbichler: > >> Fiona Ebner hat am 12.09.2024 16:08 CEST geschrieben: >> >> >> Am 12.09.24 um 15:56 schrieb Fiona Ebner: >>> Am 12.09.24 um 14:43 schrieb Fabian Grünbichler: >>>> On August 13, 2024 3:28 pm, Fiona Ebner wrote: >>>>> + } elsif ($mechanism eq 'directory') { >>>>> + my $directory = $info->{'archive-directory'} >>>>> + or die "did not get path to archive directory from backup >>>>> provider\n"; >>>>> + die "not a directory '$directory'" if !-d $directory; >>>>> + >>>>> + my $rsync = ['rsync', '--stats', '-h', '-X', '-A', '--numeric-ids', >>>>> '-aH', '--delete', >>>>> + '--no-whole-file', '--sparse', '--one-file-system', >>>>> '--relative']; >>>>> + push $rsync->@*, '--bwlimit', $bwlimit if $bwlimit; >>>>> + push $rsync->@*, "${directory}/./", $rootdir; >>>> >>>> and this as well? >>>> >>> >>> Good catch, will fix! >>> >> >> Hmm, then rsync won't be able to access the source (for my Borg example) >> anymore :/ >> >> WARN: rsync: [sender] change_dir >> "/run/pve-storage-borg-plugin/pve-lxc-111-2024-08-13T09:34:25Z.restore-container/filesystem" >> failed: Permission denied (13) >> >> Wit restore_tar_archive we stream the contents via stdin, can't do that >> here. But maybe some kind of bind mount to make it accessible? > > or rsync-on-host piped to rsync-in-ns ? haven't tried though Would that require setting up an rsync daemon process? Or how would you achieve the split? The man page says that --server/sender should not be used: > INTERNAL OPTIONS >The options --server and --sender are used internally by rsync, and > should never be typed by a user under normal circumstances. Some awareness > of these options may be needed in certain sce‐ >narios, such as when setting up a login that can only run an rsync > command. For instance, the support directory of the rsync distribution has > an example script named rrsync (for restricted >rsync) that can be used with a restricted ssh login. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] Bug 2582 roadmap
Hi Pavel, Am 10.09.24 um 12:18 schrieb Pavel Tide via pve-devel: > Hi Proxmox team, > > Would you provide any delivery estimates on this item? > https://bugzilla.proxmox.com/show_bug.cgi?id=2582 > > As far as I understand it's been implemented already, but currently stays in > the development branch - our lab is up to date and yet we don't see how we > can create a separate non-root account to work with PVE cluster. > a patch series had been proposed, but the implementation was not finished AFAIK, see Fabian's review[0]. Since Oguz left the company, nobody else has picked up work on the series yet. Maybe you can describe what exactly your use case is, which privileges you'd need in particular. Of course, proposing patches for what you need (or a rebased version of Oguz's full series) is welcome too :) [0]: https://lore.proxmox.com/pve-devel/1658908014.zeyifvlr1o.astr...@nora.none/ Best Regards, Fiona ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [RFC container v2 23/25] backup: implement restore for external providers
Am 12.09.24 um 15:56 schrieb Fiona Ebner: > Am 12.09.24 um 14:43 schrieb Fabian Grünbichler: >> On August 13, 2024 3:28 pm, Fiona Ebner wrote: >>> + } elsif ($mechanism eq 'directory') { >>> + my $directory = $info->{'archive-directory'} >>> + or die "did not get path to archive directory from backup >>> provider\n"; >>> + die "not a directory '$directory'" if !-d $directory; >>> + >>> + my $rsync = ['rsync', '--stats', '-h', '-X', '-A', '--numeric-ids', >>> '-aH', '--delete', >>> + '--no-whole-file', '--sparse', '--one-file-system', >>> '--relative']; >>> + push $rsync->@*, '--bwlimit', $bwlimit if $bwlimit; >>> + push $rsync->@*, "${directory}/./", $rootdir; >> >> and this as well? >> > > Good catch, will fix! > Hmm, then rsync won't be able to access the source (for my Borg example) anymore :/ WARN: rsync: [sender] change_dir "/run/pve-storage-borg-plugin/pve-lxc-111-2024-08-13T09:34:25Z.restore-container/filesystem" failed: Permission denied (13) Wit restore_tar_archive we stream the contents via stdin, can't do that here. But maybe some kind of bind mount to make it accessible? ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [RFC container v2 23/25] backup: implement restore for external providers
Am 12.09.24 um 14:43 schrieb Fabian Grünbichler: > On August 13, 2024 3:28 pm, Fiona Ebner wrote: >> @@ -118,6 +136,55 @@ sub restore_tar_archive { >> die $err if $err && !$no_unpack_error; >> } >> >> +sub restore_external_archive { >> +my ($backup_provider, $storeid, $volname, $rootdir, $conf, >> $no_unpack_error, $bwlimit) = @_; >> + >> +my ($mechanism, $vmtype) = >> $backup_provider->restore_get_mechanism($volname, $storeid); >> +die "cannot restore non-LXC guest of type '$vmtype'\n" if $vmtype ne >> 'lxc'; >> + >> +my $info = $backup_provider->restore_container_init($volname, $storeid, >> {}); >> +eval { >> +if ($mechanism eq 'tar') { >> +my $tar_path = $info->{'tar-path'} >> +or die "did not get path to tar file from backup provider\n"; >> +die "not a regular file '$tar_path'" if !-f $tar_path; >> +restore_tar_archive($tar_path, $rootdir, $conf, $no_unpack_error, >> $bwlimit); > > shouldn't this be `lxc-userns-exec`-ed? > The restore_tar_archive() function does that AFAICS. >> +} elsif ($mechanism eq 'directory') { >> +my $directory = $info->{'archive-directory'} >> +or die "did not get path to archive directory from backup >> provider\n"; >> +die "not a directory '$directory'" if !-d $directory; >> + >> +my $rsync = ['rsync', '--stats', '-h', '-X', '-A', '--numeric-ids', >> '-aH', '--delete', >> +'--no-whole-file', '--sparse', '--one-file-system', >> '--relative']; >> +push $rsync->@*, '--bwlimit', $bwlimit if $bwlimit; >> +push $rsync->@*, "${directory}/./", $rootdir; > > and this as well? > Good catch, will fix! > also, for both tar and rsync we probably need to think about how to > prevent bogus input here (which might be user-creatable if they have > write access to the backup storage) from violating our assumptions.. > What assumptions do you mean exactly? ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [RFC container v2 22/25] backup: implement backup for external providers
Am 12.09.24 um 14:43 schrieb Fabian Grünbichler: > On August 13, 2024 3:28 pm, Fiona Ebner wrote: >> +$info->{'firewall-config'} = $firewall_file if -e $firewall_file; >> +$info->{'bandwidth-limit'} = $opts->{bwlimit} * 1024 if >> $opts->{bwlimit}; >> +$backup_provider->backup_container($vmid, $config_file, $id_map, >> $findexcl, $info); > > it might be easier to hide the idmapping from the backup provider? e.g., > hand it a idmapped bindmount or something like that? > Yes, that would be nicer. But could that potentially lead to permission issues? A mid/long term plan is to have the backup provider code run with lower privileges. I suppose to later implement that, the subroutine for the provider could run within a matching user namespace too? ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [RFC qemu-server v2 21/25] backup: implement restore for external providers
Am 12.09.24 um 14:44 schrieb Fabian Grünbichler: > On August 13, 2024 3:28 pm, Fiona Ebner wrote: >> + >> +# allocate volumes >> +my $map = $restore_allocate_devices->($storecfg, $virtdev_hash, $vmid); >> + >> +for my $virtdev (sort keys $virtdev_hash->%*) { >> +my $d = $virtdev_hash->{$virtdev}; >> +next if $d->{is_cloudinit}; # no need to restore cloudinit >> + >> +my $info = >> +$backup_provider->restore_vm_volume_init($volname, $storeid, >> $d->{devname}, {}); >> +my $source_path = $info->{'qemu-img-path'} >> +or die "did not get source image path from backup provider\n"; >> +eval { >> +qemu_img_convert( >> +$source_path, $d->{volid}, $d->{size}, undef, 0, >> $options->{bwlimit}); >> +}; > > this definitely needs to get a call to file_size_info with import > hardening patches applied ;) > Sure, added a reminder to myself for now :) ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [RFC storage v2 10/25] plugin: introduce new_backup_provider() method
Am 12.09.24 um 14:43 schrieb Fabian Grünbichler: > On August 13, 2024 3:28 pm, Fiona Ebner wrote: >> >> Container mechanism 'directory': >> >> The backup provider gives the path to a directory with the full >> filesystem structure of the container. >> >> Container mechanism 'directory': >> >> The backup provider gives the path to a (potentially compressed) tar >> archive with the full filesystem structure of the container. > > same as in the cover letter ;) base on the code here I assume the second > one should be tar. I wonder whether just tar wouldn't be enough (easier > to not get ACLs/xattrs/ownership/.. right)? > Yes, should be tar, will fix! I'd guess it is more convenient for many providers to expose (a FUSE mount of) a directory. Using tar can mean more work. E.g. with Borg, mounting the archive seems much cheaper than creating the tar. It's also that the archive looks like: guest.config firewall.config filesystem/ and while borg has "export-tar" where one can specify specific paths, the tar will still contain the "filesystem/" prefix. Not sure if there is an easy way to get rid of that. >> diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm >> index 6444390..d5b76ae 100644 >> --- a/src/PVE/Storage/Plugin.pm >> +++ b/src/PVE/Storage/Plugin.pm >> @@ -1755,6 +1755,21 @@ sub rename_volume { >> return "${storeid}:${base}${target_vmid}/${target_volname}"; >> } >> >> +# Used by storage plugins for external backup providers. See >> PVE::BackupProvider::Plugin for the API >> +# the provider needs to implement. >> +# >> +# $scfg - the storage configuration >> +# $storeid - the storage ID >> +# $log_function($log_level, $message) - this log function can be used to >> write to the backup task >> +# log in Proxmox VE. $log_level is 'info', 'warn' or 'err', $message is >> the message to be printed. >> +# >> +# Returns a blessed reference to the backup provider class. >> +sub new_backup_provider { >> +my ($class, $scfg, $storeid, $log_function) = @_; >> + >> +return; >> +} > > would it maybe make sense to make this a "die implement me" and make the > opt-in via the storage plugin features? it would be more in line with > what we do in other parts and less subtle.. > We don't have a method for storage plugin features yet, only volume_has_feature() and the stand-alone storage_can_replicate(). We could generalize (and deprecate) the latter though. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH v2 container] fix #3367: prevent adding bind and device mountpoints to template
There is a check in the clone API call that prohibits cloning when a bind or device mountpoint is present. Converting to a template also will fail when such a mountpoint is present. In particular, that failure will also trigger at the end of a restore operation, because restore will also try to convert to a template again. Add a safeguard so that users won't get into the problematic situation in the first place. Signed-off-by: Fiona Ebner --- Changes in v2: * different approach with safeguard instead of silently dropping bind mounts during conversion to template Previous discussion: https://lore.proxmox.com/pve-devel/20210402123636.27037-1-f.eb...@proxmox.com/T/#u src/PVE/LXC/Config.pm | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm index ce64c4c..09687ab 100644 --- a/src/PVE/LXC/Config.pm +++ b/src/PVE/LXC/Config.pm @@ -1181,7 +1181,13 @@ sub update_pct_config { if ($opt =~ m/^mp(\d+)$/ || $opt eq 'rootfs') { $class->check_protection($conf, "can't update CT $vmid drive '$opt'"); my $mp = $class->parse_volume($opt, $value); - $check_content_type->($mp) if ($mp->{type} eq 'volume'); + if ($mp->{type} eq 'volume') { + $check_content_type->($mp); + } elsif ($class->is_template($conf)) { + # TODO allow bind mounts once admin-defined bind mount sources and corresponding ACL + # support is implemented + die "cannot add mountpoint '$opt' of type '$mp->{type}' to template\n"; + } } elsif ($opt eq 'hookscript') { PVE::GuestHelpers::check_hookscript($value); } elsif ($opt eq 'nameserver') { -- 2.39.2 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [RFC v2 qemu] savevm-async: reuse migration blocker check for snapshots/hibernation
Same rationale as with upstream QEMU commit 5aaac46793 ("migration: savevm: consult migration blockers"), migration and (async) snapshot are essentially the same operation and thus snapshot also needs to check for migration blockers. For example, this catches passed-through PCI devices, where the driver does not support migration. However, the commit notes: > There is really no difference between live migration and savevm, except > that savevm does not require bdrv_invalidate_cache to be implemented > by all disks. However, it is unlikely that savevm is used with anything > except qcow2 disks, so the penalty is small and worth the improvement > in catching bad usage of savevm. and for Proxmox VE, suspend-to-disk with VMDK does use savevm-async and would be broken by simply using migration_is_blocked(). To keep this working, introduce a new helper that filters blockers with the prefix used by the VMDK migration blocker. The function qemu_savevm_state_blocked() is called as part of savevm_async_is_blocked() so no check is lost with this patch. The helper is declared in migration/migration.c to be able to access the 'migration_blockers'. The VMDK blocker message is declared via a '#define', because using a 'const char*' led to the linker to complain about multiple declarations. The message does not include the reference to the block node anymore, but it would've just been a generated one like '#block334' in any case, which didn't help users identify the VMDK disk before the change either. Note, this also "breaks" snapshot and hibernate with VNC clipboard by preventing it. Previously, this would "work", because the Proxmox VE API has no check yet, but the clipboard will be broken after rollback, in the sense that it cannot be used anymore, not just lost contents. So some users might consider adding the check here a breaking change even if it's technically correct to prevent snapshot and hibernate with VNC clipboard. But other users might rightfully complain about broken clipboard. And again, the check also prevents blockers from passed-through PCI devices, etc. so it seems worth tolerating that breakage. Signed-off-by: Fiona Ebner --- Changes in v2: * hard-code blocker message to get a conflict if it changes upstream. * avoid making helper more general than it needs to be (had a parameter for matching the prefix of blocker messages). block/vmdk.c| 4 +--- include/migration/blocker.h | 2 ++ migration/migration.c | 24 migration/migration.h | 1 + migration/savevm-async.c| 2 +- 5 files changed, 29 insertions(+), 4 deletions(-) diff --git a/block/vmdk.c b/block/vmdk.c index 3b82979fdf..eb15109620 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -1403,9 +1403,7 @@ static int vmdk_open(BlockDriverState *bs, QDict *options, int flags, qemu_co_mutex_init(&s->lock); /* Disable migration when VMDK images are used */ -error_setg(&s->migration_blocker, "The vmdk format used by node '%s' " - "does not support live migration", - bdrv_get_device_or_node_name(bs)); +error_setg(&s->migration_blocker, "%s", MIGRATION_BLOCKER_VMDK); ret = migrate_add_blocker_normal(&s->migration_blocker, errp); if (ret < 0) { goto fail; diff --git a/include/migration/blocker.h b/include/migration/blocker.h index a687ac0efe..f36bfb2df1 100644 --- a/include/migration/blocker.h +++ b/include/migration/blocker.h @@ -18,6 +18,8 @@ #define MIG_MODE_ALL MIG_MODE__MAX +#define MIGRATION_BLOCKER_VMDK "The vmdk format used by a disk does not support live migration" + /** * @migrate_add_blocker - prevent all modes of migration from proceeding * diff --git a/migration/migration.c b/migration/migration.c index b8d7e471a4..b18c3720b2 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1913,6 +1913,30 @@ bool migration_is_blocked(Error **errp) return false; } +bool savevm_async_is_blocked(Error **errp) +{ +GSList *blockers = migration_blockers[migrate_mode()]; + +if (qemu_savevm_state_blocked(errp)) { +return true; +} + +/* + * The limitation for VMDK images only applies to live-migration, not + * snapshots, see commit 5aaac46793 ("migration: savevm: consult migration + * blockers"). + */ +while (blockers) { +if (strcmp(error_get_pretty(blockers->data), MIGRATION_BLOCKER_VMDK)) { +error_propagate(errp, error_copy(blockers->data)); +return true; +} +blockers = g_slist_next(blockers); +} + +return false; +} + /* Returns true if continue to migrate, or false if error detected */ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
Re: [pve-devel] [PATCH v2 common] REST environment: warn helpers: also log messages to syslog
Ping Am 12.06.24 um 11:51 schrieb Fiona Ebner: > Ping > > Am 09.02.24 um 15:15 schrieb Fiona Ebner: >> for better visibility. When not in a task, warnings from these helpers >> are only logged to STDERR, which is particularly unhelpful in case of >> daemons. This is the main motivation behind this change. >> >> For tasks, warnings from these helpers are already more visible on the >> UI side, but when looking at the syslog, one can only see the warning >> count from the task right now, not the actual messages. This is >> another reason in favor of the change. >> >> Reported-by: Friedrich Weber >> Suggested-by: Thomas Lamprecht >> Signed-off-by: Fiona Ebner >> --- >> >> Changes in v2: >> * Go with the approach suggested by Thomas. >> >> src/PVE/RESTEnvironment.pm | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm >> index 191c6eb..bfde7e6 100644 >> --- a/src/PVE/RESTEnvironment.pm >> +++ b/src/PVE/RESTEnvironment.pm >> @@ -723,6 +723,7 @@ sub log_warn { >> $rest_env->warn($message); >> } else { >> chomp($message); >> +syslog('warning', "%s", $message); >> print STDERR "WARN: $message\n"; >> } >> } >> @@ -732,6 +733,7 @@ sub warn { >> >> chomp($message); >> >> +syslog('warning', "%s", $message); >> print STDERR "WARN: $message\n"; >> >> $self->{warning_count}++; > > > ___ > pve-devel mailing list > pve-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > > ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH pve-storage] qcow2 format: enable subcluster allocation by default
Am 03.07.24 um 16:24 schrieb Alexandre Derumier via pve-devel: > > > extended_l2 is an optimisation to reduce write amplification. > Currently,without it, when a vm write 4k, a full 64k cluster s/write/writes/ > need to be writen. needs to be written. > > When enabled, the cluster is splitted in 32 subclusters. s/splitted/split/ > > We use a 128k cluster by default, to have 32 * 4k subclusters > > https://blogs.igalia.com/berto/2020/12/03/subcluster-allocation-for-qcow2-images/ > https://static.sched.com/hosted_files/kvmforum2020/d9/qcow2-subcluster-allocation.pdf > > some stats for 4k randwrite benchmark Can you please share the exact command you used? What kind of underlying disks do you have? > > Cluster size Without subclusters With subclusters > 16 KB 5859 IOPS 8063 IOPS > 32 KB 5674 IOPS 11107 IOPS > 64 KB 2527 IOPS 12731 IOPS > 128 KB 1576 IOPS 11808 IOPS > 256 KB 976 IOPS 9195 IOPS > 512 KB 510 IOPS 7079 IOPS > 1 MB 448 IOPS 3306 IOPS > 2 MB 262 IOPS 2269 IOPS > How does read performance compare for you (with 128 KiB cluster size)? I don't see any noticeable difference in my testing with an ext4 directory storage on an SSD, attaching the qcow2 images as SCSI disks to the VM, neither for reading nor writing. I only tested without your change and with your change using 4k (rand)read and (rand)write. I'm not sure we should enable this for everybody, there's always a risk to break stuff with added complexity. Maybe it's better to have a storage configuration option that people can opt-in to, e.g. qcow2-create-opts extended_l2=on,cluster_size=128k If we get enough positive feedback, we can still change the default in a future (major) release. > Signed-off-by: Alexandre Derumier > --- > src/PVE/Storage/Plugin.pm | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm > index 6444390..31b20fe 100644 > --- a/src/PVE/Storage/Plugin.pm > +++ b/src/PVE/Storage/Plugin.pm > @@ -561,7 +561,7 @@ sub preallocation_cmd_option { > die "preallocation mode '$prealloc' not supported by format '$fmt'\n" > if !$QCOW2_PREALLOCATION->{$prealloc}; > > - return "preallocation=$prealloc"; > + return "preallocation=$prealloc,extended_l2=on,cluster_size=128k"; Also, it doesn't really fit here in the preallocation helper as the helper is specific to that setting. > } elsif ($fmt eq 'raw') { > $prealloc = $prealloc // 'off'; > $prealloc = 'off' if $prealloc eq 'metadata'; > -- > 2.39.2 > > ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] qcow2 internal snapshot: 4k write speed reduce to 100~200 iops on ssd ?
Hi, Am 30.08.24 um 18:01 schrieb DERUMIER, Alexandre via pve-devel: > Hi, > > I was doing tests with gfs2 && ocfs2, > > and I have notice 4k randwrite iops going from 2 iops to > 100~200iops when a snapshot is present. > > I thinked it was related to gfs2 && ocfs2 allocation, > but I can reproduce too with a simple qcow2 file on > a local ssd drive. > Is the performance drop also this big with the local storage? > is it expected ? > > (I don't have use qcow2 snapshot since 10year, so I really don't > remember performance). > A performance drop is of course expected, because AFAIU it needs to do COW for the sectors that the snapshot references, but the drop does sound rather dramatic in your case. The performance should be better again once the COW is done, e.g. when running another test afterwards. I quickly tested in a nested VM with qcow2 disk on a directory storage and --numjobs=4: ran test, got ~33k IOPS made a snapshot ran test, got ~20k IOPS (it starts out very slow and then recovers) ran test, got ~33k IOPS again Best Regards, Fiona > > Using external snapshot file, I'm around 1iops. > > test inside the vm: > > fio --filename=/dev/sdX --ioengine=libaio --direct 1 --bs=4k -- > rw=randwrite --numjobs=64 > > ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH storage 1/2] fix #3873: btrfs: check for correct subvolume taking snapshot
Am 09.07.24 um 13:51 schrieb Maximiliano Sandoval: > Suppose we are doing a snapshot of disk 0 for VM 100. The > dir_glob_foreach runs over $path=/subvolume/images/100, lists all > snapshot names and appends their names to the path of the disk, e.g. > /subvolume/images/vm-100-disk-0@SNAP_NAME, but the original directory > $path might contain a second disk `vm-100-disk-1` which is also listed > by the dir_glib_foreach. > > The patch skips images which reference other disks. > > Signed-off-by: Maximiliano Sandoval > --- > src/PVE/Storage/BTRFSPlugin.pm | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/src/PVE/Storage/BTRFSPlugin.pm b/src/PVE/Storage/BTRFSPlugin.pm > index 42815cb..dc0420d 100644 > --- a/src/PVE/Storage/BTRFSPlugin.pm > +++ b/src/PVE/Storage/BTRFSPlugin.pm > @@ -767,6 +767,9 @@ sub volume_export { > push @$cmd, (map { "$path\@$_" } ($with_snapshots // [])->@*), $path; > } else { > dir_glob_foreach(dirname($path), $BTRFS_VOL_REGEX, sub { I feel like this would be a bit nicer (and slightly more robust) if it used foreach_subvol() and would check the basename like done in free_image(). Or even better, introduce a new helper that iterates over all snapshots of a specific volume to not repeat ourselves here and in free_image(). Since foreach_subvol() is only used once (i.e. in free_image()), we could even replace that. There's another pre-existing thing that's a bit confusing, which is that BTRFS_VOL_REGEX only matches snapshot volumes and not the actual volume itself AFAICS. > + if (index($path, $_[1]) < 0) { > + return > + } > push @$cmd, "$path\@$_[2]" if !(defined($snapshot) && $_[2] eq > $snapshot); > }); > } ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH storage v3 2/2] d/control: dependencies: add bzip2, gzip, lzop, zstd
Am 10.09.24 um 14:52 schrieb Fabian Grünbichler: >> Fiona Ebner hat am 10.09.2024 14:24 CEST geschrieben: >> Am 12.08.24 um 13:44 schrieb Maximiliano Sandoval: >>> The decompressor_info method calls binaries provided by these packages >>> so they are (alphabetically) added explicitly as dependencies. >>> >>> To avoid a build-time error >>> >>> E: libpve-storage-perl: >>> depends-on-essential-package-without-using-version Depends: gzip >>> >>> the current minor version available in bullseye was set for gzip. >>> >> >> Since I didn't get that error, I'm interested: what command are you >> building with? Why the version for Bullseye? > > it's wrong in any case - gzip is Essential (installed on all Debian systems). > you don't ever need a dependency (build or otherwise) for it. this is what > lintian is trying to tell you - it's wrong to depend on gzip without a > version constraint (and okay to use it without a dependency). *only* if you > have a certain version requirement you need to add the dependency (with the > proper version qualifier) to be able to encode that version requirement. > I see, thank you for the explanation! >> I guess many systems already have bzip2 installed, but do we really want >> to require it for everybody? Or should we rather keep it optional (could >> be a Recommends or Suggests dependency) and add a clean error if not >> installed? > > I am not sure about *how* many (most dev machines yes, since it's > build-essential and required by a lot of build-related packages ;)). it's > only 124kB though, and libbz2-1.0 is always installed (it's transitively > Essential via dpkg), so it doesn't pull in anything that isn't installed on > all standard Debian-based systems anyway since it only depends on that and > libc6. I don't really see a reason to use Recommends or Suggests here, even > if bz2 compress isos are not that common. Okay :) ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH storage v3 1/2] fix #5267: storage: add bzip2 support
Am 12.08.24 um 13:44 schrieb Maximiliano Sandoval: > A popular ISO compressed exclusively with bz2 is OPNsense [2]. > > Since this requires adding `bz2` to the list of known compression > formats we add decompression methods for vmz a tar. Typo here I presume. > diff --git a/src/test/list_volumes_test.pm b/src/test/list_volumes_test.pm > index d155cb9..ba959ff 100644 > --- a/src/test/list_volumes_test.pm > +++ b/src/test/list_volumes_test.pm > @@ -189,6 +189,7 @@ my @tests = ( > "$storage_dir/dump/vzdump-lxc-16112-2020_03_30-21_49_30.tar.gz", > "$storage_dir/dump/vzdump-lxc-16112-2020_03_30-21_49_30.tar.zst", > "$storage_dir/dump/vzdump-lxc-16112-2020_03_30-21_59_30.tgz", > + "$storage_dir/dump/vzdump-openvz-16112-2020_03_30-21_39_30.tar.bz2", > ], > expected => [ > { > @@ -237,6 +238,15 @@ my @tests = ( > 'vmid'=> '16112', > 'volid' => > 'local:backup/vzdump-lxc-16112-2020_03_30-21_59_30.tgz', > }, > + { > + 'content' => 'backup', > + 'ctime' => 1585604370, > + 'format' => 'tar.bz2', > + 'size'=> DEFAULT_SIZE, > + 'subtype' => 'openvz', > + 'vmid'=> '16112', > + 'volid' => > 'local:backup/vzdump-openvz-16112-2020_03_30-21_39_30.tar.bz2', > + }, > ], > }, > { > @@ -440,7 +450,6 @@ my @tests = ( > > "$storage_dir/template/cache/debian-10.0-standard_10.0-1_amd64.zip.gz", > > "$storage_dir/template/cache/debian-10.0-standard_10.0-1_amd64.tar.bz2", Why is this one still listed here in the non-matching test cases? Shouldn't this be recognized too now? > "$storage_dir/private/subvol-19254-disk-0/19254", > - "$storage_dir/dump/vzdump-openvz-16112-2020_03_30-21_39_30.tar.bz2", > "$storage_dir/dump/vzdump-openvz-16112-2020_03_30-21_39_30.zip.gz", > "$storage_dir/dump/vzdump-openvz-16112-2020_03_30-21_39_30.tgz.lzo", > "$storage_dir/dump/vzdump-qemu-16110-2020_03_30-21_12_40.vma.xz", ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH storage v3 2/2] d/control: dependencies: add bzip2, gzip, lzop, zstd
Am 12.08.24 um 13:44 schrieb Maximiliano Sandoval: > The decompressor_info method calls binaries provided by these packages > so they are (alphabetically) added explicitly as dependencies. > > To avoid a build-time error > > E: libpve-storage-perl: > depends-on-essential-package-without-using-version Depends: gzip > > the current minor version available in bullseye was set for gzip. > Since I didn't get that error, I'm interested: what command are you building with? Why the version for Bullseye? I guess many systems already have bzip2 installed, but do we really want to require it for everybody? Or should we rather keep it optional (could be a Recommends or Suggests dependency) and add a clean error if not installed? ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH guest-common 1/1] storage tunnel: check just-imported image files
Am 09.08.24 um 13:22 schrieb Fabian Grünbichler: > remote migration requires elevated privileges already and can thus only be > triggered by trusted sources, but an additional safeguard of checking the > image > for external references doesn't hurt. > > Signed-off-by: Fabian Grünbichler > --- > > Notes: > requires pve-storage change to actually have an effect > > src/PVE/StorageTunnel.pm | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/src/PVE/StorageTunnel.pm b/src/PVE/StorageTunnel.pm > index c880889..21780bd 100644 > --- a/src/PVE/StorageTunnel.pm > +++ b/src/PVE/StorageTunnel.pm > @@ -280,6 +280,13 @@ sub handle_query_disk_import { > delete $state->{sockets}->{$unix}; > delete $state->{disk_import}; > $state->{cleanup}->{volumes}->{$volid} = 1; > + my $cfg = PVE::Storage::config(); > + my ($storage, $volume) = PVE::Storage::parse_volume_id($volid); > + my $scfg = PVE::Storage::storage_config($cfg, $storage); > + # check imported image for bad references > + if ($scfg->{path}) { Seems a bit like a plugin method would be nicest here. But I guess we can still add that if/when it becomes necessary for non-path-based plugins to do such checks. > + PVE::Storage::file_size_info(PVE::Storage::path($cfg, $volid), > undef, 1); Isn't this broken because PVE::Storage::path() uses wantarray? At least a small test program does "break" in the same scenario: > febner@enia ~ % cat asdf.pm > use strict; > use warnings; > > sub a { > return wantarray ? (1, 2) : 1; > } > > sub b { > my ($a, $b, $c) = @_; > > print "$a $b $c\n"; > } > > b(a(), 3, 4); > febner@enia ~ % perl asdf.pm > 1 2 3 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH storage 1/1] file_size_info: implement untrusted mode
Am 09.08.24 um 13:22 schrieb Fabian Grünbichler: > this allows checking some extra attributes for images which come from a > potentially malicious source. > > since file_size_info is not part of the plugin API, no API bump is needed. if > desired, a similar check could also be implemented in volume_size_info, which > would entail bumping both APIVER and APIAGE (since the additional parameter > would make checking untrusted volumes opt-in for external plugins). > > Signed-off-by: Fabian Grünbichler Reviewed-by: Fiona Ebner > @@ -977,12 +987,27 @@ sub file_size_info { > There's a new early return now because of commit 851cc07 ("base plugin: do not decode the empty string") where we should also die if $untrusted. Although it is very unlikely that we'll reach that since you do not set a timeout at the call sites with $untrusted set. > my $info = eval { decode_json($json) }; > if (my $err = $@) { ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH qemu-server 1/1] disk import: add additional safeguards for imported image files
Am 09.08.24 um 13:22 schrieb Fabian Grünbichler: > creating non-raw disk images with arbitrary content is only possible with raw > access to the storage, but checking for references to external files doesn't > hurt. > > Signed-off-by: Fabian Grünbichler > --- > > Notes: > requires pve-storage change to actually have an effect > > PVE/API2/Qemu.pm | 15 --- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm > index d25a79fe..30839745 100644 > --- a/PVE/API2/Qemu.pm > +++ b/PVE/API2/Qemu.pm > @@ -412,12 +412,14 @@ my sub create_disks : prototype($$) { > > $needs_creation = $live_import; > > - if (PVE::Storage::parse_volume_id($source, 1)) { # PVE-managed > volume > + if (my ($source_storage, $source_volid) = > PVE::Storage::parse_volume_id($source, 1)) { # PVE-managed volume Nit: line too long and honestly, becomes a bit hard to read. > if ($live_import && $ds ne 'efidisk0') { > my $path = PVE::Storage::path($storecfg, $source) > or die "failed to get a path for '$source'\n"; > $source = $path; > - ($size, my $source_format) = > PVE::Storage::file_size_info($source); > + # check potentially untrusted image file! > + ($size, my $source_format) = > PVE::Storage::file_size_info($source, undef, 1); > + > die "could not get file size of $source\n" if !$size; > $live_import_mapping->{$ds} = { > path => $source, > @@ -431,6 +433,12 @@ my sub create_disks : prototype($$) { > format => $disk->{format}, > }; > > + my $scfg = PVE::Storage::storage_config($storecfg, > $source_storage); > + # check potentially untrusted image file! > + if ($scfg->{path}) { > + > PVE::Storage::file_size_info(PVE::Storage::path($storecfg, $source), undef, > 1); This call is probably broken, see reply to guest-common patch. Nit: I'd move this to the beginning of the branch to avoid having it between $dest_info assignments. > + } > + > $dest_info->{efisize} = > PVE::QemuServer::get_efivars_size($conf, $disk) > if $ds eq 'efidisk0'; > > @@ -441,7 +449,8 @@ my sub create_disks : prototype($$) { > } > } else { > $source = PVE::Storage::abs_filesystem_path($storecfg, > $source, 1); > - ($size, my $source_format) = > PVE::Storage::file_size_info($source); > + # check potentially untrusted image file! > + ($size, my $source_format) = > PVE::Storage::file_size_info($source, undef, 1); > die "could not get file size of $source\n" if !$size; > > if ($live_import && $ds ne 'efidisk0') { ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH manager] ui: qemu: hardware: fix permission check for adding tpmstate volume
Am 05.08.24 um 15:33 schrieb Friedrich Weber: > Previously, the "Add -> TPM State" menu item in the GUI was disabled > if the user did not have Sys.Console privileges. This deviated from > the permission check in the backend, which does not require > Sys.Console but (among others) VM.Config.Disk. > > Fix this inconsistency by checking for VM.Config.Disk in the frontend > instead of Sys.Console. > > Reported in enterprise support. > > Signed-off-by: Friedrich Weber applied, thanks! ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH storage] base plugin: do not decode the empty string
Am 02.09.24 um 14:47 schrieb Maximiliano Sandoval: > If the json was empty, for example if the qemu-img command times out, a > message > > warn "could not parse qemu-img info command output for '$filename' - > $err\n"; > > would have been printed. > > This message could lead one to think the issue lies in the contents of > the json, even if the previous warning said that there was a timeout. > I know it's already applied, so mentioning it for the future: the commit title (and actually also message) really should've included the context, e.g. "when querying file size". ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH common v4 3/7] inotify: interfaces: make sure bridge_vids use space as separator
Am 29.07.24 um 13:55 schrieb Aaron Lauterer: > Because the API accepts multiple possible list separators we need to > make sure that we write the bridge_vids with space as separator, no > matter which separator was used when passing it to the API. > Nit: The reason why it's necessary to write out with space as the separator is missing. Who gets confused otherwise ;)? Should we do the conversion earlier, i.e. in the API endpoint when writing the interface parameters? Like that we don't end up carrying the invalid value around. > Signed-off-by: Aaron Lauterer > Tested-By: Stefan Hanreich > Reviewed-by: Shannon Sterz Still: Reviewed-by: Fiona Ebner ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH common v4 2/7] fix #3893: network: add vlan id and range parameter definitions
Am 29.07.24 um 13:55 schrieb Aaron Lauterer: > @@ -595,6 +601,34 @@ sub pve_verify_iface { > return $id; > } > > +# vlan id (vids), single or range > +register_format('pve-vlan-id-or-range', \&pve_verify_vlan_id_or_range); > +sub pve_verify_vlan_id_or_range { > +my ($vlan, $noerr) = @_; > + > +my $check_vid = sub { > + my $tag = shift; > + if ( $tag < 2 || $tag > 4094) { Style nit: extra space after parenthesis > + return undef if $noerr; > + die "invalid VLAN tag '$tag'\n"; > + } > +}; > + > +if ($vlan !~ m/^(\d+)(?:-(\d+))?$/) { > + return undef if $noerr; > + die "invalid VLAN configuration '$vlan'\n"; > +} > +my $start = $1; > +my $end = $2; > +return undef if (!defined $check_vid->($start)); check_vid does not explicitly return anything in the success case, so this check seems very brittle and I guess relies on implicit return? Style nits: parentheses for post-if missing parentheses for defined() perlcritic prefers 'return' over 'return undef' > +if ($end) { > + return undef if (!defined $check_vid->($end)); > + die "VLAN range must go from lower to higher tag '$vlan'" if $start >= > $end && !$noerr; What if $noerr is set, but $start >= $end? You want to have a 'return' then, right? > +} > + > +return $vlan; > +} > + > # general addresses by name or IP > register_format('address', \&pve_verify_address); > sub pve_verify_address { ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH common v4 1/7] tools: add check_list_empty function
Am 29.07.24 um 13:55 schrieb Aaron Lauterer: > In some situations we don't want a total empty list. I opted for a > dedicated function instead of integrating it as error in the > `split_list` function. It is used in many places and the potential > fallout from unintended behavior changes is too big. > > Signed-off-by: Aaron Lauterer > Tested-By: Stefan Hanreich > Reviewed-by: Shannon Sterz > --- > changes since: v3: none > v2: newly added > > src/PVE/Tools.pm | 8 > 1 file changed, 8 insertions(+) > > diff --git a/src/PVE/Tools.pm b/src/PVE/Tools.pm > index bd305bd..f796bd0 100644 > --- a/src/PVE/Tools.pm > +++ b/src/PVE/Tools.pm > @@ -718,6 +718,14 @@ sub split_list { > return @data; > } > > +sub check_list_empty { > +my ($list) = @_; > +if (scalar(PVE::Tools::split_list($list)) < 1) { > + return 0; > +} > +return 1; > +} This can be very confusing IMHO. Intuitively, I'd expect the expression check_list_empty($list) to be truthy if $list is empty. I'd rather call it list_not_empty. But looking at the caller you introduce later, it might be better to avoid the double negative, flip the truth table and call it list_is_empty. > + > sub trim { > my $txt = shift; > ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH v2 proxmox-apt-hook] initial commit
Many people will use 'upgrade' instead of 'full-upgrade' or 'dist-upgrade' (e.g. [0][1]) despite the documentation explicitly mentioning 'dist-upgrade' [3]. Proxmox projects use different packaging guarantees than Debian (necessary for a rolling release model) and using 'upgrade' can lead to the system being stuck on outdated versions, or in rare cases, even break the system [2]. The match is kept simple, to not accidentally catch things like > -o 'foo=bar upgrade baz' and trip up advanced users. It does not catch invocations with '-y' either, making it less likely to break automated user scripts. Although they should not use 'upgrade' either, it still would be bad to break them. If the risk is still considered too high, this change should wait until a major or at least point release. To avoid false positives, it would be necessary to properly parse options, which is likely not worth the effort. A downside is that the hook is only invoked after the user confirms the upgrade and fetching the packages, but there doesn't seem to be an early enough hook entry that provides access to the command line. Since this is just an additional safety warning to guide new users, it should still be good enough. It is intended that meta-packages for Proxmox projects recommend this package. The same postinst/postrm logic for the hook like in proxmox-ve and apt-listchanges is used to not have disable/re-enable the hook upon removal/re-install of the package. [0]: https://forum.proxmox.com/threads/150217/post-680158 [1]: https://forum.proxmox.com/threads/140580/post-630419 [2]: https://www.reddit.com/r/Proxmox/comments/ujqig9/use_apt_distupgrade_or_the_gui_not_apt_upgrade/ [3]: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#system_software_updates Signed-off-by: Fiona Ebner --- Changes in v2: * Mention that actual breakage is rare, being stuck on outdated versions is much more common. * Do not ask for confirmation, only log the warning. * Split into own package, so it can be re-used by different products. .gitignore | 7 Makefile | 47 +++ debian/apt-hook/10proxmox-apt-hook | 4 +++ debian/apt-hook/proxmox-apt-hook | 52 ++ debian/changelog | 5 +++ debian/control | 17 ++ debian/copyright | 14 debian/docs| 1 + debian/install | 2 ++ debian/postrm | 35 debian/preinst | 13 debian/rules | 8 + debian/source/format | 1 + 13 files changed, 206 insertions(+) create mode 100644 .gitignore create mode 100644 Makefile create mode 100644 debian/apt-hook/10proxmox-apt-hook create mode 100755 debian/apt-hook/proxmox-apt-hook create mode 100644 debian/changelog create mode 100644 debian/control create mode 100644 debian/copyright create mode 100644 debian/docs create mode 100644 debian/install create mode 100644 debian/postrm create mode 100644 debian/preinst create mode 100755 debian/rules create mode 100644 debian/source/format diff --git a/.gitignore b/.gitignore new file mode 100644 index 000..5e6053d --- /dev/null +++ b/.gitignore @@ -0,0 +1,7 @@ +/*.build +/*.buildinfo +/*.changes +/*.deb +/*.dsc +/*.tar* +/proxmox-apt-hook-*/ diff --git a/Makefile b/Makefile new file mode 100644 index 000..e0cd704 --- /dev/null +++ b/Makefile @@ -0,0 +1,47 @@ +include /usr/share/dpkg/default.mk + +PACKAGE=proxmox-apt-hook + +GITVERSION:=$(shell git rev-parse HEAD) + +BUILDDIR ?= $(PACKAGE)-$(DEB_VERSION) +DSC=$(PACKAGE)_$(DEB_VERSION).dsc + +DEB=$(PACKAGE)_$(DEB_VERSION_UPSTREAM_REVISION)_all.deb + +all: $(DEB) + +$(BUILDDIR): debian + rm -rf $@ $@.tmp + mkdir -p $@.tmp/debian + cp -a debian/ $@.tmp/ + echo "git clone git://git.proxmox.com/git/proxmox-apt-hook.git\\ngit checkout $(GITVERSION)" > $@.tmp/debian/SOURCE + mv $@.tmp $@ + +.PHONY: deb +deb: $(DEB) +$(DEB): $(BUILDDIR) + cd $(BUILDDIR); dpkg-buildpackage -b -us -uc + lintian $(DEB) + +.PHONY: dsc +dsc: $(DSC) +$(DSC): $(BUILDDIR) + cd $(BUILDDIR); dpkg-buildpackage -S -us -uc -d + lintian $(DSC) + +.PHONY: sbuild +sbuild: $(DSC) + sbuild $(DSC) + +.PHONY: upload +upload: UPLOAD_DIST ?= $(DEB_DISTRIBUTION) +upload: $(DEB) + tar cf - $(DEB)|ssh repo...@repo.proxmox.com -- upload --product pve --dist $(UPLOAD_DIST) + +.PHONY: distclean +distclean: clean + +.PHONY: clean +clean: + rm -rf *~ $(PACKAGE)-[0-9]*/ $(PACKAGE)*.tar.* *.deb *.dsc *.changes *.build *.buildinfo diff --git a/debian/apt-hook/10proxmox-apt-hook b/debian/apt-hook/10proxmox-apt-hook new file mode 100644 index 000..c4e6978 --- /dev/null +++ b/debian/apt-h
Re: [pve-devel] [RFC qemu-server] apt hook: warn against using 'upgrade' command
Am 09.09.24 um 09:48 schrieb Fabian Grünbichler: >> Thomas Lamprecht hat am 06.09.2024 18:58 CEST >> geschrieben: >> Am 06/09/2024 um 12:40 schrieb Fiona Ebner: >>> Many people will use 'upgrade' instead of 'full-upgrade' or >>> 'dist-upgrade' (e.g. [0][1]) despite the documentation explicitly >>> mentioning 'dist-upgrade' [3]. Proxmox VE uses different packaging >>> guarantees than Debian and using 'upgrade' can lead to a broken >>> system [2]. > > just a slight nit here: you should only end up with a broken system if we > miss properly tracking some inter-package relationship. it can happen happen > (and probably does, from time to time), but in the vast majority of cases > "apt[-get] upgrade" should at most leave you stuck with an outdated system > (with APT telling you that there are still packages to be upgraded), not a > broken one. we did get a lot better about accounting for these things over > the past few years (but of course, we don't have anywhere close to the > infrastructure that Debian has for automated tracking and testing). Okay, will change the commit message in v2. > >>> The match is kept simple, to not accidentally catch things like >>>> -o 'foo=bar upgrade baz' >>> and trip up advanced users. >>> >>> It does not catch invocations with '-y' either, making it less likely >>> to break automated user scripts. Although they should not use >>> 'upgrade' either, it still would be bad to break them. If the risk is >>> still considered too high, this change should wait until a major or >>> at least point release. >>> >>> To avoid false positives, it would be necessary to properly parse >>> options, which is likely not worth the effort. >>> >>> A downside is that the hook is only invoked after the user confirms >>> the upgrade, but there doesn't seem to be an early enough hook entry >>> (DPkg::Pre-Invoke is also too late). Since this is just an additional >>> safety warning to guide new users, it should still be good enough. >>> >>> [0]: https://forum.proxmox.com/threads/150217/post-680158 >>> [1]: https://forum.proxmox.com/threads/140580/post-630419 >>> [2]: >>> https://www.reddit.com/r/Proxmox/comments/ujqig9/use_apt_distupgrade_or_the_gui_not_apt_upgrade/ >>> [3]: >>> https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#system_software_updates >>> >> >> yeah, it's something I considered here and then but never pulled through, >> as it just somehow doesn't feel right... >> >> But it's definitively a real problem, and so I surely won't block this on >> the basis of some gut feeling, I'd rather like to hear Fabian's opinion on >> it. > > given that I also use `apt upgrade` from time to time (habit from being an > unstable user ;)), and that it might alienate power users coming from Debian, > I'd prefer this to be a non-interactive warning with the text "disarmed" a > bit? > > something like > > !! WARNING !! > Since Proxmox VE follows a rolling release model, using 'upgrade' can lead to > a system being stuck on outdated versions, or in rare cases, break upon > upgrading. Use 'dist-upgrade' or 'full-upgrade' instead. > !! WARNING !! > > with or without a prompt (it's a pity that the hook is not executed with the > config before the regular confirmation prompt, else we could just depend on > that)? Sounds good. And since it is so rare, I'd just leave out the additional confirmation prompt then. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH qemu-server 10/10] test: cfg2cmd: don't use QEMU binary version less than 8.0 for tests
Am 06.09.24 um 16:39 schrieb Daniel Kral: > Another small misspelling that could be corrected here: > > s/depreacation/deprecation > > On Tue, 23 Jul 2024 17:25:40 +0200, Fiona Ebner wrote: >> diff --git a/test/cfg2cmd/cputype-icelake-client-deprecation.conf >> b/test/cfg2cmd/cputype-icelake-client-deprecation.conf >> index 523dd275..2f595162 100644 >> --- a/test/cfg2cmd/cputype-icelake-client-deprecation.conf >> +++ b/test/cfg2cmd/cputype-icelake-client-deprecation.conf >> @@ -1,5 +1,4 @@ >> # TEST: test CPU type depreacation for Icelake-Client (never existed in the >> wild) > > Otherwise, I haven't found anything to add to the changes and the test > cases are all passing on my machine as expected. > > Therefore: > > Reviewed-by: Daniel Kral > Tested-by: Daniel Kral > Thank you for the review/testing! Note (for the future): If the tags here are supposed to be for the whole series, it would be customary to add them in a reply to the cover letter. Feel free to send patches for the pre-existing typos you found. You might want to wait until my series got applied to avoid merge-conflicts. If there's going to be a v2 of my series for some reason, I can also add them as follow-ups myself. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH container/manager v2 0/2] add deny read/write options for device passthrough
Am 24.07.24 um 19:18 schrieb Filip Schauer: > Add the deny_read and deny_write options for device passthrough, to > restrict container access to devices. > > This allows for passing through a device in read-only mode without > giving the container full access it. > > Up until now a container with a device passed through to it was granted > full access to that device without an option to restrict that access as > pointed out by @Fiona. > > Changes since v1: > * set default values for deny_read & deny_write > * remove the deny_read checkbox from the UI, since it is expected to > only have a very niche use case. > We could also use dashes instead of underscores, i.e. "deny-read"/"deny-write" as we often do for new properties. Still not fully sure we need deny_read in the backend until somebody complains with a sensible use case, but I guess it doesn't hurt if it's already there. In any case: Reviewed-by: Fiona Ebner Tested-by: Fiona Ebner ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH docs] passthrough: viommu: replace host requirement with reference to pcie passthrough
For reference, this already got applied: https://git.proxmox.com/?p=pve-docs.git;a=commit;h=2173def64373a50a13390ee342fb95abc901f203 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [RFC qemu-server] apt hook: warn against using 'upgrade' command
Sorry, messed up the subject-prefix, since I copy-pasted it in the .git/config and forgot to change. This is for proxmox-ve of course. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [RFC qemu-server] apt hook: warn against using 'upgrade' command
Many people will use 'upgrade' instead of 'full-upgrade' or 'dist-upgrade' (e.g. [0][1]) despite the documentation explicitly mentioning 'dist-upgrade' [3]. Proxmox VE uses different packaging guarantees than Debian and using 'upgrade' can lead to a broken system [2]. The match is kept simple, to not accidentally catch things like > -o 'foo=bar upgrade baz' and trip up advanced users. It does not catch invocations with '-y' either, making it less likely to break automated user scripts. Although they should not use 'upgrade' either, it still would be bad to break them. If the risk is still considered too high, this change should wait until a major or at least point release. To avoid false positives, it would be necessary to properly parse options, which is likely not worth the effort. A downside is that the hook is only invoked after the user confirms the upgrade, but there doesn't seem to be an early enough hook entry (DPkg::Pre-Invoke is also too late). Since this is just an additional safety warning to guide new users, it should still be good enough. [0]: https://forum.proxmox.com/threads/150217/post-680158 [1]: https://forum.proxmox.com/threads/140580/post-630419 [2]: https://www.reddit.com/r/Proxmox/comments/ujqig9/use_apt_distupgrade_or_the_gui_not_apt_upgrade/ [3]: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#system_software_updates Signed-off-by: Fiona Ebner --- debian/apthook/pve-apt-hook | 8 1 file changed, 8 insertions(+) diff --git a/debian/apthook/pve-apt-hook b/debian/apthook/pve-apt-hook index 47629bc..b41d0fe 100755 --- a/debian/apthook/pve-apt-hook +++ b/debian/apthook/pve-apt-hook @@ -55,6 +55,14 @@ my $blank; while (my $line = <$fh>) { chomp $line; + if ($line && $line =~ m/^CommandLine::AsString=apt(-get)?%20upgrade$/) { +$log->("!! WARNING !!\n"); +$log->("Using 'upgrade' can violate packaging assumptions made by Proxmox VE. Use 'dist-upgrade' instead.\n"); +$log->("\n"); +$log->("Press enter to continue, or C^c to abort.\n"); +$cleanup->(0, 1); + } + if (!defined($blank)) { $blank = 1 if !$line; next; -- 2.39.2 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v3 storage] fix #5191: api, cli: implement moving a volume between storages
Am 03.07.24 um 14:59 schrieb Filip Schauer: > Add the ability to move a backup, ISO, container template or snippet > between storages and nodes via an API method. Moving a VMA backup to a > Proxmox Backup Server requires the proxmox-vma-to-pbs package to be > installed. Currently only VMA backups can be moved to a Proxmox Backup > Server and moving backups from a Proxmox Backup Server is not yet > supported. Can we split this? I.e. first add the "easy moves", export/import preparations, then support moving backups without PBS, introduce the decompress_archive_into_pipe() helper, finally support moving VMA to PBS. API/CLI could be separate too. > @@ -483,15 +485,173 @@ __PACKAGE__->register_method ({ > return $upid; > }}); > > +sub volume_move { Should this even be a new top-level method? Or can/should we extend export/import instead, to not only cover guest images? Because with this top-level method we block the way for external storage plugins to support this functionality too if they don't adhere to our assumptions. > +my ($cfg, $source_volid, $target_storeid, $delete) = @_; > + > +my ($source_storeid, $source_volname) = > PVE::Storage::parse_volume_id($source_volid, 0); > + > +die "source and target storage cannot be the same\n" if ($source_storeid > eq $target_storeid); Style nit: superfluous parentheses for post-if (there are other instances of the same below) ---snip 8<--- > + > + if ($vtype eq 'backup' && $target_scfg->{type} eq 'pbs') { IMHO this branch should be factored out into a helper. Either inside PBSPlugin or inside PBSClient, not sure which one is the best fit. ---snip 8<--- > + if (my $err = $@) { > + for my $created_file (@created_files) { > + eval { unlink($created_file) }; > + warn $@ if $@; I think you need to check the return value for unlink. And if it failed, use $! (don't log anything if it's already ENOENT). > + } > + die $err; > + } > + } > + > +PVE::Storage::archive_remove($source_path) if $delete; > +} elsif ($vtype eq 'images') { > + die "use pct move-volume or qm disk move\n"; > +} elsif ($vtype eq 'rootdir') { > + die "cannot move OpenVZ rootdir\n"; Maybe put these on top as early exits? Then you could save one level of indentation. And I'd also catch the case when you encounter a new content type with something like "not implemented yet". Note that content type 'rootdir' is nowadays used for all container images (that's how it's used in the storage configuration and list_images()). It's just that the default implementation of parse_volname() (wrongly) classifies those as 'images'. > +} > + > +return; > +} > + > __PACKAGE__->register_method ({ > -name => 'copy', > +name => 'move', > path => '{volume}', > method => 'POST', > -description => "Copy a volume. This is experimental code - do not use.", > +description => "Move a volume.", > +permissions => { > + description => "You need the 'Datastore.Allocate' privilege on the > storages.", The documentation says: Datastore.Allocate: create/modify/remove a datastore and delete volumes Datastore.AllocateSpace: allocate space on a datastore and the DELTE method for volumes has: You need 'Datastore.Allocate' privilege on the storage (or 'Datastore.AllocateSpace' for backup volumes if you have VM.Backup privilege on the VM) I think we can use that too here. Require Datastore.AllocateSpace if not using --delete and require Datastore.Allocate on the source if using --delete, as well as special casing the backup case. And writing this later, after looking at the actual checks below: the documentation could be much more precise here ;) > + user => 'all', > +}, > protected => 1, > proxyto => 'node', > parameters => { > - additionalProperties => 0, > + additionalProperties => 0, > properties => { > node => get_standard_option('pve-node'), > storage => get_standard_option('pve-storage-id', { optional => 1}), ---snip 8<--- > + if ($delete) { > + $rpcenv->check($user, "/storage/$src_storeid", > ["Datastore.Allocate"]); Aha, so this is only required when using delete ;) > + } else { > + $rpcenv->check($user, "/storage/$src_storeid", ["Datastore.Audit"]); But Datastore.Audit does not entail permission to read the volume contents. There is check_volume_access() for that. > + } > > - my ($target_sid, $target_volname) = > PVE::Storage::parse_volume_id($dst_volid); > - #my $target_ip = PVE::Cluster::remote_node_ip($target_node); > + my ($vtype) = PVE::Storage::parse_volname($cfg, $src_volid); > + die "use pct move-volume or qm disk move" if $vtype eq 'images'; || $vtype eq 'rootdir' ---snip 8<--- > @@ -1661,16 +1677,21 @@ sub volume_import { > my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $file_format) = >
[pve-devel] [PATCH qemu 2/2] pick up stable fixes for 9.0
Includes fixes for VirtIO-net, ARM and x86(_64) emulation, CVEs to harden NBD server against malicious clients, as well as a few others (VNC, physmem, Intel IOMMU, ...). Signed-off-by: Fiona Ebner --- ...net-Ensure-queue-index-fits-with-RSS.patch | 35 ++ ...etwork-stall-at-the-host-side-waitin.patch | 338 ++ ...t-nic-model-help-output-as-documente.patch | 70 ...net-nic-model-for-non-help-arguments.patch | 32 ++ ...-assert-for-128-bit-tile-accesses-wh.patch | 57 +++ ...arm-Fix-UMOPA-UMOPS-of-16-bit-values.patch | 59 +++ ...-shifts-by-1-in-tszimm_shr-and-tszim.patch | 62 ...e-SMCR_EL2.LEN-and-SVCR_EL2.LEN-if-E.patch | 41 +++ ...e-denormals-correctly-for-FMOPA-wide.patch | 164 + ...el_iommu-fix-FRCD-construction-macro.patch | 39 ++ ...386-Do-not-apply-REX-to-MMX-operands.patch | 33 ++ ...rash-by-resetting-local_err-in-modul.patch | 42 +++ ...-Plumb-in-new-args-to-nbd_client_add.patch | 164 + ...024-7409-Cap-default-max-connections.patch | 172 + ...024-7409-Drop-non-negotiating-client.patch | 123 +++ ...024-7409-Close-stray-clients-at-serv.patch | 161 + ...c-fix-crash-when-no-console-attached.patch | 47 +++ ...024-7409-Avoid-use-after-free-when-c.patch | 89 + ...fix-memory-leak-in-dirty_memory_exte.patch | 134 +++ ...ckup-Proxmox-backup-patches-for-QEMU.patch | 4 +- .../0046-PVE-backup-add-fleecing-option.patch | 4 +- debian/patches/series | 19 + 22 files changed, 1885 insertions(+), 4 deletions(-) create mode 100644 debian/patches/extra/0018-virtio-net-Ensure-queue-index-fits-with-RSS.patch create mode 100644 debian/patches/extra/0019-virtio-net-Fix-network-stall-at-the-host-side-waitin.patch create mode 100644 debian/patches/extra/0020-net-Reinstate-net-nic-model-help-output-as-documente.patch create mode 100644 debian/patches/extra/0021-net-Fix-net-nic-model-for-non-help-arguments.patch create mode 100644 debian/patches/extra/0022-target-arm-Don-t-assert-for-128-bit-tile-accesses-wh.patch create mode 100644 debian/patches/extra/0023-target-arm-Fix-UMOPA-UMOPS-of-16-bit-values.patch create mode 100644 debian/patches/extra/0024-target-arm-Avoid-shifts-by-1-in-tszimm_shr-and-tszim.patch create mode 100644 debian/patches/extra/0025-target-arm-Ignore-SMCR_EL2.LEN-and-SVCR_EL2.LEN-if-E.patch create mode 100644 debian/patches/extra/0026-target-arm-Handle-denormals-correctly-for-FMOPA-wide.patch create mode 100644 debian/patches/extra/0027-intel_iommu-fix-FRCD-construction-macro.patch create mode 100644 debian/patches/extra/0028-target-i386-Do-not-apply-REX-to-MMX-operands.patch create mode 100644 debian/patches/extra/0029-module-Prevent-crash-by-resetting-local_err-in-modul.patch create mode 100644 debian/patches/extra/0030-nbd-server-Plumb-in-new-args-to-nbd_client_add.patch create mode 100644 debian/patches/extra/0031-nbd-server-CVE-2024-7409-Cap-default-max-connections.patch create mode 100644 debian/patches/extra/0032-nbd-server-CVE-2024-7409-Drop-non-negotiating-client.patch create mode 100644 debian/patches/extra/0033-nbd-server-CVE-2024-7409-Close-stray-clients-at-serv.patch create mode 100644 debian/patches/extra/0034-vnc-fix-crash-when-no-console-attached.patch create mode 100644 debian/patches/extra/0035-nbd-server-CVE-2024-7409-Avoid-use-after-free-when-c.patch create mode 100644 debian/patches/extra/0036-softmmu-physmem-fix-memory-leak-in-dirty_memory_exte.patch diff --git a/debian/patches/extra/0018-virtio-net-Ensure-queue-index-fits-with-RSS.patch b/debian/patches/extra/0018-virtio-net-Ensure-queue-index-fits-with-RSS.patch new file mode 100644 index 000..1dcb129 --- /dev/null +++ b/debian/patches/extra/0018-virtio-net-Ensure-queue-index-fits-with-RSS.patch @@ -0,0 +1,35 @@ +From Mon Sep 17 00:00:00 2001 +From: Akihiko Odaki +Date: Mon, 1 Jul 2024 20:58:04 +0900 +Subject: [PATCH] virtio-net: Ensure queue index fits with RSS + +Ensure the queue index points to a valid queue when software RSS +enabled. The new calculation matches with the behavior of Linux's TAP +device with the RSS eBPF program. + +Fixes: 4474e37a5b3a ("virtio-net: implement RX RSS processing") +Reported-by: Zhibin Hu +Cc: qemu-sta...@nongnu.org +Signed-off-by: Akihiko Odaki +Reviewed-by: Michael S. Tsirkin +Signed-off-by: Jason Wang +(cherry picked from commit f1595ceb9aad36a6c1da95bcb77ab9509b38822d) +Signed-off-by: Fiona Ebner +--- + hw/net/virtio-net.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c +index 3644bfd91b..f48588638d 100644 +--- a/hw/net/virtio-net.c b/hw/net/virtio-net.c +@@ -1949,7 +1949,8 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf, + if (!no_rss && n->rss_data.enabled && n->rss_data.enabled_software_rss) { + int index = virtio_net_pro
[pve-devel] [PATCH qemu 1/2] pick up fix for VirtIO PCI regressions
Commit f06b222 ("fixes for QEMU 9.0") included a revert for the QEMU commit 2ce6cff94d ("virtio-pci: fix use of a released vector"). That commit caused some regressions which sounded just as bad as the fix. Those regressions have now been addressed upstream, so pick up the fix and drop the revert. Dropping the revert fixes the original issue that commit 2ce6cff94d ("virtio-pci: fix use of a released vector") addressed. Signed-off-by: Fiona Ebner --- ...tio-pci-fix-use-of-a-released-vector.patch | 87 --- ...ck-copy-before-write-fix-permission.patch} | 0 ...-write-support-unligned-snapshot-di.patch} | 0 ...-write-create-block_copy-bitmap-in-.patch} | 0 ...backup-add-discard-source-parameter.patch} | 0 ...-de-initialization-of-vhost-user-de.patch} | 0 ...se-float_status-copy-in-sme_fmopa_s.patch} | 0 ...Use-FPST_F16-for-SME-FMOPA-widening.patch} | 0 ...on-and-honor-bootindex-again-for-le.patch} | 0 ...a-bump-instruction-limit-in-scripts.patch} | 0 ...5-block-copy-Fix-missing-graph-lock.patch} | 0 ...do-not-operate-on-sources-from-fina.patch} | 0 ...ix-the-use-of-an-uninitialized-irqfd.patch | 77 debian/patches/series | 24 ++--- 14 files changed, 89 insertions(+), 99 deletions(-) delete mode 100644 debian/patches/extra/0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch rename debian/patches/extra/{0007-block-copy-before-write-fix-permission.patch => 0006-block-copy-before-write-fix-permission.patch} (100%) rename debian/patches/extra/{0008-block-copy-before-write-support-unligned-snapshot-di.patch => 0007-block-copy-before-write-support-unligned-snapshot-di.patch} (100%) rename debian/patches/extra/{0009-block-copy-before-write-create-block_copy-bitmap-in-.patch => 0008-block-copy-before-write-create-block_copy-bitmap-in-.patch} (100%) rename debian/patches/extra/{0010-qapi-blockdev-backup-add-discard-source-parameter.patch => 0009-qapi-blockdev-backup-add-discard-source-parameter.patch} (100%) rename debian/patches/extra/{0011-hw-virtio-Fix-the-de-initialization-of-vhost-user-de.patch => 0010-hw-virtio-Fix-the-de-initialization-of-vhost-user-de.patch} (100%) rename debian/patches/extra/{0012-target-arm-Use-float_status-copy-in-sme_fmopa_s.patch => 0011-target-arm-Use-float_status-copy-in-sme_fmopa_s.patch} (100%) rename debian/patches/extra/{0013-target-arm-Use-FPST_F16-for-SME-FMOPA-widening.patch => 0012-target-arm-Use-FPST_F16-for-SME-FMOPA-widening.patch} (100%) rename debian/patches/extra/{0014-scsi-fix-regression-and-honor-bootindex-again-for-le.patch => 0013-scsi-fix-regression-and-honor-bootindex-again-for-le.patch} (100%) rename debian/patches/extra/{0015-hw-scsi-lsi53c895a-bump-instruction-limit-in-scripts.patch => 0014-hw-scsi-lsi53c895a-bump-instruction-limit-in-scripts.patch} (100%) rename debian/patches/extra/{0016-block-copy-Fix-missing-graph-lock.patch => 0015-block-copy-Fix-missing-graph-lock.patch} (100%) rename debian/patches/extra/{0017-Revert-qemu-char-do-not-operate-on-sources-from-fina.patch => 0016-Revert-qemu-char-do-not-operate-on-sources-from-fina.patch} (100%) create mode 100644 debian/patches/extra/0017-virtio-pci-Fix-the-use-of-an-uninitialized-irqfd.patch diff --git a/debian/patches/extra/0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch b/debian/patches/extra/0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch deleted file mode 100644 index d2de6d1..000 --- a/debian/patches/extra/0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch +++ /dev/null @@ -1,87 +0,0 @@ -From 00000000 Mon Sep 17 00:00:00 2001 -From: Fiona Ebner -Date: Thu, 16 May 2024 12:59:52 +0200 -Subject: [PATCH] Revert "virtio-pci: fix use of a released vector" - -This reverts commit 2ce6cff94df2650c460f809e5ad263f1d22507c0. - -The fix causes some issues: -https://gitlab.com/qemu-project/qemu/-/issues/2321 -https://gitlab.com/qemu-project/qemu/-/issues/2334 - -The CVE fixed by commit 2ce6cff94d ("virtio-pci: fix use of a released -vector") is CVE-2024-4693 [0] and allows a malicious guest that -controls the boot process in the guest to crash its QEMU process. - -The issues sound worse than the CVE, so revert until there is a proper -fix. - -[0]: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-4693 - -Signed-off-by: Fiona Ebner - hw/virtio/virtio-pci.c | 37 ++--- - 1 file changed, 2 insertions(+), 35 deletions(-) - -diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c -index e04218a9fb..fd66713848 100644 a/hw/virtio/virtio-pci.c -+++ b/hw/virtio/virtio-pci.c -@@ -1410,38 +1410,6 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy, - return offset; - } - --static void virtio_pci_set_vector(VirtIODevice *vdev, -- VirtIOPCI
Re: [pve-devel] [PATCH v2 qemu-server] remote migration: fix online migration via API clients
Am 03.09.24 um 11:37 schrieb Fabian Grünbichler: > On August 13, 2024 10:42 am, Fiona Ebner wrote: >> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm >> index e71face4..d31589e7 100644 >> --- a/PVE/QemuMigrate.pm >> +++ b/PVE/QemuMigrate.pm >> @@ -1095,7 +1095,10 @@ sub phase2 { >> die "only UNIX sockets are supported for remote migration\n" >> if $tunnel_info->{proto} ne 'unix'; >> >> -my $remote_socket = $tunnel_info->{addr}; >> +# untaint >> +my ($remote_socket) = >> +$tunnel_info->{addr} =~ >> m|^(/run/qemu-server/(?:(?!\.\./).)+\.migrate)$|; > > should we just switch to `\d+`, like we do for regular migration? in > phase2_start_local_cluster we have: > > elsif ($line =~ m!^migration listens on > (unix):(/run/qemu-server/(\d+)\.migrate)$!) { > $tunnel_info->{addr} = $2; > die "Destination UNIX sockets VMID does not match source VMID" if > $vmid ne $3; > $tunnel_info->{proto} = $1; > } > > and I don't really see a reason to deviate from that scheme any time > soon? > Sounds good, did so in v3: https://lore.proxmox.com/pve-devel/20240904111231.106570-1-f.eb...@proxmox.com/T/#u ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH v3 qemu-server] remote migration: fix online migration via API clients
As reported in the community forum [0], when a remote migration request comes in via an API client, the -T flag for Perl is set, so an insecure dependency in a call like unlink() in forward_unix_socket() will fail with: > failed to write forwarding command - Insecure dependency in unlink while > running with -T switch To fix it, untaint the problematic socket addresses coming from the remote side. Require that all sockets are below '/run/qemu-server/' and end with '.migrate' with the main socket being matched more strictly. This allows extensions in the future while still being quite strict. [0]: https://forum.proxmox.com/threads/123048/post-691958 Signed-off-by: Fiona Ebner --- Changes in v3: * Match main socket address more strictly as suggested by Fabian. PVE/QemuMigrate.pm | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index e71face4..6591f3f7 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -1095,7 +1095,9 @@ sub phase2 { die "only UNIX sockets are supported for remote migration\n" if $tunnel_info->{proto} ne 'unix'; - my $remote_socket = $tunnel_info->{addr}; + # untaint + my ($remote_socket) = $tunnel_info->{addr} =~ m|^(/run/qemu-server/\d+\.migrate)$| + or die "unexpected socket address '$tunnel_info->{addr}'\n"; my $local_socket = $remote_socket; $local_socket =~ s/$remote_vmid/$vmid/g; $tunnel_info->{addr} = $local_socket; @@ -1104,6 +1106,9 @@ sub phase2 { PVE::Tunnel::forward_unix_socket($self->{tunnel}, $local_socket, $remote_socket); foreach my $remote_socket (@{$tunnel_info->{unix_sockets}}) { + # untaint + ($remote_socket) = $remote_socket =~ m|^(/run/qemu-server/(?:(?!\.\./).)+\.migrate)$| + or die "unexpected socket address '$remote_socket'\n"; my $local_socket = $remote_socket; $local_socket =~ s/$remote_vmid/$vmid/g; next if $self->{tunnel}->{forwarded}->{$local_socket}; -- 2.39.2 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] applied: [PATCH docs] pve-network: correct language errors
Am 13.08.24 um 15:57 schrieb Fiona Ebner: > Am 13.08.24 um 12:23 schrieb Alexander Zeidler: >> Signed-off-by: Alexander Zeidler >> --- >> pve-network.adoc | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/pve-network.adoc b/pve-network.adoc >> index acdcf39..434430d 100644 >> --- a/pve-network.adoc >> +++ b/pve-network.adoc >> @@ -447,7 +447,7 @@ slaves in the single logical bonded interface such that >> different >> network-peers use different MAC addresses for their network packet >> traffic. >> >> -If your switch support the LACP (IEEE 802.3ad) protocol then we recommend >> using >> +If your switch supports the LACP (IEEE 802.3ad) protocol then we recommend >> using >> the corresponding bonding mode (802.3ad). Otherwise you should generally >> use the >> active-backup mode. >> >> @@ -490,7 +490,7 @@ iface vmbr0 inet static >> >> >> [thumbnail="default-network-setup-bond.svg"] >> -Another possibility it to use the bond directly as bridge port. >> +Another possibility is to use the bond directly as bridge port. >> This can be used to make the guest network fault-tolerant. >> >> .Example: Use a bond as bridge port > > applied, thanks! While adding the missing comma Lukas noted and his R-b. I also added a follow-up using "as the bridge port" instead of "as bridge port" while at it. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH docs] local-zfs: correct language errors
Am 13.08.24 um 12:23 schrieb Alexander Zeidler: > Signed-off-by: Alexander Zeidler > --- > local-zfs.adoc | 14 +++--- > 1 file changed, 7 insertions(+), 7 deletions(-) > applied, thanks! Squashed in a change to wrap lines that would've been longer than 80 characters. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH docs] pve-network: correct language errors
Am 13.08.24 um 12:23 schrieb Alexander Zeidler: > Signed-off-by: Alexander Zeidler > --- > pve-network.adoc | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/pve-network.adoc b/pve-network.adoc > index acdcf39..434430d 100644 > --- a/pve-network.adoc > +++ b/pve-network.adoc > @@ -447,7 +447,7 @@ slaves in the single logical bonded interface such that > different > network-peers use different MAC addresses for their network packet > traffic. > > -If your switch support the LACP (IEEE 802.3ad) protocol then we recommend > using > +If your switch supports the LACP (IEEE 802.3ad) protocol then we recommend > using > the corresponding bonding mode (802.3ad). Otherwise you should generally use > the > active-backup mode. > > @@ -490,7 +490,7 @@ iface vmbr0 inet static > > > [thumbnail="default-network-setup-bond.svg"] > -Another possibility it to use the bond directly as bridge port. > +Another possibility is to use the bond directly as bridge port. > This can be used to make the guest network fault-tolerant. > > .Example: Use a bond as bridge port applied, thanks! While adding the missing comma Lukas noted and his R-b. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [RFC container v2 23/25] backup: implement restore for external providers
First, the provider is asked about what restore mechanism to use. Currently, 'directory' and 'tar' are possible, for restoring either from a directory containing the full filesystem structure (for which rsync is used) or a potentially compressed tar file containing the same. The new functions are copied and adapted from the existing ones for PBS or tar and it might be worth to factor out the common parts. Signed-off-by: Fiona Ebner --- Changes in v2: * Adapt to API changes. src/PVE/LXC/Create.pm | 141 ++ 1 file changed, 141 insertions(+) diff --git a/src/PVE/LXC/Create.pm b/src/PVE/LXC/Create.pm index 117103c..9d1c337 100644 --- a/src/PVE/LXC/Create.pm +++ b/src/PVE/LXC/Create.pm @@ -25,6 +25,24 @@ sub restore_archive { if ($scfg->{type} eq 'pbs') { return restore_proxmox_backup_archive($storage_cfg, $archive, $rootdir, $conf, $no_unpack_error, $bwlimit); } + my $log_function = sub { + my ($log_level, $message) = @_; + my $prefix = $log_level eq 'err' ? 'ERROR' : uc($log_level); + print "$prefix: $message\n"; + }; + my $backup_provider = + PVE::Storage::new_backup_provider($storage_cfg, $storeid, $log_function); + if ($backup_provider) { + return restore_external_archive( + $backup_provider, + $storeid, + $volname, + $rootdir, + $conf, + $no_unpack_error, + $bwlimit, + ); + } } $archive = PVE::Storage::abs_filesystem_path($storage_cfg, $archive) if $archive ne '-'; @@ -118,6 +136,55 @@ sub restore_tar_archive { die $err if $err && !$no_unpack_error; } +sub restore_external_archive { +my ($backup_provider, $storeid, $volname, $rootdir, $conf, $no_unpack_error, $bwlimit) = @_; + +my ($mechanism, $vmtype) = $backup_provider->restore_get_mechanism($volname, $storeid); +die "cannot restore non-LXC guest of type '$vmtype'\n" if $vmtype ne 'lxc'; + +my $info = $backup_provider->restore_container_init($volname, $storeid, {}); +eval { + if ($mechanism eq 'tar') { + my $tar_path = $info->{'tar-path'} + or die "did not get path to tar file from backup provider\n"; + die "not a regular file '$tar_path'" if !-f $tar_path; + restore_tar_archive($tar_path, $rootdir, $conf, $no_unpack_error, $bwlimit); + } elsif ($mechanism eq 'directory') { + my $directory = $info->{'archive-directory'} + or die "did not get path to archive directory from backup provider\n"; + die "not a directory '$directory'" if !-d $directory; + + my $rsync = ['rsync', '--stats', '-h', '-X', '-A', '--numeric-ids', '-aH', '--delete', + '--no-whole-file', '--sparse', '--one-file-system', '--relative']; + push $rsync->@*, '--bwlimit', $bwlimit if $bwlimit; + push $rsync->@*, "${directory}/./", $rootdir; + + my $transferred = ''; + my $outfunc = sub { + return if $_[0] !~ /^Total transferred file size: (.+)$/; + $transferred = $1; + }; + my $errfunc = sub { log_warn($_[0]); }; + + my $starttime = time(); + PVE::Tools::run_command($rsync, outfunc => $outfunc, errfunc => $errfunc); + my $delay = time () - $starttime; + + print "sync finished - transferred ${transferred} in ${delay}s\n"; + } else { + die "mechanism '$mechanism' requested by backup provider is not supported for LXCs\n"; + } +}; +my $err = $@; +eval { $backup_provider->restore_container_cleanup($volname, $storeid, {}); }; +if (my $cleanup_err = $@) { + die $cleanup_err if !$err; + warn $cleanup_err; +} +die $err if $err; + +} + sub recover_config { my ($storage_cfg, $volid, $vmid) = @_; @@ -126,6 +193,8 @@ sub recover_config { my $scfg = PVE::Storage::storage_check_enabled($storage_cfg, $storeid); if ($scfg->{type} eq 'pbs') { return recover_config_from_proxmox_backup($storage_cfg, $volid, $vmid); + } elsif (PVE::Storage::new_backup_provider($storage_cfg, $storeid, sub {})) { + return recover_config_from_external_backup($storage_cfg, $volid, $vmid); } } @@ -200,6 +269,26 @@ sub recover_config_from_tar { return wantarray ? ($conf, $mp_param) : $conf; } +sub recover_config
[pve-devel] [RFC qemu-server v2 19/25] backup: implement backup for external providers
The state of the VM's disk images at the time the backup is started is preserved via a snapshot-access block node. Old data is moved to the fleecing image when new guest writes come in. The snapshot-access block node, as well as the associated bitmap in case of incremental backup, will be made available to the external provider. They are exported via NBD and for 'nbd' mechanism, the NBD socket path is passed to the provider, while for 'block-device' mechanism, the NBD export is made accessible as a regular block device first and the bitmap information is made available via a $next_dirty_region->() function. For 'block-device', the 'nbdinfo' binary is required. The provider can indicate that it wants to do an incremental backup by returning the bitmap ID that was used for a previous backup and it will then be told if the bitmap was newly created (either first backup or old bitmap was invalid) or if the bitmap can be reused. The provider then reads the parts of the NBD or block device it needs, either the full disk for full backup, or the dirty parts according to the bitmap for incremental backup. The bitmap has to be respected, reads to other parts of the image will return an error. After backing up each part of the disk, it should be discarded in the export to avoid unnecessary space usage in the fleecing image (requires the storage underlying the fleecing image to support discard too). Signed-off-by: Fiona Ebner --- Changes in v2: * Support 'block-device' mechanism, by exposing the NBD export as a block device via qemu-nbd. * Adapt to API changes, i.e. pass all volumes as well as configuration files to the provider at once. PVE/VZDump/QemuServer.pm | 308 ++- 1 file changed, 307 insertions(+), 1 deletion(-) diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm index 4ad4a154..9daebbc2 100644 --- a/PVE/VZDump/QemuServer.pm +++ b/PVE/VZDump/QemuServer.pm @@ -20,7 +20,7 @@ use PVE::QMPClient; use PVE::Storage::Plugin; use PVE::Storage::PBSPlugin; use PVE::Storage; -use PVE::Tools; +use PVE::Tools qw(run_command); use PVE::VZDump; use PVE::Format qw(render_duration render_bytes); @@ -291,6 +291,8 @@ sub archive { if ($self->{vzdump}->{opts}->{pbs}) { $self->archive_pbs($task, $vmid); +} elsif ($self->{vzdump}->{'backup-provider'}) { + $self->archive_external($task, $vmid); } else { $self->archive_vma($task, $vmid, $filename, $comp); } @@ -1136,6 +1138,23 @@ sub cleanup { # If VM was started only for backup, it is already stopped now. if (PVE::QemuServer::Helpers::vm_running_locally($vmid)) { + if ($task->{cleanup}->{'nbd-stop'}) { + eval { PVE::QemuServer::QMPHelpers::nbd_stop($vmid); }; + $self->logerr($@) if $@; + } + + if (my $info = $task->{cleanup}->{'backup-access-teardown'}) { + my $params = { + 'target-id' => $info->{'target-id'}, + timeout => 60, + success => $info->{success} ? JSON::true : JSON::false, + }; + + $self->loginfo("tearing down backup-access"); + eval { mon_cmd($vmid, "backup-access-teardown", $params->%*) }; + $self->logerr($@) if $@; + } + $detach_tpmstate_drive->($task, $vmid); detach_fleecing_images($task->{disks}, $vmid) if $task->{'use-fleecing'}; } @@ -1147,4 +1166,291 @@ sub cleanup { } } +my sub block_device_backup_cleanup { +my ($self, $paths, $cpids) = @_; + +for my $path ($paths->@*) { + eval { run_command(["qemu-nbd", "-d", $path ]); }; + $self->log('warn', "unable to disconnect NBD backup source '$path' - $@") if $@; +} + +my $waited; +my $wait_limit = 5; +for ($waited = 0; $waited < $wait_limit && scalar(keys $cpids->%*); $waited++) { + while ((my $cpid = waitpid(-1, POSIX::WNOHANG)) > 0) { + delete($cpids->{$cpid}); + } + if ($waited == 0) { + kill 15, $_ for keys $cpids->%*; + } + sleep 1; +} +if ($waited == $wait_limit && scalar(keys $cpids->%*)) { + kill 9, $_ for keys $cpids->%*; + sleep 1; + while ((my $cpid = waitpid(-1, POSIX::WNOHANG)) > 0) { + delete($cpids->{$cpid}); + } + $self->log('warn', "unable to collect nbdinfo child process '$_'") for keys $cpids->%*; +} +} + +my sub block_device_backup_prepare { +my ($self, $devicename, $size, $nbd_path, $bitmap_name, $count) = @_; + +my $nbd_info_uri = "nbd+unix:///${devicename}?socket=${nbd_path}"; +my $qemu_nbd_uri = "nbd:unix:$
[pve-devel] [POC storage v2 13/25] Borg plugin
Archive names start with the guest type and ID and then the same timestamp format as PBS. Container archives have the following structure: guest.config firewall.config filesystem/ # containing the whole filesystem structure VM archives have the following structure guest.config firewall.config volumes/ # containing a raw file for each device A bindmount (respectively symlinks) are used to achieve this structure, because Borg doesn't seem to support renaming on-the-fly. (Prefix stripping via the "slashdot hack" would have helped slightly, but is only in Borg >= 1.4 https://github.com/borgbackup/borg/actions/runs/7967940995) NOTE: running via SSH was not yet tested. Bandwidth limit is not yet honored and the task size is not calculated yet. Discard for VM backups would also be nice to have, but it's not entirely clear how (parsing progress and discarding according to that is one idea). There is no dirty bitmap support, not sure if that is feasible to add. Signed-off-by: Fiona Ebner --- New in v2. src/PVE/BackupProvider/Plugin/Borg.pm | 373 ++ src/PVE/BackupProvider/Plugin/Makefile | 2 +- src/PVE/Storage.pm | 2 + src/PVE/Storage/BorgBackupPlugin.pm| 506 + src/PVE/Storage/Makefile | 1 + 5 files changed, 883 insertions(+), 1 deletion(-) create mode 100644 src/PVE/BackupProvider/Plugin/Borg.pm create mode 100644 src/PVE/Storage/BorgBackupPlugin.pm diff --git a/src/PVE/BackupProvider/Plugin/Borg.pm b/src/PVE/BackupProvider/Plugin/Borg.pm new file mode 100644 index 000..b65321c --- /dev/null +++ b/src/PVE/BackupProvider/Plugin/Borg.pm @@ -0,0 +1,373 @@ +package PVE::BackupProvider::Plugin::Borg; + +use strict; +use warnings; + +use File::chdir; +use File::Basename qw(basename); +use File::Path qw(make_path remove_tree); +use POSIX qw(strftime); + +use PVE::Tools; + +# ($vmtype, $vmid, $time_string) +our $ARCHIVE_RE_3 = qr!^pve-(lxc|qemu)-([0-9]+)-([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z)$!; + +sub archive_name { +my ($vmtype, $vmid, $backup_time) = @_; + +return "pve-${vmtype}-${vmid}-" . strftime("%FT%TZ", gmtime($backup_time)); +} + +# remove_tree can be very verbose by default, do explicit error handling and limit to one message +my sub _remove_tree { +my ($path) = @_; + +remove_tree($path, { error => \my $err }); +if ($err && @$err) { # empty array if no error + for my $diag (@$err) { + my ($file, $message) = %$diag; + die "cannot remove_tree '$path': $message\n" if $file eq ''; + die "cannot remove_tree '$path': unlinking $file failed - $message\n"; + } +} +} + +my sub prepare_run_dir { +my ($archive, $operation) = @_; + +my $run_dir = "/run/pve-storage-borg-plugin/${archive}.${operation}"; +_remove_tree($run_dir); +make_path($run_dir); +die "unable to create directory $run_dir\n" if !-d $run_dir; + +return $run_dir; +} + +my sub log_info { +my ($self, $message) = @_; + +$self->{'log-function'}->('info', $message); +} + +my sub log_warning { +my ($self, $message) = @_; + +$self->{'log-function'}->('warn', $message); +} + +my sub log_error { +my ($self, $message) = @_; + +$self->{'log-function'}->('err', $message); +} + +my sub file_contents_from_archive { +my ($self, $archive, $file) = @_; + +my $run_dir = prepare_run_dir($archive, "file-contents"); + +my $raw; + +eval { + local $CWD = $run_dir; + + $self->{'storage-plugin'}->borg_cmd_extract( + $self->{scfg}, + $self->{storeid}, + $archive, + [$file], + ); + + $raw = PVE::Tools::file_get_contents("${run_dir}/${file}"); +}; +my $err = $@; +eval { _remove_tree($run_dir); }; +log_warning($self, $@) if $@; +die $err if $err; + +return $raw; +} + +# Plugin implementation + +sub new { +my ($class, $storage_plugin, $scfg, $storeid, $log_function) = @_; + +my $self = bless { + scfg => $scfg, + storeid => $storeid, + 'storage-plugin' => $storage_plugin, + 'log-function' => $log_function, +}, $class; + +return $self; +} + +sub provider_name { +my ($self) = @_; + +return "Borg"; +} + +sub job_hook { +my ($self, $phase, $info) = @_; + +if ($phase eq 'start') { + $self->{'job-id'} = $info->{'start-time'}; +} + +return; +} + +sub backup_hook { +my ($self, $phase, $vmid, $vmtype, $info) = @_; + +if ($phase eq 'start') { + $self->{$vmid}->{'task-size'} = 0; +} + +return; +} + +sub backup_get_mechanism {
[pve-devel] [RFC qemu v2 09/25] PVE backup: implement bitmap support for external backup access
There can be one dirty bitmap for each backup target ID (which are tracked in the backup_access_bitmaps hash table). The QMP user can specify the ID of the bitmap it likes to use. This ID is then compared to the current one for the given target. If they match, the bitmap is re-used (should it still exist on the drive, otherwise re-created). If there is a mismatch, the old bitmap is removed and a new one is created. The return value of the QMP command includes information about what bitmap action was taken. Similar to what the query-backup QMP command returns for regular backup. It also includes the bitmap name and associated block node, so the management layer can then set up an NBD export with the bitmap. While the backup access is active, a background bitmap is also required. This is necessary to implement bitmap handling according to the original reference [0]. In particular: - in the error case, new writes since the backup access was set up are in the background bitmap. Because of failure, the previously tracked writes from the backup access bitmap are still required too. Thus, the bitmap is merged with the background bitmap to get all new writes since the last backup. - in the success case, continue tracking for the next incremental backup in the backup access bitmap. New writes since the backup access was set up are in the background bitmap. Because the backup was successfully, clear the backup access bitmap and merge back the background bitmap to get only the new writes. Since QEMU cannot know if the backup was successful or not (except if failure already happens during the setup QMP command), the management layer needs to tell it via the teardown QMP command. The bitmap action is also recorded in the device info now. [0]: https://lore.kernel.org/qemu-devel/b68833dd-8864-4d72-7c61-c134a9835...@ya.ru/ Signed-off-by: Fiona Ebner --- No changes in v2. pve-backup.c | 175 ++- pve-backup.h | 2 +- qapi/block-core.json | 22 +- system/runstate.c| 2 +- 4 files changed, 193 insertions(+), 8 deletions(-) diff --git a/pve-backup.c b/pve-backup.c index d3370d6744..5f8dd396d5 100644 --- a/pve-backup.c +++ b/pve-backup.c @@ -15,6 +15,7 @@ #include "qapi/qmp/qdict.h" #include "qapi/qmp/qerror.h" #include "qemu/cutils.h" +#include "qemu/error-report.h" #if defined(CONFIG_MALLOC_TRIM) #include @@ -41,6 +42,7 @@ */ const char *PBS_BITMAP_NAME = "pbs-incremental-dirty-bitmap"; +const char *BACKGROUND_BITMAP_NAME = "backup-access-background-bitmap"; static struct PVEBackupState { struct { @@ -72,6 +74,7 @@ static struct PVEBackupState { CoMutex backup_mutex; CoMutex dump_callback_mutex; char *target_id; +GHashTable *backup_access_bitmaps; // key=target_id, value=bitmap_name } backup_state; static void pvebackup_init(void) @@ -99,6 +102,8 @@ typedef struct PVEBackupDevInfo { char* device_name; int completed_ret; // INT_MAX if not completed BdrvDirtyBitmap *bitmap; +BdrvDirtyBitmap *background_bitmap; // used for external backup access +PBSBitmapAction bitmap_action; BlockDriverState *target; BlockJob *job; } PVEBackupDevInfo; @@ -362,6 +367,67 @@ static void coroutine_fn pvebackup_co_complete_stream(void *opaque) qemu_co_mutex_unlock(&backup_state.backup_mutex); } +/* + * New writes since the backup access was set up are in the background bitmap. Because of failure, + * the previously tracked writes in di->bitmap are still required too. Thus, merge with the + * background bitmap to get all new writes since the last backup. + */ +static void handle_backup_access_bitmaps_in_error_case(PVEBackupDevInfo *di) +{ +Error *local_err = NULL; + +if (di->bs && di->background_bitmap) { +bdrv_drained_begin(di->bs); +if (di->bitmap) { +bdrv_enable_dirty_bitmap(di->bitmap); +if (!bdrv_merge_dirty_bitmap(di->bitmap, di->background_bitmap, NULL, &local_err)) { +warn_report("backup access: %s - could not merge bitmaps in error path - %s", +di->device_name, +local_err ? error_get_pretty(local_err) : "unknown error"); +/* + * Could not merge, drop original bitmap too. + */ +bdrv_release_dirty_bitmap(di->bitmap); +} +} else { +warn_report("backup access: %s - expected bitmap not present", di->device_name); +} +bdrv_release_dirty_bitmap(di->background_bitmap); +bdrv_drained_end(di->bs); +} +} + +/* + * Continue tracking for next incremental backup in di->bitmap. New writes since the backup access + * was set up are in the background
[pve-devel] [RFC qemu-server v2 18/25] backup: allow adding fleecing images also for EFI and TPM
For the external backup API, it will be necessary to add a fleecing image even for small disks like EFI and TPM, because there is no other place the old data could be copied to when a new guest write comes in. Signed-off-by: Fiona Ebner --- No changes in v2. PVE/VZDump/QemuServer.pm | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm index 98685127..4ad4a154 100644 --- a/PVE/VZDump/QemuServer.pm +++ b/PVE/VZDump/QemuServer.pm @@ -548,7 +548,7 @@ my sub cleanup_fleecing_images { } my sub allocate_fleecing_images { -my ($self, $disks, $vmid, $fleecing_storeid, $format) = @_; +my ($self, $disks, $vmid, $fleecing_storeid, $format, $all_images) = @_; die "internal error - no fleecing storage specified\n" if !$fleecing_storeid; @@ -559,7 +559,8 @@ my sub allocate_fleecing_images { my $n = 0; # counter for fleecing image names for my $di ($disks->@*) { - next if $di->{virtdev} =~ m/^(?:tpmstate|efidisk)\d$/; # too small to be worth it + # EFI/TPM are usually too small to be worth it, but it's required for external providers + next if !$all_images && $di->{virtdev} =~ m/^(?:tpmstate|efidisk)\d$/; if ($di->{type} eq 'block' || $di->{type} eq 'file') { my $scfg = PVE::Storage::storage_config($self->{storecfg}, $fleecing_storeid); my $name = "vm-$vmid-fleece-$n"; @@ -623,7 +624,7 @@ my sub attach_fleecing_images { } my sub check_and_prepare_fleecing { -my ($self, $vmid, $fleecing_opts, $disks, $is_template, $qemu_support) = @_; +my ($self, $vmid, $fleecing_opts, $disks, $is_template, $qemu_support, $all_images) = @_; # Even if the VM was started specifically for fleecing, it's possible that the VM is resumed and # then starts doing IO. For VMs that are not resumed the fleecing images will just stay empty, @@ -644,7 +645,8 @@ my sub check_and_prepare_fleecing { $self->{storecfg}, $fleecing_opts->{storage}); my $format = scalar(grep { $_ eq 'qcow2' } $valid_formats->@*) ? 'qcow2' : 'raw'; - allocate_fleecing_images($self, $disks, $vmid, $fleecing_opts->{storage}, $format); + allocate_fleecing_images( + $self, $disks, $vmid, $fleecing_opts->{storage}, $format, $all_images); attach_fleecing_images($self, $disks, $vmid, $format); } @@ -735,7 +737,7 @@ sub archive_pbs { my $is_template = PVE::QemuConfig->is_template($self->{vmlist}->{$vmid}); $task->{'use-fleecing'} = check_and_prepare_fleecing( - $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support); + $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support, 0); my $fs_frozen = $self->qga_fs_freeze($task, $vmid); @@ -919,7 +921,7 @@ sub archive_vma { $attach_tpmstate_drive->($self, $task, $vmid); $task->{'use-fleecing'} = check_and_prepare_fleecing( - $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support); + $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support, 0); my $outfh; if ($opts->{stdout}) { -- 2.39.2 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel