Re: [libvirt] [Qemu-devel] [PATCH 2/2] target-i386: Disable kvm_mmu_op by default on pc-1.4
On Fri, Jan 4, 2013 at 2:52 PM, Eduardo Habkost ehabk...@redhat.com wrote: The kvm_mmu_op feature was removed from the kernel since v3.3 (released in March 2012), it was marked for removal since January 2011 and it's slower than shadow or hardware assisted paging (see kernel commit fb92045843). It doesn't make sense to keep it enabled by default. Also, keeping it enabled by default would cause unnecessary hassle when libvirt start using the enforce option. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: k...@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Cc: libvir-list@redhat.com Cc: Jiri Denemark jdene...@redhat.com I was planning to reverse the logic of the compat init functions and make pc_init_pci_1_3() enable kvm_mmu_op and then call pc_init_pci_1_4(). But that would require changing pc_init_pci_no_kvmclock() and pc_init_isa() as well. So to keep the changes simple, I am keeping the pattern used when pc_init_pci_1_3() was introduced, making pc_init_pci_1_4() disable kvm_mmu_op and then call pc_init_pci_1_3(). --- hw/pc_piix.c | 11 ++- target-i386/cpu.c | 8 target-i386/cpu.h | 1 + 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 99747a7..a6bf645 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -217,6 +217,7 @@ static void pc_init1(MemoryRegion *system_memory, } } +/* machine init function for pc-0.14 - pc-1.2 */ static void pc_init_pci(QEMUMachineInitArgs *args) { ram_addr_t ram_size = args-ram_size; @@ -232,12 +233,20 @@ static void pc_init_pci(QEMUMachineInitArgs *args) initrd_filename, cpu_model, 1, 1); } +/* machine init function for pc-1.3 */ The comment does give much information compared to the function name. static void pc_init_pci_1_3(QEMUMachineInitArgs *args) { enable_kvm_pv_eoi(); pc_init_pci(args); } +/* machine init function for pc-1.4 */ Ditto. +static void pc_init_pci_1_4(QEMUMachineInitArgs *args) +{ +disable_kvm_mmu_op(); +pc_init_pci_1_3(args); +} + static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs *args) { ram_addr_t ram_size = args-ram_size; @@ -285,7 +294,7 @@ static QEMUMachine pc_machine_v1_4 = { .name = pc-1.4, .alias = pc, .desc = Standard PC, -.init = pc_init_pci_1_3, +.init = pc_init_pci_1_4, .max_cpus = 255, .is_default = 1, }; diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 808001a..ec877c7 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -157,6 +157,14 @@ void enable_kvm_pv_eoi(void) #endif } +void disable_kvm_mmu_op(void) +{ +#ifdef CONFIG_KVM +if (kvm_enabled()) Braces. +kvm_default_features = ~(1UL KVM_FEATURE_MMU_OP); +#endif +} + void host_cpuid(uint32_t function, uint32_t count, uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx) { diff --git a/target-i386/cpu.h b/target-i386/cpu.h index 1283537..27c8d0c 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -1219,5 +1219,6 @@ void do_smm_enter(CPUX86State *env1); void cpu_report_tpr_access(CPUX86State *env, TPRAccess access); void enable_kvm_pv_eoi(void); +void disable_kvm_mmu_op(void); #endif /* CPU_I386_H */ -- 1.7.11.7 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH 3/9] target-i386: check/enforce: Fix CPUID leaf numbers on error messages
On Fri, Jan 4, 2013 at 3:37 PM, Eduardo Habkost ehabk...@redhat.com wrote: The -cpu check/enforce warnings are printing incorrect information about the missing flags. There are no feature flags on CPUID leaves 0 and 0x8000, but there were references to 0 and 0x8000 in the table at kvm_check_features_against_host(). This changes the model_features_t struct to contain the register number as well, so the error messages print the correct CPUID leaf+register information, instead of wrong CPUID leaf numbers. This also changes the format of the error messages, so they follow the CPUID.leaf.register.name [bit offset] convention used on Intel documentation. Example output: $ qemu-system-x86_64 -machine pc-1.0,accel=kvm -cpu Opteron_G4,+ia64,enforce warning: host doesn't support requested feature: CPUID.01H:EDX.ia64 [bit 30] warning: host doesn't support requested feature: CPUID.01H:ECX.xsave [bit 26] warning: host doesn't support requested feature: CPUID.01H:ECX.avx [bit 28] warning: host doesn't support requested feature: CPUID.8001H:ECX.abm [bit 5] warning: host doesn't support requested feature: CPUID.8001H:ECX.sse4a [bit 6] warning: host doesn't support requested feature: CPUID.8001H:ECX.misalignsse [bit 7] warning: host doesn't support requested feature: CPUID.8001H:ECX.3dnowprefetch [bit 8] warning: host doesn't support requested feature: CPUID.8001H:ECX.xop [bit 11] warning: host doesn't support requested feature: CPUID.8001H:ECX.fma4 [bit 16] Unable to find x86 CPU definition $ Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Cc: k...@vger.kernel.org --- target-i386/cpu.c | 38 +- target-i386/cpu.h | 3 +++ 2 files changed, 32 insertions(+), 9 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 4e26b11..6c43ace 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -124,6 +124,24 @@ static const char *cpuid_7_0_ebx_feature_name[] = { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, }; +const char *get_register_name_32(unsigned int reg) +{ +static const char *reg_names[CPU_NB_REGS32] = { +[R_EAX] = EAX, +[R_ECX] = ECX, +[R_EDX] = EDX, +[R_EBX] = EBX, +[R_ESP] = ESP, +[R_EBP] = EBP, +[R_ESI] = ESI, +[R_EDI] = EDI, +}; + +if (reg CPU_NB_REGS32) Missing braces. +return NULL; +return reg_names[reg]; +} + /* collects per-function cpuid data */ typedef struct model_features_t { @@ -132,7 +150,8 @@ typedef struct model_features_t { uint32_t check_feat; const char **flag_names; uint32_t cpuid; -} model_features_t; +int reg; +} model_features_t; int check_cpuid = 0; int enforce_cpuid = 0; @@ -921,10 +940,11 @@ static int unavailable_host_feature(struct model_features_t *f, uint32_t mask) for (i = 0; i 32; ++i) if (1 i mask) { -fprintf(stderr, warning: host cpuid %04x_%04x lacks requested - flag '%s' [0x%08x]\n, -f-cpuid 16, f-cpuid 0x, -f-flag_names[i] ? f-flag_names[i] : [reserved], mask); +fprintf(stderr, warning: host doesn't support requested feature: +CPUID.%02XH:%s%s%s [bit %d]\n, +f-cpuid, get_register_name_32(f-reg), This could attempt to print NULL via %s format, which is not OK with all C libraries. If we trust that the callers always pass valid numbers, the check above could be turned into assert(). +f-flag_names[i] ? . : , +f-flag_names[i] ? f-flag_names[i] : , i); break; } return 0; @@ -943,13 +963,13 @@ static int kvm_check_features_against_host(x86_def_t *guest_def) int rv, i; struct model_features_t ft[] = { {guest_def-features, host_def.features, -~0, feature_name, 0x}, +~0, feature_name, 0x0001, R_EDX}, {guest_def-ext_features, host_def.ext_features, -~CPUID_EXT_HYPERVISOR, ext_feature_name, 0x0001}, +~CPUID_EXT_HYPERVISOR, ext_feature_name, 0x0001, R_ECX}, {guest_def-ext2_features, host_def.ext2_features, -~PPRO_FEATURES, ext2_feature_name, 0x8000}, +~PPRO_FEATURES, ext2_feature_name, 0x8001, R_EDX}, {guest_def-ext3_features, host_def.ext3_features, -~CPUID_EXT3_SVM, ext3_feature_name, 0x8001}}; +~CPUID_EXT3_SVM, ext3_feature_name, 0x8001, R_ECX}}; assert(kvm_enabled()); diff --git a/target-i386/cpu.h b/target-i386/cpu.h index 27c8d0c..ab81a5c 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -1221,4 +1221,7 @@ void cpu_report_tpr_access(CPUX86State
Re: [libvirt] [Qemu-devel] [PATCH 3/4] qemu-config: Add -drive fd and opaque options
On Fri, Oct 5, 2012 at 6:07 PM, Corey Bryant cor...@linux.vnet.ibm.com wrote: These new options can be used for passing drive file descriptors on the command line, instead of using the file option to specify a file name. These new command line options mirror the existing add-fd QMP command which allows an fd to be passed to QEMU via SCM_RIGHTS and added to an fd set. The opaque option is also available with add-fd, and allows a free-form string to be stored in the fd set along with the fd. Signed-off-by: Corey Bryant cor...@linux.vnet.ibm.com --- qemu-config.c | 8 qemu-options.hx | 15 +-- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/qemu-config.c b/qemu-config.c index cd1ec21..91053dd 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -114,6 +114,14 @@ static QemuOptsList qemu_drive_opts = { .name = copy-on-read, .type = QEMU_OPT_BOOL, .help = copy read data from backing file into image file, +},{ +.name = fd, +.type = QEMU_OPT_NUMBER, +.help = disk image file descriptor, +},{ +.name = opaque, 'opaque' is not very descriptive and it's also not obvious (except from the help text) that it's only interesting for file descriptors. How about fd_name, fd_tag or fd_descr? +.type = QEMU_OPT_STRING, +.help = free-form string used to describe fd, }, { /* end of list */ } }, diff --git a/qemu-options.hx b/qemu-options.hx index 7d97f96..513530f 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -149,7 +149,7 @@ using @file{/dev/cdrom} as filename (@pxref{host_drives}). ETEXI DEF(drive, HAS_ARG, QEMU_OPTION_drive, --drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n +-drive [file=file|fd=fd[,opaque=o]][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n [,serial=s][,addr=A][,id=name][,aio=threads|native]\n @@ -170,6 +170,12 @@ this drive. If the filename contains comma, you must double it Special files such as iSCSI devices can be specified using protocol specific URLs. See the section for Device URL Syntax for more information. +@item fd=@var{fd} +This option defines which disk image (@pxref{disk_images}) file descriptor to +use with this drive. +@item opaque=@var{opaque} +This option defines a free-form string that describes @var{fd}. This is used +when storing @var{fd} in a file descriptor set. @item if=@var{interface} This option defines on which type on interface the drive is connected. Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio. @@ -257,12 +263,17 @@ qemu-system-i386 -drive file=file,index=2,media=disk qemu-system-i386 -drive file=file,index=3,media=disk @end example +You can open an image using a pre-opened file descriptor: +@example +qemu-system-i386 -drive fd=24,opaque=rdwr:/path/to/file,index=0,media=disk +@end example + You can connect a CDROM to the slave of ide0: @example qemu-system-i386 -drive file=file,if=ide,index=1,media=cdrom @end example -If you don't specify the file= argument, you define an empty drive: +If you don't specify the file= or fd= arguments, you define an empty drive: @example qemu-system-i386 -drive if=ide,index=1,media=cdrom @end example -- 1.7.11.4 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v9 5/7] block: Convert close calls to qemu_close
On Sat, Aug 11, 2012 at 1:14 PM, Corey Bryant cor...@linux.vnet.ibm.com wrote: This patch converts all block layer close calls, that correspond to qemu_open calls, to qemu_close. Signed-off-by: Corey Bryant cor...@linux.vnet.ibm.com --- v5: -This patch is new in v5. (kw...@redhat.com, ebl...@redhat.com) v6-v9: -No changes block/raw-posix.c | 24 block/raw-win32.c |2 +- block/vmdk.c |4 ++-- block/vpc.c |2 +- block/vvfat.c | 12 ++-- osdep.c |5 + qemu-common.h |1 + savevm.c |4 ++-- 8 files changed, 30 insertions(+), 24 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index 08b997e..6be20b1 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -271,7 +271,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename, out_free_buf: qemu_vfree(s-aligned_buf); out_close: -close(fd); +qemu_close(fd); return -errno; } @@ -376,7 +376,7 @@ static void raw_close(BlockDriverState *bs) { BDRVRawState *s = bs-opaque; if (s-fd = 0) { -close(s-fd); +qemu_close(s-fd); s-fd = -1; if (s-aligned_buf != NULL) qemu_vfree(s-aligned_buf); @@ -580,7 +580,7 @@ static int raw_create(const char *filename, QEMUOptionParameter *options) if (ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) { result = -errno; } -if (close(fd) != 0) { +if (qemu_close(fd) != 0) { result = -errno; } } @@ -850,7 +850,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags) if (fd 0) { bsdPath[strlen(bsdPath)-1] = '1'; } else { -close(fd); +qemu_close(fd); } filename = bsdPath; } @@ -889,7 +889,7 @@ static int fd_open(BlockDriverState *bs) last_media_present = (s-fd = 0); if (s-fd = 0 (get_clock() - s-fd_open_time) = FD_OPEN_TIMEOUT) { -close(s-fd); +qemu_close(s-fd); s-fd = -1; #ifdef DEBUG_FLOPPY printf(Floppy closed\n); @@ -988,7 +988,7 @@ static int hdev_create(const char *filename, QEMUOptionParameter *options) else if (lseek(fd, 0, SEEK_END) total_size * BDRV_SECTOR_SIZE) ret = -ENOSPC; -close(fd); +qemu_close(fd); return ret; } @@ -1038,7 +1038,7 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags) return ret; /* close fd so that we can reopen it as needed */ -close(s-fd); +qemu_close(s-fd); s-fd = -1; s-fd_media_changed = 1; @@ -1072,7 +1072,7 @@ static int floppy_probe_device(const char *filename) prio = 100; outc: -close(fd); +qemu_close(fd); out: return prio; } @@ -1107,14 +1107,14 @@ static void floppy_eject(BlockDriverState *bs, bool eject_flag) int fd; if (s-fd = 0) { -close(s-fd); +qemu_close(s-fd); s-fd = -1; } fd = qemu_open(bs-filename, s-open_flags | O_NONBLOCK); if (fd = 0) { if (ioctl(fd, FDEJECT, 0) 0) perror(FDEJECT); -close(fd); +qemu_close(fd); } } @@ -1175,7 +1175,7 @@ static int cdrom_probe_device(const char *filename) prio = 100; outc: -close(fd); +qemu_close(fd); out: return prio; } @@ -1283,7 +1283,7 @@ static int cdrom_reopen(BlockDriverState *bs) * FreeBSD seems to not notice sometimes... */ if (s-fd = 0) -close(s-fd); +qemu_close(s-fd); fd = qemu_open(bs-filename, s-open_flags, 0644); if (fd 0) { s-fd = -1; diff --git a/block/raw-win32.c b/block/raw-win32.c index 8d7838d..c56bf83 100644 --- a/block/raw-win32.c +++ b/block/raw-win32.c @@ -261,7 +261,7 @@ static int raw_create(const char *filename, QEMUOptionParameter *options) return -EIO; set_sparse(fd); ftruncate(fd, total_size * 512); -close(fd); +qemu_close(fd); return 0; } diff --git a/block/vmdk.c b/block/vmdk.c index 557dc1b..daee426 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -1258,7 +1258,7 @@ static int vmdk_create_extent(const char *filename, int64_t filesize, ret = 0; exit: -close(fd); +qemu_close(fd); return ret; } @@ -1506,7 +1506,7 @@ static int vmdk_create(const char *filename, QEMUOptionParameter *options) } ret = 0; exit: -close(fd); +qemu_close(fd); return ret; } diff --git a/block/vpc.c b/block/vpc.c index 60ebf5a..c0b82c4 100644 --- a/block/vpc.c +++ b/block/vpc.c @@ -744,7 +744,7 @@ static int vpc_create(const char *filename, QEMUOptionParameter *options) } fail: -close(fd); +qemu_close(fd); return
Re: [libvirt] [Qemu-devel] [PATCH 4/7] compiler: add macro for GCC weak symbols
On Sat, Jul 28, 2012 at 6:50 AM, Peter Maydell peter.mayd...@linaro.org wrote: On 27 July 2012 16:31, Anthony Liguori aligu...@us.ibm.com wrote: Peter Maydell peter.mayd...@linaro.org writes: My approach to this is to avoid non-standard things http://en.wikipedia.org/wiki/C99#Implementations So unless you plan on compiling QEMU with xlc, pgi, or icc, I don't think relying on standard things really helps. QEMU doesn't support C99, it supports GCC. OK, you could perhaps rephrase that as 'mainstream' rather than 'standards-compliant'. I don't think we need to be strict C99; I do think we have more than one working host OS and that patches that use functionality that's documented not to work on all gcc targets ought to come attached to a statement that they've been tested. (MacOSX isn't actually in MAINTAINERS as a host so is a bit of a red herring. Windows is listed.) I'd also like to avoid a world where everything only targets GCC on x86_64 on Linux with KVM. Embrace and extend may also be seen to apply to GCC extensions. So if you really like weak symbols, go ahead. I'm just saying you're imposing a bigger testing burden on yourself than if you handled this some other way. (I do think it would be nice to care about building with CLANG, because there are some static analysis tools that we would then be able to run. That doesn't mean dropping all GCC extensions, though, because CLANG does support a lot of them.) -- PMM -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH 4/7] compiler: add macro for GCC weak symbols
On Fri, Jul 27, 2012 at 3:31 PM, Anthony Liguori aligu...@us.ibm.com wrote: Peter Maydell peter.mayd...@linaro.org writes: On 27 July 2012 15:27, Anthony Liguori aligu...@us.ibm.com wrote: Peter Maydell peter.mayd...@linaro.org writes: The GCC manual says Weak symbols are supported for ELF targets, and also for a.out targets when using the GNU assembler and linker. Have you tested this on Windows and MacOSX ? Weak symbols are supposed to be supported by mingw32. I have no idea about MacOS X. I have no way to develop or test for MacOS X using free software so I honestly don't care about it. My approach to this is to avoid non-standard things http://en.wikipedia.org/wiki/C99#Implementations So unless you plan on compiling QEMU with xlc, pgi, or icc, I don't think relying on standard things really helps. LLVM/Clang should definitely be in the plan. QEMU doesn't support C99, it supports GCC. There's no point in debating about whether we should rely on extensions or not. We already do--extensively. Not so extensively. There are a few extensions for which there is no simple alternative (like QEMU_PACKED) but other compilers likely need similar extensions. Then there are other extensions (like :? without middle expression) which can be easily avoided. We should avoid to use the non-standard extensions whenever possible. Regards, Anthony Liguori -- if I write a patch which is pretty much standard C then it's the platform's problem if it mishandles it. If I write a patch that uses a compiler-specific or OS-specific thing then I have to also provide the relevant alternatives...so I try to avoid doing that :-) -- PMM -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH 4/7] compiler: add macro for GCC weak symbols
On Fri, Jul 27, 2012 at 8:51 PM, Anthony Liguori aligu...@us.ibm.com wrote: Blue Swirl blauwir...@gmail.com writes: On Fri, Jul 27, 2012 at 3:31 PM, Anthony Liguori aligu...@us.ibm.com wrote: Peter Maydell peter.mayd...@linaro.org writes: On 27 July 2012 15:27, Anthony Liguori aligu...@us.ibm.com wrote: Peter Maydell peter.mayd...@linaro.org writes: The GCC manual says Weak symbols are supported for ELF targets, and also for a.out targets when using the GNU assembler and linker. Have you tested this on Windows and MacOSX ? Weak symbols are supposed to be supported by mingw32. I have no idea about MacOS X. I have no way to develop or test for MacOS X using free software so I honestly don't care about it. My approach to this is to avoid non-standard things http://en.wikipedia.org/wiki/C99#Implementations So unless you plan on compiling QEMU with xlc, pgi, or icc, I don't think relying on standard things really helps. LLVM/Clang should definitely be in the plan. weak symbols are supported by clang. QEMU doesn't support C99, it supports GCC. There's no point in debating about whether we should rely on extensions or not. We already do--extensively. Not so extensively. There are a few extensions for which there is no simple alternative (like QEMU_PACKED) but other compilers likely need similar extensions. Then there are other extensions (like :? without middle expression) which can be easily avoided. We should avoid to use the non-standard extensions whenever possible. I disagree. I don't see a point to it. QEMU has never been routinely built on anything other than GCC. Why go to a lot of trouble to support a user base that doesn't exist? If someone comes along and actively maintains support for another compiler, we can revisit. But otherwise, there's no practical reason to avoid extensions. Because it's more compliant to standards. There's also very little benefit from using the nonessential extensions. Regards, Anthony Liguori Regards, Anthony Liguori -- if I write a patch which is pretty much standard C then it's the platform's problem if it mishandles it. If I write a patch that uses a compiler-specific or OS-specific thing then I have to also provide the relevant alternatives...so I try to avoid doing that :-) -- PMM -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH qemu 4/6] vl.c: change 'defconfig' variable to bool
On Tue, Apr 24, 2012 at 20:32, Eduardo Habkost ehabk...@redhat.com wrote: Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- vl.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/vl.c b/vl.c index 1e5e593..a4f4676 100644 --- a/vl.c +++ b/vl.c @@ -2279,7 +2279,7 @@ int main(int argc, char **argv, char **envp) #ifdef CONFIG_VNC int show_vnc_port = 0; #endif - int defconfig = 1; + int defconfig = true; The type is still 'int', is that intentional? const char *log_mask = NULL; const char *log_file = NULL; GMemVTable mem_trace = { @@ -2346,7 +2346,7 @@ int main(int argc, char **argv, char **envp) popt = lookup_opt(argc, argv, optarg, optind); switch (popt-index) { case QEMU_OPTION_nodefconfig: - defconfig=0; + defconfig = false; break; } } -- 1.7.3.2 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4] Add support for fd: protocol
On Mon, Aug 22, 2011 at 5:42 PM, Corey Bryant cor...@linux.vnet.ibm.com wrote: On 08/22/2011 01:25 PM, Anthony Liguori wrote: On 08/22/2011 11:50 AM, Daniel P. Berrange wrote: On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: I don't think it makes sense to have qemu-fe do dynamic labelling. You certainly could avoid the fd passing by having qemu-fe do the open though and just let qemu-fe run without the restricted security context. qemu-fe would also not be entirely simple, Indeed. I do like the idea of a privileged qemu-fe performing the open and passing the fd to a restricted qemu. Me too. However, I get the impression that this won't get delivered nearly as quickly as fd: passing could be. How soon do we need image isolation for NFS? Btw, this sounds similar to what Blue Swirl recommended here on v1 of this patch: http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02187.html I was thinking about simply doing fork() + setuid() at some point and using the FD passing structures directly. But would it bring advantages to have two separate executables, are they different from access control point of view vs. single but forked one? Regards, Corey because it will need to act as a proxy for the monitor, in order to make hotplug work. ie the mgmt app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would then have to open the file and send 'drive_add fd:NN' onto the real QEMU, and then pass the results on back. In addition qemu-fe would still have to be under some kind of restricted security context for it to be acceptable. This is going to want to be as locked down as possible. I think there's got to be some give and take here. It should at least be as locked down as libvirtd. From a security point of view, we should be able to agree that we want libvirtd to be as locked down as possible. But there shouldn't be a hard requirement to lock down qemu-fe more than libvirtd. Instead, the requirement should be for qemu-fe to be as/more vigilant in not trusting qemu-system-x86_64 as libvirtd is. The fundamental problem here, is that there is some logic in libvirtd that rightly belongs in QEMU. In order to preserve the security model, that means that we're going to have to take a subsection of QEMU and trust it more. So I'd see that you'd likely end up with the qemu-fe security policy being identical to the qemu security policy, Then there's no point in doing qemu-fe. qemu-fe should be thought of as QEMU supplied libvirtd plugin. with the exception that it would be allowed to open files on NFS without needing them to be labelled. So I don't really see that all this gives us any tangible benefits over just allowing the mgmt app to pass in the FDs directly. But libvirt would still need to parse image files. Not neccessarily. As mentioned below, it is entirely possible to enable the mgmt app to pass in details of the backing files, at which point no image parsing is required by libvirt. Hence my assertion that the question of who does image parsing is irrelevant to this discussion. That's certainly true. Regards, Anthony Liguori -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4] Add support for fd: protocol
On Mon, Aug 22, 2011 at 6:22 PM, Daniel P. Berrange berra...@redhat.com wrote: On Mon, Aug 22, 2011 at 12:25:25PM -0500, Anthony Liguori wrote: On 08/22/2011 11:50 AM, Daniel P. Berrange wrote: On Mon, Aug 22, 2011 at 11:29:12AM -0500, Anthony Liguori wrote: I don't think it makes sense to have qemu-fe do dynamic labelling. You certainly could avoid the fd passing by having qemu-fe do the open though and just let qemu-fe run without the restricted security context. qemu-fe would also not be entirely simple, Indeed. because it will need to act as a proxy for the monitor, in order to make hotplug work. ie the mgmt app would be sending 'drive_add file:/foo/bar' to qemu-fe, which would then have to open the file and send 'drive_add fd:NN' onto the real QEMU, and then pass the results on back. In addition qemu-fe would still have to be under some kind of restricted security context for it to be acceptable. This is going to want to be as locked down as possible. I think there's got to be some give and take here. It should at least be as locked down as libvirtd. From a security point of view, we should be able to agree that we want libvirtd to be as locked down as possible. But there shouldn't be a hard requirement to lock down qemu-fe more than libvirtd. Instead, the requirement should be for qemu-fe to be as/more vigilant in not trusting qemu-system-x86_64 as libvirtd is. The fundamental problem here, is that there is some logic in libvirtd that rightly belongs in QEMU. In order to preserve the security model, that means that we're going to have to take a subsection of QEMU and trust it more. Well we have a process that makes security decisions, and a process which applies those security decisions and a process which is confined by those decisions. Currently libvirtd makes applies the decisions, and qemu is confined. A qemu-fe model would mean that libvirt is making the decisions, but is then relying on qemu-fe to apply them. IMHO that split is undesirable, but that's besides the point, since this is not a decision that needs to be made now. 'qemu-fe' needs to have a way to communicate with the confined process ('qemu-system-XXX') to supply it the resources (file FDs) it needs to access. The requirements of such a comms channel for qemu-fe are going to be the same as those needed by libvirtd talking to QEMU today, or indeed by any process that is applying security decisions to QEMU. So inventing a 'qemu-fe' does not make the need for passing FDs into QEMU go away, nor does it improve/change the overall security of the architecture, it is merely a different architectural choice. It does however require alot more development work to create this new 'qemu-fe' program and get it debugged generally usable in production deployments So adding FD passing to QEMU blocks creation of a 'qemu-fe' program, but is not dependant on it. Thus we can add FD passing to QEMU today and be a step closer to being able to create a 'qemu-fe' in the future, while also enabling libvirtd other mgmt apps to take advantage of this to solve the immediate security problem we have with NFS today, without having to wait a months or years for 'qemu-fe' to get into a usable state. The advantage of this qemu-fe approach is that block format internals does not need to be shared between QEMU and libvirt. FD passing can still be useful for other purposes. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v3] Add support for fd: protocol
On Tue, Jul 26, 2011 at 3:51 PM, Corey Bryant cor...@linux.vnet.ibm.com wrote: sVirt provides SELinux MAC isolation for Qemu guest processes and their corresponding resources (image files). sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, which is needed for image file isolation when using the sVirt SELinux security driver in libvirt. The proposed solution entails a combination of Qemu, libvirt, and SELinux patches that work together to isolate multiple guests' images when they're stored in the same NFS mount. This results in an environment where sVirt isolation and NFS image file isolation can both be provided. This patch contains the Qemu code to support this solution. I would like to solicit input from the libvirt community prior to starting the libvirt patch. Currently, Qemu opens an image file in addition to performing the necessary read and write operations. The proposed solution will move the open out of Qemu and into libvirt. Once libvirt opens an image file for the guest, it will pass the file descriptor to Qemu via a new fd: protocol. If the image file resides in an NFS mount, the following SELinux policy changes will provide image isolation: - A new SELinux boolean is created (e.g. virt_read_write_nfs) to allow Qemu (svirt_t) to only have SELinux read and write permissions on nfs_t files - Qemu (svirt_t) also gets SELinux use permissions on libvirt (virtd_t) file descriptors Following is a sample invocation of Qemu using the fd: protocol on the command line: qemu -drive file=fd:4,format=qcow2 The fd: protocol is also supported by the drive_add monitor command. This requires that the specified file descriptor is passed to the monitor alongside a prior getfd monitor command. There are some additional features provided by certain image types where Qemu reopens the image file. All of these scenarios will be unsupported for the fd: protocol, at least for this patch: - The -snapshot command line option - The savevm monitor command - The snapshot_blkdev monitor command - Use of copy-on-write image files - The -cdrom command line option - The -drive command line option with media=cdrom - The change monitor command In the earlier discussion, virtio-blk and iSCSI were identified as interesting protocols to be used with file descriptors in the future. This patch would bind fd: protocol only to raw file interface to be used by qcow2 etc. higher levels. I guess with small changes the protocol could be made selectable. Maybe it's just changing command line to format=virtio-blk or format=iscsi for those uses, though I fear there may be a layering violation. Considering the privilege separation side, this patch seems to be somewhat orthogonal to that (I don't expect any command line passing and parsing to happen between QEMU processes) but the low level fd handling seems useful modulo the comments made by others. The thought is that this support can be added in the future, but is not required for the initial fd: support. This patch was tested with the following formats: raw, cow, qcow, qcow2, qed, and vmdk, using the fd: protocol from the command line and the monitor. Tests were also run to verify existing file name support and qemu-img were not regressed. Non-valid file descriptors, fd: without format, snapshot and backing files, and cdrom were also tested. v2: - Add drive_add monitor command support - Fence off unsupported features that re-open image file v3: - Fence off cdrom and change monitor command support Signed-off-by: Corey Bryant cor...@linux.vnet.ibm.com --- block.c | 16 ++ block.h | 1 + block/cow.c | 5 +++ block/qcow.c | 5 +++ block/qcow2.c | 5 +++ block/qed.c | 4 ++ block/raw-posix.c | 81 +++-- block/vmdk.c | 5 +++ block_int.h | 1 + blockdev.c | 19 monitor.c | 5 +++ monitor.h | 1 + qemu-options.hx | 8 +++-- qemu-tool.c | 5 +++ 14 files changed, 149 insertions(+), 12 deletions(-) diff --git a/block.c b/block.c index 24a25d5..500db84 100644 --- a/block.c +++ b/block.c @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags, char tmp_filename[PATH_MAX]; char backing_filename[PATH_MAX]; + if (bdrv_is_fd_protocol(bs)) { + return -ENOTSUP; + } + /* if snapshot, we create a temporary backing file and open it instead of opening 'filename' directly */ @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags, /* Find the right image format driver */ if (!drv) {
Re: [libvirt] [Qemu-devel] live snapshot wiki updated
On Fri, Jul 22, 2011 at 8:06 AM, Stefan Hajnoczi stefa...@gmail.com wrote: On Thu, Jul 21, 2011 at 8:42 PM, Blue Swirl blauwir...@gmail.com wrote: On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote: Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt. The backing file information changes when image streaming completes. Before: fedora.img - my_vm.qed After: my_vm.qed (fedora.img is no longer referenced) The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file. I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service. Today we seem to be sprinkling a little bit of storage management into QEMU and a little bit into libvirt :(. In that spirit it is much nicer to think of storage like a SAN appliance where you have LUNs that you access as block devices. It also provides an API for snapshotting, cloning LUNs, etc. Let's move to that model instead of worrying about how to spread storage logic across QEMU and libvirt. Would NBD protocol fit to this purpose, or is it too simple? Then libvirt would handle the storage format completely and present an NBD interface to QEMU (or give an fd to an external service) and QEMU would not care about the storage format in this mode at all. NBD does not support flush (fdatasync). Therefore it only supports the slow cache=writethrough mode in a safe manner. Maybe NBD could still be used in networked setups as a secondary alternative. It would be neat to use virtio-blk as the interface because it can be passed through to the guest. The guest talks directly to the storage management service without going through QEMU. The trick is to do something like vhost: 1. An ioeventfd for virtqueue (guest-host) kicks 2. An irqfd for host-guest kicks 3. Shared memory for vring and zero-copy data access The storage management service provides a UNIX domain socket over which fds can be passed to set up the vhost-like virtio-blk interface. Moving the image format code into a separate program makes it possible to safely write to a backing file while VMs are using it because the storage service can be host-wide, not per-VM. For example, streaming a shared backing file over NFS while running VMs using copy-on-write images. If we ever want to do deduplication or other global operations, then this approach is nice too. To summarize: The storage service manages image files including creation, deletion, snapshotting, and actual I/O. QEMU uses a vhost-like virtio-blk interface and can pass it directly into the guest. libvirt uses the storage service API without needing to parse image files or keep track of backing file relationships. Excellent plan. If one day kernel provides builtin virtio-blk services which can be passed via libvirt and QEMU to the guest, we'll even have zero copy all the way. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] live snapshot wiki updated
On Fri, Jul 22, 2011 at 12:11 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Jul 22, 2011 at 8:22 AM, Kevin Wolf kw...@redhat.com wrote: Am 21.07.2011 17:01, schrieb Stefan Hajnoczi: On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote: Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt. The backing file information changes when image streaming completes. Before: fedora.img - my_vm.qed After: my_vm.qed (fedora.img is no longer referenced) The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file. I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service. And how do you implement that in a way that works on all platforms, and without root privileges? I can't see this happen unless it stays completely optional. The cross-platform way would be an iSCSI target that understands image formats. But iSCSI requires copying when doing I/O and we can't pass through virtio-blk. The guest could use iSCSI directly using the network interface without virtio-blk. This setup wouldn't give max performance in local use but it could also be useful in some networked setups and probably more useful than NBD. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] live snapshot wiki updated
On Fri, Jul 22, 2011 at 11:11 AM, Kevin Wolf kw...@redhat.com wrote: Am 22.07.2011 09:36, schrieb Avi Kivity: On 07/20/2011 04:51 PM, Kevin Wolf wrote: The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that? This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev. It can be done without blockdev. Have a dictionary that translates filenames, and populate it from the command line (for a bonus, translate a filename to a file descriptor inherited from the caller or passed via the monitor). Sure, you can always add ugly hacks, but it isn't the right solution that we want to use for all times. However, once we use it, it will show up in the external API and we'll never get rid of it again. Fully agree. This would also be a highly specific API for QCOW2 and similar formats. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] live snapshot wiki updated
On Thu, Jul 21, 2011 at 11:25 AM, Kevin Wolf kw...@redhat.com wrote: Am 20.07.2011 19:20, schrieb Blue Swirl: On Wed, Jul 20, 2011 at 4:51 PM, Kevin Wolf kw...@redhat.com wrote: Am 20.07.2011 15:25, schrieb Jes Sorensen: On 07/20/11 12:01, Kevin Wolf wrote: Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen. My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem. To me this sounds more like a problem than a solution. It basically means that during an open (which may even be initiated by a monitor command), you need monitor interaction. It basically means that open becomes asynchronous, and requires clients to deal with that, which sounds at least interesting... Also you have to add some magic to all places opening something. I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image. The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that? This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev. There could still be some issues: Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage? This could be handled so that instead of naming the backing file, QEMU asks for a descriptor for the backing file by presenting the descriptor to main file C, qemu shouldn't ask for anything. libvirt shouldn't give it a filename in the first place. It should do something like this: { execute: blockdev_add, arguments= { driver: fd, fd: 4, backing-file: { driver: fd, fd: 5 } }} And then qemu doesn't even have a reason to know that there is something called CA. Yes, that's better. but I think the real solution is that libvirt should handle the storage formats completely and it should present QEMU with only a raw file like interface (read/write/seek) for the data. Then any backing files would be handled within libvirt. Performance could suffer, though. I like your humour. :-) Well, for some applications, security is more important than performance or convenience. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] live snapshot wiki updated
On Thu, Jul 21, 2011 at 11:07 AM, Jes Sorensen jes.soren...@redhat.com wrote: On 07/20/11 21:51, Blue Swirl wrote: And the snapshot_blkdev monitor command is a case where qemu needs to create a new qcow2 image on the fly, while referencing the name of an existing file. What backing name do you put in the new qcow2 file unless you already have a name association for all fds already open for the existing backing file? For backing file with original name of /path/in/storage, QEMU could present the fd and the backin path by requesting something like fd:12,/path/in/storage. The next file in chain /path2/file would be fd:12,fd:34,/path2/file. Or if possible, -fd 12 -backing /path/in/storage with spaces and funny characters escaped etc. Rather than trying to do this by mangling files on the disk, I would suggest we allow registering a call-back open function, which calls back into libvirt and requests it to open a given file. It can then do all it's security foo to decide whether or not to allow the file to be open. Just to clarify: I was not proposing any mangling of the files. This is relatively clean and avoids the mess of relying on outside processes messing about in the images. Cheers, Jes -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] live snapshot wiki updated
On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote: Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt. The backing file information changes when image streaming completes. Before: fedora.img - my_vm.qed After: my_vm.qed (fedora.img is no longer referenced) The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file. I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service. Today we seem to be sprinkling a little bit of storage management into QEMU and a little bit into libvirt :(. In that spirit it is much nicer to think of storage like a SAN appliance where you have LUNs that you access as block devices. It also provides an API for snapshotting, cloning LUNs, etc. Let's move to that model instead of worrying about how to spread storage logic across QEMU and libvirt. Would NBD protocol fit to this purpose, or is it too simple? Then libvirt would handle the storage format completely and present an NBD interface to QEMU (or give an fd to an external service) and QEMU would not care about the storage format in this mode at all. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] live snapshot wiki updated
On Wed, Jul 20, 2011 at 4:51 PM, Kevin Wolf kw...@redhat.com wrote: Am 20.07.2011 15:25, schrieb Jes Sorensen: On 07/20/11 12:01, Kevin Wolf wrote: Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen. My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem. To me this sounds more like a problem than a solution. It basically means that during an open (which may even be initiated by a monitor command), you need monitor interaction. It basically means that open becomes asynchronous, and requires clients to deal with that, which sounds at least interesting... Also you have to add some magic to all places opening something. I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image. The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that? This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev. There could still be some issues: Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage? This could be handled so that instead of naming the backing file, QEMU asks for a descriptor for the backing file by presenting the descriptor to main file C, but I think the real solution is that libvirt should handle the storage formats completely and it should present QEMU with only a raw file like interface (read/write/seek) for the data. Then any backing files would be handled within libvirt. Performance could suffer, though. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] live snapshot wiki updated
On Wed, Jul 20, 2011 at 4:46 PM, Eric Blake ebl...@redhat.com wrote: On 07/20/2011 07:25 AM, Jes Sorensen wrote: I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image. The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that? We've already told you - qemu must have a way to be passed fds which are associated with names, and when a file refers to another backing file by name, then qemu falls back on its fd/name mapping to use the already-passed fd instead. Which implies that someone else, either libvirt or a qemu-maintained libblockformat.so, needs to have a stable interface for parsing the backing file name out of an arbitrary qcow2 file, and that this interface must work no matter how many other extensions are added to qcow2. I'd avoid any name based access in this case. If QEMU has write access to main file, it could forge the backing file name in main file to point to for example /etc/shadow and then request libvirt to perform the opening. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] live snapshot wiki updated
On Wed, Jul 20, 2011 at 8:41 PM, Eric Blake ebl...@redhat.com wrote: On 07/20/2011 11:20 AM, Blue Swirl wrote: There could still be some issues: Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage? The fix for CVE-2010-2238 already deals with this: if primary image C refers to backing file CA of raw format, but does not state what file format CA contains, then a malicious guest can modify the contents of CA to appear to be yet another qcow2 image. At which point, if libvirt follows the backing file specified in CA, then yes, the malicious guest really can cause libvirt to expose arbitrary file CB for manipulation by the guest. But that security hole was already plugged - by default, libvirt refuses to probe backing files parsed from qcow2 headers for file format, but instead requires the outer qcow2 header to also include the a file format designation for the backing file. At which point, you then have a safe chain: if C refers to CA, then libvirt knows that both C and CA are essential to the storage presented by giving qemu the file name C, and the guest will already be modifying CA, but there is no storage corruption involved. But what if CA is accessed even if C is not? For example, QEMU opens C (to determine CA and write new information about the path), closes it and then requests CA? That is, as long as libvirt can already accurately read the chain of backing files from any starting point, then it can hand that entire chain of backing files (whether by the topmost file name as it does now, or whether by a series of fds as is being proposed) to qemu. This could be handled so that instead of naming the backing file, QEMU asks for a descriptor for the backing file by presenting the descriptor to main file C, but I think the real solution is that libvirt should handle the storage formats completely and it should present QEMU with only a raw file like interface (read/write/seek) for the data. Then any backing files would be handled within libvirt. Performance could suffer, though. The monitor interface was not designed to throw the read()/write()/seek() burden back on libvirt, and indeed that would kill performance so it is a non-starter idea. All we need for security is the open() burden to be shifted out of qemu and into libvirt. Obviously the interface should be faster than monitor, for example a pair of sockets with some efficient protocol. Monitor could still be used to set up these. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] live snapshot wiki updated
On Wed, Jul 20, 2011 at 8:47 PM, Eric Blake ebl...@redhat.com wrote: On 07/20/2011 11:27 AM, Blue Swirl wrote: We've already told you - qemu must have a way to be passed fds which are associated with names, and when a file refers to another backing file by name, then qemu falls back on its fd/name mapping to use the already-passed fd instead. Which implies that someone else, either libvirt or a qemu-maintained libblockformat.so, needs to have a stable interface for parsing the backing file name out of an arbitrary qcow2 file, and that this interface must work no matter how many other extensions are added to qcow2. I'd avoid any name based access in this case. If QEMU has write access to main file, it could forge the backing file name in main file to point to for example /etc/shadow and then request libvirt to perform the opening. Won't work. Well, it might work within the context of a single qemu process. But when that process ends, then libvirt would have to touch up the qcow2 headers of that file to replace the /etc/shadow name with the real backing file name, otherwise, the next time you restart qemu-img or a new qemu guest using the same image, the information has been lost, since the fd has been closed in the meantime. How would libvirt know to do this touch up? We really _do_ need a way to give qemu both an fd and the name of the file that the fd is tied to. On Linux, qemu could use /proc/self/fd to reconstruct the name from fd, but that's not portable to other OS. And we've already discussed how in the libvirt model, that libvirt is deemed more secure than qemu. Therefore, I think it is reasonable for qemu to make the assumptions that if it exposes a monitor command where the supervisor (libvirt or otherwise) can pass in both an fd and a file name, that either the supervisor is passing in correct information, or that the bug is in the supervisor and not in qemu if the supervisor passes in wrong information and things blow up. Yes, I'm not worried about QEMU getting confused by libvirt. And the snapshot_blkdev monitor command is a case where qemu needs to create a new qcow2 image on the fly, while referencing the name of an existing file. What backing name do you put in the new qcow2 file unless you already have a name association for all fds already open for the existing backing file? For backing file with original name of /path/in/storage, QEMU could present the fd and the backin path by requesting something like fd:12,/path/in/storage. The next file in chain /path2/file would be fd:12,fd:34,/path2/file. Or if possible, -fd 12 -backing /path/in/storage with spaces and funny characters escaped etc. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] live snapshot wiki updated
On Wed, Jul 20, 2011 at 9:17 PM, Eric Blake ebl...@redhat.com wrote: On 07/20/2011 12:00 PM, Blue Swirl wrote: Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage? The fix for CVE-2010-2238 already deals with this: if primary image C refers to backing file CA of raw format, but does not state what file format CA contains, then a malicious guest can modify the contents of CA to appear to be yet another qcow2 image. At which point, if libvirt follows the backing file specified in CA, then yes, the malicious guest really can cause libvirt to expose arbitrary file CB for manipulation by the guest. But that security hole was already plugged - by default, libvirt refuses to probe backing files parsed from qcow2 headers for file format, but instead requires the outer qcow2 header to also include the a file format designation for the backing file. At which point, you then have a safe chain: if C refers to CA, then libvirt knows that both C and CA are essential to the storage presented by giving qemu the file name C, and the guest will already be modifying CA, but there is no storage corruption involved. But what if CA is accessed even if C is not? For example, QEMU opens C (to determine CA and write new information about the path), closes it and then requests CA? Why is qemu trying to access CA? Either because CA was mentioned as a backing file for C (in which case libvirt already knows about it, because either libvirt handed C to qemu at startup time after already parsing C's headers to learn that CA is a backing file, or because libvirt called the snapshot_blkdev command with the intent of having qemu populate CA with C as its backing file), or because qemu has a bug (in which case, libvirt should refuse the access to CA). So no new backing files can be introduced by QEMU after it has started without libvirt knowing it? Libvirt is already perfectly capable of tracking all files that qemu might need to access, and whether it is qemu or libvirt that does the open() of those files, we can still have libvirt validate whether each request for a file makes sense given the context of all previous files in use from the time the qemu command line was invoked and across all monitor commands in the meantime. On non-NFS solutions, where every file can have a SELinux label, then the security is then present by merely having libvirt relabel all such files to a unique label for that particular qemu process, and SELinux merely enforces that qemu cannot open() anything but what libvirt has already labeled. And since libvirt already knows which files to label in the non-NFS scenario, it already knows which fds to pass in the NFS scenario, at which point the ability to prevent qemu from open()ing an NFS file is a security enhancement. Your question about qemu wanting to use CA is thus answered independently of whether the fd management solution is solved by libvirt handing an fd for CA to qemu prior to any monitor command where qemu will then need to use CA, or whether qemu is taught to asynchronously ask libvirt to open an fd for CA on qemu's behalf. The answer is that libvirt already tracks whether qemu should access CA, and just needs a way to enforce that knowledge. The enforcement already exists for non-NFS via SELinux labels, and the proposal to add fd handling will expand that enforcement to also cover NFS. OK. I think fds would be useful internally in a privilege separation mode in plain QEMU too. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v2] Add support for fd: protocol
On Thu, Jun 16, 2011 at 5:48 PM, Corey Bryant brynt...@us.ibm.com wrote: On 06/15/2011 03:12 PM, Blue Swirl wrote: On Tue, Jun 14, 2011 at 4:31 PM, Corey Bryantbrynt...@us.ibm.com wrote: sVirt provides SELinux MAC isolation for Qemu guest processes and their corresponding resources (image files). sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, which is needed for image file isolation when using the sVirt SELinux security driver in libvirt. The proposed solution entails a combination of Qemu, libvirt, and SELinux patches that work together to isolate multiple guests' images when they're stored in the same NFS mount. This results in an environment where sVirt isolation and NFS image file isolation can both be provided. This patch contains the Qemu code to support this solution. I would like to solicit input from the libvirt community prior to starting the libvirt patch. Currently, Qemu opens an image file in addition to performing the necessary read and write operations. The proposed solution will move the open out of Qemu and into libvirt. Once libvirt opens an image file for the guest, it will pass the file descriptor to Qemu via a new fd: protocol. If the image file resides in an NFS mount, the following SELinux policy changes will provide image isolation: - A new SELinux boolean is created (e.g. virt_read_write_nfs) to allow Qemu (svirt_t) to only have SELinux read and write permissions on nfs_t files - Qemu (svirt_t) also gets SELinux use permissions on libvirt (virtd_t) file descriptors Following is a sample invocation of Qemu using the fd: protocol on the command line: qemu -drive file=fd:4,format=qcow2 The fd: protocol is also supported by the drive_add monitor command. This requires that the specified file descriptor is passed to the monitor alongside a prior getfd monitor command. There are some additional features provided by certain image types where Qemu reopens the image file. All of these scenarios will be unsupported for the fd: protocol, at least for this patch: - The -snapshot command line option - The savevm monitor command - The snapshot_blkdev monitor command - Starting Qemu with a backing file There's also native CDROM device. Did you consider adding an explicit reopen method to block layer? Thanks. Yes it looks like I overlooked CDROM reopens. I'm not sure that I'm clear on the purpose of the reopen function. Would the goal be to funnel all block layer reopens through a single function, enabling potential future support where a privileged layer of Qemu, or libvirt, performs the open? Eventually yes, but I think it would help also now by moving the checks to a single place. It's a bit orthogonal to this patch though. The thought is that this support can be added in the future, but is not required for the initial fd: support. This patch was tested with the following formats: raw, cow, qcow, qcow2, qed, and vmdk, using the fd: protocol from the command line and the monitor. Tests were also run to verify existing file name support and qemu-img were not regressed. Non-valid file descriptors, fd: without format, snapshot and backing files were also tested. Signed-off-by: Corey Bryantcor...@linux.vnet.ibm.com --- block.c | 16 ++ block.h | 1 + block/cow.c | 5 +++ block/qcow.c | 5 +++ block/qcow2.c | 5 +++ block/qed.c | 4 ++ block/raw-posix.c | 81 +++-- block/vmdk.c | 5 +++ block_int.h | 1 + blockdev.c | 10 ++ monitor.c | 5 +++ monitor.h | 1 + qemu-options.hx | 8 +++-- qemu-tool.c | 5 +++ 14 files changed, 140 insertions(+), 12 deletions(-) diff --git a/block.c b/block.c index 24a25d5..500db84 100644 --- a/block.c +++ b/block.c @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags, char tmp_filename[PATH_MAX]; char backing_filename[PATH_MAX]; + if (bdrv_is_fd_protocol(bs)) { + return -ENOTSUP; + } + /* if snapshot, we create a temporary backing file and open it instead of opening 'filename' directly */ @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags, /* Find the right image format driver */ if (!drv) { + /* format must be specified for fd: protocol */ + if (bdrv_is_fd_protocol(bs)) { + return -ENOTSUP
Re: [libvirt] [Qemu-devel] [PATCH v2] Add support for fd: protocol
On Tue, Jun 14, 2011 at 4:31 PM, Corey Bryant brynt...@us.ibm.com wrote: sVirt provides SELinux MAC isolation for Qemu guest processes and their corresponding resources (image files). sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, which is needed for image file isolation when using the sVirt SELinux security driver in libvirt. The proposed solution entails a combination of Qemu, libvirt, and SELinux patches that work together to isolate multiple guests' images when they're stored in the same NFS mount. This results in an environment where sVirt isolation and NFS image file isolation can both be provided. This patch contains the Qemu code to support this solution. I would like to solicit input from the libvirt community prior to starting the libvirt patch. Currently, Qemu opens an image file in addition to performing the necessary read and write operations. The proposed solution will move the open out of Qemu and into libvirt. Once libvirt opens an image file for the guest, it will pass the file descriptor to Qemu via a new fd: protocol. If the image file resides in an NFS mount, the following SELinux policy changes will provide image isolation: - A new SELinux boolean is created (e.g. virt_read_write_nfs) to allow Qemu (svirt_t) to only have SELinux read and write permissions on nfs_t files - Qemu (svirt_t) also gets SELinux use permissions on libvirt (virtd_t) file descriptors Following is a sample invocation of Qemu using the fd: protocol on the command line: qemu -drive file=fd:4,format=qcow2 The fd: protocol is also supported by the drive_add monitor command. This requires that the specified file descriptor is passed to the monitor alongside a prior getfd monitor command. There are some additional features provided by certain image types where Qemu reopens the image file. All of these scenarios will be unsupported for the fd: protocol, at least for this patch: - The -snapshot command line option - The savevm monitor command - The snapshot_blkdev monitor command - Starting Qemu with a backing file There's also native CDROM device. Did you consider adding an explicit reopen method to block layer? The thought is that this support can be added in the future, but is not required for the initial fd: support. This patch was tested with the following formats: raw, cow, qcow, qcow2, qed, and vmdk, using the fd: protocol from the command line and the monitor. Tests were also run to verify existing file name support and qemu-img were not regressed. Non-valid file descriptors, fd: without format, snapshot and backing files were also tested. Signed-off-by: Corey Bryant cor...@linux.vnet.ibm.com --- block.c | 16 ++ block.h | 1 + block/cow.c | 5 +++ block/qcow.c | 5 +++ block/qcow2.c | 5 +++ block/qed.c | 4 ++ block/raw-posix.c | 81 +++-- block/vmdk.c | 5 +++ block_int.h | 1 + blockdev.c | 10 ++ monitor.c | 5 +++ monitor.h | 1 + qemu-options.hx | 8 +++-- qemu-tool.c | 5 +++ 14 files changed, 140 insertions(+), 12 deletions(-) diff --git a/block.c b/block.c index 24a25d5..500db84 100644 --- a/block.c +++ b/block.c @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags, char tmp_filename[PATH_MAX]; char backing_filename[PATH_MAX]; + if (bdrv_is_fd_protocol(bs)) { + return -ENOTSUP; + } + /* if snapshot, we create a temporary backing file and open it instead of opening 'filename' directly */ @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags, /* Find the right image format driver */ if (!drv) { + /* format must be specified for fd: protocol */ + if (bdrv_is_fd_protocol(bs)) { + return -ENOTSUP; + } ret = find_image_format(filename, drv); } @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState *bs) return bs-enable_write_cache; } +int bdrv_is_fd_protocol(BlockDriverState *bs) +{ + return bs-fd_protocol; +} + /* XXX: no longer used */ void bdrv_set_change_cb(BlockDriverState *bs, void (*change_cb)(void *opaque, int reason), @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs, BlockDriver *drv = bs-drv; if (!drv) return -ENOMEDIUM; + if (bdrv_is_fd_protocol(bs)) { + return -ENOTSUP; + } if (drv-bdrv_snapshot_create) return drv-bdrv_snapshot_create(bs, sn_info); if (bs-file) diff --git a/block.h b/block.h index
Re: [libvirt] [Qemu-devel] [PATCH v2 3/3] raw-posix: Re-open host CD-ROM after media change
On Mon, Apr 4, 2011 at 5:43 PM, Anthony Liguori anth...@codemonkey.ws wrote: On 04/04/2011 09:26 AM, Daniel P. Berrange wrote: On Mon, Apr 04, 2011 at 09:19:36AM -0500, Anthony Liguori wrote: On 04/04/2011 08:16 AM, Daniel P. Berrange wrote: That doesn't really have any impact. If a desktop user is logged in, udev may change the ownership to match that user, but if they aren't, then udev may reset it to root:disk. Either way, QEMU may loose permissions to the disk. Then if you create a guest without being in the 'disk' group, it'll fail. That's pretty expected AFAICT. We don't *ever* want to put QEMU in the 'disk' group because that gives it access to any disk on the system in general. If that's what the user wants to do, what's the problem with doing it? Setting the global user/group is not enough because just because you have one VM that you want in disk doesn't mean you want all of them in disk. Privilege separated QEMU sounds so interesting that I'd go for that direction. There could be helper processes which retain privileges and communicate with the main unprivileged QEMU with only file descriptors. The helpers could even execute setgid disk group re-opener for the CD-ROM case, or ask libvirt to do the reopen. For unprivileged QEMU part it wouldn't matter, all it sees are the descriptors. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v2 3/3] raw-posix: Re-open host CD-ROM after media change
On Sun, Apr 3, 2011 at 2:57 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Tue, Mar 29, 2011 at 8:04 PM, Stefan Hajnoczi stefa...@linux.vnet.ibm.com wrote: Piggy-back on the guest CD-ROM polling to poll on the host. Open and close the host CD-ROM file descriptor to ensure we read the new size and not a stale size. Two things are going on here: 1. If hald/udisks is not already polling CD-ROMs on the host then re-opening the CD-ROM causes the host to read the new medium's size. 2. There is a bug in Linux which means the CD-ROM file descriptor must be re-opened in order for lseek(2) to see the new size. The inode size gets out of sync with the underlying device (which you can confirm by checking that /sys/block/sr0/size and lseek(2) do not match after media change). I have raised this with the maintainers but we need a workaround for the foreseeable future. Note that these changes are all in a #ifdef __linux__ section. Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- block/raw-posix.c | 26 ++ 1 files changed, 22 insertions(+), 4 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index 6b72470..8b5205c 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -1238,10 +1238,28 @@ static int cdrom_is_inserted(BlockDriverState *bs) BDRVRawState *s = bs-opaque; int ret; - ret = ioctl(s-fd, CDROM_DRIVE_STATUS, CDSL_CURRENT); - if (ret == CDS_DISC_OK) - return 1; - return 0; + /* + * Close the file descriptor if no medium is present and open it to poll + * again. This ensures the medium size is refreshed. If the file + * descriptor is kept open the size can become stale. This is essentially + * replicating CD-ROM polling but is driven by the guest. As the guest + * polls, we poll the host. + */ + + if (s-fd == -1) { + s-fd = qemu_open(bs-filename, s-open_flags, 0644); + if (s-fd 0) { + return 0; + } + } + + ret = (ioctl(s-fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) == CDS_DISC_OK); + + if (!ret) { + close(s-fd); + s-fd = -1; + } + return ret; } static int cdrom_eject(BlockDriverState *bs, int eject_flag) -- 1.7.4.1 There is an issue with reopening host devices in QEMU when running under libvirt. It appears that libvirt chowns image files (including device nodes) so that the launched QEMU process can access them. Unfortunately after media change on host devices udev will reset the ownership of the device node. This causes open(2) to fail with EACCES since the QEMU process does not have the right uid/gid/groups and libvirt is unaware that the file's ownership has changed. In order for media change to work with Linux host CD-ROM it is necessary to reopen the file (otherwise the inode size will not refresh, this is an issue with existing kernels). How can libvirt's security model be made to support this case? In theory udev could be temporarily configured with libvirt permissions for the CD-ROM device while passed through to the guest, but is that feasible? How about something like this: Add an explicit reopen method to BlockDriver. Make a special block device for passed file descriptors. Pass descriptors in libvirt for CD-ROMs instead of the device paths. The reopen method for file descriptors should notify libvirt about need to pass a reopened descriptor and then block all accesses until a new descriptor is available. This should also solve your earlier problem. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Re: [Qemu-devel] [PATCH 2/6] Introduce monitor 'wait' command (v2)
On 4/8/09, Anthony Liguori aligu...@us.ibm.com wrote: The wait command will pause the monitor the command was issued in until a new event becomes available. Events are queued if there isn't a waiter present. The wait command completes after a single event is available. +if (!w-polling) +monitor_resume(w-mon); CODING_STYLE, chapter 4: Every indented statement is braced; even if the block contains just one statement. ;-) -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list