Re: [Qemu-block] [PATCH v2 3/5] pci: Add INTERFACE_PCIE_DEVICE to all PCIe devices
On Wed, Sep 27, 2017 at 04:56:33PM -0300, Eduardo Habkost wrote: > Change all devices that set is_express=1 to implement > INTERFACE_PCIE_DEVICE. > > Cc: Keith Busch > Cc: Kevin Wolf > Cc: Max Reitz > Cc: Dmitry Fleytman > Cc: Jason Wang > Cc: "Michael S. Tsirkin" > Cc: Marcel Apfelbaum > Cc: Paul Burton > Cc: Paolo Bonzini > Cc: Hannes Reinecke > Cc: qemu-block@nongnu.org > Reviewed-by: Alistair Francis > Signed-off-by: Eduardo Habkost Reviewed-by: David Gibson > --- > Changes v1 -> v2: > * base-xhci is marked as hybrid, now (in another patch) > * Included pcie-pci-bridge > --- > hw/block/nvme.c| 4 > hw/net/e1000e.c| 4 > hw/pci-bridge/pcie_pci_bridge.c| 1 + > hw/pci-bridge/pcie_root_port.c | 4 > hw/pci-bridge/xio3130_downstream.c | 4 > hw/pci-bridge/xio3130_upstream.c | 4 > hw/pci-host/xilinx-pcie.c | 4 > hw/scsi/megasas.c | 6 ++ > 8 files changed, 31 insertions(+) > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c > index 9aa32692a3..441e21ed1f 100644 > --- a/hw/block/nvme.c > +++ b/hw/block/nvme.c > @@ -1110,6 +1110,10 @@ static const TypeInfo nvme_info = { > .instance_size = sizeof(NvmeCtrl), > .class_init= nvme_class_init, > .instance_init = nvme_instance_init, > +.interfaces = (InterfaceInfo[]) { > +{ INTERFACE_PCIE_DEVICE }, > +{ } > +}, > }; > > static void nvme_register_types(void) > diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c > index 6c42b4478c..81f7934a59 100644 > --- a/hw/net/e1000e.c > +++ b/hw/net/e1000e.c > @@ -708,6 +708,10 @@ static const TypeInfo e1000e_info = { > .instance_size = sizeof(E1000EState), > .class_init = e1000e_class_init, > .instance_init = e1000e_instance_init, > +.interfaces = (InterfaceInfo[]) { > +{ INTERFACE_PCIE_DEVICE }, > +{ } > +}, > }; > > static void e1000e_register_types(void) > diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c > index 9aa5cc3e45..88db143633 100644 > --- a/hw/pci-bridge/pcie_pci_bridge.c > +++ b/hw/pci-bridge/pcie_pci_bridge.c > @@ -180,6 +180,7 @@ static const TypeInfo pcie_pci_bridge_info = { > .class_init = pcie_pci_bridge_class_init, > .interfaces = (InterfaceInfo[]) { > { TYPE_HOTPLUG_HANDLER }, > +{ INTERFACE_PCIE_DEVICE }, > { }, > } > }; > diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c > index 4d588cb22e..9b6e4ce512 100644 > --- a/hw/pci-bridge/pcie_root_port.c > +++ b/hw/pci-bridge/pcie_root_port.c > @@ -161,6 +161,10 @@ static const TypeInfo rp_info = { > .class_init= rp_class_init, > .abstract = true, > .class_size = sizeof(PCIERootPortClass), > +.interfaces = (InterfaceInfo[]) { > +{ INTERFACE_PCIE_DEVICE }, > +{ } > +}, > }; > > static void rp_register_types(void) > diff --git a/hw/pci-bridge/xio3130_downstream.c > b/hw/pci-bridge/xio3130_downstream.c > index e706f36cb7..7d2f7629c1 100644 > --- a/hw/pci-bridge/xio3130_downstream.c > +++ b/hw/pci-bridge/xio3130_downstream.c > @@ -195,6 +195,10 @@ static const TypeInfo xio3130_downstream_info = { > .name = "xio3130-downstream", > .parent= TYPE_PCIE_SLOT, > .class_init= xio3130_downstream_class_init, > +.interfaces = (InterfaceInfo[]) { > +{ INTERFACE_PCIE_DEVICE }, > +{ } > +}, > }; > > static void xio3130_downstream_register_types(void) > diff --git a/hw/pci-bridge/xio3130_upstream.c > b/hw/pci-bridge/xio3130_upstream.c > index a052224bbf..227997ce46 100644 > --- a/hw/pci-bridge/xio3130_upstream.c > +++ b/hw/pci-bridge/xio3130_upstream.c > @@ -166,6 +166,10 @@ static const TypeInfo xio3130_upstream_info = { > .name = "x3130-upstream", > .parent= TYPE_PCIE_PORT, > .class_init= xio3130_upstream_class_init, > +.interfaces = (InterfaceInfo[]) { > +{ INTERFACE_PCIE_DEVICE }, > +{ } > +}, > }; > > static void xio3130_upstream_register_types(void) > diff --git a/hw/pci-host/xilinx-pcie.c b/hw/pci-host/xilinx-pcie.c > index 4613dda1d2..7659253090 100644 > --- a/hw/pci-host/xilinx-pcie.c > +++ b/hw/pci-host/xilinx-pcie.c > @@ -317,6 +317,10 @@ static const TypeInfo xilinx_pcie_root_info = { > .parent = TYPE_PCI_BRIDGE, > .instance_size = sizeof(XilinxPCIERoot), > .class_init = xilinx_pcie_root_class_init, > +.interfaces = (InterfaceInfo[]) { > +{ INTERFACE_PCIE_DEVICE }, > +{ } > +}, > }; > > static void xilinx_pcie_register(void) > diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c > index 0db68aacee..535ee267c3 100644 > --- a/hw/scsi/megasas.c > +++ b/hw/scsi/megasas.c > @@ -2451,6 +2451,7 @@ typedef struct MegasasInfo { > int osts; > const VMStateDescription *vmsd; > Property *props; > +InterfaceInfo *in
Re: [Qemu-block] [PATCH v2 4/5] pci: Add INTERFACE_CONVENTIONAL_PCI_DEVICE to Conventional PCI devices
On Wed, Sep 27, 2017 at 04:56:34PM -0300, Eduardo Habkost wrote: > Add INTERFACE_CONVENTIONAL_PCI_DEVICE to all direct subtypes of > TYPE_PCI_DEVICE, except: > > 1) The ones that already have INTERFACE_PCIE_DEVICE set: > > * base-xhci > * e1000e > * nvme > * pvscsi > * vfio-pci > * virtio-pci > * vmxnet3 > > 2) base-pci-bridge > > Not all PCI bridges are Conventional PCI devices, so > INTERFACE_CONVENTIONAL_PCI_DEVICE is added only to the subtypes > that are actually Conventional PCI: > > * dec-21154-p2p-bridge > * i82801b11-bridge > * pbm-bridge > * pci-bridge > > The direct subtypes of base-pci-bridge not touched by this patch > are: > > * xilinx-pcie-root: Already marked as PCIe-only. > * pcie-pci-bridge: Already marked as PCIe-only. > * pcie-port: all non-abstract subtypes of pcie-port are already > marked as PCIe-only devices. > > 3) megasas-base > > Not all megasas devices are Conventional PCI devices, so the > interface names are added to the subclasses registered by > megasas_register_types(), according to information in the > megasas_devices[] array. > > "megasas-gen2" already implements INTERFACE_PCIE_DEVICE, so add > INTERFACE_CONVENTIONAL_PCI_DEVICE only to "megasas". > > Acked-by: Alberto Garcia > Acked-by: John Snow > Acked-by: Anthony PERARD > Signed-off-by: Eduardo Habkost Reviewed-by: David Gibson and for the ppc devices Acked-by: David Gibson > --- > Changes v1 -> v2: > * s/legacy/conventional/ > * Suggested-by: Alex Williamson > * Note about pcie-pci-bridge on commit message. > * New devices: sungem, sunhme > > Cc: "Michael S. Tsirkin" > Cc: Igor Mammedov > Cc: Gerd Hoffmann > Cc: Paolo Bonzini > Cc: Richard Henderson > Cc: Eduardo Habkost > Cc: Stefano Stabellini > Cc: Anthony Perard > Cc: John Snow > Cc: Alberto Garcia > Cc: Aurelien Jarno > Cc: Yongbok Kim > Cc: Jiri Slaby > Cc: Alexander Graf > Cc: Marcel Apfelbaum > Cc: Jason Wang > Cc: Jiri Pirko > Cc: "Hervé Poussineau" > Cc: Peter Maydell > Cc: David Gibson > Cc: Hannes Reinecke > Cc: Mark Cave-Ayland > Cc: Artyom Tarasenko > Cc: Alex Williamson > Cc: qemu-de...@nongnu.org > Cc: xen-de...@lists.xenproject.org > Cc: qemu-block@nongnu.org > Cc: qemu-...@nongnu.org > Cc: qemu-...@nongnu.org > --- > hw/acpi/piix4.c | 1 + > hw/audio/ac97.c | 4 > hw/audio/es1370.c | 4 > hw/audio/intel-hda.c| 4 > hw/char/serial-pci.c| 12 > hw/display/cirrus_vga.c | 4 > hw/display/qxl.c| 4 > hw/display/sm501.c | 4 > hw/display/vga-pci.c| 4 > hw/display/vmware_vga.c | 4 > hw/i2c/smbus_ich9.c | 4 > hw/i386/amd_iommu.c | 4 > hw/i386/kvm/pci-assign.c| 4 > hw/i386/pc_piix.c | 4 > hw/i386/xen/xen_platform.c | 4 > hw/i386/xen/xen_pvdevice.c | 4 > hw/ide/ich.c| 4 > hw/ide/pci.c| 4 > hw/ipack/tpci200.c | 4 > hw/isa/i82378.c | 4 > hw/isa/lpc_ich9.c | 1 + > hw/isa/piix4.c | 4 > hw/isa/vt82c686.c | 16 > hw/mips/gt64xxx_pci.c | 4 > hw/misc/edu.c | 5 + > hw/misc/ivshmem.c | 4 > hw/misc/macio/macio.c | 4 > hw/misc/pci-testdev.c | 4 > hw/net/e1000.c | 4 > hw/net/eepro100.c | 4 > hw/net/ne2000.c | 4 > hw/net/pcnet-pci.c | 4 > hw/net/rocker/rocker.c | 4 > hw/net/rtl8139.c| 4 > hw/net/sungem.c | 4 > hw/net/sunhme.c | 4 > hw/pci-bridge/dec.c | 8 > hw/pci-bridge/i82801b11.c | 4 > hw/pci-bridge/pci_bridge_dev.c | 1 + > hw/pci-bridge/pci_expander_bridge.c | 8 > hw/pci-host/apb.c | 8 > hw/pci-host/bonito.c| 4 > hw/pci-host/gpex.c | 4 > hw/pci-host/grackle.c | 4 > hw/pci-host/piix.c | 8 > hw/pci-host/ppce500.c | 4 > hw/pci-host/prep.c | 4 > hw/pci-host/q35.c | 4 > hw/pci-host/uninorth.c | 16 > hw/pci-host/versatile.c | 4 > hw/ppc/ppc4xx_pci.c | 4 > hw/scsi/esp-pci.c | 4 > hw/scsi/lsi53c895a.c| 4 > hw/scsi/megasas.c | 4 > hw/scsi/mptsas.c| 4 > h
Re: [Qemu-block] [Qemu-arm] [Qemu-devel] [PATCH v1 1/8] Replace all occurances of __FUNCTION__ with __func__
On 26 September 2017 at 06:32, Eric Blake wrote: > On 09/25/2017 07:08 PM, Alistair Francis wrote: >> diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c >> index 58005b6619..32687afced 100644 >> --- a/hw/arm/nseries.c >> +++ b/hw/arm/nseries.c >> @@ -463,7 +463,7 @@ static uint32_t mipid_txrx(void *opaque, uint32_t cmd, >> int len) >> uint8_t ret; >> >> if (len > 9) { >> -hw_error("%s: FIXME: bad SPI word width %i\n", __FUNCTION__, len); >> +hw_error("%s: FIXME: bad SPI word width %i\n", __func__, len); > > Not this patch's problem, but it would probably be simpler if hw_error() > were a macro that automatically prefixed __func__, rather than making > every caller have to supply it themselves. I'm not sure there's a great deal of benefit to that change, because use of hw_error() in new code is rarely correct (it does an abort() so it should never be used for guest-triggered conditions, which is about the only time that you might be interested in a guest register dump rather than just asserting). Most of its existing uses are in crufty old device models. thanks -- PMM
Re: [Qemu-block] [Qemu-devel] [PATCH v1 2/8] tests: Replace fprintf(stderr, "*\n" with error_report()
On Mon, Sep 25, 2017 at 5:55 PM, Emilio G. Cota wrote: > On Mon, Sep 25, 2017 at 17:08:48 -0700, Alistair Francis wrote: >> diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c >> index caa1e8e689..41ba1600c0 100644 >> --- a/tests/atomic_add-bench.c >> +++ b/tests/atomic_add-bench.c >> @@ -29,7 +29,7 @@ static const char commands_string[] = >> static void usage_complete(char *argv[]) >> { >> fprintf(stderr, "Usage: %s [options]\n", argv[0]); >> -fprintf(stderr, "options:\n%s\n", commands_string); >> +fprintf(stderr, "options:\n%s", commands_string); >> } > > We do want that trailing \n, unless we move it to commands_string. Yeah, this was a mistake. It should have swapped to error_report(). > > Also, I think using error_report here would be confusing -- this is a > standalone > test program with as little QEMU-specific knowledge as possible (QemuThreads > are used for portability); using error_report here is confusing (this is not > an error). Do you mean in all tests/ sub directory or just a few certain cases? > >> diff --git a/tests/check-qlit b/tests/check-qlit >> new file mode 100755 >> index >> ..950429524e3eb07e6daed1fe01caad0f5d554809 >> GIT binary patch >> literal 272776 >> zcmeEvdwf*Ywf~vPB$){zGeCghJ;($So{10*LNEgfoIs*MKvBRDLV(l&F`3b5QKOS6 > > ? I don't know what this is, I don't seem to have this binary in my > checked out tree. Yeah, this shouldn't be here. Thanks, Alistair > > (snips thousands of lines) > >> diff --git a/tests/qht-bench.c b/tests/qht-bench.c >> index 11c1cec766..2637d601a9 100644 >> --- a/tests/qht-bench.c >> +++ b/tests/qht-bench.c >> @@ -5,6 +5,7 @@ >> * See the COPYING file in the top-level directory. >> */ >> #include "qemu/osdep.h" >> +#include "qemu/error-report.h" >> #include "qemu/processor.h" >> #include "qemu/atomic.h" >> #include "qemu/qht.h" >> @@ -89,7 +90,7 @@ static const char commands_string[] = >> static void usage_complete(int argc, char *argv[]) >> { >> fprintf(stderr, "Usage: %s [options]\n", argv[0]); >> -fprintf(stderr, "options:\n%s\n", commands_string); >> +fprintf(stderr, "options:\n%s", commands_string); > > Same as above: this removes the necessary trailing \n. > >> exit(-1); >> } >> >> @@ -328,7 +329,7 @@ static void htable_init(void) >> retries++; >> } >> } >> -fprintf(stderr, " populated after %zu retries\n", retries); >> +error_report(" populated after %zu retries", retries); >> } > > ditto -- I'd rather keep fprintf(stderr) here, it's less confusing. > > Thanks, > > Emilio
Re: [Qemu-block] [Qemu-devel] [PATCH v2] virtio: introduce `info virtio' hmp command
> OK but if it's useful as an hmp command, why not as a qmp command? > The command is designed for debugging and produces quite sightly output. For respective qmp command most of `info virtio' output would excessive and unneccesary. On 09/27/2017 10:43 PM, Michael S. Tsirkin wrote: On Wed, Sep 27, 2017 at 06:09:42PM +0300, Jan Dakinevich wrote: The command is intended for exposing device specific virtio feature bits and their negotiation status. It is convenient and useful for debug purpose. Names of features are taken from a devices via get_feature_name() within VirtioDeviceClass. If certain device doesn't implement it, the command will print only hexadecimal value of feature mask. Cc: Denis V. Lunev Signed-off-by: Jan Dakinevich OK but if it's useful as an hmp command, why not as a qmp command? --- v2: * Moved hmp_info_virtio and stuff to hmp.c to avoid build error * Added printing of device status * Listed common virtio features v1: http://lists.nongnu.org/archive/html/qemu-devel/2017-09/msg07247.html --- hmp-commands-info.hx| 14 + hmp.c | 148 hmp.h | 1 + hw/block/virtio-blk.c | 20 ++ hw/char/virtio-serial-bus.c | 15 + hw/display/virtio-gpu.c | 12 hw/net/virtio-net.c | 34 ++ hw/scsi/virtio-scsi.c | 15 + hw/virtio/virtio-balloon.c | 15 + hw/virtio/virtio-bus.c | 1 + include/hw/virtio/virtio.h | 2 + monitor.c | 1 + 12 files changed, 278 insertions(+) diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx index 4f1ece9..2550027 100644 --- a/hmp-commands-info.hx +++ b/hmp-commands-info.hx @@ -868,6 +868,20 @@ ETEXI }, STEXI +@item info virtio +@findex virtio +Display guest and host fetures for all virtio devices. +ETEXI + +{ +.name = "virtio", +.args_type = "", +.params = "", +.help = "show virtio info", +.cmd= hmp_info_virtio, +}, + +STEXI @end table ETEXI diff --git a/hmp.c b/hmp.c index ace729d..009d808 100644 --- a/hmp.c +++ b/hmp.c @@ -43,6 +43,7 @@ #include "hw/intc/intc.h" #include "migration/snapshot.h" #include "migration/misc.h" +#include "hw/virtio/virtio.h" #ifdef CONFIG_SPICE #include @@ -2894,3 +2895,150 @@ void hmp_info_memory_size_summary(Monitor *mon, const QDict *qdict) } hmp_handle_error(mon, &err); } + +#define HMP_INFO_VIRTIO_INDENT 2 +#define HMP_INFO_VIRTIO_FIELD 32 + +static void hmp_info_virtio_print_status(Monitor *mon, VirtIODevice *vdev) +{ +uint8_t status[] = { +VIRTIO_CONFIG_S_ACKNOWLEDGE, +VIRTIO_CONFIG_S_DRIVER, +VIRTIO_CONFIG_S_DRIVER_OK, +VIRTIO_CONFIG_S_FEATURES_OK, +VIRTIO_CONFIG_S_NEEDS_RESET, +VIRTIO_CONFIG_S_FAILED, +}; +const char *names[] = { +"acknowledge", +"driver", +"driver_ok", +"features_ok", +"needs_reset" +"failed", +}; +const char *comma = ""; +int idx; + +monitor_printf(mon, "%*sstatus: 0x%02"PRIx8" ", + HMP_INFO_VIRTIO_INDENT, "", vdev->status); + +for (idx = 0; idx < ARRAY_SIZE(status); idx++) { +if (!(vdev->status & status[idx])) { +continue; +} +monitor_printf(mon, "%s%s", comma, names[idx]); +if (names[idx]) { +comma = ","; +} +} +monitor_printf(mon, "\n"); +} + +static void hmp_info_virtio_print_common_features(Monitor *mon, + VirtIODevice *vdev) +{ +uint64_t features[] = { +VIRTIO_RING_F_INDIRECT_DESC, +VIRTIO_RING_F_EVENT_IDX, +VIRTIO_F_NOTIFY_ON_EMPTY, +VIRTIO_F_ANY_LAYOUT, +VIRTIO_F_IOMMU_PLATFORM, +VIRTIO_F_VERSION_1, +}; +const char *names[] = { +"indirect_desc", +"event_idx", +"notify_on_empty", +"any_layout", +"iommu_platform", +"version_1", +}; +int idx; + +monitor_printf(mon, "%*scommon features:\n", HMP_INFO_VIRTIO_INDENT, ""); + +for (idx = 0; idx < ARRAY_SIZE(features); idx++) { +const char *ack = vdev->guest_features & features[idx] ? "acked" : ""; + +if (!(vdev->host_features & features[idx])) { +continue; +} +monitor_printf(mon, "%*s%*s%*s\n", HMP_INFO_VIRTIO_INDENT, "", + HMP_INFO_VIRTIO_FIELD, names[idx], + HMP_INFO_VIRTIO_FIELD, ack); +} +} + +static void hmp_info_virtio_print_specific_features(Monitor *mon, +VirtIODevice *vdev) +{ +VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev); +int idx; + +if (!vdc->get_feature_name) { +return; +} + +monitor_printf(mon, "%*sspecific features:\n", HMP_INFO_VIRTIO_INDENT,
Re: [Qemu-block] [Qemu-devel] [PATCH v1 1/8] Replace all occurances of __FUNCTION__ with __func__
On Tue, Sep 26, 2017 at 6:32 AM, Eric Blake wrote: > On 09/25/2017 07:08 PM, Alistair Francis wrote: >> Replace all occurs of __FUNCTION__ except for the check in checkpatch >> with the non GCC specific __func__. >> >> One line in hcd-musb.c was manually tweaked to pass checkpatch. >> >> Signed-off-by: Alistair Francis >> Cc: Gerd Hoffmann >> Cc: Andrzej Zaborowski >> Cc: Stefano Stabellini >> Cc: Anthony Perard >> Cc: John Snow >> Cc: Aurelien Jarno >> Cc: Yongbok Kim >> Cc: Peter Crosthwaite >> Cc: Stefan Hajnoczi (supporter:Block >> Cc: Fam Zheng (supporter:Block > > That looks funny, with no closing ). Something's breaking down between > get_maintainers.pl and your eventual email, although it's not fatal. Yeah, that's a copy and paste error, will fix. > >> Cc: Juan Quintela >> Cc: "Dr. David Alan Gilbert" >> Cc: qemu-...@nongnu.org >> Cc: qemu-block@nongnu.org >> Cc: xen-de...@lists.xenproject.org >> --- >> > >> 65 files changed, 273 insertions(+), 273 deletions(-) > > Big but mechanical, so I'm okay without splitting it further. Phew! I did not want to split it. > >> >> diff --git a/audio/audio_int.h b/audio/audio_int.h >> index 5bcb1c60e1..543b1bd8d5 100644 >> --- a/audio/audio_int.h >> +++ b/audio/audio_int.h >> @@ -253,7 +253,7 @@ static inline int audio_ring_dist (int dst, int src, int >> len) >> #define AUDIO_STRINGIFY(n) AUDIO_STRINGIFY_(n) >> >> #if defined _MSC_VER || defined __GNUC__ >> -#define AUDIO_FUNC __FUNCTION__ >> +#define AUDIO_FUNC __func__ >> #else >> #define AUDIO_FUNC __FILE__ ":" AUDIO_STRINGIFY (__LINE__) >> #endif > > This can be further simplified. We really aren't using _MSC_VER as our > compiler (can anyone prove me wrong?), and we DO require a C99 compiler > (per C99 6.4.2.2, __func__ support is mandatory), so we don't really > need the #else branch (or, for that matter, we probably don't even need > AUDIO_FUNC). But to keep this patch mechanical, that can be a separate > followup. I have a second patch that removes AUDIO_FUNC > >> diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c >> index 58005b6619..32687afced 100644 >> --- a/hw/arm/nseries.c >> +++ b/hw/arm/nseries.c >> @@ -463,7 +463,7 @@ static uint32_t mipid_txrx(void *opaque, uint32_t cmd, >> int len) >> uint8_t ret; >> >> if (len > 9) { >> -hw_error("%s: FIXME: bad SPI word width %i\n", __FUNCTION__, len); >> +hw_error("%s: FIXME: bad SPI word width %i\n", __func__, len); > > Not this patch's problem, but it would probably be simpler if hw_error() > were a macro that automatically prefixed __func__, rather than making > every caller have to supply it themselves. I'm going to leave this for another day, but I think you are right. > >> +++ b/hw/arm/omap1.c > >> @@ -1716,7 +1716,7 @@ static void omap_clkm_write(void *opaque, hwaddr addr, >> case 0x18: /* ARM_SYSST */ >> if ((s->clkm.clocking_scheme ^ (value >> 11)) & 7) { >> s->clkm.clocking_scheme = (value >> 11) & 7; >> -printf("%s: clocking scheme set to %s\n", __FUNCTION__, >> +printf("%s: clocking scheme set to %s\n", __func__, >> clkschemename[s->clkm.clocking_scheme]); > > Worth fixing the indentation while you are here? Fixed > >> @@ -2473,7 +2473,7 @@ static void omap_pwt_write(void *opaque, hwaddr addr, >> case 0x04: /* VRC */ >> if ((value ^ s->vrc) & 1) { >> if (value & 1) >> -printf("%s: %iHz buzz on\n", __FUNCTION__, (int) >> +printf("%s: %iHz buzz on\n", __func__, (int) >> /* 1.5 MHz from a 12-MHz or 13-MHz PWT_CLK >> */ >> ((omap_clk_getrate(s->clk) >> 3) / >> /* Pre-multiplexer divider */ > > Likewise? This doesn't actually change the indention. It's all one argument to a function. > >> @@ -3330,13 +3330,13 @@ static void omap_mcbsp_writeh(void *opaque, hwaddr >> addr, >> s->mcr[1] = value & 0x03e3; >> if (value & 3) /* XMCM */ >> printf("%s: Tx channel selection mode enable attempt\n", >> -__FUNCTION__); >> +__func__); >> return; >> case 0x1a: /* MCR1 */ >> s->mcr[0] = value & 0x03e1; >> if (value & 1) /* RMCM */ >> printf("%s: Rx channel selection mode enable attempt\n", >> -__FUNCTION__); >> +__func__); > > and again Fixed. > > >> +++ b/hw/arm/omap2.c >> @@ -1312,7 +1312,7 @@ static void omap_prcm_apll_update(struct omap_prcm_s >> *s) >> >> if (mode[0] == 1 || mode[0] == 2 || mode[1] == 1 || mode[1] == 2) >> fprintf(stderr, "%s: bad EN_54M_PLL or bad EN_96M_PLL\n", >> -__FUNCTION__); >> +__func__); > > More of
Re: [Qemu-block] [Qemu-devel] [PATCH v1 3/8] hw: Replace fprintf(stderr, "*\n" with error_report()
On Mon, Sep 25, 2017 at 8:51 PM, Thomas Huth wrote: > On 26.09.2017 02:08, Alistair Francis wrote: >> Replace a large number of the fprintf(stderr, "*\n" calls with >> error_report(). The functions were renamed with these commands and then >> compiler issues where manually fixed. >> >> find ./* -type f -exec sed -i \ >> 'N;N;N;N;N;N;N;N;N;N;N;N; {s|fprintf(stderr, >> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \ >> {} + >> find ./* -type f -exec sed -i \ >> 'N;N;N;N;N;N;N;N;N;N;N; {s|fprintf(stderr, >> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \ >> {} + >> find ./* -type f -exec sed -i \ >> 'N;N;N;N;N;N;N;N;N; {s|fprintf(stderr, >> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \ >> {} + >> find ./* -type f -exec sed -i \ >> 'N;N;N;N;N;N;N;N; {s|fprintf(stderr, >> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \ >> {} + >> find ./* -type f -exec sed -i \ >> 'N;N;N;N;N;N;N; {s|fprintf(stderr, >> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \ >> {} + >> find ./* -type f -exec sed -i \ >> 'N;N;N;N;N;N; {s|fprintf(stderr, >> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \ >> {} + >> find ./* -type f -exec sed -i \ >> 'N;N;N;N;N; {s|fprintf(stderr, >> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \ >> {} + >> find ./* -type f -exec sed -i \ >> 'N;N;N;N; {s|fprintf(stderr, >> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \ >> {} + >> find ./* -type f -exec sed -i \ >> 'N;N;N; {s|fprintf(stderr, >> "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \ >> {} + >> find ./* -type f -exec sed -i \ >> 'N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \ >> {} + >> find ./* -type f -exec sed -i \ >> 'N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \ >> {} + >> >> Some lines where then manually tweaked to pass checkpatch. >> >> Signed-off-by: Alistair Francis >> Cc: Andrzej Zaborowski >> Cc: Jan Kiszka >> Cc: Stefan Hajnoczi >> Cc: Paolo Bonzini >> Cc: Thomas Huth >> Cc: Gerd Hoffmann >> Cc: "Michael S. Tsirkin" >> Cc: Richard Henderson >> Cc: Eduardo Habkost >> Cc: Stefano Stabellini >> Cc: Anthony Perard >> Cc: John Snow >> Cc: Christian Borntraeger >> Cc: Cornelia Huck >> Cc: Alexander Graf >> Cc: Michael Walle >> Cc: Paul Burton >> Cc: Aurelien Jarno >> Cc: Yongbok Kim >> Cc: "Hervé Poussineau" >> Cc: Anthony Green >> Cc: Jason Wang >> Cc: Chris Wulff >> Cc: Marek Vasut >> Cc: Jia Liu >> Cc: Stafford Horne >> Cc: Marcel Apfelbaum >> Cc: Magnus Damm >> Cc: Fabien Chouteau >> Cc: Mark Cave-Ayland >> Cc: Artyom Tarasenko >> Cc: qemu-...@nongnu.org >> Cc: qemu-block@nongnu.org >> Cc: xen-de...@lists.xenproject.org >> Cc: qemu-...@nongnu.org >> --- >> >> hw/arm/armv7m.c | 2 +- >> hw/arm/boot.c | 34 +-- >> hw/arm/gumstix.c| 13 >> hw/arm/mainstone.c | 7 ++-- >> hw/arm/musicpal.c | 2 +- >> hw/arm/omap1.c | 5 +-- >> hw/arm/omap2.c | 21 ++-- >> hw/arm/omap_sx1.c | 6 ++-- >> hw/arm/palm.c | 10 +++--- >> hw/arm/pxa2xx.c | 7 ++-- >> hw/arm/stellaris.c | 3 +- >> hw/arm/tosa.c | 17 +- >> hw/arm/versatilepb.c| 2 +- >> hw/arm/vexpress.c | 8 ++--- >> hw/arm/z2.c | 6 ++-- >> hw/block/dataplane/virtio-blk.c | 6 ++-- >> hw/block/onenand.c | 8 ++--- >> hw/block/tc58128.c | 44 - >> hw/bt/core.c| 15 + >> hw/bt/hci-csr.c | 17 +- >> hw/bt/hci.c | 30 - >> hw/bt/hid.c | 2 +- >> hw/bt/l2cap.c | 47 ++- >> hw/bt/sdp.c | 7 ++-- >> hw/char/exynos4210_uart.c | 6 ++-- >> hw/char/mcf_uart.c | 5 +-- >> hw/char/sh_serial.c | 9 +++--- >> hw/core/loader.c| 31 +- >> hw/core/ptimer.c| 7 ++-- >> hw/cris/axis_dev88.c| 3 +- >> hw/cris/boot.c | 5 +-- >> hw/display/blizzard.c | 20 ++-- >> hw/display/omap_dss.c | 14 >> hw/display/pl110.c | 2 +- >> hw/display/pxa2xx_lcd.c | 2 +- >> hw/display/qxl-render.c | 7 ++-- >> hw/display/qxl.c| 10 +++--- >> hw/display/tc6393xb.c | 36 - >> hw/display/virtio-gpu-3d.c | 4 +-- >> hw/display/vmware_vga.c | 22 ++--- >> hw/dma/omap_dma.c | 26 --- >> hw/dma/soc_dma.c| 37 ++--- >> hw/gpio/omap_gpio.c | 2 +- >> hw/i2c/omap_i2c.c | 10 +++--- >> hw/i386/kvm/apic.c | 9 ++
Re: [Qemu-block] [Qemu-devel] [PATCH v4 18/23] qemu-img: Change compare_sectors() to be byte-based
On 09/13/2017 12:03 PM, Eric Blake wrote: > In the continuing quest to make more things byte-based, change > compare_sectors(), renaming it to compare_buffers() in the > process. Note that one caller (qemu-img compare) only cares > about the first difference, while the other (qemu-img rebase) > cares about how many consecutive sectors have the same > equal/different status; however, this patch does not bother to > micro-optimize the compare case to avoid the comparisons of > sectors beyond the first mismatch. Both callers are always > passing valid buffers in, so the initial check for buffer size > can be turned into an assertion. > > Signed-off-by: Eric Blake > > --- > v3: new patch > --- > qemu-img.c | 55 +++ > 1 file changed, 27 insertions(+), 28 deletions(-) > > diff --git a/qemu-img.c b/qemu-img.c > index 2e05f92e85..034122eba5 100644 > --- a/qemu-img.c > +++ b/qemu-img.c > @@ -1155,31 +1155,28 @@ static int is_allocated_sectors_min(const uint8_t > *buf, int n, int *pnum, > } > > /* > - * Compares two buffers sector by sector. Returns 0 if the first sector of > both > - * buffers matches, non-zero otherwise. > + * Compares two buffers sector by sector. Returns 0 if the first > + * sector of each buffer matches, non-zero otherwise. > * > - * pnum is set to the number of sectors (including and immediately following > - * the first one) that are known to have the same comparison result > + * pnum is set to the sector-aligned size of the buffer prefix that > + * has the same matching status as the first sector. > */ > -static int compare_sectors(const uint8_t *buf1, const uint8_t *buf2, int n, > -int *pnum) > +static int compare_buffers(const uint8_t *buf1, const uint8_t *buf2, > + int64_t bytes, int64_t *pnum) > { > bool res; > -int i; > +int64_t i = MIN(bytes, BDRV_SECTOR_SIZE); > > -if (n <= 0) { > -*pnum = 0; > -return 0; > -} > +assert(bytes > 0); > > -res = !!memcmp(buf1, buf2, 512); > -for(i = 1; i < n; i++) { > -buf1 += 512; > -buf2 += 512; > +res = !!memcmp(buf1, buf2, i); It is temporarily confusing that 'i' is never again used for this particular parameter, because > +while (i < bytes) { This gives the brief impression that we might be looping in a way that changes the comparison size passed to memcmp, which isn't true. Just me being cranky, though. It's probably still the best way, because of how you have to prime the loop. Doing it the literal-minded way requires an extra i += len, so: Reviewed-by: John Snow
Re: [Qemu-block] [Qemu-devel] [PATCH v4 01/23] block: Allow NULL file for bdrv_get_block_status()
On 09/25/2017 05:43 PM, John Snow wrote: > > > On 09/13/2017 12:03 PM, Eric Blake wrote: >> Not all callers care about which BDS owns the mapping for a given >> range of the file. This patch merely simplifies the callers by >> consolidating the logic in the common call point, while guaranteeing >> a non-NULL file to all the driver callbacks, for no semantic change. >> The only caller that does not care about pnum is bdrv_is_allocated, >> as invoked by vvfat; we can likewise add assertions that the rest >> of the stack does not have to worry about a NULL pnum. >> >> Furthermore, this will also set the stage for a future cleanup: when >> a caller does not care about which BDS owns an offset, it would be >> nice to allow the driver to optimize things to not have to return >> BDRV_BLOCK_OFFSET_VALID in the first place. In the case of fragmented >> allocation (for example, it's fairly easy to create a qcow2 image >> where consecutive guest addresses are not at consecutive host >> addresses), the current contract requires bdrv_get_block_status() >> to clamp *pnum to the limit where host addresses are no longer >> consecutive, but allowing a NULL file means that *pnum could be >> set to the full length of known-allocated data. >> > > Function reads slightly worse for the wear now with all of the return > logic handled at various places within, but unifying it might be even > stranger, perhaps.. > > Let's see if I hate this more: > > out: > bdrv_dec_in_flight(bs); > bdrv_dec_in_flight(bs); > if (ret >= 0 && sector_num + *pnum == total_sectors) { > ret |= BDRV_BLOCK_EOF; > } > early_out: > if (file) { > *file = local_file; > } > return ret; > > > and then earlier in the function, we can just: > > if (total_sectors < 0) { > ret = total_sectors; > goto early_out; > } Seems reasonable enough, I'll work that in to v5, since there are other reasons to respin the series anyway. > > It's only shed paint, though: > > Reviewed-by: John Snow > > I'm looking at the rest of the series now, so please stand by. > -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [Qemu-devel] [PATCH v4 17/23] qemu-img: Change check_empty_sectors() to byte-based
On 09/13/2017 12:03 PM, Eric Blake wrote: > Continue on the quest to make more things byte-based instead of > sector-based. > > Signed-off-by: Eric Blake Reviewed-by: John Snow
Re: [Qemu-block] [Qemu-devel] [PATCH v4 16/23] qemu-img: Drop redundant error message in compare
On 09/13/2017 12:03 PM, Eric Blake wrote: > If a read error is encountered during 'qemu-img compare', we > were printing the "Error while reading offset ..." message twice. > Update the testsuite for the improved output. > > Further simplify the code by hoisting the error code conversion > into the helper function, rather than repeating it at the callers. > > Signed-off-by: Eric Blake > > --- > v3: new patch > --- > qemu-img.c | 19 +-- > tests/qemu-iotests/074.out | 2 -- > 2 files changed, 5 insertions(+), 16 deletions(-) > > diff --git a/qemu-img.c b/qemu-img.c > index dfccebe6bc..3e1e373e8f 100644 > --- a/qemu-img.c > +++ b/qemu-img.c > @@ -1196,8 +1196,10 @@ static int64_t sectors_to_bytes(int64_t sectors) > /* > * Check if passed sectors are empty (not allocated or contain only 0 bytes) > * > - * Returns 0 in case sectors are filled with 0, 1 if sectors contain non-zero > - * data and negative value on error. > + * Intended for use by 'qemu-img compare': Returns 0 in case sectors are > + * filled with 0, 1 if sectors contain non-zero data (this is a comparison > + * failure), and 4 on error (the exit status for read errors), after emitting > + * an error message. > * > * @param blk: BlockBackend for the image > * @param sect_num: Number of first sector to check > @@ -1218,7 +1220,7 @@ static int check_empty_sectors(BlockBackend *blk, > int64_t sect_num, > if (ret < 0) { > error_report("Error while reading offset %" PRId64 " of %s: %s", > sectors_to_bytes(sect_num), filename, strerror(-ret)); > -return ret; > +return 4; > } > idx = find_nonzero(buffer, sect_count * BDRV_SECTOR_SIZE); > if (idx >= 0) { > @@ -1473,11 +1475,6 @@ static int img_compare(int argc, char **argv) >filename2, buf1, quiet); > } > if (ret) { > -if (ret < 0) { > -error_report("Error while reading offset %" PRId64 ": > %s", > - sectors_to_bytes(sector_num), > strerror(-ret)); > -ret = 4; > -} > goto out; > } > } > @@ -1522,12 +1519,6 @@ static int img_compare(int argc, char **argv) > ret = check_empty_sectors(blk_over, sector_num, nb_sectors, >filename_over, buf1, quiet); > if (ret) { > -if (ret < 0) { > -error_report("Error while reading offset %" PRId64 > - " of %s: %s", > sectors_to_bytes(sector_num), > - filename_over, strerror(-ret)); > -ret = 4; > -} > goto out; > } > } > diff --git a/tests/qemu-iotests/074.out b/tests/qemu-iotests/074.out > index 8fba5aea9c..ede66c3f81 100644 > --- a/tests/qemu-iotests/074.out > +++ b/tests/qemu-iotests/074.out > @@ -4,7 +4,6 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 > wrote 512/512 bytes at offset 512 > 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > qemu-img: Error while reading offset 0 of > blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/t.IMGFMT: Input/output error > -qemu-img: Error while reading offset 0: Input/output error > 4 > Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 > Formatting 'TEST_DIR/t.IMGFMT.2', fmt=IMGFMT size=0 > @@ -12,7 +11,6 @@ Formatting 'TEST_DIR/t.IMGFMT.2', fmt=IMGFMT size=0 > wrote 512/512 bytes at offset 512 > 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > qemu-img: Error while reading offset 0 of > blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/t.IMGFMT: Input/output error > -qemu-img: Error while reading offset 0 of > blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/t.IMGFMT: Input/output error > Warning: Image size mismatch! > 4 > Cleanup > Hm, naively I might assume it's best for the caller to report the error and to leave the function a nicely self-contained helper, but I won't insist on it. Reviewed-by: John Snow
Re: [Qemu-block] [Qemu-devel] [PATCH] iotests: Fix 195 if IMGFMT is part of TEST_DIR
On 09/27/2017 04:13 PM, Max Reitz wrote: > do_run_qemu() in iotest 195 first applies _filter_imgfmt when printing > qemu's command line and _filter_testdir only afterwards. Therefore, if > the image format is part of the test directory path, _filter_testdir > will no longer apply and the actual output will differ from the > reference output even in case of success. > > For example, TEST_DIR might be "/tmp/test-qcow2", in which case > _filter_imgfmt first transforms this to "/tmp/test-IMGFMT" which is no > longer recognized as the TEST_DIR by _filter_testdir. > > Fix this by not applying _filter_imgfmt in do_run_qemu() but in > run_qemu() instead, and only after _filter_testdir. > > Signed-off-by: Max Reitz > --- > tests/qemu-iotests/195 | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [Qemu-devel] [PATCH v4 15/23] qemu-img: Add find_nonzero()
On 09/13/2017 12:03 PM, Eric Blake wrote: > During 'qemu-img compare', when we are checking that an allocated > portion of one file is all zeros, we don't need to waste time > computing how many additional sectors after the first non-zero > byte are also non-zero. Create a new helper find_nonzero() to do > the check for a first non-zero sector, and rebase > check_empty_sectors() to use it. > > The new interface intentionally uses bytes in its interface, even > though it still crawls the buffer a sector at a time; it is robust > to a partial sector at the end of the buffer. > > Signed-off-by: Eric Blake Reviewed-by: John Snow
[Qemu-block] [PATCH] iotests: Fix 195 if IMGFMT is part of TEST_DIR
do_run_qemu() in iotest 195 first applies _filter_imgfmt when printing qemu's command line and _filter_testdir only afterwards. Therefore, if the image format is part of the test directory path, _filter_testdir will no longer apply and the actual output will differ from the reference output even in case of success. For example, TEST_DIR might be "/tmp/test-qcow2", in which case _filter_imgfmt first transforms this to "/tmp/test-IMGFMT" which is no longer recognized as the TEST_DIR by _filter_testdir. Fix this by not applying _filter_imgfmt in do_run_qemu() but in run_qemu() instead, and only after _filter_testdir. Signed-off-by: Max Reitz --- tests/qemu-iotests/195 | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/tests/qemu-iotests/195 b/tests/qemu-iotests/195 index 05a239cbf5..e7a403ded2 100755 --- a/tests/qemu-iotests/195 +++ b/tests/qemu-iotests/195 @@ -44,15 +44,16 @@ _supported_os Linux function do_run_qemu() { -echo Testing: "$@" | _filter_imgfmt +echo Testing: "$@" $QEMU -nographic -qmp-pretty stdio -serial none "$@" echo } function run_qemu() { -do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_qemu | _filter_qmp \ - | _filter_qemu_io | _filter_generated_node_ids +do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_imgfmt | _filter_qemu \ + | _filter_qmp | _filter_qemu_io \ + | _filter_generated_node_ids } size=64M -- 2.13.5
Re: [Qemu-block] [PATCH v4 0/6] Misc improvements to crypto block driver
On 2017-09-27 14:53, Daniel P. Berrange wrote: > This is a followup to > > v1: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg00781.html > v2: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg06464.html > v3: https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg02923.html > > This collection of patches first improves the performance of the > crypto block driver and then does various cleanups to improve ongoing > maint work. > > Changed in v4: > > - Drop intermediate patch that replaced '512' with a constant (Max) > - Use MIN() macro where needed (Max) > - Fix bounce buffer size at 1MB instead of varying per sector size (Max) > - Convert missing qcrypto_block_encrypt call to sectors in qcow.c (Max) > > Changed in v3: > > - Support passthrough of BDRV_REQ_FUA (Eric) > - Fix potential truncation of payload offset values (Eric) > - Use encryption scheme sector size instead of BDRV_SECTOR_SIZE (Kevin) > - Use QEMU_IS_ALIGNED where appropriate (Eric) > - Remove unused 'sector_num' variable (Eric) > - Fix whitespace alignment (Eric) > - Fix math error in qcow conversion (Eric) > > Daniel P. Berrange (6): > block: use 1 MB bounce buffers for crypto instead of 16KB > crypto: expose encryption sector size in APIs > block: fix data type casting for crypto payload offset > block: convert crypto driver to bdrv_co_preadv|pwritev > block: convert qcrypto_block_encrypt|decrypt to take bytes offset > block: support passthrough of BDRV_REQ_FUA in crypto driver > > block/crypto.c | 130 > ++--- > block/qcow.c | 11 +++-- > block/qcow2-cluster.c | 8 ++- > block/qcow2.c | 4 +- > crypto/block-luks.c| 18 --- > crypto/block-qcow.c| 13 +++-- > crypto/block.c | 26 +++--- > crypto/blockpriv.h | 5 +- > include/crypto/block.h | 29 --- > 9 files changed, 148 insertions(+), 96 deletions(-) Thanks; hoping that is OK with you, I've applied this series to my block branch: https://github.com/XanClic/qemu/commits/block Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [Qemu-devel] [PATCH v4 14/23] qemu-img: Speed up compare on pre-allocated larger file
On 09/13/2017 12:03 PM, Eric Blake wrote: > Compare the following images with all-zero contents: > $ truncate --size 1M A > $ qemu-img create -f qcow2 -o preallocation=off B 1G > $ qemu-img create -f qcow2 -o preallocation=metadata C 1G > > On my machine, the difference is noticeable for pre-patch speeds, > with more than an order of magnitude in difference caused by the > choice of preallocation in the qcow2 file: > > $ time ./qemu-img compare -f raw -F qcow2 A B > Warning: Image size mismatch! > Images are identical. > > real 0m0.014s > user 0m0.007s > sys 0m0.007s > > $ time ./qemu-img compare -f raw -F qcow2 A C > Warning: Image size mismatch! > Images are identical. > > real 0m0.341s > user 0m0.144s > sys 0m0.188s > > Why? Because bdrv_is_allocated() returns false for image B but > true for image C, throwing away the fact that both images know > via lseek(SEEK_HOLE) that the entire image still reads as zero. > From there, qemu-img ends up calling bdrv_pread() for every byte > of the tail, instead of quickly looking for the next allocation. > The solution: use block_status instead of is_allocated, giving: > > $ time ./qemu-img compare -f raw -F qcow2 A C > Warning: Image size mismatch! > Images are identical. > > real 0m0.014s > user 0m0.011s > sys 0m0.003s > > which is on par with the speeds for no pre-allocation. > > Signed-off-by: Eric Blake Makes good sense to me. Reviewed-by: John Snow
Re: [Qemu-block] [PATCH v4 5/6] block: convert qcrypto_block_encrypt|decrypt to take bytes offset
On 2017-09-27 14:53, Daniel P. Berrange wrote: > Instead of sector offset, take the bytes offset when encrypting > or decrypting data. > > Signed-off-by: Daniel P. Berrange > --- > block/crypto.c | 12 > block/qcow.c | 11 +++ > block/qcow2-cluster.c | 8 +++- > block/qcow2.c | 4 ++-- > crypto/block-luks.c| 12 > crypto/block-qcow.c| 12 > crypto/block.c | 20 ++-- > crypto/blockpriv.h | 4 ++-- > include/crypto/block.h | 14 -- > 9 files changed, 56 insertions(+), 41 deletions(-) Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v4 4/6] block: convert crypto driver to bdrv_co_preadv|pwritev
On 2017-09-27 14:53, Daniel P. Berrange wrote: > Make the crypto driver implement the bdrv_co_preadv|pwritev > callbacks, and also use bdrv_co_preadv|pwritev for I/O > with the protocol driver beneath. This replaces sector based > I/O with byte based I/O, and allows us to stop assuming the > physical sector size matches the encryption sector size. > > Signed-off-by: Daniel P. Berrange > --- > block/crypto.c | 106 > + > 1 file changed, 54 insertions(+), 52 deletions(-) Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v4 1/6] block: use 1 MB bounce buffers for crypto instead of 16KB
On 2017-09-27 14:53, Daniel P. Berrange wrote: > Using 16KB bounce buffers creates a significant performance > penalty for I/O to encrypted volumes on storage which high > I/O latency (rotating rust & network drives), because it > triggers lots of fairly small I/O operations. > > On tests with rotating rust, and cache=none|directsync, > write speed increased from 2MiB/s to 32MiB/s, on a par > with that achieved by the in-kernel luks driver. With > other cache modes the in-kernel driver is still notably > faster because it is able to report completion of the > I/O request before any encryption is done, while the > in-QEMU driver must encrypt the data before completion. > > Signed-off-by: Daniel P. Berrange > --- > block/crypto.c | 28 +++- > 1 file changed, 15 insertions(+), 13 deletions(-) Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
[Qemu-block] [PATCH v2 4/5] pci: Add INTERFACE_CONVENTIONAL_PCI_DEVICE to Conventional PCI devices
Add INTERFACE_CONVENTIONAL_PCI_DEVICE to all direct subtypes of TYPE_PCI_DEVICE, except: 1) The ones that already have INTERFACE_PCIE_DEVICE set: * base-xhci * e1000e * nvme * pvscsi * vfio-pci * virtio-pci * vmxnet3 2) base-pci-bridge Not all PCI bridges are Conventional PCI devices, so INTERFACE_CONVENTIONAL_PCI_DEVICE is added only to the subtypes that are actually Conventional PCI: * dec-21154-p2p-bridge * i82801b11-bridge * pbm-bridge * pci-bridge The direct subtypes of base-pci-bridge not touched by this patch are: * xilinx-pcie-root: Already marked as PCIe-only. * pcie-pci-bridge: Already marked as PCIe-only. * pcie-port: all non-abstract subtypes of pcie-port are already marked as PCIe-only devices. 3) megasas-base Not all megasas devices are Conventional PCI devices, so the interface names are added to the subclasses registered by megasas_register_types(), according to information in the megasas_devices[] array. "megasas-gen2" already implements INTERFACE_PCIE_DEVICE, so add INTERFACE_CONVENTIONAL_PCI_DEVICE only to "megasas". Acked-by: Alberto Garcia Acked-by: John Snow Acked-by: Anthony PERARD Signed-off-by: Eduardo Habkost --- Changes v1 -> v2: * s/legacy/conventional/ * Suggested-by: Alex Williamson * Note about pcie-pci-bridge on commit message. * New devices: sungem, sunhme Cc: "Michael S. Tsirkin" Cc: Igor Mammedov Cc: Gerd Hoffmann Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: Stefano Stabellini Cc: Anthony Perard Cc: John Snow Cc: Alberto Garcia Cc: Aurelien Jarno Cc: Yongbok Kim Cc: Jiri Slaby Cc: Alexander Graf Cc: Marcel Apfelbaum Cc: Jason Wang Cc: Jiri Pirko Cc: "Hervé Poussineau" Cc: Peter Maydell Cc: David Gibson Cc: Hannes Reinecke Cc: Mark Cave-Ayland Cc: Artyom Tarasenko Cc: Alex Williamson Cc: qemu-de...@nongnu.org Cc: xen-de...@lists.xenproject.org Cc: qemu-block@nongnu.org Cc: qemu-...@nongnu.org Cc: qemu-...@nongnu.org --- hw/acpi/piix4.c | 1 + hw/audio/ac97.c | 4 hw/audio/es1370.c | 4 hw/audio/intel-hda.c| 4 hw/char/serial-pci.c| 12 hw/display/cirrus_vga.c | 4 hw/display/qxl.c| 4 hw/display/sm501.c | 4 hw/display/vga-pci.c| 4 hw/display/vmware_vga.c | 4 hw/i2c/smbus_ich9.c | 4 hw/i386/amd_iommu.c | 4 hw/i386/kvm/pci-assign.c| 4 hw/i386/pc_piix.c | 4 hw/i386/xen/xen_platform.c | 4 hw/i386/xen/xen_pvdevice.c | 4 hw/ide/ich.c| 4 hw/ide/pci.c| 4 hw/ipack/tpci200.c | 4 hw/isa/i82378.c | 4 hw/isa/lpc_ich9.c | 1 + hw/isa/piix4.c | 4 hw/isa/vt82c686.c | 16 hw/mips/gt64xxx_pci.c | 4 hw/misc/edu.c | 5 + hw/misc/ivshmem.c | 4 hw/misc/macio/macio.c | 4 hw/misc/pci-testdev.c | 4 hw/net/e1000.c | 4 hw/net/eepro100.c | 4 hw/net/ne2000.c | 4 hw/net/pcnet-pci.c | 4 hw/net/rocker/rocker.c | 4 hw/net/rtl8139.c| 4 hw/net/sungem.c | 4 hw/net/sunhme.c | 4 hw/pci-bridge/dec.c | 8 hw/pci-bridge/i82801b11.c | 4 hw/pci-bridge/pci_bridge_dev.c | 1 + hw/pci-bridge/pci_expander_bridge.c | 8 hw/pci-host/apb.c | 8 hw/pci-host/bonito.c| 4 hw/pci-host/gpex.c | 4 hw/pci-host/grackle.c | 4 hw/pci-host/piix.c | 8 hw/pci-host/ppce500.c | 4 hw/pci-host/prep.c | 4 hw/pci-host/q35.c | 4 hw/pci-host/uninorth.c | 16 hw/pci-host/versatile.c | 4 hw/ppc/ppc4xx_pci.c | 4 hw/scsi/esp-pci.c | 4 hw/scsi/lsi53c895a.c| 4 hw/scsi/megasas.c | 4 hw/scsi/mptsas.c| 4 hw/sd/sdhci.c | 4 hw/sh4/sh_pci.c | 4 hw/sparc64/sun4u.c | 4 hw/usb/hcd-ehci-pci.c | 4 hw/usb/hcd-ohci.c | 4 hw/usb/hcd-uhci.c | 4 hw/vfio/pci-quirks.c| 4 hw/watchdog/wdt_i6300esb.c | 4 hw/xen/xen_pt.c | 4
[Qemu-block] [PATCH v2 3/5] pci: Add INTERFACE_PCIE_DEVICE to all PCIe devices
Change all devices that set is_express=1 to implement INTERFACE_PCIE_DEVICE. Cc: Keith Busch Cc: Kevin Wolf Cc: Max Reitz Cc: Dmitry Fleytman Cc: Jason Wang Cc: "Michael S. Tsirkin" Cc: Marcel Apfelbaum Cc: Paul Burton Cc: Paolo Bonzini Cc: Hannes Reinecke Cc: qemu-block@nongnu.org Reviewed-by: Alistair Francis Signed-off-by: Eduardo Habkost --- Changes v1 -> v2: * base-xhci is marked as hybrid, now (in another patch) * Included pcie-pci-bridge --- hw/block/nvme.c| 4 hw/net/e1000e.c| 4 hw/pci-bridge/pcie_pci_bridge.c| 1 + hw/pci-bridge/pcie_root_port.c | 4 hw/pci-bridge/xio3130_downstream.c | 4 hw/pci-bridge/xio3130_upstream.c | 4 hw/pci-host/xilinx-pcie.c | 4 hw/scsi/megasas.c | 6 ++ 8 files changed, 31 insertions(+) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 9aa32692a3..441e21ed1f 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1110,6 +1110,10 @@ static const TypeInfo nvme_info = { .instance_size = sizeof(NvmeCtrl), .class_init= nvme_class_init, .instance_init = nvme_instance_init, +.interfaces = (InterfaceInfo[]) { +{ INTERFACE_PCIE_DEVICE }, +{ } +}, }; static void nvme_register_types(void) diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c index 6c42b4478c..81f7934a59 100644 --- a/hw/net/e1000e.c +++ b/hw/net/e1000e.c @@ -708,6 +708,10 @@ static const TypeInfo e1000e_info = { .instance_size = sizeof(E1000EState), .class_init = e1000e_class_init, .instance_init = e1000e_instance_init, +.interfaces = (InterfaceInfo[]) { +{ INTERFACE_PCIE_DEVICE }, +{ } +}, }; static void e1000e_register_types(void) diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c index 9aa5cc3e45..88db143633 100644 --- a/hw/pci-bridge/pcie_pci_bridge.c +++ b/hw/pci-bridge/pcie_pci_bridge.c @@ -180,6 +180,7 @@ static const TypeInfo pcie_pci_bridge_info = { .class_init = pcie_pci_bridge_class_init, .interfaces = (InterfaceInfo[]) { { TYPE_HOTPLUG_HANDLER }, +{ INTERFACE_PCIE_DEVICE }, { }, } }; diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c index 4d588cb22e..9b6e4ce512 100644 --- a/hw/pci-bridge/pcie_root_port.c +++ b/hw/pci-bridge/pcie_root_port.c @@ -161,6 +161,10 @@ static const TypeInfo rp_info = { .class_init= rp_class_init, .abstract = true, .class_size = sizeof(PCIERootPortClass), +.interfaces = (InterfaceInfo[]) { +{ INTERFACE_PCIE_DEVICE }, +{ } +}, }; static void rp_register_types(void) diff --git a/hw/pci-bridge/xio3130_downstream.c b/hw/pci-bridge/xio3130_downstream.c index e706f36cb7..7d2f7629c1 100644 --- a/hw/pci-bridge/xio3130_downstream.c +++ b/hw/pci-bridge/xio3130_downstream.c @@ -195,6 +195,10 @@ static const TypeInfo xio3130_downstream_info = { .name = "xio3130-downstream", .parent= TYPE_PCIE_SLOT, .class_init= xio3130_downstream_class_init, +.interfaces = (InterfaceInfo[]) { +{ INTERFACE_PCIE_DEVICE }, +{ } +}, }; static void xio3130_downstream_register_types(void) diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c index a052224bbf..227997ce46 100644 --- a/hw/pci-bridge/xio3130_upstream.c +++ b/hw/pci-bridge/xio3130_upstream.c @@ -166,6 +166,10 @@ static const TypeInfo xio3130_upstream_info = { .name = "x3130-upstream", .parent= TYPE_PCIE_PORT, .class_init= xio3130_upstream_class_init, +.interfaces = (InterfaceInfo[]) { +{ INTERFACE_PCIE_DEVICE }, +{ } +}, }; static void xio3130_upstream_register_types(void) diff --git a/hw/pci-host/xilinx-pcie.c b/hw/pci-host/xilinx-pcie.c index 4613dda1d2..7659253090 100644 --- a/hw/pci-host/xilinx-pcie.c +++ b/hw/pci-host/xilinx-pcie.c @@ -317,6 +317,10 @@ static const TypeInfo xilinx_pcie_root_info = { .parent = TYPE_PCI_BRIDGE, .instance_size = sizeof(XilinxPCIERoot), .class_init = xilinx_pcie_root_class_init, +.interfaces = (InterfaceInfo[]) { +{ INTERFACE_PCIE_DEVICE }, +{ } +}, }; static void xilinx_pcie_register(void) diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c index 0db68aacee..535ee267c3 100644 --- a/hw/scsi/megasas.c +++ b/hw/scsi/megasas.c @@ -2451,6 +2451,7 @@ typedef struct MegasasInfo { int osts; const VMStateDescription *vmsd; Property *props; +InterfaceInfo *interfaces; } MegasasInfo; static struct MegasasInfo megasas_devices[] = { @@ -2480,6 +2481,10 @@ static struct MegasasInfo megasas_devices[] = { .is_express = true, .vmsd = &vmstate_megasas_gen2, .props = megasas_properties_gen2, +.interfaces = (InterfaceInfo[]) { +{ INTERFACE_PCIE_DEVICE }, +{ } +}, } };
Re: [Qemu-block] [Qemu-devel] [PATCH v2] virtio: introduce `info virtio' hmp command
On Wed, Sep 27, 2017 at 06:09:42PM +0300, Jan Dakinevich wrote: > The command is intended for exposing device specific virtio feature bits > and their negotiation status. It is convenient and useful for debug > purpose. > > Names of features are taken from a devices via get_feature_name() within > VirtioDeviceClass. If certain device doesn't implement it, the command > will print only hexadecimal value of feature mask. > > Cc: Denis V. Lunev > Signed-off-by: Jan Dakinevich OK but if it's useful as an hmp command, why not as a qmp command? > --- > v2: > * Moved hmp_info_virtio and stuff to hmp.c to avoid build error > * Added printing of device status > * Listed common virtio features > > v1: http://lists.nongnu.org/archive/html/qemu-devel/2017-09/msg07247.html > --- > hmp-commands-info.hx| 14 + > hmp.c | 148 > > hmp.h | 1 + > hw/block/virtio-blk.c | 20 ++ > hw/char/virtio-serial-bus.c | 15 + > hw/display/virtio-gpu.c | 12 > hw/net/virtio-net.c | 34 ++ > hw/scsi/virtio-scsi.c | 15 + > hw/virtio/virtio-balloon.c | 15 + > hw/virtio/virtio-bus.c | 1 + > include/hw/virtio/virtio.h | 2 + > monitor.c | 1 + > 12 files changed, 278 insertions(+) > > diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx > index 4f1ece9..2550027 100644 > --- a/hmp-commands-info.hx > +++ b/hmp-commands-info.hx > @@ -868,6 +868,20 @@ ETEXI > }, > > STEXI > +@item info virtio > +@findex virtio > +Display guest and host fetures for all virtio devices. > +ETEXI > + > +{ > +.name = "virtio", > +.args_type = "", > +.params = "", > +.help = "show virtio info", > +.cmd= hmp_info_virtio, > +}, > + > +STEXI > @end table > ETEXI > > diff --git a/hmp.c b/hmp.c > index ace729d..009d808 100644 > --- a/hmp.c > +++ b/hmp.c > @@ -43,6 +43,7 @@ > #include "hw/intc/intc.h" > #include "migration/snapshot.h" > #include "migration/misc.h" > +#include "hw/virtio/virtio.h" > > #ifdef CONFIG_SPICE > #include > @@ -2894,3 +2895,150 @@ void hmp_info_memory_size_summary(Monitor *mon, const > QDict *qdict) > } > hmp_handle_error(mon, &err); > } > + > +#define HMP_INFO_VIRTIO_INDENT 2 > +#define HMP_INFO_VIRTIO_FIELD 32 > + > +static void hmp_info_virtio_print_status(Monitor *mon, VirtIODevice *vdev) > +{ > +uint8_t status[] = { > +VIRTIO_CONFIG_S_ACKNOWLEDGE, > +VIRTIO_CONFIG_S_DRIVER, > +VIRTIO_CONFIG_S_DRIVER_OK, > +VIRTIO_CONFIG_S_FEATURES_OK, > +VIRTIO_CONFIG_S_NEEDS_RESET, > +VIRTIO_CONFIG_S_FAILED, > +}; > +const char *names[] = { > +"acknowledge", > +"driver", > +"driver_ok", > +"features_ok", > +"needs_reset" > +"failed", > +}; > +const char *comma = ""; > +int idx; > + > +monitor_printf(mon, "%*sstatus: 0x%02"PRIx8" ", > + HMP_INFO_VIRTIO_INDENT, "", vdev->status); > + > +for (idx = 0; idx < ARRAY_SIZE(status); idx++) { > +if (!(vdev->status & status[idx])) { > +continue; > +} > +monitor_printf(mon, "%s%s", comma, names[idx]); > +if (names[idx]) { > +comma = ","; > +} > +} > +monitor_printf(mon, "\n"); > +} > + > +static void hmp_info_virtio_print_common_features(Monitor *mon, > + VirtIODevice *vdev) > +{ > +uint64_t features[] = { > +VIRTIO_RING_F_INDIRECT_DESC, > +VIRTIO_RING_F_EVENT_IDX, > +VIRTIO_F_NOTIFY_ON_EMPTY, > +VIRTIO_F_ANY_LAYOUT, > +VIRTIO_F_IOMMU_PLATFORM, > +VIRTIO_F_VERSION_1, > +}; > +const char *names[] = { > +"indirect_desc", > +"event_idx", > +"notify_on_empty", > +"any_layout", > +"iommu_platform", > +"version_1", > +}; > +int idx; > + > +monitor_printf(mon, "%*scommon features:\n", HMP_INFO_VIRTIO_INDENT, ""); > + > +for (idx = 0; idx < ARRAY_SIZE(features); idx++) { > +const char *ack = vdev->guest_features & features[idx] ? "acked" : > ""; > + > +if (!(vdev->host_features & features[idx])) { > +continue; > +} > +monitor_printf(mon, "%*s%*s%*s\n", HMP_INFO_VIRTIO_INDENT, "", > + HMP_INFO_VIRTIO_FIELD, names[idx], > + HMP_INFO_VIRTIO_FIELD, ack); > +} > +} > + > +static void hmp_info_virtio_print_specific_features(Monitor *mon, > +VirtIODevice *vdev) > +{ > +VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev); > +int idx; > + > +if (!vdc->get_feature_name) { > +return; > +} > + > +monitor_printf(mon, "%*sspecific features:\n", HMP_INFO_VIRTIO_INDENT
Re: [Qemu-block] [Qemu-devel] [PATCH v4 12/23] block: Convert bdrv_get_block_status_above() to bytes
On 09/27/2017 02:57 PM, Eric Blake wrote: > On 09/27/2017 01:41 PM, John Snow wrote: >> >> >> On 09/13/2017 12:03 PM, Eric Blake wrote: >>> We are gradually moving away from sector-based interfaces, towards >>> byte-based. In the common case, allocation is unlikely to ever use >>> values that are not naturally sector-aligned, but it is possible >>> that byte-based values will let us be more precise about allocation >>> at the end of an unaligned file that can do byte-based access. >>> >>> Changing the name of the function from bdrv_get_block_status_above() >>> to bdrv_block_status_above() ensures that the compiler enforces that >>> all callers are updated. For now, the io.c layer still assert()s >>> that all callers are sector-aligned, but that can be relaxed when a >>> later patch implements byte-based block status in the drivers. >>> >>> For the most part this patch is just the addition of scaling at the >>> callers followed by inverse scaling at bdrv_block_status(). But some >>> code, particularly bdrv_block_status(), gets a lot simpler because >>> it no longer has to mess with sectors. Likewise, mirror code no >>> longer computes s->granularity >> BDRV_SECTOR_BITS, and can therefore >>> drop an assertion (fix a neighboring assertion to use is_power_of_2 >>> while there). >>> >> >> Huh, I suppose so, yeah. Do you have a test that covers what happens in >> this newly available use case? > > Not directly - the mirror code no longer requires sector alignment, but > is still unlikely to use sub-sector requests unless a particular driver > returns really small status information. I suppose we could tweak the > blkdebug driver to force status requests to be fragmented at > ridiculously small alignments, and then prove that mirroring still > occurs correctly, once all the series are in, but it's probably more > effort than it is worth to force sub-sector mirroring if we don't have a > real use case that will rely on it. > Hmm, yeah, the code probably can't be exercised currently but I do wonder if we're removing too many breadcrumbs for potential problem spots if someone decides to return sub-sector information in the future. Well, I suppose I haven't been too diligent about complaining about their removal elsewhere, so for consistency: Either with or without the assertion removed as you see fit: Reviewed-by: John Snow >> >>> For ease of review, bdrv_get_block_status() was tackled separately. >>> >>> Signed-off-by: Eric Blake >>> >> >> Looks mechanically correct, anyway. >> >> Reviewed-by: John Snow >> >
Re: [Qemu-block] [Qemu-devel] [PATCH v4 13/23] qemu-img: Simplify logic in img_compare()
On 09/27/2017 02:05 PM, John Snow wrote: > > > On 09/13/2017 12:03 PM, Eric Blake wrote: >> As long as we are querying the status for a chunk smaller than >> the known image size, we are guaranteed that a successful return >> will have set pnum to a non-zero size (pnum is zero only for >> queries beyond the end of the file). Use that to slightly >> simplify the calculation of the current chunk size being compared. >> Likewise, we don't have to shrink the amount of data operated on >> until we know we have to read the file, and therefore have to fit >> in the bounds of our buffer. Also, note that 'total_sectors_over' >> is equivalent to 'progress_base'. >> >> With these changes in place, sectors_to_process() is now dead code, >> and can be removed. >> >> Signed-off-by: Eric Blake >> >> @@ -1402,14 +1393,9 @@ static int img_compare(int argc, char **argv) >> goto out; >> } >> allocated2 = status2 & BDRV_BLOCK_ALLOCATED; >> -if (pnum1) { >> -nb_sectors = MIN(nb_sectors, >> - DIV_ROUND_UP(pnum1, BDRV_SECTOR_SIZE)); >> -} >> -if (pnum2) { >> -nb_sectors = MIN(nb_sectors, >> - DIV_ROUND_UP(pnum2, BDRV_SECTOR_SIZE)); >> -} >> + >> +assert(pnum1 && pnum2); >> +nb_sectors = DIV_ROUND_UP(MIN(pnum1, pnum2), BDRV_SECTOR_SIZE); > > In the apocalyptic future where non-sector sized returns are possible, > does this math make sense? > > e.g. say the return is zeroes, but it's not aligned anymore, so we > assume we have an extra half a sector's worth of zeroes here. Not introduced in this patch, but a good question for 12/23. We want to round up rather than down to ensure that we don't inf-loop on a partial sector response; but at the same time, you're right that if we got a report of a half-sector zero and we widen it, we can't guarantee that the second half is zero. On the bright side, this rounding goes away when later patches switch img_compare to be byte-based, later in this series. But you're right that it is probably smarter to have 12/23 assert that things are already aligned (and thus we don't need to round in the first place). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [Qemu-devel] [PATCH v4 13/23] qemu-img: Simplify logic in img_compare()
On 09/13/2017 12:03 PM, Eric Blake wrote: > As long as we are querying the status for a chunk smaller than > the known image size, we are guaranteed that a successful return > will have set pnum to a non-zero size (pnum is zero only for > queries beyond the end of the file). Use that to slightly > simplify the calculation of the current chunk size being compared. > Likewise, we don't have to shrink the amount of data operated on > until we know we have to read the file, and therefore have to fit > in the bounds of our buffer. Also, note that 'total_sectors_over' > is equivalent to 'progress_base'. > > With these changes in place, sectors_to_process() is now dead code, > and can be removed. > > Signed-off-by: Eric Blake > > --- > v3: new patch > --- > qemu-img.c | 40 +++- > 1 file changed, 11 insertions(+), 29 deletions(-) > > diff --git a/qemu-img.c b/qemu-img.c > index b91133b922..f8423e9b3f 100644 > --- a/qemu-img.c > +++ b/qemu-img.c > @@ -1171,11 +1171,6 @@ static int64_t sectors_to_bytes(int64_t sectors) > return sectors << BDRV_SECTOR_BITS; > } > > -static int64_t sectors_to_process(int64_t total, int64_t from) > -{ > -return MIN(total - from, IO_BUF_SIZE >> BDRV_SECTOR_BITS); > -} > - > /* > * Check if passed sectors are empty (not allocated or contain only 0 bytes) > * > @@ -1372,13 +1367,9 @@ static int img_compare(int argc, char **argv) > goto out; > } > > -for (;;) { > +while (sector_num < total_sectors) { > int64_t status1, status2; > > -nb_sectors = sectors_to_process(total_sectors, sector_num); > -if (nb_sectors <= 0) { > -break; > -} > status1 = bdrv_block_status_above(bs1, NULL, >sector_num * BDRV_SECTOR_SIZE, >(total_sectors1 - sector_num) * > @@ -1402,14 +1393,9 @@ static int img_compare(int argc, char **argv) > goto out; > } > allocated2 = status2 & BDRV_BLOCK_ALLOCATED; > -if (pnum1) { > -nb_sectors = MIN(nb_sectors, > - DIV_ROUND_UP(pnum1, BDRV_SECTOR_SIZE)); > -} > -if (pnum2) { > -nb_sectors = MIN(nb_sectors, > - DIV_ROUND_UP(pnum2, BDRV_SECTOR_SIZE)); > -} > + > +assert(pnum1 && pnum2); > +nb_sectors = DIV_ROUND_UP(MIN(pnum1, pnum2), BDRV_SECTOR_SIZE); In the apocalyptic future where non-sector sized returns are possible, does this math make sense? e.g. say the return is zeroes, but it's not aligned anymore, so we assume we have an extra half a sector's worth of zeroes here. > > if (strict) { > if ((status1 & ~BDRV_BLOCK_OFFSET_MASK) != > @@ -1422,9 +1408,10 @@ static int img_compare(int argc, char **argv) > } > } > if ((status1 & BDRV_BLOCK_ZERO) && (status2 & BDRV_BLOCK_ZERO)) { > -nb_sectors = DIV_ROUND_UP(MIN(pnum1, pnum2), BDRV_SECTOR_SIZE); > +/* nothing to do */ > } else if (allocated1 == allocated2) { > if (allocated1) { > +nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> > BDRV_SECTOR_BITS); > ret = blk_pread(blk1, sector_num << BDRV_SECTOR_BITS, buf1, > nb_sectors << BDRV_SECTOR_BITS); > if (ret < 0) { > @@ -1453,7 +1440,7 @@ static int img_compare(int argc, char **argv) > } > } > } else { > - > +nb_sectors = MIN(nb_sectors, IO_BUF_SIZE >> BDRV_SECTOR_BITS); > if (allocated1) { > ret = check_empty_sectors(blk1, sector_num, nb_sectors, >filename1, buf1, quiet); > @@ -1476,30 +1463,24 @@ static int img_compare(int argc, char **argv) > > if (total_sectors1 != total_sectors2) { > BlockBackend *blk_over; > -int64_t total_sectors_over; > const char *filename_over; > > qprintf(quiet, "Warning: Image size mismatch!\n"); > if (total_sectors1 > total_sectors2) { > -total_sectors_over = total_sectors1; > blk_over = blk1; > filename_over = filename1; > } else { > -total_sectors_over = total_sectors2; > blk_over = blk2; > filename_over = filename2; > } > > -for (;;) { > +while (sector_num < progress_base) { > int64_t count; > > -nb_sectors = sectors_to_process(total_sectors_over, sector_num); > -if (nb_sectors <= 0) { > -break; > -} > ret = bdrv_is_allocated_above(blk_bs(blk_over), NULL, >sector_num * BDRV_SECTOR_SIZE, > - nb_sectors * BDRV_SECTOR_SIZE, > +
Re: [Qemu-block] [Qemu-devel] [PATCH v4 12/23] block: Convert bdrv_get_block_status_above() to bytes
On 09/27/2017 01:41 PM, John Snow wrote: > > > On 09/13/2017 12:03 PM, Eric Blake wrote: >> We are gradually moving away from sector-based interfaces, towards >> byte-based. In the common case, allocation is unlikely to ever use >> values that are not naturally sector-aligned, but it is possible >> that byte-based values will let us be more precise about allocation >> at the end of an unaligned file that can do byte-based access. >> >> Changing the name of the function from bdrv_get_block_status_above() >> to bdrv_block_status_above() ensures that the compiler enforces that >> all callers are updated. For now, the io.c layer still assert()s >> that all callers are sector-aligned, but that can be relaxed when a >> later patch implements byte-based block status in the drivers. >> >> For the most part this patch is just the addition of scaling at the >> callers followed by inverse scaling at bdrv_block_status(). But some >> code, particularly bdrv_block_status(), gets a lot simpler because >> it no longer has to mess with sectors. Likewise, mirror code no >> longer computes s->granularity >> BDRV_SECTOR_BITS, and can therefore >> drop an assertion (fix a neighboring assertion to use is_power_of_2 >> while there). >> > > Huh, I suppose so, yeah. Do you have a test that covers what happens in > this newly available use case? Not directly - the mirror code no longer requires sector alignment, but is still unlikely to use sub-sector requests unless a particular driver returns really small status information. I suppose we could tweak the blkdebug driver to force status requests to be fragmented at ridiculously small alignments, and then prove that mirroring still occurs correctly, once all the series are in, but it's probably more effort than it is worth to force sub-sector mirroring if we don't have a real use case that will rely on it. > >> For ease of review, bdrv_get_block_status() was tackled separately. >> >> Signed-off-by: Eric Blake >> > > Looks mechanically correct, anyway. > > Reviewed-by: John Snow > -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [Qemu-devel] [PATCH v4 12/23] block: Convert bdrv_get_block_status_above() to bytes
On 09/13/2017 12:03 PM, Eric Blake wrote: > We are gradually moving away from sector-based interfaces, towards > byte-based. In the common case, allocation is unlikely to ever use > values that are not naturally sector-aligned, but it is possible > that byte-based values will let us be more precise about allocation > at the end of an unaligned file that can do byte-based access. > > Changing the name of the function from bdrv_get_block_status_above() > to bdrv_block_status_above() ensures that the compiler enforces that > all callers are updated. For now, the io.c layer still assert()s > that all callers are sector-aligned, but that can be relaxed when a > later patch implements byte-based block status in the drivers. > > For the most part this patch is just the addition of scaling at the > callers followed by inverse scaling at bdrv_block_status(). But some > code, particularly bdrv_block_status(), gets a lot simpler because > it no longer has to mess with sectors. Likewise, mirror code no > longer computes s->granularity >> BDRV_SECTOR_BITS, and can therefore > drop an assertion (fix a neighboring assertion to use is_power_of_2 > while there). > Huh, I suppose so, yeah. Do you have a test that covers what happens in this newly available use case? > For ease of review, bdrv_get_block_status() was tackled separately. > > Signed-off-by: Eric Blake > Looks mechanically correct, anyway. Reviewed-by: John Snow
Re: [Qemu-block] [Qemu-devel] [PATCH v4 11/23] block: Switch bdrv_co_get_block_status_above() to byte-based
On 09/13/2017 12:03 PM, Eric Blake wrote: > We are gradually converting to byte-based interfaces, as they are > easier to reason about than sector-based. Convert another internal > type (no semantic change), and rename it to match the corresponding > public function rename. > > Signed-off-by: Eric Blake > Reviewed-by: Fam Zheng Reviewed-by: John Snow
Re: [Qemu-block] [Qemu-devel] [PATCH v4 10/23] block: Switch bdrv_common_block_status_above() to byte-based
On 09/13/2017 12:03 PM, Eric Blake wrote: > We are gradually converting to byte-based interfaces, as they are > easier to reason about than sector-based. Convert another internal > function (no semantic change). > > Signed-off-by: Eric Blake > Reviewed-by: Fam Zheng Reviewed-by: John Snow
[Qemu-block] [Qemu-devel] [PATCH v3] virtio: introduce `info virtio' hmp command
The command is intended for exposing device specific virtio feature bits and their negotiation status. It is convenient and useful for debug purpose. Names of features are taken from a devices via get_feature_name() within VirtioDeviceClass. If certain device doesn't implement it, the command will print only hexadecimal value of feature mask. Cc: Denis V. Lunev Signed-off-by: Jan Dakinevich --- v3: * Avoid signed int * Use virtio_vdev_has_feature()/virtio_host_has_feature() v2: http://lists.nongnu.org/archive/html/qemu-devel/2017-09/msg07527.html v1: http://lists.nongnu.org/archive/html/qemu-devel/2017-09/msg07247.html --- hmp-commands-info.hx| 14 + hmp.c | 149 hmp.h | 1 + hw/block/virtio-blk.c | 21 +++ hw/char/virtio-serial-bus.c | 15 + hw/display/virtio-gpu.c | 13 hw/net/virtio-net.c | 35 +++ hw/scsi/virtio-scsi.c | 16 + hw/virtio/virtio-balloon.c | 15 + hw/virtio/virtio-bus.c | 1 + include/hw/virtio/virtio.h | 2 + monitor.c | 1 + 12 files changed, 283 insertions(+) diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx index 4f1ece9..2550027 100644 --- a/hmp-commands-info.hx +++ b/hmp-commands-info.hx @@ -868,6 +868,20 @@ ETEXI }, STEXI +@item info virtio +@findex virtio +Display guest and host fetures for all virtio devices. +ETEXI + +{ +.name = "virtio", +.args_type = "", +.params = "", +.help = "show virtio info", +.cmd= hmp_info_virtio, +}, + +STEXI @end table ETEXI diff --git a/hmp.c b/hmp.c index ace729d..f231395 100644 --- a/hmp.c +++ b/hmp.c @@ -43,6 +43,7 @@ #include "hw/intc/intc.h" #include "migration/snapshot.h" #include "migration/misc.h" +#include "hw/virtio/virtio.h" #ifdef CONFIG_SPICE #include @@ -2894,3 +2895,151 @@ void hmp_info_memory_size_summary(Monitor *mon, const QDict *qdict) } hmp_handle_error(mon, &err); } + +#define HMP_INFO_VIRTIO_INDENT 2 +#define HMP_INFO_VIRTIO_FIELD 32 + +static void hmp_info_virtio_print_status(Monitor *mon, VirtIODevice *vdev) +{ +uint8_t status[] = { +VIRTIO_CONFIG_S_ACKNOWLEDGE, +VIRTIO_CONFIG_S_DRIVER, +VIRTIO_CONFIG_S_DRIVER_OK, +VIRTIO_CONFIG_S_FEATURES_OK, +VIRTIO_CONFIG_S_NEEDS_RESET, +VIRTIO_CONFIG_S_FAILED, +}; +const char *names[] = { +"acknowledge", +"driver", +"driver_ok", +"features_ok", +"needs_reset" +"failed", +}; +const char *comma = ""; +unsigned idx; + +monitor_printf(mon, "%*sstatus: 0x%02"PRIx8" ", + HMP_INFO_VIRTIO_INDENT, "", vdev->status); + +for (idx = 0; idx < ARRAY_SIZE(status); idx++) { +if (!(vdev->status & status[idx])) { +continue; +} +monitor_printf(mon, "%s%s", comma, names[idx]); +if (names[idx]) { +comma = ","; +} +} +monitor_printf(mon, "\n"); +} + +static void hmp_info_virtio_print_common_features(Monitor *mon, + VirtIODevice *vdev) +{ +uint64_t features[] = { +VIRTIO_RING_F_INDIRECT_DESC, +VIRTIO_RING_F_EVENT_IDX, +VIRTIO_F_NOTIFY_ON_EMPTY, +VIRTIO_F_ANY_LAYOUT, +VIRTIO_F_IOMMU_PLATFORM, +VIRTIO_F_VERSION_1, +}; +const char *names[] = { +"indirect_desc", +"event_idx", +"notify_on_empty", +"any_layout", +"iommu_platform", +"version_1", +}; +unsigned idx; + +monitor_printf(mon, "%*scommon features:\n", HMP_INFO_VIRTIO_INDENT, ""); + +for (idx = 0; idx < ARRAY_SIZE(features); idx++) { +const char *ack = virtio_vdev_has_feature(vdev, features[idx]) +? "acked" : ""; + +if (!virtio_host_has_feature(vdev, features[idx])) { +continue; +} +monitor_printf(mon, "%*s%*s%*s\n", HMP_INFO_VIRTIO_INDENT, "", + HMP_INFO_VIRTIO_FIELD, names[idx], + HMP_INFO_VIRTIO_FIELD, ack); +} +} + +static void hmp_info_virtio_print_specific_features(Monitor *mon, +VirtIODevice *vdev) +{ +VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev); +unsigned idx; + +if (!vdc->get_feature_name) { +return; +} + +monitor_printf(mon, "%*sspecific features:\n", HMP_INFO_VIRTIO_INDENT, ""); + +for (idx = 0; idx < 64; idx++) { +const char *name = vdc->get_feature_name(vdev, idx); +const char *ack = virtio_vdev_has_feature(vdev, idx) ? "acked" : ""; + +if (!name) { +continue; +} +if (!virtio_host_has_feature(vdev, idx)) { +continue; +} +monitor_printf(mon, "%*s%*s%*s\n", HMP_INFO_VIRT
Re: [Qemu-block] [Qemu-devel] [PATCH v2] virtio: introduce `info virtio' hmp command
On 09/27/2017 06:53 PM, Dr. David Alan Gilbert wrote: > * Jan Dakinevich (jan.dakinev...@virtuozzo.com) wrote: >> The command is intended for exposing device specific virtio feature bits >> and their negotiation status. It is convenient and useful for debug >> purpose. >> >> Names of features are taken from a devices via get_feature_name() within >> VirtioDeviceClass. If certain device doesn't implement it, the command >> will print only hexadecimal value of feature mask. >> >> Cc: Denis V. Lunev >> Signed-off-by: Jan Dakinevich >> --- >> v2: >> * Moved hmp_info_virtio and stuff to hmp.c to avoid build error >> * Added printing of device status >> * Listed common virtio features >> >> v1: http://lists.nongnu.org/archive/html/qemu-devel/2017-09/msg07247.html >> --- >> hmp-commands-info.hx| 14 + >> hmp.c | 148 >> >> hmp.h | 1 + >> hw/block/virtio-blk.c | 20 ++ >> hw/char/virtio-serial-bus.c | 15 + >> hw/display/virtio-gpu.c | 12 >> hw/net/virtio-net.c | 34 ++ >> hw/scsi/virtio-scsi.c | 15 + >> hw/virtio/virtio-balloon.c | 15 + >> hw/virtio/virtio-bus.c | 1 + >> include/hw/virtio/virtio.h | 2 + >> monitor.c | 1 + >> 12 files changed, 278 insertions(+) >> >> diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx >> index 4f1ece9..2550027 100644 >> --- a/hmp-commands-info.hx >> +++ b/hmp-commands-info.hx >> @@ -868,6 +868,20 @@ ETEXI >> }, >> >> STEXI >> +@item info virtio >> +@findex virtio >> +Display guest and host fetures for all virtio devices. >> +ETEXI >> + >> +{ >> +.name = "virtio", >> +.args_type = "", >> +.params = "", >> +.help = "show virtio info", >> +.cmd= hmp_info_virtio, >> +}, >> + >> +STEXI >> @end table >> ETEXI >> >> diff --git a/hmp.c b/hmp.c >> index ace729d..009d808 100644 >> --- a/hmp.c >> +++ b/hmp.c >> @@ -43,6 +43,7 @@ >> #include "hw/intc/intc.h" >> #include "migration/snapshot.h" >> #include "migration/misc.h" >> +#include "hw/virtio/virtio.h" >> >> #ifdef CONFIG_SPICE >> #include >> @@ -2894,3 +2895,150 @@ void hmp_info_memory_size_summary(Monitor *mon, >> const QDict *qdict) >> } >> hmp_handle_error(mon, &err); >> } >> + >> +#define HMP_INFO_VIRTIO_INDENT 2 >> +#define HMP_INFO_VIRTIO_FIELD 32 >> + >> +static void hmp_info_virtio_print_status(Monitor *mon, VirtIODevice *vdev) >> +{ >> +uint8_t status[] = { >> +VIRTIO_CONFIG_S_ACKNOWLEDGE, >> +VIRTIO_CONFIG_S_DRIVER, >> +VIRTIO_CONFIG_S_DRIVER_OK, >> +VIRTIO_CONFIG_S_FEATURES_OK, >> +VIRTIO_CONFIG_S_NEEDS_RESET, >> +VIRTIO_CONFIG_S_FAILED, >> +}; >> +const char *names[] = { >> +"acknowledge", >> +"driver", >> +"driver_ok", >> +"features_ok", >> +"needs_reset" >> +"failed", >> +}; >> +const char *comma = ""; >> +int idx; >> + >> +monitor_printf(mon, "%*sstatus: 0x%02"PRIx8" ", >> + HMP_INFO_VIRTIO_INDENT, "", vdev->status); >> + >> +for (idx = 0; idx < ARRAY_SIZE(status); idx++) { >> +if (!(vdev->status & status[idx])) { >> +continue; >> +} >> +monitor_printf(mon, "%s%s", comma, names[idx]); >> +if (names[idx]) { >> +comma = ","; >> +} >> +} >> +monitor_printf(mon, "\n"); >> +} >> + >> +static void hmp_info_virtio_print_common_features(Monitor *mon, >> + VirtIODevice *vdev) >> +{ >> +uint64_t features[] = { >> +VIRTIO_RING_F_INDIRECT_DESC, >> +VIRTIO_RING_F_EVENT_IDX, >> +VIRTIO_F_NOTIFY_ON_EMPTY, >> +VIRTIO_F_ANY_LAYOUT, >> +VIRTIO_F_IOMMU_PLATFORM, >> +VIRTIO_F_VERSION_1, >> +}; >> +const char *names[] = { >> +"indirect_desc", >> +"event_idx", >> +"notify_on_empty", >> +"any_layout", >> +"iommu_platform", >> +"version_1", >> +}; >> +int idx; >> + >> +monitor_printf(mon, "%*scommon features:\n", HMP_INFO_VIRTIO_INDENT, >> ""); >> + >> +for (idx = 0; idx < ARRAY_SIZE(features); idx++) { >> +const char *ack = vdev->guest_features & features[idx] ? "acked" : >> ""; >> + >> +if (!(vdev->host_features & features[idx])) { >> +continue; >> +} >> +monitor_printf(mon, "%*s%*s%*s\n", HMP_INFO_VIRTIO_INDENT, "", >> + HMP_INFO_VIRTIO_FIELD, names[idx], >> + HMP_INFO_VIRTIO_FIELD, ack); >> +} >> +} >> + >> +static void hmp_info_virtio_print_specific_features(Monitor *mon, >> +VirtIODevice *vdev) >> +{ >> +VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev); >> +int idx; >> + >> +
Re: [Qemu-block] [PULL 00/24] Block layer patches
On 26 September 2017 at 07:21, Kevin Wolf wrote: > The following changes since commit 1e3ee834083227f552179f6e43902cba5a866e6b: > > Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into > staging (2017-09-25 20:31:24 +0100) > > are available in the git repository at: > > git://repo.or.cz/qemu/kevin.git tags/for-upstream > > for you to fetch changes up to b156d51b62e6970753e1f9f36f7c4d5fdbf4c619: > > Merge remote-tracking branch 'mreitz/tags/pull-block-2017-09-26' into > queue-block (2017-09-26 15:03:02 +0200) > > > Block layer patches > > Applied, thanks. -- PMM
Re: [Qemu-block] [PATCH v2 2/2] qcow2: truncate the tail of the image file after shrinking the image
On 2017-09-27 18:27, Pavel Butsykin wrote: > On 27.09.2017 19:00, Max Reitz wrote: >> On 2017-09-22 11:39, Pavel Butsykin wrote: >>> Now after shrinking the image, at the end of the image file, there >>> might be a >>> tail that probably will never be used. So we can find the last used >>> cluster and >>> cut the tail. >>> >>> Signed-off-by: Pavel Butsykin >>> --- >>> block/qcow2-refcount.c | 22 ++ >>> block/qcow2.c | 23 +++ >>> block/qcow2.h | 1 + >>> 3 files changed, 46 insertions(+) >>> >>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c >>> index 88d5a3f1ad..aa3fd6cf17 100644 >>> --- a/block/qcow2-refcount.c >>> +++ b/block/qcow2-refcount.c >>> @@ -3181,3 +3181,25 @@ out: >>> g_free(reftable_tmp); >>> return ret; >>> } >>> + >>> +int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size) >>> +{ >>> + BDRVQcow2State *s = bs->opaque; >>> + int64_t i; >>> + >>> + for (i = size_to_clusters(s, size) - 1; i >= 0; i--) { >>> + uint64_t refcount; >>> + int ret = qcow2_get_refcount(bs, i, &refcount); >>> + if (ret < 0) { >>> + fprintf(stderr, "Can't get refcount for cluster %" >>> PRId64 ": %s\n", >>> + i, strerror(-ret)); >>> + return ret; >>> + } >>> + if (refcount > 0) { >>> + return i; >>> + } >>> + } >>> + qcow2_signal_corruption(bs, true, -1, -1, >>> + "There are no references in the refcount >>> table."); >>> + return -EIO; >>> +} >>> diff --git a/block/qcow2.c b/block/qcow2.c >>> index 8a4311d338..8dfb5393a7 100644 >>> --- a/block/qcow2.c >>> +++ b/block/qcow2.c >>> @@ -3106,6 +3106,7 @@ static int qcow2_truncate(BlockDriverState *bs, >>> int64_t offset, >>> new_l1_size = size_to_l1(s, offset); >>> if (offset < old_length) { >>> + int64_t last_cluster, old_file_size; >>> if (prealloc != PREALLOC_MODE_OFF) { >>> error_setg(errp, >>> "Preallocation can't be used for shrinking >>> an image"); >>> @@ -3134,6 +3135,28 @@ static int qcow2_truncate(BlockDriverState >>> *bs, int64_t offset, >>> "Failed to discard unused refblocks"); >>> return ret; >>> } >>> + >>> + old_file_size = bdrv_getlength(bs->file->bs); >>> + if (old_file_size < 0) { >>> + error_setg_errno(errp, -old_file_size, >>> + "Failed to inquire current file length"); >>> + return old_file_size; >>> + } >>> + last_cluster = qcow2_get_last_cluster(bs, old_file_size); >>> + if (last_cluster < 0) { >>> + error_setg_errno(errp, -last_cluster, >>> + "Failed to find the last cluster"); >>> + return last_cluster; >>> + } >> >> My idea was rather that you just wouldn't truncate the image file if >> something fails here. So in any of these new cases where you currently >> just report the whole truncate operation as having failed, you could >> just emit a warning and not do the truncation of bs->file. > > I'm not sure that's right. If the qcow2_get_last_cluster() returned an > error, probably with the image was a problem.. can we continue to work > with the image without risking to damage it even more? if something bad > happened with the reftable we usually mark the image as corrupted, it's > the same thing. Well, the only thing that's left to do is to write the new size into the image header, so I think that should work just fine... I won't disagree that bdrv_getlength() or qcow2_get_last_cluster() failing may be reasons to stop truncation (although I don't think they necessarily are at this point). But I could well imagine that the below bdrv_truncate() of bs->file fails for benign reasons, e.g. because the underlying protocol does not support shrinking of images or something. Then we probably should carry on. Max > Although if the shrink is almost complete, the truncation of bs->file > isn't so important thing and we could update qcow2 header. > >> I can live with the current version, though, so: >> >> Reviewed-by: Max Reitz >> >> But I'll wait for a response from you before merging this series. >> >> Max >> >>> + if ((last_cluster + 1) * s->cluster_size < old_file_size) { >>> + ret = bdrv_truncate(bs->file, (last_cluster + 1) * >>> s->cluster_size, >>> + PREALLOC_MODE_OFF, NULL); >>> + if (ret < 0) { >>> + error_setg_errno(errp, -ret, >>> + "Failed to truncate the tail of the >>> image"); >>> + return ret; >>> + } >>> + } >>> } else { >>> ret = qcow2_grow_l1_table(bs, new_l1_size, true); >>> if (ret < 0) { >>> diff --git a/block/qcow2.h b/block/qcow2.h >>> index 5a289a81e2..782a2
Re: [Qemu-block] blockdev-commit design
On 2017-09-26 19:59, Kevin Wolf wrote: [...] > * The old block-commit command decides between an "actual" commit job > and the mirror-based active commit based on whether top is the > active layer. > > We don't get an option for the active layer any more now, so this > isn't how things can work for blockdev-commit. We could probably > check whether top has a BlockBackend parent, but that's not really > what we're interested in anyway. Maybe the best we could do to > decide this automatically is to check whether there is any parent of > top that requires write permissions. If there is, we need active > commit, otherwise the "normal" one is good enough. > > However, who says that the intentions of the user stay as we deduce > them at the start of the block job? Who says that the user doesn't > want to add a writable disk as a user of the node while the block > job is running? > > The optimal solution to this would be that the commit filter node > responds to permission requests and switches between active and > "normal" commit mode. I'm not sure how hard this would be to > implement. > > As long as we don't have the automatic switch, do we have to allow > the user to specify explicitly which mode they want instead of > automatically choosing one? Probably a stupid question: What advantage does the commit job have over the mirror job? I.e. would it be possible to just always do a mirror? Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v2 2/2] qcow2: truncate the tail of the image file after shrinking the image
On 27.09.2017 19:00, Max Reitz wrote: On 2017-09-22 11:39, Pavel Butsykin wrote: Now after shrinking the image, at the end of the image file, there might be a tail that probably will never be used. So we can find the last used cluster and cut the tail. Signed-off-by: Pavel Butsykin --- block/qcow2-refcount.c | 22 ++ block/qcow2.c | 23 +++ block/qcow2.h | 1 + 3 files changed, 46 insertions(+) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index 88d5a3f1ad..aa3fd6cf17 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -3181,3 +3181,25 @@ out: g_free(reftable_tmp); return ret; } + +int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size) +{ +BDRVQcow2State *s = bs->opaque; +int64_t i; + +for (i = size_to_clusters(s, size) - 1; i >= 0; i--) { +uint64_t refcount; +int ret = qcow2_get_refcount(bs, i, &refcount); +if (ret < 0) { +fprintf(stderr, "Can't get refcount for cluster %" PRId64 ": %s\n", +i, strerror(-ret)); +return ret; +} +if (refcount > 0) { +return i; +} +} +qcow2_signal_corruption(bs, true, -1, -1, +"There are no references in the refcount table."); +return -EIO; +} diff --git a/block/qcow2.c b/block/qcow2.c index 8a4311d338..8dfb5393a7 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -3106,6 +3106,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset, new_l1_size = size_to_l1(s, offset); if (offset < old_length) { +int64_t last_cluster, old_file_size; if (prealloc != PREALLOC_MODE_OFF) { error_setg(errp, "Preallocation can't be used for shrinking an image"); @@ -3134,6 +3135,28 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset, "Failed to discard unused refblocks"); return ret; } + +old_file_size = bdrv_getlength(bs->file->bs); +if (old_file_size < 0) { +error_setg_errno(errp, -old_file_size, + "Failed to inquire current file length"); +return old_file_size; +} +last_cluster = qcow2_get_last_cluster(bs, old_file_size); +if (last_cluster < 0) { +error_setg_errno(errp, -last_cluster, + "Failed to find the last cluster"); +return last_cluster; +} My idea was rather that you just wouldn't truncate the image file if something fails here. So in any of these new cases where you currently just report the whole truncate operation as having failed, you could just emit a warning and not do the truncation of bs->file. I'm not sure that's right. If the qcow2_get_last_cluster() returned an error, probably with the image was a problem.. can we continue to work with the image without risking to damage it even more? if something bad happened with the reftable we usually mark the image as corrupted, it's the same thing. Although if the shrink is almost complete, the truncation of bs->file isn't so important thing and we could update qcow2 header. I can live with the current version, though, so: Reviewed-by: Max Reitz But I'll wait for a response from you before merging this series. Max +if ((last_cluster + 1) * s->cluster_size < old_file_size) { +ret = bdrv_truncate(bs->file, (last_cluster + 1) * s->cluster_size, +PREALLOC_MODE_OFF, NULL); +if (ret < 0) { +error_setg_errno(errp, -ret, + "Failed to truncate the tail of the image"); +return ret; +} +} } else { ret = qcow2_grow_l1_table(bs, new_l1_size, true); if (ret < 0) { diff --git a/block/qcow2.h b/block/qcow2.h index 5a289a81e2..782a206ecb 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -597,6 +597,7 @@ int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order, BlockDriverAmendStatusCB *status_cb, void *cb_opaque, Error **errp); int qcow2_shrink_reftable(BlockDriverState *bs); +int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size); /* qcow2-cluster.c functions */ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
Re: [Qemu-block] [PATCH v2 2/2] qcow2: truncate the tail of the image file after shrinking the image
On 2017-09-22 11:39, Pavel Butsykin wrote: > Now after shrinking the image, at the end of the image file, there might be a > tail that probably will never be used. So we can find the last used cluster > and > cut the tail. > > Signed-off-by: Pavel Butsykin > --- > block/qcow2-refcount.c | 22 ++ > block/qcow2.c | 23 +++ > block/qcow2.h | 1 + > 3 files changed, 46 insertions(+) > > diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c > index 88d5a3f1ad..aa3fd6cf17 100644 > --- a/block/qcow2-refcount.c > +++ b/block/qcow2-refcount.c > @@ -3181,3 +3181,25 @@ out: > g_free(reftable_tmp); > return ret; > } > + > +int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size) > +{ > +BDRVQcow2State *s = bs->opaque; > +int64_t i; > + > +for (i = size_to_clusters(s, size) - 1; i >= 0; i--) { > +uint64_t refcount; > +int ret = qcow2_get_refcount(bs, i, &refcount); > +if (ret < 0) { > +fprintf(stderr, "Can't get refcount for cluster %" PRId64 ": > %s\n", > +i, strerror(-ret)); > +return ret; > +} > +if (refcount > 0) { > +return i; > +} > +} > +qcow2_signal_corruption(bs, true, -1, -1, > +"There are no references in the refcount > table."); > +return -EIO; > +} > diff --git a/block/qcow2.c b/block/qcow2.c > index 8a4311d338..8dfb5393a7 100644 > --- a/block/qcow2.c > +++ b/block/qcow2.c > @@ -3106,6 +3106,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t > offset, > new_l1_size = size_to_l1(s, offset); > > if (offset < old_length) { > +int64_t last_cluster, old_file_size; > if (prealloc != PREALLOC_MODE_OFF) { > error_setg(errp, > "Preallocation can't be used for shrinking an image"); > @@ -3134,6 +3135,28 @@ static int qcow2_truncate(BlockDriverState *bs, > int64_t offset, > "Failed to discard unused refblocks"); > return ret; > } > + > +old_file_size = bdrv_getlength(bs->file->bs); > +if (old_file_size < 0) { > +error_setg_errno(errp, -old_file_size, > + "Failed to inquire current file length"); > +return old_file_size; > +} > +last_cluster = qcow2_get_last_cluster(bs, old_file_size); > +if (last_cluster < 0) { > +error_setg_errno(errp, -last_cluster, > + "Failed to find the last cluster"); > +return last_cluster; > +} My idea was rather that you just wouldn't truncate the image file if something fails here. So in any of these new cases where you currently just report the whole truncate operation as having failed, you could just emit a warning and not do the truncation of bs->file. I can live with the current version, though, so: Reviewed-by: Max Reitz But I'll wait for a response from you before merging this series. Max > +if ((last_cluster + 1) * s->cluster_size < old_file_size) { > +ret = bdrv_truncate(bs->file, (last_cluster + 1) * > s->cluster_size, > +PREALLOC_MODE_OFF, NULL); > +if (ret < 0) { > +error_setg_errno(errp, -ret, > + "Failed to truncate the tail of the image"); > +return ret; > +} > +} > } else { > ret = qcow2_grow_l1_table(bs, new_l1_size, true); > if (ret < 0) { > diff --git a/block/qcow2.h b/block/qcow2.h > index 5a289a81e2..782a206ecb 100644 > --- a/block/qcow2.h > +++ b/block/qcow2.h > @@ -597,6 +597,7 @@ int qcow2_change_refcount_order(BlockDriverState *bs, int > refcount_order, > BlockDriverAmendStatusCB *status_cb, > void *cb_opaque, Error **errp); > int qcow2_shrink_reftable(BlockDriverState *bs); > +int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size); > > /* qcow2-cluster.c functions */ > int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size, > signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [Qemu-devel] [PATCH v2] virtio: introduce `info virtio' hmp command
* Jan Dakinevich (jan.dakinev...@virtuozzo.com) wrote: > The command is intended for exposing device specific virtio feature bits > and their negotiation status. It is convenient and useful for debug > purpose. > > Names of features are taken from a devices via get_feature_name() within > VirtioDeviceClass. If certain device doesn't implement it, the command > will print only hexadecimal value of feature mask. > > Cc: Denis V. Lunev > Signed-off-by: Jan Dakinevich > --- > v2: > * Moved hmp_info_virtio and stuff to hmp.c to avoid build error > * Added printing of device status > * Listed common virtio features > > v1: http://lists.nongnu.org/archive/html/qemu-devel/2017-09/msg07247.html > --- > hmp-commands-info.hx| 14 + > hmp.c | 148 > > hmp.h | 1 + > hw/block/virtio-blk.c | 20 ++ > hw/char/virtio-serial-bus.c | 15 + > hw/display/virtio-gpu.c | 12 > hw/net/virtio-net.c | 34 ++ > hw/scsi/virtio-scsi.c | 15 + > hw/virtio/virtio-balloon.c | 15 + > hw/virtio/virtio-bus.c | 1 + > include/hw/virtio/virtio.h | 2 + > monitor.c | 1 + > 12 files changed, 278 insertions(+) > > diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx > index 4f1ece9..2550027 100644 > --- a/hmp-commands-info.hx > +++ b/hmp-commands-info.hx > @@ -868,6 +868,20 @@ ETEXI > }, > > STEXI > +@item info virtio > +@findex virtio > +Display guest and host fetures for all virtio devices. > +ETEXI > + > +{ > +.name = "virtio", > +.args_type = "", > +.params = "", > +.help = "show virtio info", > +.cmd= hmp_info_virtio, > +}, > + > +STEXI > @end table > ETEXI > > diff --git a/hmp.c b/hmp.c > index ace729d..009d808 100644 > --- a/hmp.c > +++ b/hmp.c > @@ -43,6 +43,7 @@ > #include "hw/intc/intc.h" > #include "migration/snapshot.h" > #include "migration/misc.h" > +#include "hw/virtio/virtio.h" > > #ifdef CONFIG_SPICE > #include > @@ -2894,3 +2895,150 @@ void hmp_info_memory_size_summary(Monitor *mon, const > QDict *qdict) > } > hmp_handle_error(mon, &err); > } > + > +#define HMP_INFO_VIRTIO_INDENT 2 > +#define HMP_INFO_VIRTIO_FIELD 32 > + > +static void hmp_info_virtio_print_status(Monitor *mon, VirtIODevice *vdev) > +{ > +uint8_t status[] = { > +VIRTIO_CONFIG_S_ACKNOWLEDGE, > +VIRTIO_CONFIG_S_DRIVER, > +VIRTIO_CONFIG_S_DRIVER_OK, > +VIRTIO_CONFIG_S_FEATURES_OK, > +VIRTIO_CONFIG_S_NEEDS_RESET, > +VIRTIO_CONFIG_S_FAILED, > +}; > +const char *names[] = { > +"acknowledge", > +"driver", > +"driver_ok", > +"features_ok", > +"needs_reset" > +"failed", > +}; > +const char *comma = ""; > +int idx; > + > +monitor_printf(mon, "%*sstatus: 0x%02"PRIx8" ", > + HMP_INFO_VIRTIO_INDENT, "", vdev->status); > + > +for (idx = 0; idx < ARRAY_SIZE(status); idx++) { > +if (!(vdev->status & status[idx])) { > +continue; > +} > +monitor_printf(mon, "%s%s", comma, names[idx]); > +if (names[idx]) { > +comma = ","; > +} > +} > +monitor_printf(mon, "\n"); > +} > + > +static void hmp_info_virtio_print_common_features(Monitor *mon, > + VirtIODevice *vdev) > +{ > +uint64_t features[] = { > +VIRTIO_RING_F_INDIRECT_DESC, > +VIRTIO_RING_F_EVENT_IDX, > +VIRTIO_F_NOTIFY_ON_EMPTY, > +VIRTIO_F_ANY_LAYOUT, > +VIRTIO_F_IOMMU_PLATFORM, > +VIRTIO_F_VERSION_1, > +}; > +const char *names[] = { > +"indirect_desc", > +"event_idx", > +"notify_on_empty", > +"any_layout", > +"iommu_platform", > +"version_1", > +}; > +int idx; > + > +monitor_printf(mon, "%*scommon features:\n", HMP_INFO_VIRTIO_INDENT, ""); > + > +for (idx = 0; idx < ARRAY_SIZE(features); idx++) { > +const char *ack = vdev->guest_features & features[idx] ? "acked" : > ""; > + > +if (!(vdev->host_features & features[idx])) { > +continue; > +} > +monitor_printf(mon, "%*s%*s%*s\n", HMP_INFO_VIRTIO_INDENT, "", > + HMP_INFO_VIRTIO_FIELD, names[idx], > + HMP_INFO_VIRTIO_FIELD, ack); > +} > +} > + > +static void hmp_info_virtio_print_specific_features(Monitor *mon, > +VirtIODevice *vdev) > +{ > +VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev); > +int idx; > + > +if (!vdc->get_feature_name) { > +return; > +} > + > +monitor_printf(mon, "%*sspecific features:\n", HMP_INFO_VIRTIO_INDENT, > ""); > + > +for (idx = 0; idx < 64; idx++) { > +const char *n
[Qemu-block] [PATCH v1.1 DRAFT] nbd: Minimal structured read for client
Minimal implementation: drop most of additional error information. Signed-off-by: Vladimir Sementsov-Ogievskiy --- Hi! Here is a draft of how to refactor reply-payload receiving if you don't like my previous simple (but not flexible) try. Ofcourse, if we agree on this approach this patch should be splitted into several ones and many things (error handling) should be improved. The idea is: nbd_read_reply_entry reads only reply header through nbd/client.c code. Then, the payload is read through block/nbd-client-cmds.c: simple payload: generic per-command handler, however it should only exist for CMD_READ structured NONE: no payload, handle in nbd_co_receive_one_chunk structured error: read by nbd_handle_structured_error_payload defined in block/nbd-client-cmds.c structured success: read by per-[command X reply-type] handler defined in block/nbd-client-cmds.c For now nbd-client-cmds.c looks more like nbd-payload.c, but, may be, we should move command sending special-casing (CMD_WRITE) to it too.. Don't waste time on careful reviewing this patch, let's consider first the concept of nbd-client-cmds.c, block/nbd-client.h | 10 +++ include/block/nbd.h | 82 -- nbd/nbd-internal.h | 25 -- block/nbd-client-cmds.c | 220 block/nbd-client.c | 118 -- nbd/client.c| 128 block/Makefile.objs | 2 +- 7 files changed, 475 insertions(+), 110 deletions(-) create mode 100644 block/nbd-client-cmds.c diff --git a/block/nbd-client.h b/block/nbd-client.h index b435754b82..abb88e4ea5 100644 --- a/block/nbd-client.h +++ b/block/nbd-client.h @@ -35,6 +35,8 @@ typedef struct NBDClientSession { NBDClientRequest requests[MAX_NBD_REQUESTS]; NBDReply reply; bool quit; + +bool structured_reply; } NBDClientSession; NBDClientSession *nbd_get_client_session(BlockDriverState *bs); @@ -60,4 +62,12 @@ void nbd_client_detach_aio_context(BlockDriverState *bs); void nbd_client_attach_aio_context(BlockDriverState *bs, AioContext *new_context); +int nbd_handle_structured_payload(QIOChannel *ioc, int cmd, + NBDStructuredReplyChunk *reply, void *opaque); +int nbd_handle_simple_payload(QIOChannel *ioc, int cmd, void *opaque); + +int nbd_handle_structured_error_payload(QIOChannel *ioc, +NBDStructuredReplyChunk *reply, +int *request_ret); + #endif /* NBD_CLIENT_H */ diff --git a/include/block/nbd.h b/include/block/nbd.h index 314f2f9bbc..b9a4e1dfa9 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -57,12 +57,6 @@ struct NBDRequest { }; typedef struct NBDRequest NBDRequest; -struct NBDReply { -uint64_t handle; -uint32_t error; -}; -typedef struct NBDReply NBDReply; - typedef struct NBDSimpleReply { uint32_t magic; /* NBD_SIMPLE_REPLY_MAGIC */ uint32_t error; @@ -77,6 +71,24 @@ typedef struct NBDStructuredReplyChunk { uint32_t length; /* length of payload */ } QEMU_PACKED NBDStructuredReplyChunk; +typedef union NBDReply { +NBDSimpleReply simple; +NBDStructuredReplyChunk structured; +struct { +uint32_t magic; +uint32_t _skip; +uint64_t handle; +} QEMU_PACKED; +} NBDReply; + +#define NBD_SIMPLE_REPLY_MAGIC 0x67446698 +#define NBD_STRUCTURED_REPLY_MAGIC 0x668e33ef + +static inline bool nbd_reply_is_simple(NBDReply *reply) +{ +return reply->magic == NBD_SIMPLE_REPLY_MAGIC; +} + typedef struct NBDStructuredRead { NBDStructuredReplyChunk h; uint64_t offset; @@ -88,6 +100,11 @@ typedef struct NBDStructuredError { uint16_t message_length; } QEMU_PACKED NBDStructuredError; +typedef struct NBDPayloadOffsetHole { +uint64_t offset; +uint32_t hole_size; +} QEMU_PACKED NBDPayloadOffsetHole; + /* Transmission (export) flags: sent from server to client during handshake, but describe what will happen during transmission */ #define NBD_FLAG_HAS_FLAGS (1 << 0) /* Flags are there */ @@ -178,12 +195,54 @@ enum { #define NBD_SREP_TYPE_NONE 0 #define NBD_SREP_TYPE_OFFSET_DATA 1 +#define NBD_SREP_TYPE_OFFSET_HOLE 2 #define NBD_SREP_TYPE_ERROR NBD_SREP_ERR(1) +#define NBD_SREP_TYPE_ERROR_OFFSET NBD_SREP_ERR(2) + +static inline bool nbd_srep_type_is_error(int type) +{ +return type & (1 << 15); +} + +/* NBD errors are based on errno numbers, so there is a 1:1 mapping, + * but only a limited set of errno values is specified in the protocol. + * Everything else is squashed to EINVAL. + */ +#define NBD_SUCCESS0 +#define NBD_EPERM 1 +#define NBD_EIO5 +#define NBD_ENOMEM 12 +#define NBD_EINVAL 22 +#define NBD_ENOSPC 28 +#define NBD_ESHUTDOWN 108 + +static inline int nbd_errno_to_system_errno(int err) +{ +switch (err) { +case NBD_SUCCES
[Qemu-block] [Qemu-devel] [PATCH v2] virtio: introduce `info virtio' hmp command
The command is intended for exposing device specific virtio feature bits and their negotiation status. It is convenient and useful for debug purpose. Names of features are taken from a devices via get_feature_name() within VirtioDeviceClass. If certain device doesn't implement it, the command will print only hexadecimal value of feature mask. Cc: Denis V. Lunev Signed-off-by: Jan Dakinevich --- v2: * Moved hmp_info_virtio and stuff to hmp.c to avoid build error * Added printing of device status * Listed common virtio features v1: http://lists.nongnu.org/archive/html/qemu-devel/2017-09/msg07247.html --- hmp-commands-info.hx| 14 + hmp.c | 148 hmp.h | 1 + hw/block/virtio-blk.c | 20 ++ hw/char/virtio-serial-bus.c | 15 + hw/display/virtio-gpu.c | 12 hw/net/virtio-net.c | 34 ++ hw/scsi/virtio-scsi.c | 15 + hw/virtio/virtio-balloon.c | 15 + hw/virtio/virtio-bus.c | 1 + include/hw/virtio/virtio.h | 2 + monitor.c | 1 + 12 files changed, 278 insertions(+) diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx index 4f1ece9..2550027 100644 --- a/hmp-commands-info.hx +++ b/hmp-commands-info.hx @@ -868,6 +868,20 @@ ETEXI }, STEXI +@item info virtio +@findex virtio +Display guest and host fetures for all virtio devices. +ETEXI + +{ +.name = "virtio", +.args_type = "", +.params = "", +.help = "show virtio info", +.cmd= hmp_info_virtio, +}, + +STEXI @end table ETEXI diff --git a/hmp.c b/hmp.c index ace729d..009d808 100644 --- a/hmp.c +++ b/hmp.c @@ -43,6 +43,7 @@ #include "hw/intc/intc.h" #include "migration/snapshot.h" #include "migration/misc.h" +#include "hw/virtio/virtio.h" #ifdef CONFIG_SPICE #include @@ -2894,3 +2895,150 @@ void hmp_info_memory_size_summary(Monitor *mon, const QDict *qdict) } hmp_handle_error(mon, &err); } + +#define HMP_INFO_VIRTIO_INDENT 2 +#define HMP_INFO_VIRTIO_FIELD 32 + +static void hmp_info_virtio_print_status(Monitor *mon, VirtIODevice *vdev) +{ +uint8_t status[] = { +VIRTIO_CONFIG_S_ACKNOWLEDGE, +VIRTIO_CONFIG_S_DRIVER, +VIRTIO_CONFIG_S_DRIVER_OK, +VIRTIO_CONFIG_S_FEATURES_OK, +VIRTIO_CONFIG_S_NEEDS_RESET, +VIRTIO_CONFIG_S_FAILED, +}; +const char *names[] = { +"acknowledge", +"driver", +"driver_ok", +"features_ok", +"needs_reset" +"failed", +}; +const char *comma = ""; +int idx; + +monitor_printf(mon, "%*sstatus: 0x%02"PRIx8" ", + HMP_INFO_VIRTIO_INDENT, "", vdev->status); + +for (idx = 0; idx < ARRAY_SIZE(status); idx++) { +if (!(vdev->status & status[idx])) { +continue; +} +monitor_printf(mon, "%s%s", comma, names[idx]); +if (names[idx]) { +comma = ","; +} +} +monitor_printf(mon, "\n"); +} + +static void hmp_info_virtio_print_common_features(Monitor *mon, + VirtIODevice *vdev) +{ +uint64_t features[] = { +VIRTIO_RING_F_INDIRECT_DESC, +VIRTIO_RING_F_EVENT_IDX, +VIRTIO_F_NOTIFY_ON_EMPTY, +VIRTIO_F_ANY_LAYOUT, +VIRTIO_F_IOMMU_PLATFORM, +VIRTIO_F_VERSION_1, +}; +const char *names[] = { +"indirect_desc", +"event_idx", +"notify_on_empty", +"any_layout", +"iommu_platform", +"version_1", +}; +int idx; + +monitor_printf(mon, "%*scommon features:\n", HMP_INFO_VIRTIO_INDENT, ""); + +for (idx = 0; idx < ARRAY_SIZE(features); idx++) { +const char *ack = vdev->guest_features & features[idx] ? "acked" : ""; + +if (!(vdev->host_features & features[idx])) { +continue; +} +monitor_printf(mon, "%*s%*s%*s\n", HMP_INFO_VIRTIO_INDENT, "", + HMP_INFO_VIRTIO_FIELD, names[idx], + HMP_INFO_VIRTIO_FIELD, ack); +} +} + +static void hmp_info_virtio_print_specific_features(Monitor *mon, +VirtIODevice *vdev) +{ +VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev); +int idx; + +if (!vdc->get_feature_name) { +return; +} + +monitor_printf(mon, "%*sspecific features:\n", HMP_INFO_VIRTIO_INDENT, ""); + +for (idx = 0; idx < 64; idx++) { +const char *name = vdc->get_feature_name(vdev, idx); +const char *ack = vdev->guest_features & (1 << idx) ? "acked" : ""; + +if (!name) { +continue; +} +if (!(vdev->host_features & (1 << idx))) { +continue; +} +monitor_printf(mon, "%*s%*s%*s\n", HMP_INFO_VIRTIO_INDENT, "", + HMP_INFO_VIRTIO_FIELD, name, HMP_
Re: [Qemu-block] [PATCH v4 5/6] block: convert qcrypto_block_encrypt|decrypt to take bytes offset
On 09/27/2017 07:53 AM, Daniel P. Berrange wrote: > Instead of sector offset, take the bytes offset when encrypting > or decrypting data. > > Signed-off-by: Daniel P. Berrange > --- Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v4 4/6] block: convert crypto driver to bdrv_co_preadv|pwritev
On 09/27/2017 07:53 AM, Daniel P. Berrange wrote: > Make the crypto driver implement the bdrv_co_preadv|pwritev > callbacks, and also use bdrv_co_preadv|pwritev for I/O > with the protocol driver beneath. This replaces sector based > I/O with byte based I/O, and allows us to stop assuming the > physical sector size matches the encryption sector size. > > Signed-off-by: Daniel P. Berrange > --- > block/crypto.c | 106 > + > 1 file changed, 54 insertions(+), 52 deletions(-) > Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v4 1/6] block: use 1 MB bounce buffers for crypto instead of 16KB
On 09/27/2017 07:53 AM, Daniel P. Berrange wrote: > Using 16KB bounce buffers creates a significant performance > penalty for I/O to encrypted volumes on storage which high > I/O latency (rotating rust & network drives), because it > triggers lots of fairly small I/O operations. > > On tests with rotating rust, and cache=none|directsync, > write speed increased from 2MiB/s to 32MiB/s, on a par > with that achieved by the in-kernel luks driver. With > other cache modes the in-kernel driver is still notably > faster because it is able to report completion of the > I/O request before any encryption is done, while the > in-QEMU driver must encrypt the data before completion. > > Signed-off-by: Daniel P. Berrange > --- > block/crypto.c | 28 +++- > 1 file changed, 15 insertions(+), 13 deletions(-) Reviewed-by: Eric Blake > @@ -464,12 +467,11 @@ block_crypto_co_writev(BlockDriverState *bs, int64_t > sector_num, > > qemu_iovec_init(&hd_qiov, qiov->niov); > > -/* Bounce buffer so we have a linear mem region for > - * entire sector. XXX optimize so we avoid bounce > - * buffer in case that qiov->niov == 1 > +/* Bounce buffer because we're not permitted to touch > + * contents of qiov - it points to guest memory. > */ > cipher_data = > -qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512, > +qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_IO_SIZE, >qiov->size)); We allocate and free a bounce buffer for every write - is there anything we can do to reduce malloc() calls by reusing a bounce buffer associated with a coroutine (we have to be careful that parallel coroutines don't share bounce buffers, of course). But that's independent of this patch. Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [Qemu-devel] [PATCH 2/5] iotests: Set up Python logging
On Wed, Sep 27, 2017 at 10:03:36AM -0300, Eduardo Habkost wrote: > Set up Python logging module instead of relying on > QEMUMachine._debug to enable debugging messages. > > Cc: Kevin Wolf > Cc: Max Reitz > Cc: qemu-block@nongnu.org > Signed-off-by: Eduardo Habkost > --- > tests/qemu-iotests/iotests.py | 3 +++ > 1 file changed, 3 insertions(+) Reviewed-by: Daniel P. Berrange Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
[Qemu-block] [PATCH 2/5] iotests: Set up Python logging
Set up Python logging module instead of relying on QEMUMachine._debug to enable debugging messages. Cc: Kevin Wolf Cc: Max Reitz Cc: qemu-block@nongnu.org Signed-off-by: Eduardo Habkost --- tests/qemu-iotests/iotests.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index 1af117e37d..36a7757aaf 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -28,6 +28,7 @@ import qtest import struct import json import signal +import logging # This will not work if arguments contain spaces but is necessary if we @@ -467,6 +468,8 @@ def main(supported_fmts=[], supported_oses=['linux']): else: output = StringIO.StringIO() +logging.basicConfig(level=(logging.DEBUG if debug else logging.WARN)) + class MyTestRunner(unittest.TextTestRunner): def __init__(self, stream=output, descriptions=True, verbosity=verbosity): unittest.TextTestRunner.__init__(self, stream, descriptions, verbosity) -- 2.13.5
[Qemu-block] [PATCH v4 5/6] block: convert qcrypto_block_encrypt|decrypt to take bytes offset
Instead of sector offset, take the bytes offset when encrypting or decrypting data. Signed-off-by: Daniel P. Berrange --- block/crypto.c | 12 block/qcow.c | 11 +++ block/qcow2-cluster.c | 8 +++- block/qcow2.c | 4 ++-- crypto/block-luks.c| 12 crypto/block-qcow.c| 12 crypto/block.c | 20 ++-- crypto/blockpriv.h | 4 ++-- include/crypto/block.h | 14 -- 9 files changed, 56 insertions(+), 41 deletions(-) diff --git a/block/crypto.c b/block/crypto.c index 965c173b01..edf53d49d1 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -398,7 +398,6 @@ block_crypto_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes, int ret = 0; uint64_t sector_size = qcrypto_block_get_sector_size(crypto->block); uint64_t payload_offset = qcrypto_block_get_payload_offset(crypto->block); -uint64_t sector_num = offset / sector_size; assert(!flags); assert(payload_offset < INT64_MAX); @@ -430,15 +429,14 @@ block_crypto_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes, goto cleanup; } -if (qcrypto_block_decrypt(crypto->block, sector_num, cipher_data, - cur_bytes, NULL) < 0) { +if (qcrypto_block_decrypt(crypto->block, offset + bytes_done, + cipher_data, cur_bytes, NULL) < 0) { ret = -EIO; goto cleanup; } qemu_iovec_from_buf(qiov, bytes_done, cipher_data, cur_bytes); -sector_num += cur_bytes / sector_size; bytes -= cur_bytes; bytes_done += cur_bytes; } @@ -463,7 +461,6 @@ block_crypto_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes, int ret = 0; uint64_t sector_size = qcrypto_block_get_sector_size(crypto->block); uint64_t payload_offset = qcrypto_block_get_payload_offset(crypto->block); -uint64_t sector_num = offset / sector_size; assert(!flags); assert(payload_offset < INT64_MAX); @@ -488,8 +485,8 @@ block_crypto_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes, qemu_iovec_to_buf(qiov, bytes_done, cipher_data, cur_bytes); -if (qcrypto_block_encrypt(crypto->block, sector_num, cipher_data, - cur_bytes, NULL) < 0) { +if (qcrypto_block_encrypt(crypto->block, offset + bytes_done, + cipher_data, cur_bytes, NULL) < 0) { ret = -EIO; goto cleanup; } @@ -503,7 +500,6 @@ block_crypto_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes, goto cleanup; } -sector_num += cur_bytes / sector_size; bytes -= cur_bytes; bytes_done += cur_bytes; } diff --git a/block/qcow.c b/block/qcow.c index f450b00cfc..9569deeaf0 100644 --- a/block/qcow.c +++ b/block/qcow.c @@ -478,7 +478,9 @@ static int get_cluster_offset(BlockDriverState *bs, for(i = 0; i < s->cluster_sectors; i++) { if (i < n_start || i >= n_end) { memset(s->cluster_data, 0x00, 512); -if (qcrypto_block_encrypt(s->crypto, start_sect + i, +if (qcrypto_block_encrypt(s->crypto, + (start_sect + i) * + BDRV_SECTOR_SIZE, s->cluster_data, BDRV_SECTOR_SIZE, NULL) < 0) { @@ -668,7 +670,8 @@ static coroutine_fn int qcow_co_readv(BlockDriverState *bs, int64_t sector_num, } if (bs->encrypted) { assert(s->crypto); -if (qcrypto_block_decrypt(s->crypto, sector_num, buf, +if (qcrypto_block_decrypt(s->crypto, + sector_num * BDRV_SECTOR_SIZE, buf, n * BDRV_SECTOR_SIZE, NULL) < 0) { ret = -EIO; break; @@ -740,8 +743,8 @@ static coroutine_fn int qcow_co_writev(BlockDriverState *bs, int64_t sector_num, } if (bs->encrypted) { assert(s->crypto); -if (qcrypto_block_encrypt(s->crypto, sector_num, buf, - n * BDRV_SECTOR_SIZE, NULL) < 0) { +if (qcrypto_block_encrypt(s->crypto, sector_num * BDRV_SECTOR_SIZE, + buf, n * BDRV_SECTOR_SIZE, NULL) < 0) { ret = -EIO; break; } diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 0d4824993c..09ae0b6ecc 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -396,15
[Qemu-block] [PATCH v4 4/6] block: convert crypto driver to bdrv_co_preadv|pwritev
Make the crypto driver implement the bdrv_co_preadv|pwritev callbacks, and also use bdrv_co_preadv|pwritev for I/O with the protocol driver beneath. This replaces sector based I/O with byte based I/O, and allows us to stop assuming the physical sector size matches the encryption sector size. Signed-off-by: Daniel P. Berrange --- block/crypto.c | 106 + 1 file changed, 54 insertions(+), 52 deletions(-) diff --git a/block/crypto.c b/block/crypto.c index 61f5d77bc0..965c173b01 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -387,18 +387,23 @@ static void block_crypto_close(BlockDriverState *bs) #define BLOCK_CRYPTO_MAX_IO_SIZE (1024 * 1024) static coroutine_fn int -block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num, - int remaining_sectors, QEMUIOVector *qiov) +block_crypto_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes, + QEMUIOVector *qiov, int flags) { BlockCrypto *crypto = bs->opaque; -int cur_nr_sectors; /* number of sectors in current iteration */ +uint64_t cur_bytes; /* number of bytes in current iteration */ uint64_t bytes_done = 0; uint8_t *cipher_data = NULL; QEMUIOVector hd_qiov; int ret = 0; -uint64_t payload_offset = -qcrypto_block_get_payload_offset(crypto->block) / 512; -assert(payload_offset < (INT64_MAX / 512)); +uint64_t sector_size = qcrypto_block_get_sector_size(crypto->block); +uint64_t payload_offset = qcrypto_block_get_payload_offset(crypto->block); +uint64_t sector_num = offset / sector_size; + +assert(!flags); +assert(payload_offset < INT64_MAX); +assert(QEMU_IS_ALIGNED(offset, sector_size)); +assert(QEMU_IS_ALIGNED(bytes, sector_size)); qemu_iovec_init(&hd_qiov, qiov->niov); @@ -413,37 +418,29 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num, goto cleanup; } -while (remaining_sectors) { -cur_nr_sectors = remaining_sectors; - -if (cur_nr_sectors > (BLOCK_CRYPTO_MAX_IO_SIZE / 512)) { -cur_nr_sectors = (BLOCK_CRYPTO_MAX_IO_SIZE / 512); -} +while (bytes) { +cur_bytes = MIN(bytes, BLOCK_CRYPTO_MAX_IO_SIZE); qemu_iovec_reset(&hd_qiov); -qemu_iovec_add(&hd_qiov, cipher_data, cur_nr_sectors * 512); +qemu_iovec_add(&hd_qiov, cipher_data, cur_bytes); -ret = bdrv_co_readv(bs->file, -payload_offset + sector_num, -cur_nr_sectors, &hd_qiov); +ret = bdrv_co_preadv(bs->file, payload_offset + offset + bytes_done, + cur_bytes, &hd_qiov, 0); if (ret < 0) { goto cleanup; } -if (qcrypto_block_decrypt(crypto->block, - sector_num, - cipher_data, cur_nr_sectors * 512, - NULL) < 0) { +if (qcrypto_block_decrypt(crypto->block, sector_num, cipher_data, + cur_bytes, NULL) < 0) { ret = -EIO; goto cleanup; } -qemu_iovec_from_buf(qiov, bytes_done, -cipher_data, cur_nr_sectors * 512); +qemu_iovec_from_buf(qiov, bytes_done, cipher_data, cur_bytes); -remaining_sectors -= cur_nr_sectors; -sector_num += cur_nr_sectors; -bytes_done += cur_nr_sectors * 512; +sector_num += cur_bytes / sector_size; +bytes -= cur_bytes; +bytes_done += cur_bytes; } cleanup: @@ -455,18 +452,23 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num, static coroutine_fn int -block_crypto_co_writev(BlockDriverState *bs, int64_t sector_num, - int remaining_sectors, QEMUIOVector *qiov) +block_crypto_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes, +QEMUIOVector *qiov, int flags) { BlockCrypto *crypto = bs->opaque; -int cur_nr_sectors; /* number of sectors in current iteration */ +uint64_t cur_bytes; /* number of bytes in current iteration */ uint64_t bytes_done = 0; uint8_t *cipher_data = NULL; QEMUIOVector hd_qiov; int ret = 0; -uint64_t payload_offset = -qcrypto_block_get_payload_offset(crypto->block) / 512; -assert(payload_offset < (INT64_MAX / 512)); +uint64_t sector_size = qcrypto_block_get_sector_size(crypto->block); +uint64_t payload_offset = qcrypto_block_get_payload_offset(crypto->block); +uint64_t sector_num = offset / sector_size; + +assert(!flags); +assert(payload_offset < INT64_MAX); +assert(QEMU_IS_ALIGNED(offset, sector_size)); +assert(QEMU_IS_ALIGNED(bytes, sector_size)); qemu_iovec_init(&hd_qiov, qiov->niov); @@ -481,37 +483,29 @@ block_crypto_co_writev(BlockDriverState *bs, int64_t sector_num, goto cleanup;
[Qemu-block] [PATCH v4 2/6] crypto: expose encryption sector size in APIs
While current encryption schemes all have a fixed sector size of 512 bytes, this is not guaranteed to be the case in future. Expose the sector size in the APIs so the block layer can remove assumptions about fixed 512 byte sectors. Reviewed-by: Max Reitz Reviewed-by: Eric Blake Signed-off-by: Daniel P. Berrange --- crypto/block-luks.c| 6 -- crypto/block-qcow.c| 1 + crypto/block.c | 6 ++ crypto/blockpriv.h | 1 + include/crypto/block.h | 15 +++ 5 files changed, 27 insertions(+), 2 deletions(-) diff --git a/crypto/block-luks.c b/crypto/block-luks.c index 36bc856084..a9062bb0f2 100644 --- a/crypto/block-luks.c +++ b/crypto/block-luks.c @@ -846,8 +846,9 @@ qcrypto_block_luks_open(QCryptoBlock *block, } } +block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE; block->payload_offset = luks->header.payload_offset * -QCRYPTO_BLOCK_LUKS_SECTOR_SIZE; +block->sector_size; luks->cipher_alg = cipheralg; luks->cipher_mode = ciphermode; @@ -1240,8 +1241,9 @@ qcrypto_block_luks_create(QCryptoBlock *block, QCRYPTO_BLOCK_LUKS_SECTOR_SIZE)) * QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS); +block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE; block->payload_offset = luks->header.payload_offset * -QCRYPTO_BLOCK_LUKS_SECTOR_SIZE; +block->sector_size; /* Reserve header space to match payload offset */ initfunc(block, block->payload_offset, opaque, &local_err); diff --git a/crypto/block-qcow.c b/crypto/block-qcow.c index a456fe338b..4dd594a9ba 100644 --- a/crypto/block-qcow.c +++ b/crypto/block-qcow.c @@ -80,6 +80,7 @@ qcrypto_block_qcow_init(QCryptoBlock *block, goto fail; } +block->sector_size = QCRYPTO_BLOCK_QCOW_SECTOR_SIZE; block->payload_offset = 0; return 0; diff --git a/crypto/block.c b/crypto/block.c index c382393d9a..a7a9ad240e 100644 --- a/crypto/block.c +++ b/crypto/block.c @@ -170,6 +170,12 @@ uint64_t qcrypto_block_get_payload_offset(QCryptoBlock *block) } +uint64_t qcrypto_block_get_sector_size(QCryptoBlock *block) +{ +return block->sector_size; +} + + void qcrypto_block_free(QCryptoBlock *block) { if (!block) { diff --git a/crypto/blockpriv.h b/crypto/blockpriv.h index 0edb810e22..d227522d88 100644 --- a/crypto/blockpriv.h +++ b/crypto/blockpriv.h @@ -36,6 +36,7 @@ struct QCryptoBlock { QCryptoHashAlgorithm kdfhash; size_t niv; uint64_t payload_offset; /* In bytes */ +uint64_t sector_size; /* In bytes */ }; struct QCryptoBlockDriver { diff --git a/include/crypto/block.h b/include/crypto/block.h index f0e543bee1..13232b2472 100644 --- a/include/crypto/block.h +++ b/include/crypto/block.h @@ -241,6 +241,21 @@ QCryptoHashAlgorithm qcrypto_block_get_kdf_hash(QCryptoBlock *block); uint64_t qcrypto_block_get_payload_offset(QCryptoBlock *block); /** + * qcrypto_block_get_sector_size: + * @block: the block encryption object + * + * Get the size of sectors used for payload encryption. A new + * IV is used at the start of each sector. The encryption + * sector size is not required to match the sector size of the + * underlying storage. For example LUKS will always use a 512 + * byte sector size, even if the volume is on a disk with 4k + * sectors. + * + * Returns: the sector in bytes + */ +uint64_t qcrypto_block_get_sector_size(QCryptoBlock *block); + +/** * qcrypto_block_free: * @block: the block encryption object * -- 2.13.5
[Qemu-block] [PATCH v4 6/6] block: support passthrough of BDRV_REQ_FUA in crypto driver
The BDRV_REQ_FUA flag can trivially be allowed in the crypt driver as a passthrough to the underlying block driver. Reviewed-by: Max Reitz Reviewed-by: Eric Blake Signed-off-by: Daniel P. Berrange --- block/crypto.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/block/crypto.c b/block/crypto.c index edf53d49d1..60ddf8623e 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -279,6 +279,9 @@ static int block_crypto_open_generic(QCryptoBlockFormat format, return -EINVAL; } +bs->supported_write_flags = BDRV_REQ_FUA & +bs->file->bs->supported_write_flags; + opts = qemu_opts_create(opts_spec, NULL, 0, &error_abort); qemu_opts_absorb_qdict(opts, options, &local_err); if (local_err) { @@ -462,7 +465,7 @@ block_crypto_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes, uint64_t sector_size = qcrypto_block_get_sector_size(crypto->block); uint64_t payload_offset = qcrypto_block_get_payload_offset(crypto->block); -assert(!flags); +assert(!(flags & ~BDRV_REQ_FUA)); assert(payload_offset < INT64_MAX); assert(QEMU_IS_ALIGNED(offset, sector_size)); assert(QEMU_IS_ALIGNED(bytes, sector_size)); @@ -495,7 +498,7 @@ block_crypto_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes, qemu_iovec_add(&hd_qiov, cipher_data, cur_bytes); ret = bdrv_co_pwritev(bs->file, payload_offset + offset + bytes_done, - cur_bytes, &hd_qiov, 0); + cur_bytes, &hd_qiov, flags); if (ret < 0) { goto cleanup; } -- 2.13.5
[Qemu-block] [PATCH v4 3/6] block: fix data type casting for crypto payload offset
The crypto APIs report the offset of the data payload as an uint64_t type, but the block driver is casting to size_t or ssize_t which will potentially truncate. Most of the block APIs use int64_t for offsets meanwhile, so even if using uint64_t in the crypto block driver we are still at risk of truncation. Change the block crypto driver to use uint64_t, but add asserts that the value is less than INT64_MAX. Reviewed-by: Max Reitz Reviewed-by: Eric Blake Signed-off-by: Daniel P. Berrange --- block/crypto.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/block/crypto.c b/block/crypto.c index 684cabeaf8..61f5d77bc0 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -364,8 +364,9 @@ static int block_crypto_truncate(BlockDriverState *bs, int64_t offset, PreallocMode prealloc, Error **errp) { BlockCrypto *crypto = bs->opaque; -size_t payload_offset = +uint64_t payload_offset = qcrypto_block_get_payload_offset(crypto->block); +assert(payload_offset < (INT64_MAX - offset)); offset += payload_offset; @@ -395,8 +396,9 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num, uint8_t *cipher_data = NULL; QEMUIOVector hd_qiov; int ret = 0; -size_t payload_offset = +uint64_t payload_offset = qcrypto_block_get_payload_offset(crypto->block) / 512; +assert(payload_offset < (INT64_MAX / 512)); qemu_iovec_init(&hd_qiov, qiov->niov); @@ -462,8 +464,9 @@ block_crypto_co_writev(BlockDriverState *bs, int64_t sector_num, uint8_t *cipher_data = NULL; QEMUIOVector hd_qiov; int ret = 0; -size_t payload_offset = +uint64_t payload_offset = qcrypto_block_get_payload_offset(crypto->block) / 512; +assert(payload_offset < (INT64_MAX / 512)); qemu_iovec_init(&hd_qiov, qiov->niov); @@ -524,7 +527,9 @@ static int64_t block_crypto_getlength(BlockDriverState *bs) BlockCrypto *crypto = bs->opaque; int64_t len = bdrv_getlength(bs->file->bs); -ssize_t offset = qcrypto_block_get_payload_offset(crypto->block); +uint64_t offset = qcrypto_block_get_payload_offset(crypto->block); +assert(offset < INT64_MAX); +assert(offset < len); len -= offset; -- 2.13.5
[Qemu-block] [PATCH v4 1/6] block: use 1 MB bounce buffers for crypto instead of 16KB
Using 16KB bounce buffers creates a significant performance penalty for I/O to encrypted volumes on storage which high I/O latency (rotating rust & network drives), because it triggers lots of fairly small I/O operations. On tests with rotating rust, and cache=none|directsync, write speed increased from 2MiB/s to 32MiB/s, on a par with that achieved by the in-kernel luks driver. With other cache modes the in-kernel driver is still notably faster because it is able to report completion of the I/O request before any encryption is done, while the in-QEMU driver must encrypt the data before completion. Signed-off-by: Daniel P. Berrange --- block/crypto.c | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/block/crypto.c b/block/crypto.c index 58ef6f2f52..684cabeaf8 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -379,7 +379,11 @@ static void block_crypto_close(BlockDriverState *bs) } -#define BLOCK_CRYPTO_MAX_SECTORS 32 +/* + * 1 MB bounce buffer gives good performance / memory tradeoff + * when using cache=none|directsync. + */ +#define BLOCK_CRYPTO_MAX_IO_SIZE (1024 * 1024) static coroutine_fn int block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num, @@ -396,12 +400,11 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num, qemu_iovec_init(&hd_qiov, qiov->niov); -/* Bounce buffer so we have a linear mem region for - * entire sector. XXX optimize so we avoid bounce - * buffer in case that qiov->niov == 1 +/* Bounce buffer because we don't wish to expose cipher text + * in qiov which points to guest memory. */ cipher_data = -qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512, +qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_IO_SIZE, qiov->size)); if (cipher_data == NULL) { ret = -ENOMEM; @@ -411,8 +414,8 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num, while (remaining_sectors) { cur_nr_sectors = remaining_sectors; -if (cur_nr_sectors > BLOCK_CRYPTO_MAX_SECTORS) { -cur_nr_sectors = BLOCK_CRYPTO_MAX_SECTORS; +if (cur_nr_sectors > (BLOCK_CRYPTO_MAX_IO_SIZE / 512)) { +cur_nr_sectors = (BLOCK_CRYPTO_MAX_IO_SIZE / 512); } qemu_iovec_reset(&hd_qiov); @@ -464,12 +467,11 @@ block_crypto_co_writev(BlockDriverState *bs, int64_t sector_num, qemu_iovec_init(&hd_qiov, qiov->niov); -/* Bounce buffer so we have a linear mem region for - * entire sector. XXX optimize so we avoid bounce - * buffer in case that qiov->niov == 1 +/* Bounce buffer because we're not permitted to touch + * contents of qiov - it points to guest memory. */ cipher_data = -qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512, +qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_IO_SIZE, qiov->size)); if (cipher_data == NULL) { ret = -ENOMEM; @@ -479,8 +481,8 @@ block_crypto_co_writev(BlockDriverState *bs, int64_t sector_num, while (remaining_sectors) { cur_nr_sectors = remaining_sectors; -if (cur_nr_sectors > BLOCK_CRYPTO_MAX_SECTORS) { -cur_nr_sectors = BLOCK_CRYPTO_MAX_SECTORS; +if (cur_nr_sectors > (BLOCK_CRYPTO_MAX_IO_SIZE / 512)) { +cur_nr_sectors = (BLOCK_CRYPTO_MAX_IO_SIZE / 512); } qemu_iovec_to_buf(qiov, bytes_done, -- 2.13.5
[Qemu-block] [PATCH v4 0/6] Misc improvements to crypto block driver
This is a followup to v1: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg00781.html v2: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg06464.html v3: https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg02923.html This collection of patches first improves the performance of the crypto block driver and then does various cleanups to improve ongoing maint work. Changed in v4: - Drop intermediate patch that replaced '512' with a constant (Max) - Use MIN() macro where needed (Max) - Fix bounce buffer size at 1MB instead of varying per sector size (Max) - Convert missing qcrypto_block_encrypt call to sectors in qcow.c (Max) Changed in v3: - Support passthrough of BDRV_REQ_FUA (Eric) - Fix potential truncation of payload offset values (Eric) - Use encryption scheme sector size instead of BDRV_SECTOR_SIZE (Kevin) - Use QEMU_IS_ALIGNED where appropriate (Eric) - Remove unused 'sector_num' variable (Eric) - Fix whitespace alignment (Eric) - Fix math error in qcow conversion (Eric) Daniel P. Berrange (6): block: use 1 MB bounce buffers for crypto instead of 16KB crypto: expose encryption sector size in APIs block: fix data type casting for crypto payload offset block: convert crypto driver to bdrv_co_preadv|pwritev block: convert qcrypto_block_encrypt|decrypt to take bytes offset block: support passthrough of BDRV_REQ_FUA in crypto driver block/crypto.c | 130 ++--- block/qcow.c | 11 +++-- block/qcow2-cluster.c | 8 ++- block/qcow2.c | 4 +- crypto/block-luks.c| 18 --- crypto/block-qcow.c| 13 +++-- crypto/block.c | 26 +++--- crypto/blockpriv.h | 5 +- include/crypto/block.h | 29 --- 9 files changed, 148 insertions(+), 96 deletions(-) -- 2.13.5
Re: [Qemu-block] [PATCH v3 5/7] block: convert crypto driver to bdrv_co_preadv|pwritev
On Sat, Sep 16, 2017 at 06:54:58PM +0200, Max Reitz wrote: > On 2017-09-12 13:28, Daniel P. Berrange wrote: > > Make the crypto driver implement the bdrv_co_preadv|pwritev > > callbacks, and also use bdrv_co_preadv|pwritev for I/O > > with the protocol driver beneath. > > > > Signed-off-by: Daniel P. Berrange > > --- > > block/crypto.c | 104 > > +++-- > > 1 file changed, 56 insertions(+), 48 deletions(-) > > Reviewed-by: Max Reitz > > Some notes below. > > > diff --git a/block/crypto.c b/block/crypto.c > > index 49d6d4c058..d004e9cef4 100644 > > --- a/block/crypto.c > > +++ b/block/crypto.c > > [...] > > > @@ -410,37 +414,33 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t > > sector_num, > > goto cleanup; > > } > > > > -while (remaining_sectors) { > > -cur_nr_sectors = remaining_sectors; > > +while (bytes) { > > +cur_bytes = bytes; > > > > -if (cur_nr_sectors > BLOCK_CRYPTO_MAX_SECTORS) { > > -cur_nr_sectors = BLOCK_CRYPTO_MAX_SECTORS; > > +if (cur_bytes > (BLOCK_CRYPTO_MAX_SECTORS * sector_size)) { > > +cur_bytes = BLOCK_CRYPTO_MAX_SECTORS * sector_size; > > It's a bit weird that now the bounce buffer's size is now no longer > fixed at 1 MB but variable depending on the crypto driver's block size. > It also doesn't seem too intentional when looking at the first patch's > commit message... > > In any case, that would be an issue in the previous patch, though. In > general, I'm wondering whether maybe you should squash this patch into > the previous one... Yes, that would make the for a larger patch, but it > wouldn't leave some not-quite-right state in between where sector_size > is generally assumed to be equal to BDRV_SECTOR_SIZE -- which it is in > practice, but not necessarily in theory. In the end i'm going with this approach - just dropping the previous patch entirely, since 99% of what it does is then removed in this patch and the next one. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [Qemu-block] [PATCH 8/8] nbd: Minimal structured read for client
27.09.2017 13:05, Vladimir Sementsov-Ogievskiy wrote: 26.09.2017 01:19, Eric Blake wrote: On 09/25/2017 08:58 AM, Vladimir Sementsov-Ogievskiy wrote: Minimal implementation: drop most of additional error information. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd-client.h | 2 + include/block/nbd.h | 15 - block/nbd-client.c | 97 +- nbd/client.c | 169 +++- 4 files changed, 249 insertions(+), 34 deletions(-) +++ b/include/block/nbd.h @@ -57,11 +57,17 @@ struct NBDRequest { }; typedef struct NBDRequest NBDRequest; -struct NBDReply { +typedef struct NBDReply { + bool simple; uint64_t handle; uint32_t error; -}; -typedef struct NBDReply NBDReply; + + uint16_t flags; + uint16_t type; + uint32_t tail_length; + uint64_t offset; + uint32_t hole_size; +} NBDReply; This feels like it should be a discriminated union, rather than a struct containing fields that are only sometimes valid... No: simple, handle and error are always valid flags, type, tail_length are valid for all structured replies offset and hole_size are valid for structured hole reply so, we have one case when all fields are valid.. how do you see it with union, what is the real difference? I think it would be just a complication. #define NBD_SREP_TYPE_NONE 0 #define NBD_SREP_TYPE_OFFSET_DATA 1 +#define NBD_SREP_TYPE_OFFSET_HOLE 2 #define NBD_SREP_TYPE_ERROR NBD_SREP_ERR(1) +#define NBD_SREP_TYPE_ERROR_OFFSET NBD_SREP_ERR(2) ...especially since there is more than one type of SREP packet layout. I also wonder why you are defining constants in a piecemeal fashion, rather than all at once (even if your minimal server implementation doesn't send a particular constant, there's no harm in defining them all up front). hmm. just to not define unused constants. It doesn't matter, I can define them all if you prefer. +++ b/block/nbd-client.c @@ -179,9 +179,10 @@ err: return rc; } -static int nbd_co_receive_reply(NBDClientSession *s, - uint64_t handle, - QEMUIOVector *qiov) +static int nbd_co_receive_1_reply_or_chunk(NBDClientSession *s, Long name, and unusual to mix in "1" instead of "one". Would it be better to name it nbd_co_receive_chunk, where we declare that a simple reply is (roughly) the same amount of effort as a chunk in a structured reply? + uint64_t handle, + bool *cont, + QEMUIOVector *qiov) { No documentation of the function? int ret; int i = HANDLE_TO_INDEX(s, handle); @@ -191,29 +192,95 @@ static int nbd_co_receive_reply(NBDClientSession *s, qemu_coroutine_yield(); s->requests[i].receiving = false; if (!s->ioc || s->quit) { - ret = -EIO; - } else { - assert(s->reply.handle == handle); - ret = -s->reply.error; - if (qiov && s->reply.error == 0) { + *cont = false; + return -EIO; + } + + assert(s->reply.handle == handle); + *cont = !(s->reply.simple || (s->reply.flags & NBD_SREP_FLAG_DONE)); We need to validate that the server is not sending us SREP chunks unless we negotiated them. I'm thinking it might be better to do it here (maybe you did it somewhere else, but I haven't seen it yet; I'm reviewing the patch in textual order rather than the order in which things are called). No, I didn't. Will add (may be early, in reply_entry). + ret = -s->reply.error; + if (ret < 0) { + goto out; + } + + if (s->reply.simple) { + if (qiov) { if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov, - NULL) < 0) { - ret = -EIO; - s->quit = true; + NULL) < 0) + { + goto fatal; } } + goto out; + } - /* Tell the read handler to read another header. */ - s->reply.handle = 0; + /* here we deal with successful structured reply */ + switch (s->reply.type) { + QEMUIOVector sub_qiov; + case NBD_SREP_TYPE_OFFSET_DATA: This is putting a LOT of smarts directly into the receive routine. Here's where I was previously wondering (and I think Paolo as well) whether it might be better to split the efforts: the generic function reads off the chunk information and any payload, but a per-command Hmm. it was my idea to move all reading into one coroutine (in my refactoring series, but Paolo was against). Or you mean to read a payload as raw? It would lead to double copying it to destination qiov, which I dislike.. callback function then parses the chunks. per-command? Then callback for CMD_READ would have all of these "smarts", so the whole code would not be simpler.. (However it will simpl
Re: [Qemu-block] [PATCH 8/8] nbd: Minimal structured read for client
26.09.2017 01:19, Eric Blake wrote: On 09/25/2017 08:58 AM, Vladimir Sementsov-Ogievskiy wrote: Minimal implementation: drop most of additional error information. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd-client.h | 2 + include/block/nbd.h | 15 - block/nbd-client.c | 97 +- nbd/client.c| 169 +++- 4 files changed, 249 insertions(+), 34 deletions(-) +++ b/include/block/nbd.h @@ -57,11 +57,17 @@ struct NBDRequest { }; typedef struct NBDRequest NBDRequest; -struct NBDReply { +typedef struct NBDReply { +bool simple; uint64_t handle; uint32_t error; -}; -typedef struct NBDReply NBDReply; + +uint16_t flags; +uint16_t type; +uint32_t tail_length; +uint64_t offset; +uint32_t hole_size; +} NBDReply; This feels like it should be a discriminated union, rather than a struct containing fields that are only sometimes valid... No: simple, handle and error are always valid flags, type, tail_length are valid for all structured replies offset and hole_size are valid for structured hole reply so, we have one case when all fields are valid.. how do you see it with union, what is the real difference? I think it would be just a complication. #define NBD_SREP_TYPE_NONE 0 #define NBD_SREP_TYPE_OFFSET_DATA 1 +#define NBD_SREP_TYPE_OFFSET_HOLE 2 #define NBD_SREP_TYPE_ERROR NBD_SREP_ERR(1) +#define NBD_SREP_TYPE_ERROR_OFFSET NBD_SREP_ERR(2) ...especially since there is more than one type of SREP packet layout. I also wonder why you are defining constants in a piecemeal fashion, rather than all at once (even if your minimal server implementation doesn't send a particular constant, there's no harm in defining them all up front). hmm. just to not define unused constants. It doesn't matter, I can define them all if you prefer. +++ b/block/nbd-client.c @@ -179,9 +179,10 @@ err: return rc; } -static int nbd_co_receive_reply(NBDClientSession *s, -uint64_t handle, -QEMUIOVector *qiov) +static int nbd_co_receive_1_reply_or_chunk(NBDClientSession *s, Long name, and unusual to mix in "1" instead of "one". Would it be better to name it nbd_co_receive_chunk, where we declare that a simple reply is (roughly) the same amount of effort as a chunk in a structured reply? + uint64_t handle, + bool *cont, + QEMUIOVector *qiov) { No documentation of the function? int ret; int i = HANDLE_TO_INDEX(s, handle); @@ -191,29 +192,95 @@ static int nbd_co_receive_reply(NBDClientSession *s, qemu_coroutine_yield(); s->requests[i].receiving = false; if (!s->ioc || s->quit) { -ret = -EIO; -} else { -assert(s->reply.handle == handle); -ret = -s->reply.error; -if (qiov && s->reply.error == 0) { +*cont = false; +return -EIO; +} + +assert(s->reply.handle == handle); +*cont = !(s->reply.simple || (s->reply.flags & NBD_SREP_FLAG_DONE)); We need to validate that the server is not sending us SREP chunks unless we negotiated them. I'm thinking it might be better to do it here (maybe you did it somewhere else, but I haven't seen it yet; I'm reviewing the patch in textual order rather than the order in which things are called). No, I didn't. Will add (may be early, in reply_entry). +ret = -s->reply.error; +if (ret < 0) { +goto out; +} + +if (s->reply.simple) { +if (qiov) { if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov, - NULL) < 0) { -ret = -EIO; -s->quit = true; + NULL) < 0) +{ +goto fatal; } } +goto out; +} -/* Tell the read handler to read another header. */ -s->reply.handle = 0; +/* here we deal with successful structured reply */ +switch (s->reply.type) { +QEMUIOVector sub_qiov; +case NBD_SREP_TYPE_OFFSET_DATA: This is putting a LOT of smarts directly into the receive routine. Here's where I was previously wondering (and I think Paolo as well) whether it might be better to split the efforts: the generic function reads off the chunk information and any payload, but a per-command Hmm. it was my idea to move all reading into one coroutine (in my refactoring series, but Paolo was against). Or you mean to read a payload as raw? It would lead to double copying it to destination qiov, which I dislike.. callback function then parses the chunks. per-command? Then callback for CMD_READ would have all of these "smarts", so the whole code would not be simpler.. (However it will simplif
Re: [Qemu-block] [Qemu-devel] [PATCH] virtio: introduce `info virtio' hmp command
On Tue, 26 Sep 2017 19:20:20 +0200 Kevin Wolf wrote: > Am 26.09.2017 um 18:13 hat Jan Dakinevich geschrieben: > > The command is intended for exposing device specific virtio feature bits > > and their negotiation status. It is convenient and useful for debug > > purpose. > > > > Names of features are taken from a devices via get_feature_name() within > > VirtioDeviceClass. If certain device doesn't implement it, the command > > will print only hexadecimal value of feature mask. > > > > Cc: Denis V. Lunev > > Signed-off-by: Jan Dakinevich > > --- > > The patch suggests an another approach for this: > > > > "virtio: show guest features in 'info qtree'" > > http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg01609.html > > Actually, I think the original approach is much nicer because it uses > generic infrastructure rather than introducing a one-off monitor command > that is specific to a certain class of devices. Probably unsurprisingly, I like this approach much better. > > But at the same time I can see Cornelia's point that this would create a > whole lot of properties and if we did the same thing consistently for > all devices, the output of 'info qtree' would be completely cluttered > up. > > Maybe it would make sense to have properties that can be queried with > QOM, but that don't show up in 'info qtree'? If we then add some HMP > 'qom-get' command that allows dumping a whole device as a QOM object > instead of having to query property by property, it should be quite > convenient to use. It's not just that you have to add a property for every feature bit. They would also need either more state (to cover host offered and guest accepted bits), or they would need to be duplicated for host and guest bits. Additionally, we need to consider that feature negotiation is only final for virtio-1 devices if the feature accepted status has been set -- that's why I think providing the current status makes sense as well. I do like your idea of queryable properties and HMP-driven qom-get, though.