Re: [Qemu-devel] [PATCH v6 0/8] Add Qemu to SeaBIOS LCHS interface

2019-09-11 Thread Sam Eiderman via Qemu-devel
Gentle ping

On Tue, Aug 27, 2019, 11:24 Sam Eiderman  wrote:

> v1:
>
> Non-standard logical geometries break under QEMU.
>
> A virtual disk which contains an operating system which depends on
> logical geometries (consistent values being reported from BIOS INT13
> AH=08) will most likely break under QEMU/SeaBIOS if it has non-standard
> logical geometries - for example 56 SPT (sectors per track).
> No matter what QEMU will guess - SeaBIOS, for large enough disks - will
> use LBA translation, which will report 63 SPT instead.
>
> In addition we can not enforce SeaBIOS to rely on phyiscal geometries at
> all. A virtio-blk-pci virtual disk with 255 phyiscal heads can not
> report more than 16 physical heads when moved to an IDE controller, the
> ATA spec allows a maximum of 16 heads - this is an artifact of
> virtualization.
>
> By supplying the logical geometies directly we are able to support such
> "exotic" disks.
>
> We will use fw_cfg to do just that.
>
> v2:
>
> Fix missing parenthesis check in
> "hd-geo-test: Add tests for lchs override"
>
> v3:
>
> * Rename fw_cfg key to "bios-geometry".
> * Remove "extendible" interface.
> * Add cpu_to_le32 fix as Laszlo suggested or big endian hosts
> * Fix last qtest commit - automatic docker tester for some reason does not
> have qemu-img set
>
> v4:
>
> * Change fw_cfg interface from mixed textual/binary to textual only
>
> v5:
>
> * Fix line > 80 chars in tests/hd-geo-test.c
>
> v6:
>
> * Small fixes for issues pointed by Max
> * (&conf->conf)->lcyls to conf->conf.lcyls and so on
> * Remove scsi_unrealize from everything other than scsi-hd
> * Add proper include to sysemu.h
> * scsi_device_unrealize() after scsi_device_purge_requests()
>
> Sam Eiderman (8):
>   block: Refactor macros - fix tabbing
>   block: Support providing LCHS from user
>   bootdevice: Add interface to gather LCHS
>   scsi: Propagate unrealize() callback to scsi-hd
>   bootdevice: Gather LCHS from all relevant devices
>   bootdevice: Refactor get_boot_devices_list
>   bootdevice: FW_CFG interface for LCHS values
>   hd-geo-test: Add tests for lchs override
>
>  bootdevice.c | 148 --
>  hw/block/virtio-blk.c|   6 +
>  hw/ide/qdev.c|   7 +-
>  hw/nvram/fw_cfg.c|  14 +-
>  hw/scsi/scsi-bus.c   |  16 ++
>  hw/scsi/scsi-disk.c  |  12 +
>  include/hw/block/block.h |  22 +-
>  include/hw/scsi/scsi.h   |   1 +
>  include/sysemu/sysemu.h  |   4 +
>  tests/Makefile.include   |   2 +-
>  tests/hd-geo-test.c  | 582 +++
>  11 files changed, 773 insertions(+), 41 deletions(-)
>
> --
> 2.23.0.187.g17f5b7556c-goog
>
>


Re: [Qemu-devel] [PATCH v4 13/54] plugin: add user-facing API

2019-09-10 Thread Aaron Lindsay OS via Qemu-devel
On Sep 06 20:31, Alex Bennée wrote:
> Aaron Lindsay OS  writes:
> 
> > One thing I would find useful is the ability to access register values
> > during an execution-time callback. I think the easiest way to do that
> > generically would be to expose them via the gdb functionality (like
> > Pavel's earlier patchset did [1]), though that (currently) limits you to
> > the general-purpose registers. Ideally it would be nice be able to
> > access other registers (i.e. floating-point, or maybe even system
> > registers), though those are more difficult to do generically.
> 
> ARM already has system register support via the gdbstub XML interface so
> it's certainly doable. The trick is how we do that in a probable way
> without leaking the gdb remote protocol into plugins (which is just very
> ugly).

What do you mean by "in a probable way"?

I agree that simply exposing the gdb interface does not seem like a
clean solution.

> > Perhaps if we added some sort of architectural-support checking for
> > individual plugins like I mentioned in another response to this
> > patchset, we could allow some limited architecture-specific
> > functionality in this vein? I confess I haven't thought through all the
> > ramifications of that yet, though.
> 
> I was wondering if exposing the Elf Type would be enough? It's portable
> enough that plugins should be able to work with it without defining our
> own architecture enumeration.

I can't think of a reason that wouldn't work, assuming you're referring
to exposing a value corresponding to the `e_machine` ELF header.

-Aaron



Re: [Qemu-devel] [PATCH v4 00/54] plugins for TCG

2019-09-10 Thread Aaron Lindsay OS via Qemu-devel
On Sep 06 20:52, Alex Bennée wrote:
> 
> Markus Armbruster  writes:
> > Please advise why TCG plugins don't undermine the GPL.  Any proposal to
> > add a plugin interface needs to do that.
> 
> I'm not sure what we can say about this apart from "ask your lawyer".
> I'm certainly not proposing we add any sort of language about what
> should and shouldn't be allowed to use the plugin interface. I find it
> hard to see how anyone could argue code written to interface with the
> plugin API couldn't be considered a derived work.

I am not a lawyer, but I would not have expected software merely using a
well-defined API to be considered a derivative work of the software
defining it. Unless, of course, it is a derivative work of another
plugin using the same interface in a way that is not necessitated by the
structure of the API.

What's your reasoning for why it would be a derivative work? Is your
belief that the plugin API is complex enough that anything using it has
to be a derivative work, or something else?

That said, I'm not sure I understand in what way adding a plugin
interface would undermine the GPL, so maybe I'm missing the point.

-Aaron



Re: [Qemu-devel] [PATCH] ahci: enable pci bus master MemoryRegion before loading ahci engines

2019-09-10 Thread Andy via Qemu-devel
(0x7fcc4e19b4a0)[0]: cmd done
ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 
0x
ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 
0x0001
ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 
0x0001
ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: 
0x8000

handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS:
0x00: 27 80 ef 02 00 00 00 a0 00 00 00 00 00 00 00 00
0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done
ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 
0x
ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 
0x0001
ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 
0x0001
ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: 
0x0001

handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS:
0x00: 27 80 ec 00 00 00 00 a0 00 00 00 00 00 00 00 00
0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ahci_populate_sglist ahci(0x7fcc4e19b4a0)[0]
ahci_dma_prepare_buf ahci(0x7fcc4e19b4a0)[0]: prepare buf limit=512 
prepared=512
ahci_start_transfer ahci(0x7fcc4e19b4a0)[0]: reading 512 bytes on ata w/ 
sglist

ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done
ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 
0x
ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 
0x0001
ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 
0x0002
ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 
0x0001

---


--
Best regards,
Andy Chiu

On 2019/9/10 上午2:13, John Snow wrote:

On 9/9/19 1:18 PM, andychiu via Qemu-devel wrote:

If Windows 10 guests have enabled 'turn off hard disk after idle'
option in power settings, and the guest has a SATA disk plugged in,
the SATA disk will be turned off after a specified idle time.
If the guest is live migrated or saved/loaded with its SATA disk
turned off, the following error will occur:

qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive 
buffer address
qemu-system-x86_64: Failed to load ich9_ahci:ahci
qemu-system-x86_64: error while loading state for instance 0x0 of device 
':00:1a.0/ich9_ahci'
qemu-system-x86_64: load of migration failed: Operation not permitted


Oof. That can't have been fun to discover.


Observation from trace logs shows that a while after Windows 10 turns off
a SATA disk (IDE disks don't have the following behavior),
it will disable the PCI_COMMAND_MASTER flag of the pci device containing
the ahci device. When the the disk is turning back on,
the PCI_COMMAND_MASTER flag will be restored first.
But if the guest is migrated or saved/loaded while the disk is off,
the post_load callback of ahci device, ahci_state_post_load(), will fail
at ahci_cond_start_engines() if the MemoryRegion
pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing
to the PCIDevice struct containing the ahci device.

This patch enables pci_dev->bus_master_enable_region before calling
ahci_cond_start_engines() in ahci_state_post_load(), and restore the
MemoryRegion to its original state afterwards.>

This looks good to me from an AHCI perspective, but I'm not as clear on
the implications of toggling the MemoryRegion, so I have some doubts.


MST, can you chime in and clear my confusion?

I suppose when the PCI_COMMAND_MASTER bit is turned off, we disable the
memory region, as a guest would be unable to establish a new mapping in
this time, so it makes sense that the attempt to map it fails.

What's less clear to me is what happens to existing mappings when a
region is disabled. Are they invalidated? If so, does it make sense that
we are trying to establish a mapping here at all? Maybe it's absolutely
correct that this fails.

(I suppose, though, that the simple toggling of the region won't be a
guest-visible event, so it's probably safe to do. Right?)

What I find weird for AHCI is this: We try to engage the CLB mapping
before the FIS mapping, but we fail at the FIS mapping. So why is
PORT_CMD_FIS_RX set while PORT_CMD_START is unset?

It 

Re: [Qemu-devel] [PATCH] ahci: enable pci bus master MemoryRegion before loading ahci engines

2019-09-09 Thread andychiu via Qemu-devel
0 00 00 00 00 0x70: 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done 
ahci(0x7fcc4e19b4a0)[0]: cmd done ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port 
write [reg:PxSERR] @ 0x30: 0x ahci_port_write ahci(0x7fcc4e19b4a0)[0]: 
port write [reg:PxIS] @ 0x10: 0x0001 ahci_mem_write_host 
ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 0x0001 ahci_port_write 
ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: 0x8000 
handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: 0x00: 27 80 ef 02 00 00 00 a0 
00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done ahci_port_write 
ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 0x 
ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 
0x0001 ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 
0x0001 ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write 
[reg:PxCI] @ 0x38: 0x0001 handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: 
0x00: 27 80 ec 00 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_populate_sglist 
ahci(0x7fcc4e19b4a0)[0] ahci_dma_prepare_buf ahci(0x7fcc4e19b4a0)[0]: prepare 
buf limit=512 prepared=512 ahci_start_transfer ahci(0x7fcc4e19b4a0)[0]: reading 
512 bytes on ata w/ sglist ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done 
ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 
0x ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 
0x10: 0x0001 ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] 
@ 0x10: 0x0002 ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 
0x8: 0x0001 
-
 -- Best regards, Andy Chiu John Snow  於 2019-09-10 02:13 寫道: 
> > > > On 9/9/19 1:18 PM, andychiu via Qemu-devel wrote: > > If Windows 10 
guests have enabled 'turn off hard disk after idle' > > option in power 
settings, and the guest has a SATA disk plugged in, > > the SATA disk will be 
turned off after a specified idle time. > > If the guest is live migrated or 
saved/loaded with its SATA disk > > turned off, the following error will occur: 
> > > > qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS 
receive buffer address > > qemu-system-x86_64: Failed to load ich9_ahci:ahci > 
> qemu-system-x86_64: error while loading state for instance 0x0 of device 
':00:1a.0/ich9_ahci' > > qemu-system-x86_64: load of migration failed: 
Operation not permitted > > > > Oof. That can't have been fun to discover. > > 
> Observation from trace logs shows that a while after Windows 10 turns off > > 
a SATA disk (IDE disks don't have the following behavior), > > it will disable 
the PCI_COMMAND_MASTER flag of the pci device containing > > the ahci device. 
When the the disk is turning back on, > > the PCI_COMMAND_MASTER flag will be 
restored first. > > But if the guest is migrated or saved/loaded while the disk 
is off, > > the post_load callback of ahci device, ahci_state_post_load(), will 
fail > > at ahci_cond_start_engines() if the MemoryRegion > > 
pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing > > to 
the PCIDevice struct containing the ahci device. > > > > This patch enables 
pci_dev->bus_master_enable_region before calling > > ahci_cond_start_engines() 
in ahci_state_post_load(), and restore the > > MemoryRegion to its original 
state afterwards.> > > This looks good to me from an AHCI perspective, but I'm 
not as clear on > the implications of toggling the MemoryRegion, so I have some 
doubts. > > > MST, can you chime in and clear my confusion? > > I suppose when 
the PCI_COMMAND_MASTER bit is turned off, we disable the > memory region, as a 
guest would be unable to establish a new mapping in > this time, so it makes 
sense that the attempt to map it fails. > > What's less clear to me is what 
happens to existing mappings when a > region is disabled. Are th

[Qemu-devel] [PATCH] ahci: enable pci bus master MemoryRegion before loading ahci engines

2019-09-09 Thread andychiu via Qemu-devel
If Windows 10 guests have enabled 'turn off hard disk after idle'
option in power settings, and the guest has a SATA disk plugged in,
the SATA disk will be turned off after a specified idle time.
If the guest is live migrated or saved/loaded with its SATA disk
turned off, the following error will occur:

qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive 
buffer address
qemu-system-x86_64: Failed to load ich9_ahci:ahci
qemu-system-x86_64: error while loading state for instance 0x0 of device 
':00:1a.0/ich9_ahci'
qemu-system-x86_64: load of migration failed: Operation not permitted

Observation from trace logs shows that a while after Windows 10 turns off
a SATA disk (IDE disks don't have the following behavior),
it will disable the PCI_COMMAND_MASTER flag of the pci device containing
the ahci device. When the the disk is turning back on,
the PCI_COMMAND_MASTER flag will be restored first.
But if the guest is migrated or saved/loaded while the disk is off,
the post_load callback of ahci device, ahci_state_post_load(), will fail
at ahci_cond_start_engines() if the MemoryRegion
pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing
to the PCIDevice struct containing the ahci device.

This patch enables pci_dev->bus_master_enable_region before calling
ahci_cond_start_engines() in ahci_state_post_load(), and restore the
MemoryRegion to its original state afterwards.

Signed-off-by: andychiu 
---
 hw/ide/ahci.c | 53 -
 1 file changed, 36 insertions(+), 17 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index d45393c..83f8c30 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1649,33 +1649,52 @@ static const VMStateDescription vmstate_ahci_device = {
 },
 };
 
+static int ahci_state_load_engines(AHCIState *s, AHCIDevice *ad)
+{
+AHCIPortRegs *pr = &ad->port_regs;
+DeviceState *dev_state = s->container;
+PCIDevice *pci_dev = (PCIDevice *) object_dynamic_cast(OBJECT(dev_state),
+   TYPE_PCI_DEVICE);
+bool pci_bus_master_enabled = pci_dev->bus_master_enable_region.enabled;
+
+if (!(pr->cmd & PORT_CMD_START) && (pr->cmd & PORT_CMD_LIST_ON)) {
+error_report("AHCI: DMA engine should be off, but status bit "
+ "indicates it is still running.");
+return -1;
+}
+if (!(pr->cmd & PORT_CMD_FIS_RX) && (pr->cmd & PORT_CMD_FIS_ON)) {
+error_report("AHCI: FIS RX engine should be off, but status bit "
+ "indicates it is still running.");
+return -1;
+}
+
+memory_region_set_enabled(&pci_dev->bus_master_enable_region, true);
+
+/*
+ * After a migrate, the DMA/FIS engines are "off" and
+ * need to be conditionally restarted
+ */
+pr->cmd &= ~(PORT_CMD_LIST_ON | PORT_CMD_FIS_ON);
+if (ahci_cond_start_engines(ad) != 0) {
+return -1;
+}
+memory_region_set_enabled(&pci_dev->bus_master_enable_region,
+  pci_bus_master_enabled);
+
+return 0;
+}
+
 static int ahci_state_post_load(void *opaque, int version_id)
 {
 int i, j;
 struct AHCIDevice *ad;
 NCQTransferState *ncq_tfs;
-AHCIPortRegs *pr;
 AHCIState *s = opaque;
 
 for (i = 0; i < s->ports; i++) {
 ad = &s->dev[i];
-pr = &ad->port_regs;
-
-if (!(pr->cmd & PORT_CMD_START) && (pr->cmd & PORT_CMD_LIST_ON)) {
-error_report("AHCI: DMA engine should be off, but status bit "
- "indicates it is still running.");
-return -1;
-}
-if (!(pr->cmd & PORT_CMD_FIS_RX) && (pr->cmd & PORT_CMD_FIS_ON)) {
-error_report("AHCI: FIS RX engine should be off, but status bit "
- "indicates it is still running.");
-return -1;
-}
 
-/* After a migrate, the DMA/FIS engines are "off" and
- * need to be conditionally restarted */
-pr->cmd &= ~(PORT_CMD_LIST_ON | PORT_CMD_FIS_ON);
-if (ahci_cond_start_engines(ad) != 0) {
+if (ahci_state_load_engines(s, ad)) {
 return -1;
 }
 
-- 
2.7.4




Re: [Qemu-devel] [PATCH 0/2] Adding some setsockopt() options

2019-09-06 Thread Shu-Chun Weng via Qemu-devel
Ping. Patchwork links:

http://patchwork.ozlabs.org/patch/1151884/
http://patchwork.ozlabs.org/patch/1151883/

On Thu, Aug 22, 2019 at 4:14 PM Shu-Chun Weng  wrote:

> Shu-Chun Weng (2):
>   linux-user: add missing UDP and IPv6 setsockopt options
>   linux-user: time stamping options for setsockopt()
>
>  linux-user/generic/sockbits.h |  4 
>  linux-user/mips/sockbits.h|  4 
>  linux-user/syscall.c  | 16 +---
>  3 files changed, 21 insertions(+), 3 deletions(-)
>
> --
> 2.23.0.187.g17f5b7556c-goog
>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Qemu-devel] [PATCH] linux-user: hijack open() for thread directories

2019-09-06 Thread Shu-Chun Weng via Qemu-devel
Ping. Any comments on this? Patchwork:
http://patchwork.ozlabs.org/patch/1151167/

On Wed, Aug 21, 2019 at 1:19 PM Shu-Chun Weng  wrote:

> Besides /proc/self|, files under /proc/thread-self and
> /proc/self|/task/ also expose host information to the guest
> program. This patch adds them to the hijack infrastracture. Note that
> is_proc_myself() does not check if the  matches the current thread
> and is thus only suitable for procfs files that are identical for all
> threads in the same process.
>
> Behavior verified with guest program:
>
> long main_thread_tid;
>
> long gettid() {
>   return syscall(SYS_gettid);
> }
>
> void print_info(const char* cxt, const char* dir) {
>   char buf[1024];
>   FILE* fp;
>
>   snprintf(buf, sizeof(buf), "%s/cmdline", dir);
>   fp = fopen(buf, "r");
>
>   if (fp == NULL) {
> printf("%s: can't open %s\n", cxt, buf);
>   } else {
> fgets(buf, sizeof(buf), fp);
> printf("%s %s cmd: %s\n", cxt, dir, buf);
> fclose(fp);
>   }
>
>   snprintf(buf, sizeof(buf), "%s/maps", dir);
>   fp = fopen(buf, "r");
>
>   if (fp == NULL) {
> printf("%s: can't open %s\n", cxt, buf);
>   } else {
> char seen[128][128];
> int n = 0, is_new = 0;
> while(fgets(buf, sizeof(buf), fp) != NULL) {
>   const char* p = strrchr(buf, ' ');
>   if (p == NULL || *(p + 1) == '\n') {
> continue;
>   }
>   ++p;
>   is_new = 1;
>   for (int i = 0; i < n; ++i) {
> if (strncmp(p, seen[i], sizeof(seen[i])) == 0) {
>   is_new = 0;
>   break;
> }
>   }
>   if (is_new) {
> printf("%s %s map: %s", cxt, dir, p);
> if (n < 128) {
>   strncpy(seen[n], p, sizeof(seen[n]));
>   seen[n][sizeof(seen[n]) - 1] = '\0';
>   ++n;
> }
>   }
> }
> fclose(fp);
>   }
> }
>
> void* thread_main(void* _) {
>   char buf[1024];
>
>   print_info("Child", "/proc/thread-self");
>
>   snprintf(buf, sizeof(buf), "/proc/%ld/task/%ld", (long) getpid(),
> main_thread_tid);
>   print_info("Child", buf);
>
>   snprintf(buf, sizeof(buf), "/proc/%ld/task/%ld", (long) getpid(), (long)
> gettid());
>   print_info("Child", buf);
>
>   return NULL;
> }
>
> int main() {
>   char buf[1024];
>   pthread_t thread;
>   int ret;
>
>   print_info("Main", "/proc/thread-self");
>   print_info("Main", "/proc/self");
>
>   snprintf(buf, sizeof(buf), "/proc/%ld", (long) getpid());
>   print_info("Main", buf);
>
>   main_thread_tid = gettid();
>   snprintf(buf, sizeof(buf), "/proc/self/task/%ld", main_thread_tid);
>   print_info("Main", buf);
>
>   snprintf(buf, sizeof(buf), "/proc/%ld/task/%ld", (long) getpid(),
> main_thread_tid);
>   print_info("Main", buf);
>
>   if ((ret = pthread_create(&thread, NULL, &thread_main, NULL)) < 0) {
> printf("ptherad_create failed: %s (%d)\n", strerror(ret), ret);
>   }
>
>   pthread_join(thread, NULL);
>   return 0;
> }
>
> Signed-off-by: Shu-Chun Weng 
> ---
>  linux-user/syscall.c | 40 
>  1 file changed, 40 insertions(+)
>
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 8367cb138d..73fe82bcc7 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -6968,17 +6968,57 @@ static int open_self_auxv(void *cpu_env, int fd)
>  return 0;
>  }
>
> +static int consume_task_directories(const char **filename)
> +{
> +if (!strncmp(*filename, "task/", strlen("task/"))) {
> +*filename += strlen("task/");
> +if (**filename < '1' || **filename > '9') {
> +return 0;
> +}
> +/*
> + * Don't care about the exact tid.
> + * XXX: this allows opening files under /proc/self|/task/
> where
> + *   is not a valid thread id. Consider checking if the
> file
> + *  actually exists.
> + */
> +const char *p = *filename + 1;
> +while (*p >= '0' && *p <= '9') {
> +++p;
> +}
> +if (*p == '/') {
> +*filename = p + 1;
> +return 1;
> +} else {
> +return 0;
> +}
> +}
> +return 1;
> +}
> +
> +/*
> + * Determines if filename refer to a procfs file for the current process
> or any
> + * thread within the current process. This function should only be used
> to check
> + * for files that have identical contents in all threads, e.g. exec,
> maps, etc.
> + */
>  static int is_proc_myself(const char *filename, const char *entry)
>  {
>  if (!strncmp(filename, "/proc/", strlen("/proc/"))) {
>  filename += strlen("/proc/");
>  if (!strncmp(filename, "self/", strlen("self/"))) {
>  filename += strlen("self/");
> +if (!consume_task_directories(&filename)) {
> +return 0;
> +}
> +} else if (!strncmp(filename, "thread-self/",
> strlen("thread-self/"))) {
> +filename += strlen("thread-self/");
>  } else if (*filename >= '1' && *filename <= '9') {
>  

Re: [Qemu-devel] [PATCH v6 0/4] 9p: Fix file ID collisions

2019-09-05 Thread Christian Schoenebeck via Qemu-devel
On Mittwoch, 4. September 2019 15:02:30 CEST Christian Schoenebeck wrote:
> > > Well, mailman is handling this correctly. It replaces the "From:" field
> > > with a placeholder and instead adds my actual email address as
> > > "Reply-To:" field. That's the common way to handle this on mailing
> > > lists,
> > > as also mentioned here:
> > > https://en.wikipedia.org/wiki/DMARC#From:_rewriting
> > > 
> > > So IMO patchew should automatically use the value of "Reply-To:" in that
> > > case as author of patches instead.
> > > 
> > > Reducing security cannot be the solution.
> > 
> > No, there's no need to reduce security.  Just change your local git
> > configuration to produce a 'From:' line in the commit body..
> 
> Got it. :)
> 
> > >> How are you sending patches ? With git send-email ? If so, maybe you
> > >> can
> > >> pass something like --from='"Christian Schoenebeck"
> > >> '. Since this is a different string, git will
> > >> assume you're sending someone else's patch : it will automatically add
> > >> an
> > >> extra From: made out of the commit Author as recorded in the git tree.
> > 
> > I think it is probably as simple as a 'git config' command to tell git
> > to always put a 'From:' in the body of self-authored patches when using
> > git format-patch; however, as I don't suffer from munged emails, I
> > haven't actually tested what that setting would be.

Well, I tried that Eric. The expected solution would be enabling this git 
setting:

git config [--global] format.from true
https://git-scm.com/docs/git-config#Documentation/git-config.txt-formatfrom

But as you can already read from the manual, the overall behaviour of git 
regarding a separate "From:" line in the email body was intended solely for 
the use case sender != author. So in practice (at least in my git version) git 
always makes a raw string comparison between sender (name and email) string 
and author string and only adds the separate From: line to the body if they 
differ.

Hence also "git format-patch --from=" only works here if you use a different 
author string (name and email) there, otherwise on a perfect string match it 
is simply ignored and you end up with only one "From:" in the email header.

So eventually I added one extra character in my name for now and removed it 
manually in the dumped emails subsequently (see today's
"[PATCH v7 0/3] 9p: Fix file ID collisions").

Besides that direct string comparison restriction; git also seems to have a 
bug here. Because even if you have sender != author, then git falsely uses 
author as sender of the cover letter, whereas the emails of the individual 
patches are encoded correctly.

Best regards,
Christian Schoenebeck





[Qemu-devel] [PATCH v7 2/3] 9p: stat_to_qid: implement slow path

2019-09-05 Thread Christian Schoenebeck via Qemu-devel
From: Christian Schoenebeck 

stat_to_qid attempts via qid_path_prefixmap to map unique files (which are
identified by 64 bit inode nr and 32 bit device id) to a 64 QID path value.
However this implementation makes some assumptions about inode number
generation on the host.

If qid_path_prefixmap fails, we still have 48 bits available in the QID
path to fall back to a less memory efficient full mapping.

Signed-off-by: Antonios Motakis 
[CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next
   (SHA1 7fc4c49e91).
 - Updated hash calls to new xxhash API.
 - Removed unnecessary parantheses in qpf_lookup_func().
 - Removed unnecessary g_malloc0() result checks.
 - Log error message when running out of prefixes in
   qid_path_fullmap().
 - Log warning message about potential degraded performance in
   qid_path_prefixmap().
 - Wrapped qpf_table initialization to dedicated qpf_table_init()
   function.
 - Fixed typo in comment. ]
Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p.c | 74 +++-
 hw/9pfs/9p.h |  9 +++
 2 files changed, 76 insertions(+), 7 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 8eb89c5c7d..d9be2d45d3 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -579,23 +579,34 @@ static uint32_t qpp_hash(QppEntry e)
 return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0);
 }
 
+static uint32_t qpf_hash(QpfEntry e)
+{
+return qemu_xxhash7(e.ino, e.dev, 0, 0, 0);
+}
+
 static bool qpp_lookup_func(const void *obj, const void *userp)
 {
 const QppEntry *e1 = obj, *e2 = userp;
 return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix;
 }
 
-static void qpp_table_remove(void *p, uint32_t h, void *up)
+static bool qpf_lookup_func(const void *obj, const void *userp)
+{
+const QpfEntry *e1 = obj, *e2 = userp;
+return e1->dev == e2->dev && e1->ino == e2->ino;
+}
+
+static void qp_table_remove(void *p, uint32_t h, void *up)
 {
 g_free(p);
 }
 
-static void qpp_table_destroy(struct qht *ht)
+static void qp_table_destroy(struct qht *ht)
 {
 if (!ht || !ht->map) {
 return;
 }
-qht_iter(ht, qpp_table_remove, NULL);
+qht_iter(ht, qp_table_remove, NULL);
 qht_destroy(ht);
 }
 
@@ -604,6 +615,50 @@ static void qpp_table_init(struct qht *ht)
 qht_init(ht, qpp_lookup_func, 1, QHT_MODE_AUTO_RESIZE);
 }
 
+static void qpf_table_init(struct qht *ht)
+{
+qht_init(ht, qpf_lookup_func, 1 << 16, QHT_MODE_AUTO_RESIZE);
+}
+
+static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf,
+uint64_t *path)
+{
+QpfEntry lookup = {
+.dev = stbuf->st_dev,
+.ino = stbuf->st_ino
+}, *val;
+uint32_t hash = qpf_hash(lookup);
+
+/* most users won't need the fullmap, so init the table lazily */
+if (!pdu->s->qpf_table.map) {
+qpf_table_init(&pdu->s->qpf_table);
+}
+
+val = qht_lookup(&pdu->s->qpf_table, &lookup, hash);
+
+if (!val) {
+if (pdu->s->qp_fullpath_next == 0) {
+/* no more files can be mapped :'( */
+error_report_once(
+"9p: No more prefixes available for remapping inodes from "
+"host to guest."
+);
+return -ENFILE;
+}
+
+val = g_malloc0(sizeof(QppEntry));
+*val = lookup;
+
+/* new unique inode and device combo */
+val->path = pdu->s->qp_fullpath_next++;
+pdu->s->qp_fullpath_next &= QPATH_INO_MASK;
+qht_insert(&pdu->s->qpf_table, val, hash, NULL);
+}
+
+*path = val->path;
+return 0;
+}
+
 /*
  * stat_to_qid needs to map inode number (64 bits) and device id (32 bits)
  * to a unique QID path (64 bits). To avoid having to map and keep track
@@ -629,9 +684,8 @@ static int qid_path_prefixmap(V9fsPDU *pdu, const struct 
stat *stbuf,
 if (!val) {
 if (pdu->s->qp_prefix_next == 0) {
 /* we ran out of prefixes */
-error_report_once(
-"9p: No more prefixes available for remapping inodes from "
-"host to guest."
+warn_report_once(
+"9p: Potential degraded performance of inode remapping"
 );
 return -ENFILE;
 }
@@ -656,6 +710,10 @@ static int stat_to_qid(V9fsPDU *pdu, const struct stat 
*stbuf, V9fsQID *qidp)
 if (pdu->s->ctx.export_flags & V9FS_REMAP_INODES) {
 /* map inode+device to qid path (fast path) */
 err = qid_path_prefixmap(pdu, stbuf, &qidp->path);
+if (err == -ENFILE) {
+/* fast path didn't work, fall back to full map */
+err = qid_path_fullmap(pdu, stbuf, &qidp->path);
+}
 if (err) {
 retu

[Qemu-devel] [PATCH v7 0/3] 9p: Fix file ID collisions

2019-09-05 Thread Christian Schoenebeck via Qemu-devel
This is v7 of a proposed patch set for fixing file ID collisions with 9pfs.

v6->v7:

  * Rebased to https://github.com/gkurz/qemu/commits/9p-next
(SHA1 7fc4c49e91).

  * Be pedantic and abort with error on wrong value for new command line
argument 'multidevs'.

  * Adjusted patches to qemu code style guidelines.

  * Fixed potential crash in qp_table_destroy() on error path.

  * Use dedicated hash table init functions (qpd_table_init(),
qpf_table_init(), qpp_table_init()):
https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg00144.html

  * Use warn_report_once() instead of error_report_once() for issues
interpreted merely as being warnings, not errors.

  * Simplified hash table destruction (due to simplified error path
introduced by SHA1 7fc4c49e91).

  * Dropped capturing root_ino for now:
https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg00146.html

  * Don't update proxy docs, since new command line argument 'multidevs' is
limited to the local backend for now.

  * Mention in docs that readdir() is currently not blocked by
'multidevs=forbid'.

  * Rename qid_path_prefixmap() -> qid_path_suffixmap() in patch 3
(due to the semantic change of that function by that patch).

Christian Schoenebeck (3):
  9p: Added virtfs option 'multidevs=remap|forbid|warn'
  9p: stat_to_qid: implement slow path
  9p: Use variable length suffixes for inode remapping

 fsdev/file-op-9p.h  |   5 +
 fsdev/qemu-fsdev-opts.c |   7 +-
 fsdev/qemu-fsdev.c  |  17 ++
 hw/9pfs/9p.c| 456 ++--
 hw/9pfs/9p.h|  59 ++
 qemu-options.hx |  26 ++-
 vl.c|   7 +-
 7 files changed, 552 insertions(+), 25 deletions(-)

-- 
2.20.1




[Qemu-devel] [PATCH v7 3/3] 9p: Use variable length suffixes for inode remapping

2019-09-05 Thread Christian Schoenebeck via Qemu-devel
From: Christian Schoenebeck 

Use variable length suffixes for inode remapping instead of the fixed
16 bit size prefixes before. With this change the inode numbers on guest
will typically be much smaller (e.g. around >2^1 .. >2^7 instead of >2^48
with the previous fixed size inode remapping.

Additionally this solution is more efficient, since inode numbers in
practice can take almost their entire 64 bit range on guest as well, so
there is less likely a need for generating and tracking additional suffixes,
which might also be beneficial for nested virtualization where each level of
virtualization would shift up the inode bits and increase the chance of
expensive remapping actions.

The "Exponential Golomb" algorithm is used as basis for generating the
variable length suffixes. The algorithm has a parameter k which controls the
distribution of bits on increasing indeces (minimum bits at low index vs.
maximum bits at high index). With k=0 the generated suffixes look like:

Index Dec/Bin -> Generated Suffix Bin
1 [1] -> [1] (1 bits)
2 [10] -> [010] (3 bits)
3 [11] -> [110] (3 bits)
4 [100] -> [00100] (5 bits)
5 [101] -> [10100] (5 bits)
6 [110] -> [01100] (5 bits)
7 [111] -> [11100] (5 bits)
8 [1000] -> [0001000] (7 bits)
9 [1001] -> [1001000] (7 bits)
10 [1010] -> [0101000] (7 bits)
11 [1011] -> [1101000] (7 bits)
12 [1100] -> [0011000] (7 bits)
...
65533 [1101] ->  [1011000] (31 bits)
65534 [1110] ->  [0111000] (31 bits)
65535 [] ->  [000] (31 bits)
Hence minBits=1 maxBits=31

And with k=5 they would look like:

Index Dec/Bin -> Generated Suffix Bin
1 [1] -> [01] (6 bits)
2 [10] -> [11] (6 bits)
3 [11] -> [010001] (6 bits)
4 [100] -> [110001] (6 bits)
5 [101] -> [001001] (6 bits)
6 [110] -> [101001] (6 bits)
7 [111] -> [011001] (6 bits)
8 [1000] -> [111001] (6 bits)
9 [1001] -> [000101] (6 bits)
10 [1010] -> [100101] (6 bits)
11 [1011] -> [010101] (6 bits)
12 [1100] -> [110101] (6 bits)
...
65533 [1101] -> [001110001000] (28 bits)
65534 [1110] -> [101110001000] (28 bits)
65535 [] -> [00001000] (28 bits)
Hence minBits=6 maxBits=28

Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p.c | 268 +--
 hw/9pfs/9p.h |  44 -
 2 files changed, 279 insertions(+), 33 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index d9be2d45d3..37abcdb71e 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -27,6 +27,7 @@
 #include "migration/blocker.h"
 #include "sysemu/qtest.h"
 #include "qemu/xxhash.h"
+#include 
 
 int open_fd_hw;
 int total_open_fd;
@@ -573,6 +574,116 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu)
 P9_STAT_MODE_NAMED_PIPE |   \
 P9_STAT_MODE_SOCKET)
 
+/* Mirrors all bits of a byte. So e.g. binary 1010 would become 0101. 
*/
+static inline uint8_t mirror8bit(uint8_t byte)
+{
+return (byte * 0x0202020202ULL & 0x010884422010ULL) % 1023;
+}
+
+/* Same as mirror8bit() just for a 64 bit data type instead for a byte. */
+static inline uint64_t mirror64bit(uint64_t value)
+{
+return ((uint64_t)mirror8bit(value & 0xff) << 56) |
+   ((uint64_t)mirror8bit((value >> 8)  & 0xff) << 48) |
+   ((uint64_t)mirror8bit((value >> 16) & 0xff) << 40) |
+   ((uint64_t)mirror8bit((value >> 24) & 0xff) << 32) |
+   ((uint64_t)mirror8bit((value >> 32) & 0xff) << 24) |
+   ((uint64_t)mirror8bit((value >> 40) & 0xff) << 16) |
+   ((uint64_t)mirror8bit((value >> 48) & 0xff) << 8)  |
+   ((uint64_t)mirror8bit((value >> 56) & 0xff));
+}
+
+/**
+ * @brief Parameter k for the Exponential Golomb algorihm to be used.
+ *
+ * The smaller this value, the smaller the minimum bit count for the Exp.
+ * Golomb generated affixes will be (at lowest index) however for the
+ * price of having higher maximum bit count of generated affixes (at highest
+ * index). Likewise increasing this parameter yields in smaller maximum bit
+ * count for the price of having higher minimum bit count.
+ *
+ * In practice that means: a good value for k depends on the expected amount
+ * of devices to be exposed by one export. For a small amount of devices k
+ * should be small, for a large amount of devices k might be increased
+ * instead. The default of k=0 should be fine for most users though.
+ *
+ * @b IMPORTANT: In case this ever becomes a runtime parameter; the value of
+ * k should not change as long as guest is still running! Because that would
+ * cause completely different inode

[Qemu-devel] [PATCH v7 1/3] 9p: Added virtfs option 'multidevs=remap|forbid|warn'

2019-09-05 Thread Christian Schoenebeck via Qemu-devel
From: Christian Schoenebeck 

'warn' (default): Only log an error message (once) on host if more than one
device is shared by same export, except of that just ignore this config
error though. This is the default behaviour for not breaking existing
installations implying that they really know what they are doing.

'forbid': Like 'warn', but except of just logging an error this
also denies access of guest to additional devices.

'remap': Allows to share more than one device per export by remapping
inodes from host to guest appropriately. To support multiple devices on the
9p share, and avoid qid path collisions we take the device id as input to
generate a unique QID path. The lowest 48 bits of the path will be set
equal to the file inode, and the top bits will be uniquely assigned based
on the top 16 bits of the inode and the device id.

Signed-off-by: Antonios Motakis 
[CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next
   (SHA1 7fc4c49e91).
 - Added virtfs option 'multidevs', original patch simply did the inode
   remapping without being asked.
 - Updated hash calls to new xxhash API.
 - Updated docs for new option 'multidevs'.
 - Fixed v9fs_do_readdir() not having remapped inodes.
 - Log error message when running out of prefixes in
   qid_path_prefixmap().
 - Fixed definition of QPATH_INO_MASK.
 - Wrapped qpp_table initialization to dedicated qpp_table_init()
   function.
 - Dropped unnecessary parantheses in qpp_lookup_func().
 - Dropped unnecessary g_malloc0() result checks. ]
Signed-off-by: Christian Schoenebeck 
---
 fsdev/file-op-9p.h  |   5 ++
 fsdev/qemu-fsdev-opts.c |   7 +-
 fsdev/qemu-fsdev.c  |  17 
 hw/9pfs/9p.c| 188 +++-
 hw/9pfs/9p.h|  12 +++
 qemu-options.hx |  26 +-
 vl.c|   7 +-
 7 files changed, 237 insertions(+), 25 deletions(-)

diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h
index c757c8099f..f2f7772c86 100644
--- a/fsdev/file-op-9p.h
+++ b/fsdev/file-op-9p.h
@@ -59,6 +59,11 @@ typedef struct ExtendedOps {
 #define V9FS_RDONLY 0x0040
 #define V9FS_PROXY_SOCK_FD  0x0080
 #define V9FS_PROXY_SOCK_NAME0x0100
+/*
+ * multidevs option (either one of the two applies exclusively)
+ */
+#define V9FS_REMAP_INODES   0x0200
+#define V9FS_FORBID_MULTIDEVS   0x0400
 
 #define V9FS_SEC_MASK   0x003C
 
diff --git a/fsdev/qemu-fsdev-opts.c b/fsdev/qemu-fsdev-opts.c
index 7c31af..07a18c6e48 100644
--- a/fsdev/qemu-fsdev-opts.c
+++ b/fsdev/qemu-fsdev-opts.c
@@ -31,7 +31,9 @@ static QemuOptsList qemu_fsdev_opts = {
 }, {
 .name = "readonly",
 .type = QEMU_OPT_BOOL,
-
+}, {
+.name = "multidevs",
+.type = QEMU_OPT_STRING,
 }, {
 .name = "socket",
 .type = QEMU_OPT_STRING,
@@ -75,6 +77,9 @@ static QemuOptsList qemu_virtfs_opts = {
 }, {
 .name = "readonly",
 .type = QEMU_OPT_BOOL,
+}, {
+.name = "multidevs",
+.type = QEMU_OPT_STRING,
 }, {
     .name = "socket",
 .type = QEMU_OPT_STRING,
diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c
index 077a8c4e2b..e407716177 100644
--- a/fsdev/qemu-fsdev.c
+++ b/fsdev/qemu-fsdev.c
@@ -58,6 +58,7 @@ static FsDriverTable FsDrivers[] = {
 "writeout",
 "fmode",
 "dmode",
+"multidevs",
 "throttling.bps-total",
 "throttling.bps-read",
 "throttling.bps-write",
@@ -121,6 +122,7 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp)
 const char *fsdev_id = qemu_opts_id(opts);
 const char *fsdriver = qemu_opt_get(opts, "fsdriver");
 const char *writeout = qemu_opt_get(opts, "writeout");
+const char *multidevs = qemu_opt_get(opts, "multidevs");
 bool ro = qemu_opt_get_bool(opts, "readonly", 0);
 
 if (!fsdev_id) {
@@ -161,6 +163,21 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp)
 } else {
 fsle->fse.export_flags &= ~V9FS_RDONLY;
 }
+if (multidevs) {
+if (!strcmp(multidevs, "remap")) {
+fsle->fse.export_flags &= ~V9FS_FORBID_MULTIDEVS;
+fsle->fse.export_flags |= V9FS_REMAP_INODES;
+} else if (!strcmp(multidevs, "forbid")) {
+fsle->fse.export_flags &= ~V9FS_REMAP_INODES;
+fsle->fse.export_flags |= V9FS_FORBID_MULTIDEVS;
+} else if (!strcmp(multidevs, "warn")) {
+fsle->fse.export_flags &= ~V9FS_FORBID_MULTIDEVS;
+fsle->fse.exp

Re: [Qemu-devel] [PATCH v6 0/4] 9p: Fix file ID collisions

2019-09-04 Thread Christian Schoenebeck via Qemu-devel
On Dienstag, 3. September 2019 21:38:15 CEST Eric Blake wrote:
> On 9/2/19 5:29 PM, Christian Schoenebeck via Qemu-devel wrote:
> >>>>> === OUTPUT BEGIN ===
> >>>>> 1/4 Checking commit bb69de63f788 (9p: Treat multiple devices on one
> >>>>> export
> >>>>> as an error) ERROR: Author email address is mangled by the mailing
> >>>>> list
> >>>>> #2:
> >>>>> Author: Christian Schoenebeck via Qemu-devel 
> >>>> 
> >>>> This is problematic since it ends up in the Author: field in git.
> >>>> Please
> >>>> find a way to fix that.
> >>> 
> >>> Like in which way do you imagine that? And where is the actual practical
> >>> problem? I mean every patch still has my signed-off-by tag with the
> >>> correct
> >>> email address ending up in git history.
> >> 
> >> Yes, this only breaks Author: if the patch is applied from the list.
> 
> Except that many maintainers DO apply mail from the list (thanks to 'git
> am').  Fixing patchew to unmunge things is an appealing idea, but would
> not fix the problem for maintainers not cloning from patchew, so even if
> patchew avoids the problem locally, it should still continue to warn
> about the problem.
> 
> >>> The cause for this issue is that the domain is configured to require
> >>> DKIM
> >>> signatures for all outgoing emails. That's why mailman replaces my
> >>> address
> >>> by "Christian Schoenebeck via Qemu-devel "
> >>> placeholder since it could not provide a valid signature.
> 
> And when you know that mailman is going to munge your address, the fix
> is to configure git to output 'From: correct name '
> as the first line of the BODY of the message, since 'git am' favors the
> unmunged From: from the body over the munged From: from the headers.

Ah I see, I will try that with the next 9p patch set round (probably 
tomorrow). Thanks for the hint!

I actually had a quick glimpse on the patchew sources yesterday to see if 
there was some undocumented alternative header like "X-git-author:" or 
something like that, but could not find one.

> > Well, mailman is handling this correctly. It replaces the "From:" field
> > with a placeholder and instead adds my actual email address as
> > "Reply-To:" field. That's the common way to handle this on mailing lists,
> > as also mentioned here:
> > https://en.wikipedia.org/wiki/DMARC#From:_rewriting
> > 
> > So IMO patchew should automatically use the value of "Reply-To:" in that
> > case as author of patches instead.
> > 
> > Reducing security cannot be the solution.
> 
> No, there's no need to reduce security.  Just change your local git
> configuration to produce a 'From:' line in the commit body..

Got it. :)

> >> How are you sending patches ? With git send-email ? If so, maybe you can
> >> pass something like --from='"Christian Schoenebeck"
> >> '. Since this is a different string, git will
> >> assume you're sending someone else's patch : it will automatically add an
> >> extra From: made out of the commit Author as recorded in the git tree.
> 
> I think it is probably as simple as a 'git config' command to tell git
> to always put a 'From:' in the body of self-authored patches when using
> git format-patch; however, as I don't suffer from munged emails, I
> haven't actually tested what that setting would be.
> 
> > I use "git format-patch ..." to dump the invidiual emails as raw email
> > sources and then I'll send those raw emails from the command line. So I
> > have even more control of what is exactly sent out and could of course
> > also add custom email header fields if required, if that would solve the
> > situation somehow, i.e. manually as first test and later in automated
> > way. That's not the issue here.
> Working around the problem does not require munging email headers, but
> adding a line to the email body.






Re: [Qemu-devel] [PATCH] linux-user: Support gdb 'qOffsets' query for ELF

2019-09-03 Thread Josh Kunz via Qemu-devel
The `Data` and `Code` flags in `qOffsets` are actually section offsets
rather than segment offsets. GDB relocates the symbols in those sections
relative to their location in the binary. So we have to use `load_bias`.

See here for a more detailed description:
https://sourceware.org/gdb/onlinedocs/gdb/General-Query-Packets.html#General-Query-Packets

On Mon, Aug 26, 2019 at 1:29 AM Laurent Vivier  wrote:

> Le 17/08/2019 à 01:34, Josh Kunz via Qemu-devel a écrit :
> > This is needed to support debugging PIE ELF binaries running under QEMU
> > user mode. Currently, `code_offset` and `data_offset` remain unset for
> > all ELF binaries, so GDB is unable to correctly locate the position of
> > the binary's text and data.
> >
> > The fields `code_offset`, and `data_offset` were originally added way
> > back in 2006 to support debugging of bFMT executables (978efd6aac6),
> > and support was just never added for ELF. Since non-PIE binaries are
> > loaded at exactly the address specified in the binary, GDB does not need
> > to relocate any symbols, so the buggy behavior is not normally observed.
> >
> > Buglink: https://bugs.launchpad.net/qemu/+bug/1528239
> > Signed-off-by: Josh Kunz 
> > ---
> >  linux-user/elfload.c | 2 ++
> >  1 file changed, 2 insertions(+)
>
> As it seems they are text and data segment offsets, why it's not based
> on info->start_code and info->start_data?
>
> Thanks,
> Laurent
>


Re: [Qemu-devel] [PATCH v6 0/4] 9p: Fix file ID collisions

2019-09-02 Thread Christian Schoenebeck via Qemu-devel
On Montag, 2. September 2019 17:34:32 CEST Greg Kurz wrote:
> On Sun, 01 Sep 2019 21:28:45 +0200
> 
> Christian Schoenebeck  wrote:
> > On Donnerstag, 29. August 2019 19:02:34 CEST Greg Kurz wrote:
> > > On Thu, 22 Aug 2019 15:18:54 -0700 (PDT)
> > > 
> > > no-re...@patchew.org wrote:
> > > > Patchew URL:
> > > > https://patchew.org/QEMU/cover.1566503584.git.qemu_...@crudebyte.com/
> > > > 
> > > > 
> > > > 
> > > > Hi,
> > > > 
> > > > This series seems to have some coding style problems. See output below
> > > > for
> > 
> > > > more information:
> > [snip]
> > 
> > > > === OUTPUT BEGIN ===
> > > > 1/4 Checking commit bb69de63f788 (9p: Treat multiple devices on one
> > > > export
> > > > as an error) ERROR: Author email address is mangled by the mailing
> > > > list
> > > > #2:
> > > > Author: Christian Schoenebeck via Qemu-devel 
> > > 
> > > This is problematic since it ends up in the Author: field in git. Please
> > > find a way to fix that.
> > 
> > Like in which way do you imagine that? And where is the actual practical
> > problem? I mean every patch still has my signed-off-by tag with the
> > correct
> > email address ending up in git history.
> 
> Yes, this only breaks Author: if the patch is applied from the list.
> 
> > The cause for this issue is that the domain is configured to require DKIM
> > signatures for all outgoing emails. That's why mailman replaces my address
> > by "Christian Schoenebeck via Qemu-devel "
> > placeholder since it could not provide a valid signature.
> > 
> > There were good reasons for enabling DKIM and it is a good thing for all
> > domains in general, since that ensures that (i.e. foreign) email addresses
> > cannot be used as sender address if the actual sender is not authorized
> > for
> > sending emails with that address.
> 
> Don't know much about DKIM but google seems to confirm what you say.

When you view the source of my emails you'll always see a "DKIM-Signature:" 
field in the email header, which is a signature of the email's body and the 
specific email header fields listed in the "DKIM-Signature:" block, so the 
original server can choose by itself which header fields to include ("h=...") 
for signing, but the standard requires the From: field must always be 
included.

A receiving server then obtains the public key from the domain's DNS records 
and checks if the DKIM signature of the email was correct:
https://en.wikipedia.org/wiki/DomainKeys_Identified_Mail

Additionally the receiving server obtains the so called "DMARC" entry from the 
domain's DNS records. The "DMARC" DNS entry contains the domain's general 
policies how receiving email servers shall handle verification of emails of 
this domain. For instance the DMARC entry may list SMTP servers (e.g. IPs, DNS 
names) eligble to send emails on behalf of the domain at all, and it defines 
what receiving email servers shall do with emails which were identified of not 
being from an eligible source (e.g. sender IP not listed in DMARC entry or 
missing or wrong DKIM signature in email header). For instance if the policy 
is "quarantine" in the DMARC entry then receiving servers are advised to 
simply drop invalid emails:  https://en.wikipedia.org/wiki/DMARC

> So, this means that patchew will complain each time you post if we can't
> find a proper way to address that... :-\

Well, mailman is handling this correctly. It replaces the "From:" field with a 
placeholder and instead adds my actual email address as "Reply-To:" field. 
That's the common way to handle this on mailing lists, as also mentioned here:
https://en.wikipedia.org/wiki/DMARC#From:_rewriting

So IMO patchew should automatically use the value of "Reply-To:" in that case 
as author of patches instead.

Reducing security cannot be the solution.

> > What I changed in the meantime though is that you should get all my
> > patches
> > directly to your personal address, not only from the list. Or did you
> > receive v6 again just from the list?
> 
> I've received the patches in my mailbox but I prefer to use the patchwork's
> pwclient CLI to apply patches... and patchwork captures the patches from
> the list, so I end up having to patch the authorship manually anyway.
> 
> How are you sending patches ? With git send-email ? If so, maybe you can
> pass something like --from='"Christian Schoenebeck"
> '. Since this is a differ

Re: [Qemu-devel] [PATCH v6 2/4] 9p: Added virtfs option 'multidevs=remap|forbid|warn'

2019-09-02 Thread Christian Schoenebeck via Qemu-devel
On Montag, 2. September 2019 13:49:34 CEST Greg Kurz wrote:
> On Sun, 01 Sep 2019 20:56:16 +0200
> 
> Christian Schoenebeck  wrote:
> > On Freitag, 30. August 2019 14:22:38 CEST Greg Kurz wrote:
> > > Some more comments below.
> > 
> > [snip]
> > 
> > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > > > index 8cc65c2c67..c96ea51116 100644
> > > > --- a/hw/9pfs/9p.c
> > > > +++ b/hw/9pfs/9p.c
> > > > @@ -25,6 +25,7 @@
> > > > 
> > > >  #include "trace.h"
> > > >  #include "migration/blocker.h"
> > > >  #include "sysemu/qtest.h"
> > 
> > [snip]
> > 
> > > > @@ -3672,8 +3807,13 @@ int v9fs_device_realize_common(V9fsState *s,
> > > > const
> > > > V9fsTransport *t,>
> > > > 
> > > >  goto out;
> > > >  
> > > >  }
> > > > 
> > > > +s->root_ino = stat.st_ino;
> > > 
> > > This isn't used anywhere. It looks like a leftover of the readdir fix
> > > in v5.
> > 
> > Yes, both correct. I intentionally left it though, since I found it a
> > natural complement always capturing the root inode along to the root
> > device.
> Fair enough. The local backend opens an fd to the root directory, to be used
> by any access to the 9p share. I think root_dev/root_ino should be obtained
> with fstat() on this fd, to be sure they are consistent. Maybe add an extra
> struct stat * argument to the init function ? I'd rather see this done as a
> preparatory "backend to cache 9p root device/inode during init" patch.

Convinced. I'll drop root_ino from this patch set for now.





Re: [Qemu-devel] [PATCH v6 2/4] 9p: Added virtfs option 'multidevs=remap|forbid|warn'

2019-09-02 Thread Christian Schoenebeck via Qemu-devel
On Montag, 2. September 2019 12:16:26 CEST Greg Kurz wrote:
> > > > @@ -571,22 +572,109 @@ static void coroutine_fn virtfs_reset(V9fsPDU
> > > > *pdu)
> > > > 
> > > >  P9_STAT_MODE_NAMED_PIPE |   \
> > > >  P9_STAT_MODE_SOCKET)
> > > > 
> > > > -/* This is the algorithm from ufs in spfs */
> > > > +
> > > > +/* creative abuse of tb_hash_func7, which is based on xxhash */
> > > > +static uint32_t qpp_hash(QppEntry e)
> > > > +{
> > > > +return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0);
> > > > +}
> > > > +
> > > > +static bool qpp_lookup_func(const void *obj, const void *userp)
> > > > +{
> > > > +const QppEntry *e1 = obj, *e2 = userp;
> > > > +return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix;
> > > > +}
> > > > +
> > > > +static void qpp_table_remove(void *p, uint32_t h, void *up)
> > > > +{
> > > > +g_free(p);
> > > > +}
> > > > +
> > > > +static void qpp_table_destroy(struct qht *ht)
> > > > +{
> > > > +qht_iter(ht, qpp_table_remove, NULL);
> > > > +qht_destroy(ht);
> > > > +}
> > > 
> > > Ok to have a function for this instead of open-coding but I'd
> > > like to see qpp_table_init() for consistency.
> > 
> > Well, these are just qht_init() one-liners, but if you really want to have
> > dedicated, local init functions for them, okay.
> 
> Yeah, even if it's a one-liner, I prefer consistency. Alternatively, with
> an idempotent v9fs_device_unrealize_common() like in [1], you'd have
> only one user for qpp_table_destroy() and you can open-code it. This
> would address my consistency concern even better :)
> 
> [1]
> https://github.com/gkurz/qemu/commit/7fc4c49e910df2e155b36bf0a05de9209bd92d

I'll rather add qpp_table_init() then, because grouping the two calls 
qht_iter() and qht_destroy() together to a dedicated function 
qpp_table_destroy() still makes sense semantically IMO.





Re: [Qemu-devel] [PATCH v5 3/5] 9p: Added virtfs option 'remap_inodes'

2019-09-01 Thread Christian Schoenebeck via Qemu-devel
On Freitag, 30. August 2019 13:48:27 CEST Greg Kurz wrote:
> > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > > index 8cc65c2c67..39c6c2a894 100644
> > > --- a/hw/9pfs/9p.c
> > > +++ b/hw/9pfs/9p.c
> > 
> > [snip]
> > 
> > > @@ -1940,6 +2041,19 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU
> > > *pdu, V9fsFidState *fidp, int32_t count = 0;
> > > 
> > >  off_t saved_dir_pos;
> > >  struct dirent *dent;
> > > 
> > > +struct stat stbuf;
> > > +bool fidIsExportRoot;
> > > +
> > > +/*
> > > + * determine if fidp is the export root, which is required for safe
> > > + * handling of ".." below
> > > + */
> > > +err = v9fs_co_lstat(pdu, &fidp->path, &stbuf);
> > > +if (err < 0) {
> > > +return err;
> > > +}
> > > +fidIsExportRoot = pdu->s->dev_id == stbuf.st_dev &&
> > > +  pdu->s->root_ino == stbuf.st_ino;
> > > 
> > >  /* save the directory position */
> > >  saved_dir_pos = v9fs_co_telldir(pdu, fidp);
> > > 
> > > @@ -1964,16 +2078,51 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU
> > > *pdu, V9fsFidState *fidp, v9fs_string_free(&name);
> > > 
> > >  return count;
> > >  
> > >  }
> > > 
> > > -/*
> > > - * Fill up just the path field of qid because the client uses
> > > - * only that. To fill the entire qid structure we will have
> > > - * to stat each dirent found, which is expensive
> > > - */
> > > -size = MIN(sizeof(dent->d_ino), sizeof(qid.path));
> > > -memcpy(&qid.path, &dent->d_ino, size);
> > > -/* Fill the other fields with dummy values */
> > > -qid.type = 0;
> > > -qid.version = 0;
> > > +
> > > +if (fidIsExportRoot && !strcmp("..", dent->d_name)) {
> > > +/*
> > > + * if "." is export root, then return qid of export root
> > > for
> > > + * ".." to avoid exposing anything outside the export
> > > + */
> > > +err = fid_to_qid(pdu, fidp, &qid);
> > > +if (err < 0) {
> > > +v9fs_readdir_unlock(&fidp->fs.dir);
> > > +v9fs_co_seekdir(pdu, fidp, saved_dir_pos);
> > > +v9fs_string_free(&name);
> > > +return err;
> > > +}
> > 
> > Hmm, I start to wonder whether I should postpone that particular bug fix
> > and not make it part of that QID fix patch series (not even as separate
> > patch there). Because that fix needs some more adjustments. E.g. I should
> > adjust dent->d_type here as well; but more notably it should also
> > distinguish between the case where the export root is mounted as / on
> > guest or not and that's where this fix could become ugly and grow in
> > size.
> > 
> > To make the case clear:  calling on guest
> > 
> > readdir(pathOfSome9pExportRootOnGuest);
> > 
> > currently always returns for its ".." result entry the inode number and
> > d_type of the export root's parent directory on host, so it exposes
> > information of host outside the 9p export.
> > 
> > I don't see that as security issue, since the information revealed is
> > limited to the inode number and d_type, but it is definitely incorrect
> > behaviour.
> Definitely. This should be fixed independently of this series. Maybe follow
> the same approach as commit 56f101ecce0e "9pfs: handle walk of ".." in the
> root directory", ie. basically make /.. an alias of /.

That's actually what the suggested fix already did in v5 here (see diff above). 
However I was worried whether I thought about all edge cases. So I also need 
some more testing and hence clearly decided to postpone this fix for now.

Best regards,
Christian Schoenebeck




Re: [Qemu-devel] [PATCH v6 0/4] 9p: Fix file ID collisions

2019-09-01 Thread Christian Schoenebeck via Qemu-devel
On Donnerstag, 29. August 2019 19:02:34 CEST Greg Kurz wrote:
> On Thu, 22 Aug 2019 15:18:54 -0700 (PDT)
> 
> no-re...@patchew.org wrote:
> > Patchew URL:
> > https://patchew.org/QEMU/cover.1566503584.git.qemu_...@crudebyte.com/
> > 
> > 
> > 
> > Hi,
> > 
> > This series seems to have some coding style problems. See output below for
> > more information:
[snip]
> > 
> > === OUTPUT BEGIN ===
> > 1/4 Checking commit bb69de63f788 (9p: Treat multiple devices on one export
> > as an error) ERROR: Author email address is mangled by the mailing list
> > #2:
> > Author: Christian Schoenebeck via Qemu-devel 
> 
> This is problematic since it ends up in the Author: field in git. Please
> find a way to fix that.

Like in which way do you imagine that? And where is the actual practical 
problem? I mean every patch still has my signed-off-by tag with the correct 
email address ending up in git history.

The cause for this issue is that the domain is configured to require DKIM 
signatures for all outgoing emails. That's why mailman replaces my address by
"Christian Schoenebeck via Qemu-devel " placeholder 
since it could not provide a valid signature.

There were good reasons for enabling DKIM and it is a good thing for all 
domains in general, since that ensures that (i.e. foreign) email addresses 
cannot be used as sender address if the actual sender is not authorized for 
sending emails with that address.

What I changed in the meantime though is that you should get all my patches 
directly to your personal address, not only from the list. Or did you receive 
v6 again just from the list?

> Other warnings/errors should also be fixed but they look trivial.

Yeah, they are trivial. *But* there is one thing ...

> > Author: Christian Schoenebeck via Qemu-devel 
> > 
> > ERROR: space prohibited after that open parenthesis '('
> > #92: FILE: hw/9pfs/9p.c:586:
> > +return ((uint64_t)mirror8bit( value& 0xff) << 56) |
> > 
> > ERROR: space prohibited before that close parenthesis ')'
> > #98: FILE: hw/9pfs/9p.c:592:
> > +   ((uint64_t)mirror8bit((value >> 48) & 0xff) << 8 ) |
> > 
> > ERROR: space prohibited before that close parenthesis ')'
> > #99: FILE: hw/9pfs/9p.c:593:
> > +   ((uint64_t)mirror8bit((value >> 56) & 0xff)  ) ;

... I would like to ignore this specific bot whining, because that particular 
function looks much more readable the way it is (in that patch) right now.

> > WARNING: Block comments use a leading /* on a separate line
> > #102: FILE: hw/9pfs/9p.c:596:
> > +/** @brief Parameter k for the Exponential Golomb algorihm to be used.
> > 
> > WARNING: Block comments use a leading /* on a separate line
> > #121: FILE: hw/9pfs/9p.c:615:
> > +/** @brief Exponential Golomb algorithm for arbitrary k (including k=0).
> > 
> > WARNING: Block comments use a leading /* on a separate line
> > #148: FILE: hw/9pfs/9p.c:642:
> > +/** @brief Converts a suffix into a prefix, or a prefix into a suffix.



Re: [Qemu-devel] [PATCH v6 2/4] 9p: Added virtfs option 'multidevs=remap|forbid|warn'

2019-09-01 Thread Christian Schoenebeck via Qemu-devel
On Freitag, 30. August 2019 14:22:38 CEST Greg Kurz wrote:
> Some more comments below.
[snip]
> > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > index 8cc65c2c67..c96ea51116 100644
> > --- a/hw/9pfs/9p.c
> > +++ b/hw/9pfs/9p.c
> > @@ -25,6 +25,7 @@
> > 
> >  #include "trace.h"
> >  #include "migration/blocker.h"
> >  #include "sysemu/qtest.h"
[snip]
> > @@ -3672,8 +3807,13 @@ int v9fs_device_realize_common(V9fsState *s, const
> > V9fsTransport *t,> 
> >  goto out;
> >  
> >  }
> > 
> > +s->root_ino = stat.st_ino;
> 
> This isn't used anywhere. It looks like a leftover of the readdir fix
> in v5.

Yes, both correct. I intentionally left it though, since I found it a natural 
complement always capturing the root inode along to the root device.

> >  s->dev_id = stat.st_dev;
> > 
> > +/* QID path hash table. 1 entry ought to be enough for anybody ;) */
> > +qht_init(&s->qpp_table, qpp_lookup_func, 1, QHT_MODE_AUTO_RESIZE);
> > +s->qp_prefix_next = 1; /* reserve 0 to detect overflow */
> > +
> > 
> >  s->ctx.fst = &fse->fst;
> >  fsdev_throttle_init(s->ctx.fst);
> > 
> > @@ -3687,6 +3827,7 @@ out:
> >  }
> >  g_free(s->tag);
> >  g_free(s->ctx.fs_root);
> > 
> > +qpp_table_destroy(&s->qpp_table);
> > 
> >  v9fs_path_free(&path);
> >  
> >  }
> >  return rc;
> > 
> > @@ -3699,6 +3840,7 @@ void v9fs_device_unrealize_common(V9fsState *s,
> > Error **errp)> 
> >  }
> >  fsdev_throttle_cleanup(s->ctx.fst);
> >  g_free(s->tag);
> > 
> > +    qpp_table_destroy(&s->qpp_table);
> > 
> >  g_free(s->ctx.fs_root);
> >  
> >  }
> > 
> > diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h
> > index 5e316178d5..a283b0193e 100644
> > --- a/hw/9pfs/9p.h
> > +++ b/hw/9pfs/9p.h
> > @@ -8,6 +8,7 @@
> > 
> >  #include "fsdev/9p-iov-marshal.h"
> >  #include "qemu/thread.h"
> >  #include "qemu/coroutine.h"
> > 
> > +#include "qemu/qht.h"
> > 
> >  enum {
> >  
> >  P9_TLERROR = 6,
> > 
> > @@ -235,6 +236,15 @@ struct V9fsFidState
> > 
> >  V9fsFidState *rclm_lst;
> >  
> >  };
> > 
> > +#define QPATH_INO_MASK((1ULL << 48) - 1)
> > +
> > +/* QID path prefix entry, see stat_to_qid */
> > +typedef struct {
> > +dev_t dev;
> > +uint16_t ino_prefix;
> > +uint16_t qp_prefix;
> > +} QppEntry;
> > +
> > 
> >  struct V9fsState
> >  {
> >  
> >  QLIST_HEAD(, V9fsPDU) free_list;
> > 
> > @@ -256,7 +266,10 @@ struct V9fsState
> > 
> >  Error *migration_blocker;
> >  V9fsConf fsconf;
> >  V9fsQID root_qid;
> > 
> > +ino_t root_ino;
> 
> Thinking again, I'm wondering if root_ino and dev_id could be used
> instead of root_qid in v9fs_walk()... Would you have a look at that
> if you decide to fix the readdir issue ?

I keep it in mind when looking at the postponed readdir() issue again.

> >  dev_t dev_id;
> > 
> > +struct qht qpp_table;
> > +uint16_t qp_prefix_next;
> > 
> >  };
> >  
> >  /* 9p2000.L open flags */
> > 




Re: [Qemu-devel] [PATCH v6 2/4] 9p: Added virtfs option 'multidevs=remap|forbid|warn'

2019-09-01 Thread Christian Schoenebeck via Qemu-devel
On Donnerstag, 29. August 2019 18:55:28 CEST Greg Kurz wrote:
> > diff --git a/fsdev/qemu-fsdev-opts.c b/fsdev/qemu-fsdev-opts.c
> > index 7c31af..07a18c6e48 100644
> > --- a/fsdev/qemu-fsdev-opts.c
> > +++ b/fsdev/qemu-fsdev-opts.c
> > @@ -31,7 +31,9 @@ static QemuOptsList qemu_fsdev_opts = {
> > 
> >  }, {
> >  
> >  .name = "readonly",
> >  .type = QEMU_OPT_BOOL,
> > 
> > -
> > +}, {
> > +.name = "multidevs",
> > +.type = QEMU_OPT_STRING,
> > 
> >  }, {
> >  
> >  .name = "socket",
> >  .type = QEMU_OPT_STRING,
> > 
> > @@ -76,6 +78,9 @@ static QemuOptsList qemu_virtfs_opts = {
> > 
> >  .name = "readonly",
> >  .type = QEMU_OPT_BOOL,
> >  
> >  }, {
> > 
> > +.name = "multidevs",
> > +.type = QEMU_OPT_STRING,
> > +}, {
> > 
> >  .name = "socket",
> >  .type = QEMU_OPT_STRING,
> >  
> >  }, {
> > 
> > diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c
> > index 077a8c4e2b..ed03d559a9 100644
> > --- a/fsdev/qemu-fsdev.c
> > +++ b/fsdev/qemu-fsdev.c
> > @@ -58,6 +58,7 @@ static FsDriverTable FsDrivers[] = {
> > 
> >  "writeout",
> >  "fmode",
> >  "dmode",
> > 
> > +"multidevs",
> 
> So we only allow this for the "local" backend. Any reason not to
> add this to "proxy" as well ?
> 
> I didn't do it for the "throttling" options because it is a
> feature I didn't care to support much, but "multidevs" is more
> a fix than a fancy feature.

Well, to be honest I haven't cared about the proxy backend at all. Like I 
mentioned before, I am a bit sceptical that the proxy feature is actually used 
by people in real life at all (at least in its current implementation). So 
personally I don't see much sense in investing time for adding & testing new 
things on this backend ATM.

> > +if (multidevs) {
> > +if (!strcmp(multidevs, "remap")) {
> > +fsle->fse.export_flags &= ~V9FS_FORBID_MULTIDEVS;
> > +fsle->fse.export_flags |= V9FS_REMAP_INODES;
> > +} else if (!strcmp(multidevs, "forbid")) {
> > +fsle->fse.export_flags &= ~V9FS_REMAP_INODES;
> > +fsle->fse.export_flags |= V9FS_FORBID_MULTIDEVS;
> > +}
> 
> And...
> 
> } else if (!strcmp(multidevs, "warn")) {
> fsle->fse.export_flags &= ~V9FS_FORBID_MULTIDEVS;
> fsle->fse.export_flags &= ~V9FS_REMAP_INODES;
> } else {
> error_setg(errp, "invalid multidevs property '%s'", multidevs);
> return -1;
> }
> 
> ... because we're a bit pedantic for command line options :)

And I thought I adapted to relaxed handling of CLI arguments. See existing 
option "writeout".  :)

> 
> > +}
> > 
> >  if (fsle->fse.ops->parse_opts) {
> >  
> >  if (fsle->fse.ops->parse_opts(opts, &fsle->fse, errp)) {
> > 
> > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > index 8cc65c2c67..c96ea51116 100644
> > --- a/hw/9pfs/9p.c
> > +++ b/hw/9pfs/9p.c
> > @@ -25,6 +25,7 @@
> > 
> >  #include "trace.h"
> >  #include "migration/blocker.h"
> >  #include "sysemu/qtest.h"
> > 
> > +#include "qemu/xxhash.h"
> > 
> >  int open_fd_hw;
> >  int total_open_fd;
> > 
> > @@ -571,22 +572,109 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu)
> > 
> >  P9_STAT_MODE_NAMED_PIPE |   \
> >  P9_STAT_MODE_SOCKET)
> > 
> > -/* This is the algorithm from ufs in spfs */
> > +
> > +/* creative abuse of tb_hash_func7, which is based on xxhash */
> > +static uint32_t qpp_hash(QppEntry e)
> > +{
> > +return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0);
> > +}
> > +
> > +static bool qpp_lookup_func(const void *obj, const void *userp)
> > +{
> > +const QppEntry *e1 = obj, *e2 = userp;
> > +return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix

Re: [Qemu-devel] [PATCH v6 1/4] 9p: Treat multiple devices on one export as an error

2019-09-01 Thread Christian Schoenebeck via Qemu-devel
On Donnerstag, 29. August 2019 18:27:30 CEST Greg Kurz wrote:
> > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > index 586a6dccba..8cc65c2c67 100644
> > --- a/hw/9pfs/9p.c
> > +++ b/hw/9pfs/9p.c
> > @@ -572,10 +572,18 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu)
> > 
> >  P9_STAT_MODE_SOCKET)
> >  
> >  /* This is the algorithm from ufs in spfs */
> > 
> > -static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp)
> > +static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID
> > *qidp)> 
> >  {
> >  
> >  size_t size;
> > 
> > +if (pdu->s->dev_id != stbuf->st_dev) {
> > +error_report_once(
> > +"9p: Multiple devices detected in same VirtFS export. "
> > +"You must use a separate export for each device."
> > +);
> > +return -ENODEV;
> 
> As explained in the v5 review, we don't necessarily want to break existing
> cross-device setups that just happen to work. Moreover, the next patch
> re-allows them since remap_inodes=ignore is the default. I would thus
> only do:
> 
> warn_report_once(
> "9p: Multiple devices detected in same VirtFS export, "
> "which might lead to file ID collisions and severe "
> "misbehaviours on guest! You should use a separate "
> "export for each device shared from host."
> );
> 
> So I've just changed that and applied to 9p-next since it is
> a valuable improvement. Note that I've kept the signature change
> of stat_to_qid() for simplicity even if it isn't needed before
> the next patch.

I'm fine with your changes.

Dropping "return -ENODEV" in this patch was beyond my default level for 
details since this is really just a very detailed sort of git history 
tweaking. :) Like you already pointed out, not aborting (as default behaviour) 
would have been addressed with the subsequent patch anyway.



Re: [Qemu-devel] [EXTERNAL]Re: patch to swap SIGRTMIN + 1 and SIGRTMAX - 1

2019-08-30 Thread Josh Kunz via Qemu-devel
I can take over the series. I'll rebase the patch set, and update it to
address the SIGRTMIN - 1 issue. I should have an update sometime next week.

On Wed, Aug 28, 2019 at 10:31 AM Aleksandar Markovic 
wrote:

> > From: Laurent Vivier 
> > Sent: Wednesday, August 28, 2019 10:51 AM
> > To: Josh Kunz; Aleksandar Markovic; milos.stojano...@rt-rk.com
> > Cc: marlies.r...@gmail.com; qemu-devel@nongnu.org; riku.voi...@iki.fi;
> > qemu-triv...@nongnu.org; Peter Maydell; Shu-Chun Weng; Aleksandar
> Markovic
> > Subject: [EXTERNAL]Re: [Qemu-devel] patch to swap SIGRTMIN + 1 and
> SIGRTMAX - 1
> >
> > Le 26/08/2019 à 23:10, Josh Kunz a écrit :
> > > On Wed, Aug 21, 2019 at 2:28 AM Laurent Vivier  > > <mailto:laur...@vivier.eu>> wrote:
> > >
> > > Le 19/08/2019 à 23:46, Josh Kunz via Qemu-devel a écrit :
> > > > Hi all,
> > > >
> > > > I have also experienced issues with SIGRTMIN + 1, and am
> interested in
> > > > moving this patch forwards. Anything I can do here to help?
> Would the
> > > > maintainers prefer myself or Marli re-submit the patch?
> > > >
> > > > The Go issue here seems particularly sticky. Even if we update
> the Go
> > > > runtime, users may try and run older binaries built with older
> > > versions of
> > > > Go for quite some time (months? years?). Would it be better to
> > > hide this
> > > > behind some kind of build-time flag
> > > (`--enable-sigrtmin-plus-one-proxy` or
> > > > something), so that some users can opt-in, but older binaries
> > > still work as
> > > > expected?
> > > >
> > > > Also, here is a link to the original thread this message is in
> > > reply to
> > > > in-case my mail-client doesn't set up the reply properly:
> > > >
> https://lists.nongnu.org/archive/html/qemu-devel/2019-07/msg01303.html
> > >
> > > The problem here is we break something to fix something else.
> > >
> > > I'm wondering if the series from Aleksandar Markovic, "linux-user:
> > > Support signal passing for targets having more signals than host"
> [1]
> > > can fix the problem in a better way?
> > >
> > >
> > > That patch[1] (which I'll refer to as the MUX patch to avoid confusion)
> > > does not directly fix the issue addressed by this patch (re-wiring
> > > SIGRTMIN+1), but since it basically implements generic signal
> > > multiplexing, it could be re-worked to address this case as well. The
> > > way it handles `si_code` spooks me a little bit. It could easily be
> > > broken by a kernel version change, and such a breakage could be hard to
> > > detect or lead to surprising results. Other than that, it looks like a
> > > reasonable implementation.
> > >
> > > That said, overall, fixing the SIGRTMIN+1 issue using a more-generic
> > > signal-multiplexing mechanism doesn't seem *that* much better to me. It
> > > adds a lot of complexity, and only saves a single signal (assuming
> glibc
> > > doesn't add more reserved signals). The "big win" is additional
> > > emulation features, like those introduced in MUX patch (being able to
> > > utilize signals outside of the host range). If having those features in
> > > QEMU warrants the additional complexity, then re-working this patch
> > > on-top of that infrastructure seems like a good idea.
> > >
> > > If the maintainers want to go down that route, then I would be happy to
> > > re-work this patch utilizing the infrastructure from the MUX patch.
> > > Unfortunately it will require non-trivial changes, so it may be best to
> > > wait until that patch is merged. I could also provide a patch "on top
> > > of" the MUX patch, if that's desired/more convenient.
> > >
> > > Just one last note, if you do decide to merge the MUX patch, then it
> > > would be best to use SIGRTMAX (instead of SIGRTMAX-1) as the
> > > multiplexing signal if possible, to avoid breaking go binaries.
> > >
> >
> > Personally, I prefer a solution that breaks nothing.
> >
> > Aleksandar, Milos,
> >
> > do you have an updated version of you series "Support signal passing for
> > targets having more signals than host"?
> >
>
> Milos is unfortunetely working on an entirely different project now, and
> can't spare enough time to finish the series. I am also busy with other
> issues, even though I would like very much this or equivalent solution to
> be integrated. Alternatively, someone in the team may have time later this
> year, but I do not know that yet  - perhaps somebody else (Josh) can take
> over the series?
>
> Sincerely,
> Aleksandar
>
>
> > Thanks,
> > Laurent
> >


[Qemu-devel] [PATCH v6 6/8] bootdevice: Refactor get_boot_devices_list

2019-08-27 Thread Sam Eiderman via Qemu-devel
From: Sam Eiderman 

Move device name construction to a separate function.

We will reuse this function in the following commit to pass logical CHS
parameters through fw_cfg much like we currently pass bootindex.

Reviewed-by: Karl Heubaum 
Reviewed-by: Arbel Moshe 
Signed-off-by: Sam Eiderman 
---
 bootdevice.c | 61 +---
 1 file changed, 34 insertions(+), 27 deletions(-)

diff --git a/bootdevice.c b/bootdevice.c
index bc5e1c2de4..2b12fb85a4 100644
--- a/bootdevice.c
+++ b/bootdevice.c
@@ -202,6 +202,39 @@ DeviceState *get_boot_device(uint32_t position)
 return res;
 }
 
+static char *get_boot_device_path(DeviceState *dev, bool ignore_suffixes,
+  char *suffix)
+{
+char *devpath = NULL, *s = NULL, *d, *bootpath;
+
+if (dev) {
+devpath = qdev_get_fw_dev_path(dev);
+assert(devpath);
+}
+
+if (!ignore_suffixes) {
+if (dev) {
+d = qdev_get_own_fw_dev_path_from_handler(dev->parent_bus, dev);
+if (d) {
+assert(!suffix);
+s = d;
+} else {
+s = g_strdup(suffix);
+}
+} else {
+s = g_strdup(suffix);
+}
+}
+
+bootpath = g_strdup_printf("%s%s",
+   devpath ? devpath : "",
+   s ? s : "");
+g_free(devpath);
+g_free(s);
+
+return bootpath;
+}
+
 /*
  * This function returns null terminated string that consist of new line
  * separated device paths.
@@ -218,36 +251,10 @@ char *get_boot_devices_list(size_t *size)
 bool ignore_suffixes = mc->ignore_boot_device_suffixes;
 
 QTAILQ_FOREACH(i, &fw_boot_order, link) {
-char *devpath = NULL,  *suffix = NULL;
 char *bootpath;
-char *d;
 size_t len;
 
-if (i->dev) {
-devpath = qdev_get_fw_dev_path(i->dev);
-assert(devpath);
-}
-
-if (!ignore_suffixes) {
-if (i->dev) {
-d = qdev_get_own_fw_dev_path_from_handler(i->dev->parent_bus,
-  i->dev);
-if (d) {
-assert(!i->suffix);
-suffix = d;
-} else {
-suffix = g_strdup(i->suffix);
-}
-} else {
-suffix = g_strdup(i->suffix);
-}
-}
-
-bootpath = g_strdup_printf("%s%s",
-   devpath ? devpath : "",
-   suffix ? suffix : "");
-g_free(devpath);
-g_free(suffix);
+bootpath = get_boot_device_path(i->dev, ignore_suffixes, i->suffix);
 
 if (total) {
 list[total-1] = '\n';
-- 
2.23.0.187.g17f5b7556c-goog




[Qemu-devel] [PATCH v6 2/8] block: Support providing LCHS from user

2019-08-27 Thread Sam Eiderman via Qemu-devel
From: Sam Eiderman 

Add logical geometry variables to BlockConf.

A user can now supply "lcyls", "lheads" & "lsecs" for any HD device
that supports CHS ("cyls", "heads", "secs").

These devices include:
* ide-hd
* scsi-hd
* virtio-blk-pci

In future commits we will use the provided LCHS and pass it to the BIOS
through fw_cfg to be supplied using INT13 routines.

Reviewed-by: Karl Heubaum 
Reviewed-by: Arbel Moshe 
Signed-off-by: Sam Eiderman 
---
 include/hw/block/block.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index fd55a30bca..d7246f3862 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -26,6 +26,7 @@ typedef struct BlockConf {
 uint32_t discard_granularity;
 /* geometry, not all devices use this */
 uint32_t cyls, heads, secs;
+uint32_t lcyls, lheads, lsecs;
 OnOffAuto wce;
 bool share_rw;
 BlockdevOnError rerror;
@@ -65,7 +66,10 @@ static inline unsigned int get_physical_block_exp(BlockConf 
*conf)
 #define DEFINE_BLOCK_CHS_PROPERTIES(_state, _conf)  \
 DEFINE_PROP_UINT32("cyls", _state, _conf.cyls, 0),  \
 DEFINE_PROP_UINT32("heads", _state, _conf.heads, 0),\
-DEFINE_PROP_UINT32("secs", _state, _conf.secs, 0)
+DEFINE_PROP_UINT32("secs", _state, _conf.secs, 0),  \
+DEFINE_PROP_UINT32("lcyls", _state, _conf.lcyls, 0),\
+DEFINE_PROP_UINT32("lheads", _state, _conf.lheads, 0),  \
+DEFINE_PROP_UINT32("lsecs", _state, _conf.lsecs, 0)
 
 #define DEFINE_BLOCK_ERROR_PROPERTIES(_state, _conf)\
 DEFINE_PROP_BLOCKDEV_ON_ERROR("rerror", _state, _conf.rerror,   \
-- 
2.23.0.187.g17f5b7556c-goog




[Qemu-devel] [PATCH v6 8/8] hd-geo-test: Add tests for lchs override

2019-08-27 Thread Sam Eiderman via Qemu-devel
From: Sam Eiderman 

Add QTest tests to check the logical geometry override option.

The tests in hd-geo-test are out of date - they only test IDE and do not
test interesting MBRs.

I added a few helper functions which will make adding more tests easier.

QTest's fw_cfg helper functions support only legacy fw_cfg, so I had to
read the new fw_cfg layout on my own.

Creating qcow2 disks with specific size and MBR layout is currently
unused - we only use a default empty MBR.

Reviewed-by: Karl Heubaum 
Reviewed-by: Arbel Moshe 
Signed-off-by: Sam Eiderman 
---
 tests/Makefile.include |   2 +-
 tests/hd-geo-test.c| 582 +
 2 files changed, 583 insertions(+), 1 deletion(-)

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 39bed753b3..bd385e2150 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -781,7 +781,7 @@ tests/ide-test$(EXESUF): tests/ide-test.o $(libqos-pc-obj-y)
 tests/ahci-test$(EXESUF): tests/ahci-test.o $(libqos-pc-obj-y) 
qemu-img$(EXESUF)
 tests/ipmi-kcs-test$(EXESUF): tests/ipmi-kcs-test.o
 tests/ipmi-bt-test$(EXESUF): tests/ipmi-bt-test.o
-tests/hd-geo-test$(EXESUF): tests/hd-geo-test.o
+tests/hd-geo-test$(EXESUF): tests/hd-geo-test.o $(libqos-obj-y)
 tests/boot-order-test$(EXESUF): tests/boot-order-test.o $(libqos-obj-y)
 tests/boot-serial-test$(EXESUF): tests/boot-serial-test.o $(libqos-obj-y)
 tests/bios-tables-test$(EXESUF): tests/bios-tables-test.o \
diff --git a/tests/hd-geo-test.c b/tests/hd-geo-test.c
index 62eb624726..002f5c4a43 100644
--- a/tests/hd-geo-test.c
+++ b/tests/hd-geo-test.c
@@ -17,7 +17,12 @@
 
 #include "qemu/osdep.h"
 #include "qemu-common.h"
+#include "qemu/bswap.h"
+#include "qapi/qmp/qlist.h"
 #include "libqtest.h"
+#include "libqos/fw_cfg.h"
+#include "libqos/libqos.h"
+#include "standard-headers/linux/qemu_fw_cfg.h"
 
 #define ARGV_SIZE 256
 
@@ -388,6 +393,568 @@ static void test_ide_drive_cd_0(void)
 qtest_quit(qts);
 }
 
+typedef struct {
+bool active;
+uint32_t head;
+uint32_t sector;
+uint32_t cyl;
+uint32_t end_head;
+uint32_t end_sector;
+uint32_t end_cyl;
+uint32_t start_sect;
+uint32_t nr_sects;
+} MBRpartitions[4];
+
+static MBRpartitions empty_mbr = { {false, 0, 0, 0, 0, 0, 0, 0, 0},
+   {false, 0, 0, 0, 0, 0, 0, 0, 0},
+   {false, 0, 0, 0, 0, 0, 0, 0, 0},
+   {false, 0, 0, 0, 0, 0, 0, 0, 0} };
+
+static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
+{
+const char *template = "/tmp/qtest.XX";
+char *raw_path = strdup(template);
+char *qcow2_path = strdup(template);
+char cmd[100 + 2 * PATH_MAX];
+uint8_t buf[512];
+int i, ret, fd, offset;
+uint64_t qcow2_size = sectors * 512;
+uint8_t status, parttype, head, sector, cyl;
+char *qemu_img_path;
+char *qemu_img_abs_path;
+
+offset = 0xbe;
+
+for (i = 0; i < 4; i++) {
+status = mbr[i].active ? 0x80 : 0x00;
+g_assert(mbr[i].head < 256);
+g_assert(mbr[i].sector < 64);
+g_assert(mbr[i].cyl < 1024);
+head = mbr[i].head;
+sector = mbr[i].sector + ((mbr[i].cyl & 0x300) >> 2);
+cyl = mbr[i].cyl & 0xff;
+
+buf[offset + 0x0] = status;
+buf[offset + 0x1] = head;
+buf[offset + 0x2] = sector;
+buf[offset + 0x3] = cyl;
+
+parttype = 0;
+g_assert(mbr[i].end_head < 256);
+g_assert(mbr[i].end_sector < 64);
+g_assert(mbr[i].end_cyl < 1024);
+head = mbr[i].end_head;
+sector = mbr[i].end_sector + ((mbr[i].end_cyl & 0x300) >> 2);
+cyl = mbr[i].end_cyl & 0xff;
+
+buf[offset + 0x4] = parttype;
+buf[offset + 0x5] = head;
+buf[offset + 0x6] = sector;
+buf[offset + 0x7] = cyl;
+
+(*(uint32_t *)&buf[offset + 0x8]) = cpu_to_le32(mbr[i].start_sect);
+(*(uint32_t *)&buf[offset + 0xc]) = cpu_to_le32(mbr[i].nr_sects);
+
+offset += 0x10;
+}
+
+fd = mkstemp(raw_path);
+g_assert(fd);
+close(fd);
+
+fd = open(raw_path, O_WRONLY);
+g_assert(fd >= 0);
+ret = write(fd, buf, sizeof(buf));
+g_assert(ret == sizeof(buf));
+close(fd);
+
+fd = mkstemp(qcow2_path);
+g_assert(fd);
+close(fd);
+
+qemu_img_path = getenv("QTEST_QEMU_IMG");
+g_assert(qemu_img_path);
+qemu_img_abs_path = realpath(qemu_img_path, NULL);
+g_assert(qemu_img_abs_path);
+
+ret = snprintf(cmd, sizeof(cmd),
+   "%s convert -f raw -O qcow2 %s %s > /dev/null",
+   qemu_img_abs_path,
+   raw_path, qcow2_path);
+g_assert((0 < ret) && (ret <= sizeof(cmd)));
+ret = system(cmd);
+g_assert(

[Qemu-devel] [PATCH v6 4/8] scsi: Propagate unrealize() callback to scsi-hd

2019-08-27 Thread Sam Eiderman via Qemu-devel
From: Sam Eiderman 

We will need to add LCHS removal logic to scsi-hd's unrealize() in the
next commit.

Signed-off-by: Sam Eiderman 
Reviewed-by: Karl Heubaum 
Reviewed-by: Arbel Moshe 
Signed-off-by: Sam Eiderman 
---
 hw/scsi/scsi-bus.c | 16 
 include/hw/scsi/scsi.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index bccb7cc4c6..359d50d6d0 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -59,6 +59,14 @@ static void scsi_device_realize(SCSIDevice *s, Error **errp)
 }
 }
 
+static void scsi_device_unrealize(SCSIDevice *s, Error **errp)
+{
+SCSIDeviceClass *sc = SCSI_DEVICE_GET_CLASS(s);
+if (sc->unrealize) {
+sc->unrealize(s, errp);
+}
+}
+
 int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf,
void *hba_private)
 {
@@ -217,12 +225,20 @@ static void scsi_qdev_realize(DeviceState *qdev, Error 
**errp)
 static void scsi_qdev_unrealize(DeviceState *qdev, Error **errp)
 {
 SCSIDevice *dev = SCSI_DEVICE(qdev);
+Error *local_err = NULL;
 
 if (dev->vmsentry) {
 qemu_del_vm_change_state_handler(dev->vmsentry);
 }
 
 scsi_device_purge_requests(dev, SENSE_CODE(NO_SENSE));
+
+scsi_device_unrealize(dev, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
 blockdev_mark_auto_del(dev->conf.blk);
 }
 
diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index d77a92361b..332ef602f4 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -59,6 +59,7 @@ struct SCSIRequest {
 typedef struct SCSIDeviceClass {
 DeviceClass parent_class;
 void (*realize)(SCSIDevice *dev, Error **errp);
+void (*unrealize)(SCSIDevice *dev, Error **errp);
 int (*parse_cdb)(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf,
  void *hba_private);
 SCSIRequest *(*alloc_req)(SCSIDevice *s, uint32_t tag, uint32_t lun,
-- 
2.23.0.187.g17f5b7556c-goog




[Qemu-devel] [PATCH v6 1/8] block: Refactor macros - fix tabbing

2019-08-27 Thread Sam Eiderman via Qemu-devel
From: Sam Eiderman 

Fixing tabbing in block related macros.

Reviewed-by: Karl Heubaum 
Reviewed-by: Arbel Moshe 
Signed-off-by: Sam Eiderman 
---
 hw/ide/qdev.c|  2 +-
 include/hw/block/block.h | 16 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index 6fba6b62b8..6dd219944f 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -290,7 +290,7 @@ static void ide_drive_realize(IDEDevice *dev, Error **errp)
 DEFINE_BLOCK_PROPERTIES(IDEDrive, dev.conf),\
 DEFINE_BLOCK_ERROR_PROPERTIES(IDEDrive, dev.conf),  \
 DEFINE_PROP_STRING("ver",  IDEDrive, dev.version),  \
-DEFINE_PROP_UINT64("wwn",  IDEDrive, dev.wwn, 0),\
+DEFINE_PROP_UINT64("wwn",  IDEDrive, dev.wwn, 0),   \
 DEFINE_PROP_STRING("serial",  IDEDrive, dev.serial),\
 DEFINE_PROP_STRING("model", IDEDrive, dev.model)
 
diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index 607539057a..fd55a30bca 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -50,21 +50,21 @@ static inline unsigned int get_physical_block_exp(BlockConf 
*conf)
   _conf.logical_block_size),\
 DEFINE_PROP_BLOCKSIZE("physical_block_size", _state,\
   _conf.physical_block_size),   \
-DEFINE_PROP_UINT16("min_io_size", _state, _conf.min_io_size, 0),  \
+DEFINE_PROP_UINT16("min_io_size", _state, _conf.min_io_size, 0),\
 DEFINE_PROP_UINT32("opt_io_size", _state, _conf.opt_io_size, 0),\
-DEFINE_PROP_UINT32("discard_granularity", _state, \
-   _conf.discard_granularity, -1), \
-DEFINE_PROP_ON_OFF_AUTO("write-cache", _state, _conf.wce, \
-ON_OFF_AUTO_AUTO), \
+DEFINE_PROP_UINT32("discard_granularity", _state,   \
+   _conf.discard_granularity, -1),  \
+DEFINE_PROP_ON_OFF_AUTO("write-cache", _state, _conf.wce,   \
+ON_OFF_AUTO_AUTO),  \
 DEFINE_PROP_BOOL("share-rw", _state, _conf.share_rw, false)
 
 #define DEFINE_BLOCK_PROPERTIES(_state, _conf)  \
 DEFINE_PROP_DRIVE("drive", _state, _conf.blk),  \
 DEFINE_BLOCK_PROPERTIES_BASE(_state, _conf)
 
-#define DEFINE_BLOCK_CHS_PROPERTIES(_state, _conf)  \
-DEFINE_PROP_UINT32("cyls", _state, _conf.cyls, 0),  \
-DEFINE_PROP_UINT32("heads", _state, _conf.heads, 0), \
+#define DEFINE_BLOCK_CHS_PROPERTIES(_state, _conf)  \
+DEFINE_PROP_UINT32("cyls", _state, _conf.cyls, 0),  \
+DEFINE_PROP_UINT32("heads", _state, _conf.heads, 0),\
 DEFINE_PROP_UINT32("secs", _state, _conf.secs, 0)
 
 #define DEFINE_BLOCK_ERROR_PROPERTIES(_state, _conf)\
-- 
2.23.0.187.g17f5b7556c-goog




[Qemu-devel] [PATCH v6 3/8] bootdevice: Add interface to gather LCHS

2019-08-27 Thread Sam Eiderman via Qemu-devel
From: Sam Eiderman 

Add an interface to provide direct logical CHS values for boot devices.
We will use this interface in the next commits.

Reviewed-by: Karl Heubaum 
Reviewed-by: Arbel Moshe 
Signed-off-by: Sam Eiderman 
---
 bootdevice.c| 55 +
 include/sysemu/sysemu.h |  3 +++
 2 files changed, 58 insertions(+)

diff --git a/bootdevice.c b/bootdevice.c
index 1d225202f9..bc5e1c2de4 100644
--- a/bootdevice.c
+++ b/bootdevice.c
@@ -343,3 +343,58 @@ void device_add_bootindex_property(Object *obj, int32_t 
*bootindex,
 /* initialize devices' bootindex property to -1 */
 object_property_set_int(obj, -1, name, NULL);
 }
+
+typedef struct FWLCHSEntry FWLCHSEntry;
+
+struct FWLCHSEntry {
+QTAILQ_ENTRY(FWLCHSEntry) link;
+DeviceState *dev;
+char *suffix;
+uint32_t lcyls;
+uint32_t lheads;
+uint32_t lsecs;
+};
+
+static QTAILQ_HEAD(, FWLCHSEntry) fw_lchs =
+QTAILQ_HEAD_INITIALIZER(fw_lchs);
+
+void add_boot_device_lchs(DeviceState *dev, const char *suffix,
+  uint32_t lcyls, uint32_t lheads, uint32_t lsecs)
+{
+FWLCHSEntry *node;
+
+if (!lcyls && !lheads && !lsecs) {
+return;
+}
+
+assert(dev != NULL || suffix != NULL);
+
+node = g_malloc0(sizeof(FWLCHSEntry));
+node->suffix = g_strdup(suffix);
+node->dev = dev;
+node->lcyls = lcyls;
+node->lheads = lheads;
+node->lsecs = lsecs;
+
+QTAILQ_INSERT_TAIL(&fw_lchs, node, link);
+}
+
+void del_boot_device_lchs(DeviceState *dev, const char *suffix)
+{
+FWLCHSEntry *i;
+
+if (dev == NULL) {
+return;
+}
+
+QTAILQ_FOREACH(i, &fw_lchs, link) {
+if ((!suffix || !g_strcmp0(i->suffix, suffix)) &&
+ i->dev == dev) {
+QTAILQ_REMOVE(&fw_lchs, i, link);
+g_free(i->suffix);
+g_free(i);
+
+break;
+}
+}
+}
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index d2c38f611a..1a33f25a5a 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -105,6 +105,9 @@ void device_add_bootindex_property(Object *obj, int32_t 
*bootindex,
DeviceState *dev, Error **errp);
 void restore_boot_order(void *opaque);
 void validate_bootdevices(const char *devices, Error **errp);
+void add_boot_device_lchs(DeviceState *dev, const char *suffix,
+  uint32_t lcyls, uint32_t lheads, uint32_t lsecs);
+void del_boot_device_lchs(DeviceState *dev, const char *suffix);
 
 /* handler to set the boot_device order for a specific type of MachineClass */
 typedef void QEMUBootSetHandler(void *opaque, const char *boot_order,
-- 
2.23.0.187.g17f5b7556c-goog




[Qemu-devel] [PATCH v6 7/8] bootdevice: FW_CFG interface for LCHS values

2019-08-27 Thread Sam Eiderman via Qemu-devel
From: Sam Eiderman 

Using fw_cfg, supply logical CHS values directly from QEMU to the BIOS.

Non-standard logical geometries break under QEMU.

A virtual disk which contains an operating system which depends on
logical geometries (consistent values being reported from BIOS INT13
AH=08) will most likely break under QEMU/SeaBIOS if it has non-standard
logical geometries - for example 56 SPT (sectors per track).
No matter what QEMU will report - SeaBIOS, for large enough disks - will
use LBA translation, which will report 63 SPT instead.

In addition we cannot force SeaBIOS to rely on physical geometries at
all. A virtio-blk-pci virtual disk with 255 phyiscal heads cannot
report more than 16 physical heads when moved to an IDE controller,
since the ATA spec allows a maximum of 16 heads - this is an artifact of
virtualization.

By supplying the logical geometries directly we are able to support such
"exotic" disks.

We serialize this information in a similar way to the "bootorder"
interface.
The new fw_cfg entry is "bios-geometry".

Reviewed-by: Karl Heubaum 
Reviewed-by: Arbel Moshe 
Signed-off-by: Sam Eiderman 
---
 bootdevice.c| 32 
 hw/nvram/fw_cfg.c   | 14 +++---
 include/sysemu/sysemu.h |  1 +
 3 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/bootdevice.c b/bootdevice.c
index 2b12fb85a4..b034ad7bdc 100644
--- a/bootdevice.c
+++ b/bootdevice.c
@@ -405,3 +405,35 @@ void del_boot_device_lchs(DeviceState *dev, const char 
*suffix)
 }
 }
 }
+
+/* Serialized as: (device name\0 + lchs struct) x devices */
+char *get_boot_devices_lchs_list(size_t *size)
+{
+FWLCHSEntry *i;
+size_t total = 0;
+char *list = NULL;
+
+QTAILQ_FOREACH(i, &fw_lchs, link) {
+char *bootpath;
+char *chs_string;
+size_t len;
+
+bootpath = get_boot_device_path(i->dev, false, i->suffix);
+chs_string = g_strdup_printf("%s %" PRIu32 " %" PRIu32 " %" PRIu32,
+ bootpath, i->lcyls, i->lheads, i->lsecs);
+
+if (total) {
+list[total - 1] = '\n';
+}
+len = strlen(chs_string) + 1;
+list = g_realloc(list, total + len);
+memcpy(&list[total], chs_string, len);
+total += len;
+g_free(chs_string);
+g_free(bootpath);
+}
+
+*size = total;
+
+return list;
+}
diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index 7dc3ac378e..18aff658c0 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -920,13 +920,21 @@ void *fw_cfg_modify_file(FWCfgState *s, const char 
*filename,
 
 static void fw_cfg_machine_reset(void *opaque)
 {
+MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
+FWCfgState *s = opaque;
 void *ptr;
 size_t len;
-FWCfgState *s = opaque;
-char *bootindex = get_boot_devices_list(&len);
+char *buf;
 
-ptr = fw_cfg_modify_file(s, "bootorder", (uint8_t *)bootindex, len);
+buf = get_boot_devices_list(&len);
+ptr = fw_cfg_modify_file(s, "bootorder", (uint8_t *)buf, len);
 g_free(ptr);
+
+if (!mc->legacy_fw_cfg_order) {
+buf = get_boot_devices_lchs_list(&len);
+ptr = fw_cfg_modify_file(s, "bios-geometry", (uint8_t *)buf, len);
+g_free(ptr);
+}
 }
 
 static void fw_cfg_machine_ready(struct Notifier *n, void *data)
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 1a33f25a5a..150fe8c0e2 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -108,6 +108,7 @@ void validate_bootdevices(const char *devices, Error 
**errp);
 void add_boot_device_lchs(DeviceState *dev, const char *suffix,
   uint32_t lcyls, uint32_t lheads, uint32_t lsecs);
 void del_boot_device_lchs(DeviceState *dev, const char *suffix);
+char *get_boot_devices_lchs_list(size_t *size);
 
 /* handler to set the boot_device order for a specific type of MachineClass */
 typedef void QEMUBootSetHandler(void *opaque, const char *boot_order,
-- 
2.23.0.187.g17f5b7556c-goog




[Qemu-devel] [PATCH v6 5/8] bootdevice: Gather LCHS from all relevant devices

2019-08-27 Thread Sam Eiderman via Qemu-devel
From: Sam Eiderman 

Relevant devices are:
* ide-hd (and ide-cd, ide-drive)
* scsi-hd (and scsi-cd, scsi-disk, scsi-block)
* virtio-blk-pci

We do not call del_boot_device_lchs() for ide-* since we don't need to -
IDE block devices do not support unplugging.

Signed-off-by: Sam Eiderman 
Reviewed-by: Karl Heubaum 
Reviewed-by: Arbel Moshe 
Signed-off-by: Sam Eiderman 
---
 hw/block/virtio-blk.c |  6 ++
 hw/ide/qdev.c |  5 +
 hw/scsi/scsi-disk.c   | 12 
 3 files changed, 23 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 18851601cb..6d8ff34a16 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1186,6 +1186,11 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 blk_set_guest_block_size(s->blk, s->conf.conf.logical_block_size);
 
 blk_iostatus_enable(s->blk);
+
+add_boot_device_lchs(dev, "/disk@0,0",
+ conf->conf.lcyls,
+ conf->conf.lheads,
+ conf->conf.lsecs);
 }
 
 static void virtio_blk_device_unrealize(DeviceState *dev, Error **errp)
@@ -1193,6 +1198,7 @@ static void virtio_blk_device_unrealize(DeviceState *dev, 
Error **errp)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIOBlock *s = VIRTIO_BLK(dev);
 
+del_boot_device_lchs(dev, "/disk@0,0");
 virtio_blk_data_plane_destroy(s->dataplane);
 s->dataplane = NULL;
 qemu_del_vm_change_state_handler(s->change);
diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index 6dd219944f..2ffd387a73 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -220,6 +220,11 @@ static void ide_dev_initfn(IDEDevice *dev, IDEDriveKind 
kind, Error **errp)
 
 add_boot_device_path(dev->conf.bootindex, &dev->qdev,
  dev->unit ? "/disk@1" : "/disk@0");
+
+add_boot_device_lchs(&dev->qdev, dev->unit ? "/disk@1" : "/disk@0",
+ dev->conf.lcyls,
+ dev->conf.lheads,
+ dev->conf.lsecs);
 }
 
 static void ide_dev_get_bootindex(Object *obj, Visitor *v, const char *name,
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 915641a0f1..d19896fe4d 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -35,6 +35,7 @@
 #include "hw/block/block.h"
 #include "hw/qdev-properties.h"
 #include "sysemu/dma.h"
+#include "sysemu/sysemu.h"
 #include "qemu/cutils.h"
 #include "trace.h"
 
@@ -2402,6 +2403,16 @@ static void scsi_realize(SCSIDevice *dev, Error **errp)
 blk_set_guest_block_size(s->qdev.conf.blk, s->qdev.blocksize);
 
 blk_iostatus_enable(s->qdev.conf.blk);
+
+add_boot_device_lchs(&dev->qdev, NULL,
+ dev->conf.lcyls,
+ dev->conf.lheads,
+ dev->conf.lsecs);
+}
+
+static void scsi_unrealize(SCSIDevice *dev, Error **errp)
+{
+del_boot_device_lchs(&dev->qdev, NULL);
 }
 
 static void scsi_hd_realize(SCSIDevice *dev, Error **errp)
@@ -3006,6 +3017,7 @@ static void scsi_hd_class_initfn(ObjectClass *klass, void 
*data)
 SCSIDeviceClass *sc = SCSI_DEVICE_CLASS(klass);
 
 sc->realize  = scsi_hd_realize;
+sc->unrealize= scsi_unrealize;
 sc->alloc_req= scsi_new_request;
 sc->unit_attention_reported = scsi_disk_unit_attention_reported;
 dc->desc = "virtual SCSI disk";
-- 
2.23.0.187.g17f5b7556c-goog




[Qemu-devel] [PATCH v6 0/8] Add Qemu to SeaBIOS LCHS interface

2019-08-27 Thread Sam Eiderman via Qemu-devel
v1:

Non-standard logical geometries break under QEMU.

A virtual disk which contains an operating system which depends on
logical geometries (consistent values being reported from BIOS INT13
AH=08) will most likely break under QEMU/SeaBIOS if it has non-standard
logical geometries - for example 56 SPT (sectors per track).
No matter what QEMU will guess - SeaBIOS, for large enough disks - will
use LBA translation, which will report 63 SPT instead.

In addition we can not enforce SeaBIOS to rely on phyiscal geometries at
all. A virtio-blk-pci virtual disk with 255 phyiscal heads can not
report more than 16 physical heads when moved to an IDE controller, the
ATA spec allows a maximum of 16 heads - this is an artifact of
virtualization.

By supplying the logical geometies directly we are able to support such
"exotic" disks.

We will use fw_cfg to do just that.

v2:

Fix missing parenthesis check in
"hd-geo-test: Add tests for lchs override"

v3:

* Rename fw_cfg key to "bios-geometry".
* Remove "extendible" interface.
* Add cpu_to_le32 fix as Laszlo suggested or big endian hosts
* Fix last qtest commit - automatic docker tester for some reason does not have 
qemu-img set

v4:

* Change fw_cfg interface from mixed textual/binary to textual only

v5:

* Fix line > 80 chars in tests/hd-geo-test.c

v6:

* Small fixes for issues pointed by Max
* (&conf->conf)->lcyls to conf->conf.lcyls and so on
* Remove scsi_unrealize from everything other than scsi-hd
* Add proper include to sysemu.h
* scsi_device_unrealize() after scsi_device_purge_requests()

Sam Eiderman (8):
  block: Refactor macros - fix tabbing
  block: Support providing LCHS from user
  bootdevice: Add interface to gather LCHS
  scsi: Propagate unrealize() callback to scsi-hd
  bootdevice: Gather LCHS from all relevant devices
  bootdevice: Refactor get_boot_devices_list
  bootdevice: FW_CFG interface for LCHS values
  hd-geo-test: Add tests for lchs override

 bootdevice.c | 148 --
 hw/block/virtio-blk.c|   6 +
 hw/ide/qdev.c|   7 +-
 hw/nvram/fw_cfg.c|  14 +-
 hw/scsi/scsi-bus.c   |  16 ++
 hw/scsi/scsi-disk.c  |  12 +
 include/hw/block/block.h |  22 +-
 include/hw/scsi/scsi.h   |   1 +
 include/sysemu/sysemu.h  |   4 +
 tests/Makefile.include   |   2 +-
 tests/hd-geo-test.c  | 582 +++
 11 files changed, 773 insertions(+), 41 deletions(-)

-- 
2.23.0.187.g17f5b7556c-goog




Re: [Qemu-devel] patch to swap SIGRTMIN + 1 and SIGRTMAX - 1

2019-08-26 Thread Josh Kunz via Qemu-devel
On Wed, Aug 21, 2019 at 2:28 AM Laurent Vivier  wrote:

> Le 19/08/2019 à 23:46, Josh Kunz via Qemu-devel a écrit :
> > Hi all,
> >
> > I have also experienced issues with SIGRTMIN + 1, and am interested in
> > moving this patch forwards. Anything I can do here to help? Would the
> > maintainers prefer myself or Marli re-submit the patch?
> >
> > The Go issue here seems particularly sticky. Even if we update the Go
> > runtime, users may try and run older binaries built with older versions
> of
> > Go for quite some time (months? years?). Would it be better to hide this
> > behind some kind of build-time flag (`--enable-sigrtmin-plus-one-proxy`
> or
> > something), so that some users can opt-in, but older binaries still work
> as
> > expected?
> >
> > Also, here is a link to the original thread this message is in reply to
> > in-case my mail-client doesn't set up the reply properly:
> > https://lists.nongnu.org/archive/html/qemu-devel/2019-07/msg01303.html
>
> The problem here is we break something to fix something else.
>
> I'm wondering if the series from Aleksandar Markovic, "linux-user:
> Support signal passing for targets having more signals than host" [1]
> can fix the problem in a better way?
>

That patch[1] (which I'll refer to as the MUX patch to avoid confusion)
does not directly fix the issue addressed by this patch (re-wiring
SIGRTMIN+1), but since it basically implements generic signal multiplexing,
it could be re-worked to address this case as well. The way it handles
`si_code` spooks me a little bit. It could easily be broken by a kernel
version change, and such a breakage could be hard to detect or lead to
surprising results. Other than that, it looks like a reasonable
implementation.

That said, overall, fixing the SIGRTMIN+1 issue using a more-generic
signal-multiplexing mechanism doesn't seem *that* much better to me. It
adds a lot of complexity, and only saves a single signal (assuming glibc
doesn't add more reserved signals). The "big win" is additional emulation
features, like those introduced in MUX patch (being able to utilize signals
outside of the host range). If having those features in QEMU warrants the
additional complexity, then re-working this patch on-top of that
infrastructure seems like a good idea.

If the maintainers want to go down that route, then I would be happy to
re-work this patch utilizing the infrastructure from the MUX patch.
Unfortunately it will require non-trivial changes, so it may be best to
wait until that patch is merged. I could also provide a patch "on top of"
the MUX patch, if that's desired/more convenient.

Just one last note, if you do decide to merge the MUX patch, then it would
be best to use SIGRTMAX (instead of SIGRTMAX-1) as the multiplexing signal
if possible, to avoid breaking go binaries.

Thanks again for taking a look at this issue.

Cheers,
Josh Kunz

[1] http://patchwork.ozlabs.org/cover/1103565/


Re: [Qemu-devel] [QEMU] [PATCH v5 5/8] bootdevice: Gather LCHS from all relevant devices

2019-08-25 Thread Sam Eiderman via Qemu-devel
> Only scsi-hd has the lchs properties, though, so what’s the purpose of
> defining the unrealize function for all other classes?
>
> Max

- shmuel.eider...@oracle.com
+ sam...@google.com

The only purpose is to already have them mapped to the correct existing
function, in case it will be used later on.
I can resubmit without the unrealize for the other classes, WDYT?

Sam




Re: [Qemu-devel] [QEMU] [PATCH v5 4/8] scsi: Propagate unrealize() callback to scsi-hd

2019-08-25 Thread Sam Eiderman via Qemu-devel
> @@ -213,11 +221,18 @@ static void scsi_qdev_realize(DeviceState *qdev, Error 
> **errp)
>  static void scsi_qdev_unrealize(DeviceState *qdev, Error **errp)
>  {
>  SCSIDevice *dev = SCSI_DEVICE(qdev);
> +Error *local_err = NULL;
>
>  if (dev->vmsentry) {
>  qemu_del_vm_change_state_handler(dev->vmsentry);
>  }
>
> +scsi_device_unrealize(dev, &local_err);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +return;
> +}
> +
>  scsi_device_purge_requests(dev, SENSE_CODE(NO_SENSE));

(I see this code for the first time, but) I suppose I’d put the
scsi_device_unrealize() after scsi_device_purge_requests().

Max

>  blockdev_mark_auto_del(dev->conf.blk);
>  }

- shmuel.eider...@oracle.com
+ sam...@google.com

Sure, I'll resubmit

Sam




Re: [Qemu-devel] [Qemu-block] [PATCH 1/2] vhost-user-blk: prevent using uninitialized vqs

2019-08-22 Thread yuchenlin via Qemu-devel
Raphael Norwitz  於 2019-08-23 04:16 寫道: > > Same 
rational as: e6cc11d64fc998c11a4dfcde8fda3fc33a74d844 > > Of the 3 virtqueues, 
seabios only sets cmd, leaving ctrl > and event without a physical address. 
This can cause > vhost_verify_ring_part_mapping to return ENOMEM, causing > the 
following logs: > > qemu-system-x86_64: Unable to map available ring for ring 0 
> qemu-system-x86_64: Verify ring failure on region 0 > > This has already been 
fixed for vhost scsi devices and was > recently vhost-user scsi devices. This 
commit fixes it for > vhost-user-blk devices. > > Suggested-by: Phillippe 
Mathieu-Daude  > Signed-off-by: Raphael Norwitz 
 Reviewed-by: yuchenlin  
Thanks. > > > --- > hw/block/vhost-user-blk.c | 2 +- > 1 file changed, 1 
insertion(+), 1 deletion(-) > > diff --git a/hw/block/vhost-user-blk.c 
b/hw/block/vhost-user-blk.c > index 0b8c5df..63da9bb 100644 > --- 
a/hw/block/vhost-user-blk.c > +++ b/hw/block/vhost-user-blk.c > @@ -421,7 
+421,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error 
**errp) > } > > s->inflight = g_new0(struct vhost_inflight, 1); > - s->vqs = 
g_new(struct vhost_virtqueue, s->num_queues); > + s->vqs = g_new0(struct 
vhost_virtqueue, s->num_queues); > s->watch = 0; > s->connected = false; > > -- 
> 1.9.4 > >


[Qemu-devel] [PATCH 1/2] linux-user: add missing UDP and IPv6 setsockopt options

2019-08-22 Thread Shu-Chun Weng via Qemu-devel
UDP: SOL_UDP manipulate options at UDP level. All six options currently
defined in linux source include/uapi/linux/udp.h take integer values.

IPv6: IPV6_ADDR_PREFERENCES (RFC5014: Source address selection) was not
supported.

Signed-off-by: Shu-Chun Weng 
---
 linux-user/syscall.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8367cb138d..8dc4255f12 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -49,8 +49,10 @@
 #include 
 #include 
 //#include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1837,7 +1839,8 @@ static abi_long do_setsockopt(int sockfd, int level, int 
optname,
 
 switch(level) {
 case SOL_TCP:
-/* TCP options all take an 'int' value.  */
+case SOL_UDP:
+/* TCP and UDP options all take an 'int' value.  */
 if (optlen < sizeof(uint32_t))
 return -TARGET_EINVAL;
 
@@ -2488,6 +2491,7 @@ static abi_long do_getsockopt(int sockfd, int level, int 
optname,
 case IPV6_RECVDSTOPTS:
 case IPV6_2292DSTOPTS:
 case IPV6_TCLASS:
+case IPV6_ADDR_PREFERENCES:
 #ifdef IPV6_RECVPATHMTU
 case IPV6_RECVPATHMTU:
 #endif
-- 
2.23.0.187.g17f5b7556c-goog




[Qemu-devel] [PATCH 2/2] linux-user: time stamping options for setsockopt()

2019-08-22 Thread Shu-Chun Weng via Qemu-devel
This change supports SO_TIMESTAMPNS and SO_TIMESTAMPING for
setsocketopt() with SOL_SOCKET.

The TARGET_SO_TIMESTAMP{NS,ING} constants are already defined for
alpha, hppa, and sparc. In include/uapi/asm-generic/socket.h:

In arch/mips/include/uapi/asm/socket.h:

Signed-off-by: Shu-Chun Weng 
---
 linux-user/generic/sockbits.h |  4 
 linux-user/mips/sockbits.h|  4 
 linux-user/syscall.c  | 10 --
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/linux-user/generic/sockbits.h b/linux-user/generic/sockbits.h
index e44733c601..5cbafdb49b 100644
--- a/linux-user/generic/sockbits.h
+++ b/linux-user/generic/sockbits.h
@@ -51,6 +51,10 @@
 #define TARGET_SO_PEERNAME 28
 #define TARGET_SO_TIMESTAMP29
 #define TARGET_SCM_TIMESTAMP   TARGET_SO_TIMESTAMP
+#define TARGET_SO_TIMESTAMPNS  35
+#define TARGET_SCM_TIMESTAMPNS TARGET_SO_TIMESTAMPNS
+#define TARGET_SO_TIMESTAMPING 37
+#define TARGET_SCM_TIMESTAMPINGTARGET_SO_TIMESTAMPING
 
 #define TARGET_SO_ACCEPTCONN   30
 
diff --git a/linux-user/mips/sockbits.h b/linux-user/mips/sockbits.h
index 0f022cd598..1246b7d988 100644
--- a/linux-user/mips/sockbits.h
+++ b/linux-user/mips/sockbits.h
@@ -63,6 +63,10 @@
 #define TARGET_SO_PEERNAME 28
 #define TARGET_SO_TIMESTAMP29
 #define SCM_TIMESTAMP  SO_TIMESTAMP
+#define TARGET_SO_TIMESTAMPNS  35
+#define SCM_TIMESTAMPNS SO_TIMESTAMPNS
+#define TARGET_SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPINGSO_TIMESTAMPING
 
 #define TARGET_SO_PEERSEC  30
 #define TARGET_SO_SNDBUFFORCE  31
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8dc4255f12..bac00d3fd4 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -2230,8 +2230,14 @@ set_timeout:
 optname = SO_PASSSEC;
 break;
 case TARGET_SO_TIMESTAMP:
-   optname = SO_TIMESTAMP;
-   break;
+optname = SO_TIMESTAMP;
+break;
+case TARGET_SO_TIMESTAMPNS:
+optname = SO_TIMESTAMPNS;
+break;
+case TARGET_SO_TIMESTAMPING:
+optname = SO_TIMESTAMPING;
+break;
 case TARGET_SO_RCVLOWAT:
optname = SO_RCVLOWAT;
break;
-- 
2.23.0.187.g17f5b7556c-goog




[Qemu-devel] [PATCH 0/2] Adding some setsockopt() options

2019-08-22 Thread Shu-Chun Weng via Qemu-devel
Shu-Chun Weng (2):
  linux-user: add missing UDP and IPv6 setsockopt options
  linux-user: time stamping options for setsockopt()

 linux-user/generic/sockbits.h |  4 
 linux-user/mips/sockbits.h|  4 
 linux-user/syscall.c  | 16 +---
 3 files changed, 21 insertions(+), 3 deletions(-)

-- 
2.23.0.187.g17f5b7556c-goog




[Qemu-devel] [PATCH v6 4/4] 9p: Use variable length suffixes for inode remapping

2019-08-22 Thread Christian Schoenebeck via Qemu-devel
Use variable length suffixes for inode remapping instead of the fixed
16 bit size prefixes before. With this change the inode numbers on guest
will typically be much smaller (e.g. around >2^1 .. >2^7 instead of >2^48
with the previous fixed size inode remapping.

Additionally this solution is more efficient, since inode numbers in
practice can take almost their entire 64 bit range on guest as well, so
there is less likely a need for generating and tracking additional suffixes,
which might also be beneficial for nested virtualization where each level of
virtualization would shift up the inode bits and increase the chance of
expensive remapping actions.

The "Exponential Golomb" algorithm is used as basis for generating the
variable length suffixes. The algorithm has a parameter k which controls the
distribution of bits on increasing indeces (minimum bits at low index vs.
maximum bits at high index). With k=0 the generated suffixes look like:

Index Dec/Bin -> Generated Suffix Bin
1 [1] -> [1] (1 bits)
2 [10] -> [010] (3 bits)
3 [11] -> [110] (3 bits)
4 [100] -> [00100] (5 bits)
5 [101] -> [10100] (5 bits)
6 [110] -> [01100] (5 bits)
7 [111] -> [11100] (5 bits)
8 [1000] -> [0001000] (7 bits)
9 [1001] -> [1001000] (7 bits)
10 [1010] -> [0101000] (7 bits)
11 [1011] -> [1101000] (7 bits)
12 [1100] -> [0011000] (7 bits)
...
65533 [1101] ->  [1011000] (31 bits)
65534 [1110] ->  [0111000] (31 bits)
65535 [] ->  [000] (31 bits)
Hence minBits=1 maxBits=31

And with k=5 they would look like:

Index Dec/Bin -> Generated Suffix Bin
1 [1] -> [01] (6 bits)
2 [10] -> [11] (6 bits)
3 [11] -> [010001] (6 bits)
4 [100] -> [110001] (6 bits)
5 [101] -> [001001] (6 bits)
6 [110] -> [101001] (6 bits)
7 [111] -> [011001] (6 bits)
8 [1000] -> [111001] (6 bits)
9 [1001] -> [000101] (6 bits)
10 [1010] -> [100101] (6 bits)
11 [1011] -> [010101] (6 bits)
12 [1100] -> [110101] (6 bits)
...
65533 [1101] -> [001110001000] (28 bits)
65534 [1110] -> [101110001000] (28 bits)
65535 [] -> [00001000] (28 bits)
Hence minBits=6 maxBits=28

Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p.c | 247 ---
 hw/9pfs/9p.h |  34 +++-
 2 files changed, 251 insertions(+), 30 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 728641fb7f..0359469cfa 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -26,6 +26,7 @@
 #include "migration/blocker.h"
 #include "sysemu/qtest.h"
 #include "qemu/xxhash.h"
+#include 
 
 int open_fd_hw;
 int total_open_fd;
@@ -572,6 +573,107 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu)
 P9_STAT_MODE_NAMED_PIPE |   \
 P9_STAT_MODE_SOCKET)
 
+/* Mirrors all bits of a byte. So e.g. binary 1010 would become 0101. 
*/
+static inline uint8_t mirror8bit(uint8_t byte)
+{
+return (byte * 0x0202020202ULL & 0x010884422010ULL) % 1023;
+}
+
+/* Same as mirror8bit() just for a 64 bit data type instead for a byte. */
+static inline uint64_t mirror64bit(uint64_t value)
+{
+return ((uint64_t)mirror8bit( value& 0xff) << 56) |
+   ((uint64_t)mirror8bit((value >> 8)  & 0xff) << 48) |
+   ((uint64_t)mirror8bit((value >> 16) & 0xff) << 40) |
+   ((uint64_t)mirror8bit((value >> 24) & 0xff) << 32) |
+   ((uint64_t)mirror8bit((value >> 32) & 0xff) << 24) |
+   ((uint64_t)mirror8bit((value >> 40) & 0xff) << 16) |
+   ((uint64_t)mirror8bit((value >> 48) & 0xff) << 8 ) |
+   ((uint64_t)mirror8bit((value >> 56) & 0xff)  ) ;
+}
+
+/** @brief Parameter k for the Exponential Golomb algorihm to be used.
+ *
+ * The smaller this value, the smaller the minimum bit count for the Exp.
+ * Golomb generated affixes will be (at lowest index) however for the
+ * price of having higher maximum bit count of generated affixes (at highest
+ * index). Likewise increasing this parameter yields in smaller maximum bit
+ * count for the price of having higher minimum bit count.
+ *
+ * In practice that means: a good value for k depends on the expected amount
+ * of devices to be exposed by one export. For a small amount of devices k
+ * should be small, for a large amount of devices k might be increased
+ * instead. The default of k=0 should be fine for most users though.
+ *
+ * @b IMPORTANT: In case this ever becomes a runtime parameter; the value of
+ * k should not change as long as guest is still running! Because that would
+ * cause completely different inode numbers to be genera

[Qemu-devel] [PATCH v6 0/4] 9p: Fix file ID collisions

2019-08-22 Thread Christian Schoenebeck via Qemu-devel
This is v6 of a proposed patch set for fixing file ID collisions with 9pfs.

v5->v6:

  * Rebased to https://github.com/gkurz/qemu/commits/9p-next
(SHA1 177fd3b6a8).

  * Replaced previous boolean option 'remap_inodes' by tertiary option
'multidevs=remap|forbid|warn', where 'warn' is the new/old default
behaviour for not breaking existing installations:
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg07098.html

  * Dropped incomplete fix in v9fs_do_readdir() which aimed to prevent
exposing info outside export root with '..' entry. Postponed this
fix for now for the reasons described:
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01862.html

Christian Schoenebeck (4):
  9p: Treat multiple devices on one export as an error
  9p: Added virtfs option 'multidevs=remap|forbid|warn'
  9p: stat_to_qid: implement slow path
  9p: Use variable length suffixes for inode remapping

 fsdev/file-op-9p.h  |   5 +
 fsdev/qemu-fsdev-opts.c |   7 +-
 fsdev/qemu-fsdev.c  |  11 ++
 hw/9pfs/9p.c| 488 +++++---
 hw/9pfs/9p.h|  51 +
 qemu-options.hx |  33 +++-
 vl.c|   6 +-
 7 files changed, 565 insertions(+), 36 deletions(-)

-- 
2.11.0




[Qemu-devel] [PATCH v6 1/4] 9p: Treat multiple devices on one export as an error

2019-08-22 Thread Christian Schoenebeck via Qemu-devel
The QID path should uniquely identify a file. However, the
inode of a file is currently used as the QID path, which
on its own only uniquely identifies files within a device.
Here we track the device hosting the 9pfs share, in order
to prevent security issues with QID path collisions from
other devices.

Signed-off-by: Antonios Motakis 
[CS: - Assign dev_id to export root's device already in
   v9fs_device_realize_common(), not postponed in
   stat_to_qid().
 - error_report_once() if more than one device was
   shared by export.
 - Return -ENODEV instead of -ENOSYS in stat_to_qid().
 - Fixed typo in log comment. ]
Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p.c | 69 
 hw/9pfs/9p.h |  1 +
 2 files changed, 56 insertions(+), 14 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 586a6dccba..8cc65c2c67 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -572,10 +572,18 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu)
 P9_STAT_MODE_SOCKET)
 
 /* This is the algorithm from ufs in spfs */
-static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp)
+static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID *qidp)
 {
 size_t size;
 
+if (pdu->s->dev_id != stbuf->st_dev) {
+error_report_once(
+"9p: Multiple devices detected in same VirtFS export. "
+"You must use a separate export for each device."
+);
+return -ENODEV;
+}
+
 memset(&qidp->path, 0, sizeof(qidp->path));
 size = MIN(sizeof(stbuf->st_ino), sizeof(qidp->path));
 memcpy(&qidp->path, &stbuf->st_ino, size);
@@ -587,6 +595,8 @@ static void stat_to_qid(const struct stat *stbuf, V9fsQID 
*qidp)
 if (S_ISLNK(stbuf->st_mode)) {
 qidp->type |= P9_QID_TYPE_SYMLINK;
 }
+
+return 0;
 }
 
 static int coroutine_fn fid_to_qid(V9fsPDU *pdu, V9fsFidState *fidp,
@@ -599,7 +609,10 @@ static int coroutine_fn fid_to_qid(V9fsPDU *pdu, 
V9fsFidState *fidp,
 if (err < 0) {
 return err;
 }
-stat_to_qid(&stbuf, qidp);
+err = stat_to_qid(pdu, &stbuf, qidp);
+if (err < 0) {
+return err;
+}
 return 0;
 }
 
@@ -830,7 +843,10 @@ static int coroutine_fn stat_to_v9stat(V9fsPDU *pdu, 
V9fsPath *path,
 
 memset(v9stat, 0, sizeof(*v9stat));
 
-stat_to_qid(stbuf, &v9stat->qid);
+err = stat_to_qid(pdu, stbuf, &v9stat->qid);
+if (err < 0) {
+return err;
+}
 v9stat->mode = stat_to_v9mode(stbuf);
 v9stat->atime = stbuf->st_atime;
 v9stat->mtime = stbuf->st_mtime;
@@ -891,7 +907,7 @@ static int coroutine_fn stat_to_v9stat(V9fsPDU *pdu, 
V9fsPath *path,
 #define P9_STATS_ALL   0x3fffULL /* Mask for All fields above */
 
 
-static void stat_to_v9stat_dotl(V9fsState *s, const struct stat *stbuf,
+static int stat_to_v9stat_dotl(V9fsPDU *pdu, const struct stat *stbuf,
 V9fsStatDotl *v9lstat)
 {
 memset(v9lstat, 0, sizeof(*v9lstat));
@@ -913,7 +929,7 @@ static void stat_to_v9stat_dotl(V9fsState *s, const struct 
stat *stbuf,
 /* Currently we only support BASIC fields in stat */
 v9lstat->st_result_mask = P9_STATS_BASIC;
 
-stat_to_qid(stbuf, &v9lstat->qid);
+return stat_to_qid(pdu, stbuf, &v9lstat->qid);
 }
 
 static void print_sg(struct iovec *sg, int cnt)
@@ -1115,7 +1131,6 @@ static void coroutine_fn v9fs_getattr(void *opaque)
 uint64_t request_mask;
 V9fsStatDotl v9stat_dotl;
 V9fsPDU *pdu = opaque;
-V9fsState *s = pdu->s;
 
 retval = pdu_unmarshal(pdu, offset, "dq", &fid, &request_mask);
 if (retval < 0) {
@@ -1136,7 +1151,10 @@ static void coroutine_fn v9fs_getattr(void *opaque)
 if (retval < 0) {
 goto out;
 }
-stat_to_v9stat_dotl(s, &stbuf, &v9stat_dotl);
+retval = stat_to_v9stat_dotl(pdu, &stbuf, &v9stat_dotl);
+if (retval < 0) {
+goto out;
+}
 
 /*  fill st_gen if requested and supported by underlying fs */
 if (request_mask & P9_STATS_GEN) {
@@ -1381,7 +1399,10 @@ static void coroutine_fn v9fs_walk(void *opaque)
 if (err < 0) {
 goto out;
 }
-stat_to_qid(&stbuf, &qid);
+err = stat_to_qid(pdu, &stbuf, &qid);
+if (err < 0) {
+goto out;
+}
 v9fs_path_copy(&dpath, &path);
 }
 memcpy(&qids[name_idx], &qid, sizeof(qid));
@@ -1483,7 +1504,10 @@ static void coroutine_fn v9fs_open(void *opaque)
 if (err < 0) {
 goto out;
 }
-stat_to_qid(&stbuf, &qid);
+err = stat_to_qid(pdu, &stbuf, &qid);
+if (err < 0) {
+goto out;
+}
 if (S_ISDIR(stbuf.st_mode)) {
 err = v9fs_co_opendir(pdu, fidp);
 if (err < 0) {
@@ -1593,7 +1617,10 @@ static void coroutine_fn v9fs_lcreate(void *opaque)
 fidp->flags |= FID_NON_RECLAIMABLE;
 }
 iounit =  get_iounit(pdu

[Qemu-devel] [PATCH v6 3/4] 9p: stat_to_qid: implement slow path

2019-08-22 Thread Christian Schoenebeck via Qemu-devel
stat_to_qid attempts via qid_path_prefixmap to map unique files (which are
identified by 64 bit inode nr and 32 bit device id) to a 64 QID path value.
However this implementation makes some assumptions about inode number
generation on the host.

If qid_path_prefixmap fails, we still have 48 bits available in the QID
path to fall back to a less memory efficient full mapping.

Signed-off-by: Antonios Motakis 
[CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next
   (SHA1 177fd3b6a8).
 - Updated hash calls to new xxhash API.
 - Removed unnecessary parantheses in qpf_lookup_func().
 - Removed unnecessary g_malloc0() result checks.
 - Log error message when running out of prefixes in
   qid_path_fullmap().
 - Log error message about potential degraded performance in
   qid_path_prefixmap().
 - Fixed typo in comment. ]
Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p.c | 70 ++--
 hw/9pfs/9p.h |  9 
 2 files changed, 72 insertions(+), 7 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index c96ea51116..728641fb7f 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -579,23 +579,73 @@ static uint32_t qpp_hash(QppEntry e)
 return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0);
 }
 
+static uint32_t qpf_hash(QpfEntry e)
+{
+return qemu_xxhash7(e.ino, e.dev, 0, 0, 0);
+}
+
 static bool qpp_lookup_func(const void *obj, const void *userp)
 {
 const QppEntry *e1 = obj, *e2 = userp;
 return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix;
 }
 
-static void qpp_table_remove(void *p, uint32_t h, void *up)
+static bool qpf_lookup_func(const void *obj, const void *userp)
+{
+const QpfEntry *e1 = obj, *e2 = userp;
+return e1->dev == e2->dev && e1->ino == e2->ino;
+}
+
+static void qp_table_remove(void *p, uint32_t h, void *up)
 {
 g_free(p);
 }
 
-static void qpp_table_destroy(struct qht *ht)
+static void qp_table_destroy(struct qht *ht)
 {
-qht_iter(ht, qpp_table_remove, NULL);
+qht_iter(ht, qp_table_remove, NULL);
 qht_destroy(ht);
 }
 
+static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf,
+uint64_t *path)
+{
+QpfEntry lookup = {
+.dev = stbuf->st_dev,
+.ino = stbuf->st_ino
+}, *val;
+uint32_t hash = qpf_hash(lookup);
+
+/* most users won't need the fullmap, so init the table lazily */
+if (!pdu->s->qpf_table.map) {
+qht_init(&pdu->s->qpf_table, qpf_lookup_func, 1 << 16, 
QHT_MODE_AUTO_RESIZE);
+}
+
+val = qht_lookup(&pdu->s->qpf_table, &lookup, hash);
+
+if (!val) {
+if (pdu->s->qp_fullpath_next == 0) {
+/* no more files can be mapped :'( */
+error_report_once(
+"9p: No more prefixes available for remapping inodes from "
+"host to guest."
+);
+return -ENFILE;
+}
+
+val = g_malloc0(sizeof(QppEntry));
+*val = lookup;
+
+/* new unique inode and device combo */
+val->path = pdu->s->qp_fullpath_next++;
+pdu->s->qp_fullpath_next &= QPATH_INO_MASK;
+qht_insert(&pdu->s->qpf_table, val, hash, NULL);
+}
+
+*path = val->path;
+return 0;
+}
+
 /* stat_to_qid needs to map inode number (64 bits) and device id (32 bits)
  * to a unique QID path (64 bits). To avoid having to map and keep track
  * of up to 2^64 objects, we map only the 16 highest bits of the inode plus
@@ -621,8 +671,7 @@ static int qid_path_prefixmap(V9fsPDU *pdu, const struct 
stat *stbuf,
 if (pdu->s->qp_prefix_next == 0) {
 /* we ran out of prefixes */
 error_report_once(
-"9p: No more prefixes available for remapping inodes from "
-"host to guest."
+"9p: Potential degraded performance of inode remapping"
 );
 return -ENFILE;
 }
@@ -647,6 +696,10 @@ static int stat_to_qid(V9fsPDU *pdu, const struct stat 
*stbuf, V9fsQID *qidp)
 if (pdu->s->ctx.export_flags & V9FS_REMAP_INODES) {
 /* map inode+device to qid path (fast path) */
 err = qid_path_prefixmap(pdu, stbuf, &qidp->path);
+if (err == -ENFILE) {
+/* fast path didn't work, fall back to full map */
+err = qid_path_fullmap(pdu, stbuf, &qidp->path);
+}
 if (err) {
 return err;
 }
@@ -3813,6 +3866,7 @@ int v9fs_device_realize_common(V9fsState *s, const 
V9fsTransport *t,
 /* QID path hash table. 1 entry ought to be enough for anybody ;) */
 qht_init(&s->qpp_table, qpp_lookup_func, 1, QHT_MODE_AUTO_RESIZE);
 s->qp_prefix_next = 1; /* reserve 0 to detect overflow */
+s->

[Qemu-devel] [PATCH v6 2/4] 9p: Added virtfs option 'multidevs=remap|forbid|warn'

2019-08-22 Thread Christian Schoenebeck via Qemu-devel
'warn' (default): Only log an error message (once) on host if more than one
device is shared by same export, except of that just ignore this config
error though. This is the default behaviour for not breaking existing
installations implying that they really know what they are doing.

'forbid': Like 'warn', but except of just logging an error this
also denies access of guest to additional devices.

'remap': Allows to share more than one device per export by remapping
inodes from host to guest appropriately. To support multiple devices on the
9p share, and avoid qid path collisions we take the device id as input to
generate a unique QID path. The lowest 48 bits of the path will be set
equal to the file inode, and the top bits will be uniquely assigned based
on the top 16 bits of the inode and the device id.

Signed-off-by: Antonios Motakis 
[CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next
   (SHA1 177fd3b6a8).
 - Updated hash calls to new xxhash API.
 - Added virtfs option 'multidevs', original patch simply did the inode
   remapping without being asked.
 - Updated docs for new option 'multidevs'.
 - Capture root_ino in v9fs_device_realize_common() as well, not just
   the device id.
 - Fixed v9fs_do_readdir() not having remapped inodes.
 - Log error message when running out of prefixes in
   qid_path_prefixmap().
 - Fixed definition of QPATH_INO_MASK.
 - Dropped unnecessary parantheses in qpp_lookup_func().
 - Dropped unnecessary g_malloc0() result checks. ]
Signed-off-by: Christian Schoenebeck 
---
 fsdev/file-op-9p.h  |   5 ++
 fsdev/qemu-fsdev-opts.c |   7 +-
 fsdev/qemu-fsdev.c  |  11 +++
 hw/9pfs/9p.c| 182 ++--
 hw/9pfs/9p.h|  13 
 qemu-options.hx |  33 +++--
 vl.c|   6 +-
 7 files changed, 229 insertions(+), 28 deletions(-)

diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h
index c757c8099f..f2f7772c86 100644
--- a/fsdev/file-op-9p.h
+++ b/fsdev/file-op-9p.h
@@ -59,6 +59,11 @@ typedef struct ExtendedOps {
 #define V9FS_RDONLY 0x0040
 #define V9FS_PROXY_SOCK_FD  0x0080
 #define V9FS_PROXY_SOCK_NAME0x0100
+/*
+ * multidevs option (either one of the two applies exclusively)
+ */
+#define V9FS_REMAP_INODES   0x0200
+#define V9FS_FORBID_MULTIDEVS   0x0400
 
 #define V9FS_SEC_MASK   0x003C
 
diff --git a/fsdev/qemu-fsdev-opts.c b/fsdev/qemu-fsdev-opts.c
index 7c31af..07a18c6e48 100644
--- a/fsdev/qemu-fsdev-opts.c
+++ b/fsdev/qemu-fsdev-opts.c
@@ -31,7 +31,9 @@ static QemuOptsList qemu_fsdev_opts = {
 }, {
 .name = "readonly",
 .type = QEMU_OPT_BOOL,
-
+}, {
+.name = "multidevs",
+.type = QEMU_OPT_STRING,
 }, {
 .name = "socket",
 .type = QEMU_OPT_STRING,
@@ -76,6 +78,9 @@ static QemuOptsList qemu_virtfs_opts = {
 .name = "readonly",
 .type = QEMU_OPT_BOOL,
 }, {
+.name = "multidevs",
+.type = QEMU_OPT_STRING,
+    }, {
 .name = "socket",
 .type = QEMU_OPT_STRING,
 }, {
diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c
index 077a8c4e2b..ed03d559a9 100644
--- a/fsdev/qemu-fsdev.c
+++ b/fsdev/qemu-fsdev.c
@@ -58,6 +58,7 @@ static FsDriverTable FsDrivers[] = {
 "writeout",
 "fmode",
 "dmode",
+"multidevs",
 "throttling.bps-total",
 "throttling.bps-read",
 "throttling.bps-write",
@@ -121,6 +122,7 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp)
 const char *fsdev_id = qemu_opts_id(opts);
 const char *fsdriver = qemu_opt_get(opts, "fsdriver");
 const char *writeout = qemu_opt_get(opts, "writeout");
+const char *multidevs = qemu_opt_get(opts, "multidevs");
 bool ro = qemu_opt_get_bool(opts, "readonly", 0);
 
 if (!fsdev_id) {
@@ -161,6 +163,15 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp)
 } else {
 fsle->fse.export_flags &= ~V9FS_RDONLY;
 }
+if (multidevs) {
+if (!strcmp(multidevs, "remap")) {
+fsle->fse.export_flags &= ~V9FS_FORBID_MULTIDEVS;
+fsle->fse.export_flags |= V9FS_REMAP_INODES;
+} else if (!strcmp(multidevs, "forbid")) {
+fsle->fse.export_flags &= ~V9FS_REMAP_INODES;
+fsle->fse.export_flags |= V9FS_FORBID_MULTIDEVS;
+}
+}
 
 if (fsle->fse.ops->parse_opts) {
     if (fsle->fse.ops->parse_opts(opts, &fsle->fse, errp)) {
diff --git a/hw

Re: [Qemu-devel] [QEMU] [PATCH v5 3/8] bootdevice: Add interface to gather LCHS

2019-08-22 Thread Sam Eiderman via Qemu-devel
> I’ve got a couple of “undelivered mail returned to sender” mails for Sam
> recently, but anyway...

- shmuel.eider...@oracle.com
+ sam...@google.com

> It doesn’t look like any caller actually passes a NULL @dev, so why not
> drop the @suffix part?

Just copied it from the bootindex implementation.
I think the suffix part there was implemented specifically for fdc since
the same device can have two suffixes (A and B).
This is not relavant here, but I think we still need the suffix to
create the device name for seabios to find.

Sam




[Qemu-devel] [PATCH] vfio: fix a typo

2019-08-21 Thread Chen Zhang via Qemu-devel
Signed-off-by: Chen Zhang 
---
 hw/vfio/pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index dc3479c..c5e6fe6 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -44,7 +44,7 @@
 #define TYPE_VFIO_PCI "vfio-pci"
 #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
 
-#define TYPE_VIFO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
+#define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
 
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
@@ -3199,7 +3199,7 @@ static void vfio_pci_nohotplug_dev_class_init(ObjectClass 
*klass, void *data)
 }
 
 static const TypeInfo vfio_pci_nohotplug_dev_info = { 
-.name = TYPE_VIFO_PCI_NOHOTPLUG,
+.name = TYPE_VFIO_PCI_NOHOTPLUG,
 .parent = TYPE_VFIO_PCI,
 .instance_size = sizeof(VFIOPCIDevice),
 .class_init = vfio_pci_nohotplug_dev_class_init,
-- 
2.7.4





[Qemu-devel] [PATCH] linux-user: hijack open() for thread directories

2019-08-21 Thread Shu-Chun Weng via Qemu-devel
Besides /proc/self|, files under /proc/thread-self and
/proc/self|/task/ also expose host information to the guest
program. This patch adds them to the hijack infrastracture. Note that
is_proc_myself() does not check if the  matches the current thread
and is thus only suitable for procfs files that are identical for all
threads in the same process.

Behavior verified with guest program:

long main_thread_tid;

long gettid() {
  return syscall(SYS_gettid);
}

void print_info(const char* cxt, const char* dir) {
  char buf[1024];
  FILE* fp;

  snprintf(buf, sizeof(buf), "%s/cmdline", dir);
  fp = fopen(buf, "r");

  if (fp == NULL) {
printf("%s: can't open %s\n", cxt, buf);
  } else {
fgets(buf, sizeof(buf), fp);
printf("%s %s cmd: %s\n", cxt, dir, buf);
fclose(fp);
  }

  snprintf(buf, sizeof(buf), "%s/maps", dir);
  fp = fopen(buf, "r");

  if (fp == NULL) {
printf("%s: can't open %s\n", cxt, buf);
  } else {
char seen[128][128];
int n = 0, is_new = 0;
while(fgets(buf, sizeof(buf), fp) != NULL) {
  const char* p = strrchr(buf, ' ');
  if (p == NULL || *(p + 1) == '\n') {
continue;
  }
  ++p;
  is_new = 1;
  for (int i = 0; i < n; ++i) {
if (strncmp(p, seen[i], sizeof(seen[i])) == 0) {
  is_new = 0;
  break;
}
  }
  if (is_new) {
printf("%s %s map: %s", cxt, dir, p);
if (n < 128) {
  strncpy(seen[n], p, sizeof(seen[n]));
  seen[n][sizeof(seen[n]) - 1] = '\0';
  ++n;
}
  }
}
fclose(fp);
  }
}

void* thread_main(void* _) {
  char buf[1024];

  print_info("Child", "/proc/thread-self");

  snprintf(buf, sizeof(buf), "/proc/%ld/task/%ld", (long) getpid(), 
main_thread_tid);
  print_info("Child", buf);

  snprintf(buf, sizeof(buf), "/proc/%ld/task/%ld", (long) getpid(), (long) 
gettid());
  print_info("Child", buf);

  return NULL;
}

int main() {
  char buf[1024];
  pthread_t thread;
  int ret;

  print_info("Main", "/proc/thread-self");
  print_info("Main", "/proc/self");

  snprintf(buf, sizeof(buf), "/proc/%ld", (long) getpid());
  print_info("Main", buf);

  main_thread_tid = gettid();
  snprintf(buf, sizeof(buf), "/proc/self/task/%ld", main_thread_tid);
  print_info("Main", buf);

  snprintf(buf, sizeof(buf), "/proc/%ld/task/%ld", (long) getpid(), 
main_thread_tid);
  print_info("Main", buf);

  if ((ret = pthread_create(&thread, NULL, &thread_main, NULL)) < 0) {
printf("ptherad_create failed: %s (%d)\n", strerror(ret), ret);
  }

  pthread_join(thread, NULL);
  return 0;
}

Signed-off-by: Shu-Chun Weng 
---
 linux-user/syscall.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8367cb138d..73fe82bcc7 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6968,17 +6968,57 @@ static int open_self_auxv(void *cpu_env, int fd)
 return 0;
 }
 
+static int consume_task_directories(const char **filename)
+{
+if (!strncmp(*filename, "task/", strlen("task/"))) {
+*filename += strlen("task/");
+if (**filename < '1' || **filename > '9') {
+return 0;
+}
+/*
+ * Don't care about the exact tid.
+ * XXX: this allows opening files under /proc/self|/task/ where
+ *   is not a valid thread id. Consider checking if the file
+ *  actually exists.
+ */
+const char *p = *filename + 1;
+while (*p >= '0' && *p <= '9') {
+++p;
+}
+if (*p == '/') {
+*filename = p + 1;
+return 1;
+} else {
+return 0;
+}
+}
+return 1;
+}
+
+/*
+ * Determines if filename refer to a procfs file for the current process or any
+ * thread within the current process. This function should only be used to 
check
+ * for files that have identical contents in all threads, e.g. exec, maps, etc.
+ */
 static int is_proc_myself(const char *filename, const char *entry)
 {
 if (!strncmp(filename, "/proc/", strlen("/proc/"))) {
 filename += strlen("/proc/");
 if (!strncmp(filename, "self/", strlen("self/"))) {
 filename += strlen("self/");
+if (!consume_task_directories(&filename)) {
+return 0;
+}
+} else if (!strncmp(filename, "thread-self/", strlen("thread-self/"))) 
{
+filename += strlen("thread-self/");
 } else if (*filename >= '1' && *filename <= '9') {
 char myself[80];
 snprintf(myself, sizeof(myself), "%d/", getpid());
 if (!strncmp(filename, myself, strlen(myself))) {
 filename += strlen(myself);
+if (!consume_task_directories(&filename)) {
+return 0;
+}
 } else {
 return 0;
 }
-- 
2.23.0.rc1.153.gdeed80330f-goog




Re: [Qemu-devel] patch to swap SIGRTMIN + 1 and SIGRTMAX - 1

2019-08-19 Thread Josh Kunz via Qemu-devel
Hi all,

I have also experienced issues with SIGRTMIN + 1, and am interested in
moving this patch forwards. Anything I can do here to help? Would the
maintainers prefer myself or Marli re-submit the patch?

The Go issue here seems particularly sticky. Even if we update the Go
runtime, users may try and run older binaries built with older versions of
Go for quite some time (months? years?). Would it be better to hide this
behind some kind of build-time flag (`--enable-sigrtmin-plus-one-proxy` or
something), so that some users can opt-in, but older binaries still work as
expected?

Also, here is a link to the original thread this message is in reply to
in-case my mail-client doesn't set up the reply properly:
https://lists.nongnu.org/archive/html/qemu-devel/2019-07/msg01303.html

Thanks,
Josh Kunz


[Qemu-devel] [PATCH] linux-user: erroneous fd_trans_unregister call

2019-08-19 Thread Shu-Chun Weng via Qemu-devel
timer_getoverrun returns the "overrun count" for the timer, which is not
a file descriptor and thus should not call fd_trans_unregister on it.

Signed-off-by: Shu-Chun Weng 
---
 linux-user/syscall.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8367cb138d..012d79f8c1 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -11846,7 +11846,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 timer_t htimer = g_posix_timers[timerid];
 ret = get_errno(timer_getoverrun(htimer));
 }
-fd_trans_unregister(ret);
 return ret;
 }
 #endif
-- 
2.23.0.rc1.153.gdeed80330f-goog




[Qemu-devel] [PATCH v3] linux-user: add memfd_create

2019-08-19 Thread Shu-Chun Weng via Qemu-devel
Add support for the memfd_create syscall. If the host does not have the
libc wrapper, translate to a direct syscall with NC-macro.

Buglink: https://bugs.launchpad.net/qemu/+bug/1734792
Signed-off-by: Shu-Chun Weng 
---
 include/qemu/memfd.h |  4 
 linux-user/syscall.c | 12 
 util/memfd.c |  2 +-
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/qemu/memfd.h b/include/qemu/memfd.h
index d551c28b68..975b6bdb77 100644
--- a/include/qemu/memfd.h
+++ b/include/qemu/memfd.h
@@ -32,6 +32,10 @@
 #define MFD_HUGE_SHIFT 26
 #endif
 
+#if defined CONFIG_LINUX && !defined CONFIG_MEMFD
+int memfd_create(const char *name, unsigned int flags);
+#endif
+
 int qemu_memfd_create(const char *name, size_t size, bool hugetlb,
   uint64_t hugetlbsize, unsigned int seals, Error **errp);
 bool qemu_memfd_alloc_check(void);
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8367cb138d..f3f9311e9c 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
 #include "qemu/path.h"
+#include "qemu/memfd.h"
 #include 
 #include 
 #include 
@@ -11938,6 +11939,17 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 /* PowerPC specific.  */
 return do_swapcontext(cpu_env, arg1, arg2, arg3);
 #endif
+#ifdef TARGET_NR_memfd_create
+case TARGET_NR_memfd_create:
+p = lock_user_string(arg1);
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(memfd_create(p, arg2));
+fd_trans_unregister(ret);
+unlock_user(p, arg1, 0);
+return ret;
+#endif
 
 default:
 qemu_log_mask(LOG_UNIMP, "Unsupported syscall: %d\n", num);
diff --git a/util/memfd.c b/util/memfd.c
index 00334e5b21..4a3c07e0be 100644
--- a/util/memfd.c
+++ b/util/memfd.c
@@ -35,7 +35,7 @@
 #include 
 #include 
 
-static int memfd_create(const char *name, unsigned int flags)
+int memfd_create(const char *name, unsigned int flags)
 {
 #ifdef __NR_memfd_create
 return syscall(__NR_memfd_create, name, flags);
-- 
2.23.0.rc1.153.gdeed80330f-goog




[Qemu-devel] [PATCH v2] linux-user: add memfd_create

2019-08-19 Thread Shu-Chun Weng via Qemu-devel
Add support for the memfd_create syscall. If the host does not have the
libc wrapper, translate to a direct syscall with NC-macro.

Buglink: https://bugs.launchpad.net/qemu/+bug/1734792
Signed-off-by: Shu-Chun Weng 
---
 include/qemu/memfd.h |  4 
 linux-user/syscall.c | 11 +++
 util/memfd.c |  2 +-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/qemu/memfd.h b/include/qemu/memfd.h
index d551c28b68..975b6bdb77 100644
--- a/include/qemu/memfd.h
+++ b/include/qemu/memfd.h
@@ -32,6 +32,10 @@
 #define MFD_HUGE_SHIFT 26
 #endif
 
+#if defined CONFIG_LINUX && !defined CONFIG_MEMFD
+int memfd_create(const char *name, unsigned int flags);
+#endif
+
 int qemu_memfd_create(const char *name, size_t size, bool hugetlb,
   uint64_t hugetlbsize, unsigned int seals, Error **errp);
 bool qemu_memfd_alloc_check(void);
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8367cb138d..b506c1f40e 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
 #include "qemu/path.h"
+#include "qemu/memfd.h"
 #include 
 #include 
 #include 
@@ -11938,6 +11939,16 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 /* PowerPC specific.  */
 return do_swapcontext(cpu_env, arg1, arg2, arg3);
 #endif
+#ifdef TARGET_NR_memfd_create
+case TARGET_NR_memfd_create:
+p = lock_user_string(arg1);
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(memfd_create(p, arg2));
+unlock_user(p, arg1, 0);
+return ret;
+#endif
 
 default:
 qemu_log_mask(LOG_UNIMP, "Unsupported syscall: %d\n", num);
diff --git a/util/memfd.c b/util/memfd.c
index 00334e5b21..4a3c07e0be 100644
--- a/util/memfd.c
+++ b/util/memfd.c
@@ -35,7 +35,7 @@
 #include 
 #include 
 
-static int memfd_create(const char *name, unsigned int flags)
+int memfd_create(const char *name, unsigned int flags)
 {
 #ifdef __NR_memfd_create
 return syscall(__NR_memfd_create, name, flags);
-- 
2.23.0.rc1.153.gdeed80330f-goog




[Qemu-devel] [Bug 1772165] Re: arm raspi2/raspi3 emulation has no USB support

2019-08-18 Thread Weber Kai via Qemu-devel
Hi!

I've googled: "usb" "designware" "otg" "datasheet"

I think this is the kernel driver for this device:
https://github.com/torvalds/linux/tree/master/drivers/usb/dwc3

Maybe it should be possible to use this as a reference? Maybe try to
redirect the proprietary drivers system calls? I don't know...

I've also found theses docs, which explains the device a little bit:
http://www.infradead.org/~mchehab/kernel_docs_pdf/driver-api.pdf
https://media.digikey.com/pdf/Data%20Sheets/Austriamicrosystems%20PDFs/AS3524.pdf
https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/arria-10/a10_54018.pdf

Thanks.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1772165

Title:
  arm raspi2/raspi3 emulation has no USB support

Status in QEMU:
  Confirmed

Bug description:
  Using Qemu 2.12.0 on ArchLinux.

  Trying to emulate arm device with `qemu-system-arm` and attach usb
  device for unput using

  ` -usb -device usb-host,bus=001,vendorid=0x1d6b,productid=0x0002 `

  # lsusb returns

  Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
  Bus 001 Device 014: ID 13d3:3487 IMC Networks 
  Bus 001 Device 004: ID 0457:11af Silicon Integrated Systems Corp. 
  Bus 001 Device 003: ID 0bda:57e6 Realtek Semiconductor Corp. 
  Bus 001 Device 002: ID 0bda:0129 Realtek Semiconductor Corp. RTS5129 Card 
Reader Controller
  Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

  # qemu returns
  qemu-system-arm: -device usb-host,bus=001,vendorid=0x1d6b,productid=0x0002: 
Bus '001' not found

  
  Tried with connecting external usb keyboard but that didn't seem to work 
either.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1772165/+subscriptions



Re: [Qemu-devel] [PATCH] linux-user: Support gdb 'qOffsets' query for ELF

2019-08-16 Thread Josh Kunz via Qemu-devel
+cc: riku.voi...@iki.fi, I typoed the email on the first go.

On Fri, Aug 16, 2019 at 4:34 PM Josh Kunz  wrote:

> This is needed to support debugging PIE ELF binaries running under QEMU
> user mode. Currently, `code_offset` and `data_offset` remain unset for
> all ELF binaries, so GDB is unable to correctly locate the position of
> the binary's text and data.
>
> The fields `code_offset`, and `data_offset` were originally added way
> back in 2006 to support debugging of bFMT executables (978efd6aac6),
> and support was just never added for ELF. Since non-PIE binaries are
> loaded at exactly the address specified in the binary, GDB does not need
> to relocate any symbols, so the buggy behavior is not normally observed.
>
> Buglink: https://bugs.launchpad.net/qemu/+bug/1528239
> Signed-off-by: Josh Kunz 
> ---
>  linux-user/elfload.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> index 3365e192eb..ceac035208 100644
> --- a/linux-user/elfload.c
> +++ b/linux-user/elfload.c
> @@ -2380,6 +2380,8 @@ static void load_elf_image(const char *image_name,
> int image_fd,
>  }
>
>  info->load_bias = load_bias;
> +info->code_offset = load_bias;
> +info->data_offset = load_bias;
>  info->load_addr = load_addr;
>  info->entry = ehdr->e_entry + load_bias;
>  info->start_code = -1;
> --
> 2.23.0.rc1.153.gdeed80330f-goog
>
>


[Qemu-devel] [PATCH] linux-user: Support gdb 'qOffsets' query for ELF

2019-08-16 Thread Josh Kunz via Qemu-devel
This is needed to support debugging PIE ELF binaries running under QEMU
user mode. Currently, `code_offset` and `data_offset` remain unset for
all ELF binaries, so GDB is unable to correctly locate the position of
the binary's text and data.

The fields `code_offset`, and `data_offset` were originally added way
back in 2006 to support debugging of bFMT executables (978efd6aac6),
and support was just never added for ELF. Since non-PIE binaries are
loaded at exactly the address specified in the binary, GDB does not need
to relocate any symbols, so the buggy behavior is not normally observed.

Buglink: https://bugs.launchpad.net/qemu/+bug/1528239
Signed-off-by: Josh Kunz 
---
 linux-user/elfload.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 3365e192eb..ceac035208 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2380,6 +2380,8 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 }
 
 info->load_bias = load_bias;
+info->code_offset = load_bias;
+info->data_offset = load_bias;
 info->load_addr = load_addr;
 info->entry = ehdr->e_entry + load_bias;
 info->start_code = -1;
-- 
2.23.0.rc1.153.gdeed80330f-goog




[Qemu-devel] [PATCH v2] linux-user: Add support for SIOCETHTOOL ioctl

2019-08-16 Thread Shu-Chun Weng via Qemu-devel
The ioctl numeric values are platform-independent and determined by
the file include/uapi/linux/sockios.h in Linux kernel source code:

  #define SIOCETHTOOL   0x8946

These ioctls get (or set) the field ifr_data of type char* in the
structure ifreq. Such functionality is achieved in QEMU by using
MK_STRUCT() and MK_PTR() macros with an appropriate argument, as
it was done for existing similar cases.

Signed-off-by: Shu-Chun Weng 
---
 linux-user/ioctls.h   | 1 +
 linux-user/syscall_defs.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index 3281c97ca2..9d231df665 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -208,6 +208,7 @@
   IOCTL(SIOCGIFINDEX, IOC_W | IOC_R, MK_PTR(MK_STRUCT(STRUCT_int_ifreq)))
   IOCTL(SIOCSIFPFLAGS, IOC_W, MK_PTR(MK_STRUCT(STRUCT_short_ifreq)))
   IOCTL(SIOCGIFPFLAGS, IOC_W | IOC_R, MK_PTR(MK_STRUCT(STRUCT_short_ifreq)))
+  IOCTL(SIOCETHTOOL, IOC_R | IOC_W, MK_PTR(MK_STRUCT(STRUCT_ptr_ifreq)))
   IOCTL(SIOCSIFLINK, 0, TYPE_NULL)
   IOCTL_SPECIAL(SIOCGIFCONF, IOC_W | IOC_R, do_ioctl_ifconf,
 MK_PTR(MK_STRUCT(STRUCT_ifconf)))
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 0662270300..276f96039f 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -819,6 +819,8 @@ struct target_pollfd {
 #define TARGET_SIOCGIFTXQLEN   0x8942  /* Get the tx queue length  
*/
 #define TARGET_SIOCSIFTXQLEN   0x8943  /* Set the tx queue length  
*/
 
+#define TARGET_SIOCETHTOOL 0x8946  /* Ethtool interface
*/
+
 /* ARP cache control calls. */
 #define TARGET_OLD_SIOCDARP0x8950  /* old delete ARP table entry   
*/
 #define TARGET_OLD_SIOCGARP0x8951  /* old get ARP table entry  
*/
-- 
2.23.0.rc1.153.gdeed80330f-goog




Re: [Qemu-devel] [PATCH] Add support for ethtool via TARGET_SIOCETHTOOL ioctls.

2019-08-16 Thread Shu-Chun Weng via Qemu-devel
Thank you Aleksandar,

I've updated the patch description and will send out v2 soon.

As for the length of the line: all lines in file syscall_defs.h are of
length 81 with a fixed width comment at the end. I'm not sure if making the
one line I add 80-character-wide is the right choice.

Shu-Chun

On Fri, Aug 16, 2019 at 3:37 PM Aleksandar Markovic <
aleksandar.m.m...@gmail.com> wrote:

>
> 16.08.2019. 23.28, "Shu-Chun Weng via Qemu-devel" 
> је написао/ла:
> >
> > The ioctl numeric values are platform-independent and determined by
> > the file include/uapi/linux/sockios.h in Linux kernel source code:
> >
> >   #define SIOCETHTOOL   0x8946
> >
> > These ioctls get (or set) the field ifr_data of type char* in the
> > structure ifreq. Such functionality is achieved in QEMU by using
> > MK_STRUCT() and MK_PTR() macros with an appropriate argument, as
> > it was done for existing similar cases.
> >
> > Signed-off-by: Shu-Chun Weng 
> > ---
>
> Shu-Chun, hi, and welcome!
>
> Just a couple of cosmetic things:
>
>   - by convention, the title of this patch should start with
> "linux-user:", since this patch affects linux user QEMU module;
>
>   - the patch title is too long (and has some minor mistakes) -
> "linux-user: Add support for SIOCETHTOOL ioctl" should be good enough;
>
>   - the lenght of the code lines that you add or modify must not be
> greater than 80.
>
> Sincerely,
> Aleksandar
>
> >  linux-user/ioctls.h   | 1 +
> >  linux-user/syscall_defs.h | 2 ++
> >  2 files changed, 3 insertions(+)
> >
> > diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
> > index 3281c97ca2..9d231df665 100644
> > --- a/linux-user/ioctls.h
> > +++ b/linux-user/ioctls.h
> > @@ -208,6 +208,7 @@
> >IOCTL(SIOCGIFINDEX, IOC_W | IOC_R,
> MK_PTR(MK_STRUCT(STRUCT_int_ifreq)))
> >IOCTL(SIOCSIFPFLAGS, IOC_W, MK_PTR(MK_STRUCT(STRUCT_short_ifreq)))
> >IOCTL(SIOCGIFPFLAGS, IOC_W | IOC_R,
> MK_PTR(MK_STRUCT(STRUCT_short_ifreq)))
> > +  IOCTL(SIOCETHTOOL, IOC_R | IOC_W, MK_PTR(MK_STRUCT(STRUCT_ptr_ifreq)))
> >IOCTL(SIOCSIFLINK, 0, TYPE_NULL)
> >IOCTL_SPECIAL(SIOCGIFCONF, IOC_W | IOC_R, do_ioctl_ifconf,
> >  MK_PTR(MK_STRUCT(STRUCT_ifconf)))
> > diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
> > index 0662270300..276f96039f 100644
> > --- a/linux-user/syscall_defs.h
> > +++ b/linux-user/syscall_defs.h
> > @@ -819,6 +819,8 @@ struct target_pollfd {
> >  #define TARGET_SIOCGIFTXQLEN   0x8942  /* Get the tx queue
> length  */
> >  #define TARGET_SIOCSIFTXQLEN   0x8943  /* Set the tx queue
> length  */
> >
> > +#define TARGET_SIOCETHTOOL 0x8946  /* Ethtool interface
> */
> > +
> >  /* ARP cache control calls. */
> >  #define TARGET_OLD_SIOCDARP0x8950  /* old delete ARP table
> entry   */
> >  #define TARGET_OLD_SIOCGARP0x8951  /* old get ARP table
> entry  */
> > --
> > 2.23.0.rc1.153.gdeed80330f-goog
> >
> >
>


smime.p7s
Description: S/MIME Cryptographic Signature


[Qemu-devel] [PATCH] linux-user: add memfd_create

2019-08-16 Thread Shu-Chun Weng via Qemu-devel
Add support for the memfd_create syscall. If the host does not have the
libc wrapper, translate to a direct syscall with NC-macro.

Buglink: https://bugs.launchpad.net/qemu/+bug/1734792
Signed-off-by: Shu-Chun Weng 
---
 include/qemu/memfd.h |  4 
 linux-user/syscall.c | 11 +++
 util/memfd.c |  2 +-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/qemu/memfd.h b/include/qemu/memfd.h
index d551c28b68..975b6bdb77 100644
--- a/include/qemu/memfd.h
+++ b/include/qemu/memfd.h
@@ -32,6 +32,10 @@
 #define MFD_HUGE_SHIFT 26
 #endif
 
+#if defined CONFIG_LINUX && !defined CONFIG_MEMFD
+int memfd_create(const char *name, unsigned int flags);
+#endif
+
 int qemu_memfd_create(const char *name, size_t size, bool hugetlb,
   uint64_t hugetlbsize, unsigned int seals, Error **errp);
 bool qemu_memfd_alloc_check(void);
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8367cb138d..b506c1f40e 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
 #include "qemu/path.h"
+#include "qemu/memfd.h"
 #include 
 #include 
 #include 
@@ -11938,6 +11939,16 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 /* PowerPC specific.  */
 return do_swapcontext(cpu_env, arg1, arg2, arg3);
 #endif
+#ifdef TARGET_NR_memfd_create
+case TARGET_NR_memfd_create:
+p = lock_user_string(arg1);
+if (!p) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(memfd_create(p, arg2));
+unlock_user(p, arg1, 0);
+return ret;
+#endif
 
 default:
 qemu_log_mask(LOG_UNIMP, "Unsupported syscall: %d\n", num);
diff --git a/util/memfd.c b/util/memfd.c
index 00334e5b21..4a3c07e0be 100644
--- a/util/memfd.c
+++ b/util/memfd.c
@@ -35,7 +35,7 @@
 #include 
 #include 
 
-static int memfd_create(const char *name, unsigned int flags)
+int memfd_create(const char *name, unsigned int flags)
 {
 #ifdef __NR_memfd_create
 return syscall(__NR_memfd_create, name, flags);
-- 
2.23.0.rc1.153.gdeed80330f-goog




[Qemu-devel] [PATCH] Add support for ethtool via TARGET_SIOCETHTOOL ioctls.

2019-08-16 Thread Shu-Chun Weng via Qemu-devel
The ioctl numeric values are platform-independent and determined by
the file include/uapi/linux/sockios.h in Linux kernel source code:

  #define SIOCETHTOOL   0x8946

These ioctls get (or set) the field ifr_data of type char* in the
structure ifreq. Such functionality is achieved in QEMU by using
MK_STRUCT() and MK_PTR() macros with an appropriate argument, as
it was done for existing similar cases.

Signed-off-by: Shu-Chun Weng 
---
 linux-user/ioctls.h   | 1 +
 linux-user/syscall_defs.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index 3281c97ca2..9d231df665 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -208,6 +208,7 @@
   IOCTL(SIOCGIFINDEX, IOC_W | IOC_R, MK_PTR(MK_STRUCT(STRUCT_int_ifreq)))
   IOCTL(SIOCSIFPFLAGS, IOC_W, MK_PTR(MK_STRUCT(STRUCT_short_ifreq)))
   IOCTL(SIOCGIFPFLAGS, IOC_W | IOC_R, MK_PTR(MK_STRUCT(STRUCT_short_ifreq)))
+  IOCTL(SIOCETHTOOL, IOC_R | IOC_W, MK_PTR(MK_STRUCT(STRUCT_ptr_ifreq)))
   IOCTL(SIOCSIFLINK, 0, TYPE_NULL)
   IOCTL_SPECIAL(SIOCGIFCONF, IOC_W | IOC_R, do_ioctl_ifconf,
 MK_PTR(MK_STRUCT(STRUCT_ifconf)))
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 0662270300..276f96039f 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -819,6 +819,8 @@ struct target_pollfd {
 #define TARGET_SIOCGIFTXQLEN   0x8942  /* Get the tx queue length  
*/
 #define TARGET_SIOCSIFTXQLEN   0x8943  /* Set the tx queue length  
*/
 
+#define TARGET_SIOCETHTOOL 0x8946  /* Ethtool interface
*/
+
 /* ARP cache control calls. */
 #define TARGET_OLD_SIOCDARP0x8950  /* old delete ARP table entry   
*/
 #define TARGET_OLD_SIOCGARP0x8951  /* old get ARP table entry  
*/
-- 
2.23.0.rc1.153.gdeed80330f-goog




Re: [Qemu-devel] [Qemu-arm] [PATCH] elf: Allow loading AArch64 ELF files

2019-08-12 Thread Aaron Lindsay OS via Qemu-devel
On Aug 12 16:02, Peter Maydell wrote:
> On Mon, 12 Aug 2019 at 15:46, Aaron Lindsay OS via Qemu-arm
>  wrote:
> >
> > Treat EM_AARCH64 as a valid value when checking the ELF's machine-type
> > header.
> >
> > Signed-off-by: Aaron Lindsay 
> > ---
> >  include/hw/elf_ops.h | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/include/hw/elf_ops.h b/include/hw/elf_ops.h
> > index 690f9238c8..f12faa90a1 100644
> > --- a/include/hw/elf_ops.h
> > +++ b/include/hw/elf_ops.h
> > @@ -381,6 +381,12 @@ static int glue(load_elf, SZ)(const char *name, int fd,
> >  goto fail;
> >  }
> >  break;
> > +case EM_AARCH64:
> > +if (ehdr.e_machine != EM_AARCH64) {
> > +ret = ELF_LOAD_WRONG_ARCH;
> > +goto fail;
> > +}
> > +break;
> >  default:
> >  if (elf_machine != ehdr.e_machine) {
> >  ret = ELF_LOAD_WRONG_ARCH;
> > --
> > 2.17.1
> 
> What problem are we trying to solve here ? If I'm reading your patch
> correctly then it makes no difference to execution, because we're
> switching on 'elf_machine', and so this extra case is saying
> "if elf_machine is EM_AARCH64, insist that ehdr.e_machine
> is also EM_AARCH64", which is exactly what the default
> case would do anyway. The only reason to add extra cases in
> this switch is to handle the situation where a target's board/loader
> code says "I can handle elf files of type X" but actually this
> implicitly means it can handle both X and Y (so for EM_X86_64 we
> need to accept both EM_X86_64 and EM_386, for EM_PPC64 we need to
> accept both EM_PPC64 and EM_PPC, and so on). We don't have a
> case like that for aarch64/arm because the boot loader code has
> to deal with the 32 bit and 64 bit code separately anyway, so
> we can pass in the correct value depending on whether the CPU
> is 32-bit or 64-bit.
> 
> The code in hw/arm/boot.c passes in an elf_machine value of
> EM_AARCH64 when it wants to load an AArch64 ELF file, so
> for that the default case is OK. The code in hw/core/generic-loader.c
> passes in 0 (EM_NONE) which will be handled by the check just
> before this switch statement, so the default case should
> work there too. Presumably there's some other code path
> for ELF file loading that doesn't work that you're trying to fix?

Please disregard this patch.

I'm sorry, upon closer inspection you are obviously correct... and I
have no earthly idea what happened here. I hit the "goto fail" in the
"default" case when debugging why I couldn't load an ELF on AArch64 last
week. I was in a hurry and saw that there were other architectures in
the switch/case and threw this code in there quickly without much
thought, and after re-compiling, it worked!

...But after your email this morning, I'm completely unable to reproduce
the failure case. I must have had another local issue which was
coincidentally resolved at the same time, unbeknownst to me.

-Aaron



[Qemu-devel] [PATCH] elf: Allow loading AArch64 ELF files

2019-08-12 Thread Aaron Lindsay OS via Qemu-devel
Treat EM_AARCH64 as a valid value when checking the ELF's machine-type
header.

Signed-off-by: Aaron Lindsay 
---
 include/hw/elf_ops.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/hw/elf_ops.h b/include/hw/elf_ops.h
index 690f9238c8..f12faa90a1 100644
--- a/include/hw/elf_ops.h
+++ b/include/hw/elf_ops.h
@@ -381,6 +381,12 @@ static int glue(load_elf, SZ)(const char *name, int fd,
 goto fail;
 }
 break;
+case EM_AARCH64:
+if (ehdr.e_machine != EM_AARCH64) {
+ret = ELF_LOAD_WRONG_ARCH;
+goto fail;
+}
+break;
 default:
 if (elf_machine != ehdr.e_machine) {
 ret = ELF_LOAD_WRONG_ARCH;
-- 
2.17.1




Re: [Qemu-devel] [PATCH v9 07/17] blockdev: adds bdrv_parse_aio to use io_uring

2019-08-07 Thread Julia Suvorova via Qemu-devel
On Wed, Aug 7, 2019 at 2:06 PM Aarushi Mehta  wrote:
>
>
>
> On Wed, 7 Aug, 2019, 17:15 Julia Suvorova,  wrote:
>>
>> On Fri, Aug 2, 2019 at 1:41 AM Aarushi Mehta  wrote:
>> > +int bdrv_parse_aio(const char *mode, int *flags)
>> > +{
>> > +if (!strcmp(mode, "threads")) {
>> > +/* do nothing, default */
>> > +} else if (!strcmp(mode, "native")) {
>> > +*flags |= BDRV_O_NATIVE_AIO;
>>
>> This 'if' should be covered with CONFIG_LINUX_AIO.
>
>
> The aio=native definition is shared with Windows hosts' native aio and will 
> break if it was covered.
>
> file-posix handles the case.

Fair enough. Then you can remove all ifdefs for io_uring from
raw_open_common in file-posix.c as this case was already checked here.

Best regards, Julia Suvorova.

>> > +#ifdef CONFIG_LINUX_IO_URING
>> > +} else if (!strcmp(mode, "io_uring")) {
>> > +*flags |= BDRV_O_IO_URING;
>> > +#endif
>> > +} else {
>> > +return -1;
>> > +}
>> > +
>> > +return 0;
>> > +}



Re: [Qemu-devel] [PATCH v9 07/17] blockdev: adds bdrv_parse_aio to use io_uring

2019-08-07 Thread Julia Suvorova via Qemu-devel
On Fri, Aug 2, 2019 at 1:41 AM Aarushi Mehta  wrote:
> +int bdrv_parse_aio(const char *mode, int *flags)
> +{
> +if (!strcmp(mode, "threads")) {
> +/* do nothing, default */
> +} else if (!strcmp(mode, "native")) {
> +*flags |= BDRV_O_NATIVE_AIO;

This 'if' should be covered with CONFIG_LINUX_AIO.

Best regards, Julia Suvorova.

> +#ifdef CONFIG_LINUX_IO_URING
> +} else if (!strcmp(mode, "io_uring")) {
> +*flags |= BDRV_O_IO_URING;
> +#endif
> +} else {
> +return -1;
> +}
> +
> +return 0;
> +}



Re: [Qemu-devel] [PATCH v9 04/17] block/io_uring: implements interfaces for io_uring

2019-08-07 Thread Julia Suvorova via Qemu-devel
On Fri, Aug 2, 2019 at 1:41 AM Aarushi Mehta  wrote:
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d6de200453..be688fcd5e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2530,6 +2530,13 @@ F: block/file-posix.c
>  F: block/file-win32.c
>  F: block/win32-aio.c
>
> +Linux io_uring
> +M: Aarushi Mehta 
> +R: Stefan Hajnoczi 

s/ste...@redhat.com/stefa...@redhat.com

> diff --git a/block/io_uring.c b/block/io_uring.c
> new file mode 100644
> index 00..902b106954
> --- /dev/null
> +++ b/block/io_uring.c
> @@ -0,0 +1,409 @@
> +/*
> + * Linux io_uring support.
> + *
> + * Copyright (C) 2009 IBM, Corp.
> + * Copyright (C) 2009 Red Hat, Inc.
> + * Copyright (C) 2019 Aarushi Mehta
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#include "qemu/osdep.h"
> +#include 
> +#include "qemu-common.h"
> +#include "block/aio.h"
> +#include "qemu/queue.h"
> +#include "block/block.h"
> +#include "block/raw-aio.h"
> +#include "qemu/coroutine.h"
> +#include "qapi/error.h"
> +
> +#define MAX_EVENTS 128

This is called 'entries' in the liburing documentation, so MAX_ENTRIES
will be a better choice.

> +typedef struct LuringState {
> +AioContext *aio_context;
> +
> +struct io_uring ring;
> +
> +/* io queue for submit at batch.  Protected by AioContext lock. */
> +LuringQueue io_q;
> +
> +/* I/O completion processing.  Only runs in I/O thread.  */
> +QEMUBH *completion_bh;
> +} LuringState;
> +
> +/**
> + * ioq_submit:
> + * @s: AIO state
> + *
> + * Queues pending sqes and submits them
> + *
> + */
> +static int ioq_submit(LuringState *s);

Now you can remove this declaration by moving the function before
luring_process_completions_and_submit()

> +LuringState *luring_init(Error **errp)
> +{
> +int rc;
> +LuringState *s;
> +s = g_new0(LuringState, 1);
> +struct io_uring *ring = &s->ring;

Please rewrite it with declarations at the beginning of the block.

Best regards, Julia Suvorova.



Re: [Qemu-devel] [PATCH v4 13/54] plugin: add user-facing API

2019-08-02 Thread Aaron Lindsay OS via Qemu-devel
One thing I would find useful is the ability to access register values
during an execution-time callback. I think the easiest way to do that
generically would be to expose them via the gdb functionality (like
Pavel's earlier patchset did [1]), though that (currently) limits you to
the general-purpose registers. Ideally it would be nice be able to
access other registers (i.e. floating-point, or maybe even system
registers), though those are more difficult to do generically.

Perhaps if we added some sort of architectural-support checking for
individual plugins like I mentioned in another response to this
patchset, we could allow some limited architecture-specific
functionality in this vein? I confess I haven't thought through all the
ramifications of that yet, though. 

-Aaron

[1] - See qemulib_read_register() at
  https://patchwork.ozlabs.org/patch/925393/



Re: [Qemu-devel] [PATCH v4 50/54] tests/plugin: add instruction execution breakdown

2019-08-01 Thread Aaron Lindsay OS via Qemu-devel
On Jul 31 17:07, Alex Bennée wrote:
> + * Attempt to measure the amount of vectorisation that has been done
> + * on some code by counting classes of instruction. This is very much
> + * ARM specific.

I suspect some of my plugins will also be architecture-specific. Does it
make sense to have a plugin specify to QEMU which architectures or
running modes (i.e. softmmu vs. linux user) it supports? Or
alternatively to have QEMU expose this information to the plugin so that
it can cleanly exit if its needs are not met?

-Aaron



Re: [Qemu-devel] [PATCH v4 24/54] plugins: implement helpers for resolving hwaddr

2019-08-01 Thread Aaron Lindsay OS via Qemu-devel
On Jul 31 17:06, Alex Bennée wrote:
> We need to keep a local per-cpu copy of the data as other threads may
> be running. We use a automatically growing array and re-use the space
> for subsequent queries.

[...]

> +bool tlb_plugin_lookup(CPUState *cpu, target_ulong addr, int mmu_idx,
> +   bool is_store, struct qemu_plugin_hwaddr *data)
> +{
> +CPUArchState *env = cpu->env_ptr;
> +CPUTLBEntry *tlbe = tlb_entry(env, mmu_idx, addr);
> +target_ulong tlb_addr = is_store ? tlb_addr_write(tlbe) : 
> tlbe->addr_read;
> +
> +if (tlb_hit(tlb_addr, addr)) {
> +if (tlb_addr & TLB_MMIO) {
> +data->hostaddr = 0;
> +data->is_io = true;
> +/* XXX: lookup device */
> +} else {
> +data->hostaddr = addr + tlbe->addend;
> +data->is_io = false;
> +}
> +return true;
> +}
> +return false;
> +}

In what cases do you expect tlb_hit() should not evaluate to true here?
Will returns of false only be in error cases, or do you expect it can
occur during normal operation? In particular, I'm interested in ensuring
this is as reliable as possible, since some plugins may require physical
addresses.

> +struct qemu_plugin_hwaddr *qemu_plugin_get_hwaddr(qemu_plugin_meminfo_t info,
> +  uint64_t vaddr)
> +{
> +CPUState *cpu = current_cpu;
> +unsigned int mmu_idx = info >> TRACE_MEM_MMU_SHIFT;
> +struct qemu_plugin_hwaddr *hwaddr;
> +
> +/* Ensure we have memory allocated for this work */
> +if (!hwaddr_refs) {
> +hwaddr_refs = g_array_sized_new(false, true,
> +sizeof(struct qemu_plugin_hwaddr),
> +cpu->cpu_index + 1);
> +} else if (cpu->cpu_index >= hwaddr_refs->len) {
> +hwaddr_refs = g_array_set_size(hwaddr_refs, cpu->cpu_index + 1);
> +}

Are there one or more race conditions with the allocations here? If so,
could they be solved by doing the allocations at plugin initialization
and when the number of online cpu's changes, instead of lazily?

>  uint64_t qemu_plugin_hwaddr_to_raddr(const struct qemu_plugin_hwaddr *haddr)

I was at first confused about the utility of this function until I
(re-?)discovered you had to convert first to hwaddr and then raddr to
get a "true" physical address. Perhaps that could be added to a comment
here or in the API definition in the main plugin header file.

-Aaron



Re: [Qemu-devel] [PATCH v4 04/54] target/arm: remove run time semihosting checks

2019-08-01 Thread Aaron Lindsay OS via Qemu-devel
On Jul 31 17:06, Alex Bennée wrote:
> Now we do all our checking and use a common EXCP_SEMIHOST for
> semihosting operations we can make helper code a lot simpler.
> 
> Signed-off-by: Alex Bennée 
> 
> ---
> v2
>   - fix re-base conflicts
>   - hoist EXCP_SEMIHOST check
>   - comment cleanups
> ---
>  target/arm/helper.c | 90 +
>  1 file changed, 18 insertions(+), 72 deletions(-)
> 
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index b74c23a9bc0..c5b90a83d36 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -8259,86 +8259,30 @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
>new_el, env->pc, pstate_read(env));
>  }
>  
> -static inline bool check_for_semihosting(CPUState *cs)
> +/*
> + * Do semihosting call and set the appropriate return value. All the
> + * permission and validity checks have been done at translate time.
> + *
> + * We only see semihosting exceptions in TCG only as they are not
> + * trapped to the hypervisor in KVM.
> + */
> +static void handle_semihosting(CPUState *cs)
>  {
>  #ifdef CONFIG_TCG
> -/* Check whether this exception is a semihosting call; if so
> - * then handle it and return true; otherwise return false.
> - */
>  ARMCPU *cpu = ARM_CPU(cs);
>  CPUARMState *env = &cpu->env;
>  
>  if (is_a64(env)) {
> -if (cs->exception_index == EXCP_SEMIHOST) {
> -/* This is always the 64-bit semihosting exception.
> - * The "is this usermode" and "is semihosting enabled"
> - * checks have been done at translate time.
> - */
> -qemu_log_mask(CPU_LOG_INT,
> -  "...handling as semihosting call 0x%" PRIx64 "\n",
> -  env->xregs[0]);
> -env->xregs[0] = do_arm_semihosting(env);
> -return true;
> -}
> -return false;
> +qemu_log_mask(CPU_LOG_INT,
> +  "...handling as semihosting call 0x%" PRIx64 "\n",
> +  env->xregs[0]);
> +env->xregs[0] = do_arm_semihosting(env);
>  } else {
> -uint32_t imm;
> -
> -/* Only intercept calls from privileged modes, to provide some
> - * semblance of security.
> - */
> -if (cs->exception_index != EXCP_SEMIHOST &&
> -(!semihosting_enabled() ||
> - ((env->uncached_cpsr & CPSR_M) == ARM_CPU_MODE_USR))) {
> -return false;
> -}
> -
> -switch (cs->exception_index) {
> -case EXCP_SEMIHOST:
> -/* This is always a semihosting call; the "is this usermode"
> - * and "is semihosting enabled" checks have been done at
> - * translate time.
> - */
> -break;
> -case EXCP_SWI:
> -/* Check for semihosting interrupt.  */
> -if (env->thumb) {
> -imm = arm_lduw_code(env, env->regs[15] - 2, arm_sctlr_b(env))
> -& 0xff;
> -if (imm == 0xab) {
> -break;
> -}
> -} else {
> -imm = arm_ldl_code(env, env->regs[15] - 4, arm_sctlr_b(env))
> -& 0xff;
> -if (imm == 0x123456) {
> -break;
> -}
> -}
> -return false;
> -case EXCP_BKPT:
> -/* See if this is a semihosting syscall.  */
> -if (env->thumb) {
> -imm = arm_lduw_code(env, env->regs[15], arm_sctlr_b(env))
> -& 0xff;
> -if (imm == 0xab) {
> -env->regs[15] += 2;
> -break;
> -}
> -}
> -return false;
> -default:
> -return false;
> -}
> -
>  qemu_log_mask(CPU_LOG_INT,
>"...handling as semihosting call 0x%x\n",
>env->regs[0]);
>  env->regs[0] = do_arm_semihosting(env);
> -return true;
>  }
> -#else
> -return false;
>  #endif
>  }
>  
> @@ -8371,11 +8315,13 @@ void arm_cpu_do_interrupt(CPUState *cs)
>  return;
>  }
>  
> -/* Semihosting semantics depend on the register width of the
> - * code that caused the exception, not the target exception level,
> - * so must be handled here.
> +/*
> + * Semihosting semantics depend on the register width of the code
> + * that caused the exception, not the target exception level, so
> + * must be handled here.
>   */
> -if (check_for_semihosting(cs)) {
> +if (cs->exception_index == EXCP_SEMIHOST) {
> +handle_semihosting(cs);
>  return;
>  }

Previously, this code would never return here if CONFIG_TCG was not
defined because check_for_semihosting() always returned false in that
case. Is it now true that `cs->exception_index` will never hold

[Qemu-devel] [Bug 1838569] Re: virtio-balloon change breaks post 4.0 upgrade

2019-07-31 Thread Bjoern Teipel via Qemu-devel
** Description changed:

- We upgraded the libvirt UCA packages from 3.6 to 4.0 as part of a queens 
upgrade and noticed that
+ We upgraded the libvirt UCA packages from 3.6 to 4.0 and qemu 2.10 to 2.11  
as part of a queens upgrade and noticed that
  virtio-ballon is broken when instances live migrate (started with a prior 3.6 
version)  with:
- 
  
  2019-07-24T06:46:49.487109Z qemu-system-x86_64: warning: Unknown firmware 
file in legacy mode: etc/msr_feature_control
  2019-07-24T06:47:22.187749Z qemu-system-x86_64: VQ 2 size 0x80 < 
last_avail_idx 0xb57 - used_idx 0xb59
  2019-07-24T06:47:22.187768Z qemu-system-x86_64: Failed to load 
virtio-balloon:virtio
  2019-07-24T06:47:22.187771Z qemu-system-x86_64: error while loading state for 
instance 0x0 of device ':00:05.0/virtio-balloon'
  2019-07-24T06:47:22.188194Z qemu-system-x86_64: load of migration failed: 
Operation not permitted
  2019-07-24 06:47:22.430+: shutting down, reason=failed
  
  This seem to be the exact problem as reported by
  https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02228.html
  
  Listed the packages which changed:
  
  Start-Date: 2019-07-06  06:40:55
  Commandline: /usr/bin/apt-get -y -o Dpkg::Options::=--force-confdef -o 
Dpkg::Options::=--force-confold install libvirt-bin python-libvirt qemu 
qemu-utils qemu-system qemu-system-arm qemu-system-mips qemu-system-ppc 
qemu-system-sparc qemu-system-x86 qemu-system-misc qemu-block-extra qemu-utils 
qemu-user qemu-kvm
  Install: librdmacm1:amd64 (17.1-1ubuntu0.1~cloud0, automatic), 
libvirt-daemon-driver-storage-rbd:amd64 (4.0.0-1ubuntu8.10~cloud0, automatic), 
ipxe-qemu-256k-compat-efi-roms:amd64 
(1.0.0+git-20150424.a25a16d-0ubuntu2~cloud0, automatic)
  Upgrade: qemu-system-mips:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 
1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-system-misc:amd64 
(1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), 
qemu-system-ppc:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 
1:2.11+dfsg-1ubuntu7.13~cloud0), python-libvirt:amd64 (3.5.0-1build1~cloud0, 
4.0.0-1~cloud0), qemu-system-x86:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 
1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt-clients:amd64 
(3.6.0-1ubuntu6.8~cloud0, 4.0.0-1ubuntu8.10~cloud0), qemu-user:amd64 
(1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), 
libvirt-bin:amd64 (3.6.0-1ubuntu6.8~cloud0, 4.0.0-1ubuntu8.10~cloud0), 
qemu:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), 
qemu-utils:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 
1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt-daemon-system:amd64 
(3.6.0-1ubuntu6.8~cloud0, 4.0.0-1ubuntu8.10~cloud0), qemu-system-sparc:amd64 
(1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), 
qemu-user-binfmt:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 
1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-kvm:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 
1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt0:amd64 (3.6.0-1ubuntu6.8~cloud0, 
4.0.0-1ubuntu8.10~cloud0), qemu-system-arm:amd64 
(1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), 
qemu-block-extra:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 
1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-system-common:amd64 
(1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), 
qemu-system:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 
1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt-daemon:amd64 (3.6.0-1ubuntu6.8~cloud0, 
4.0.0-1ubuntu8.10~cloud0)
  End-Date: 2019-07-06  06:41:08
  
  At this point the instances would have to be hard rebooted or
  stopped/started to fix the issue for future live migration attemps

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1838569

Title:
  virtio-balloon change breaks post 4.0 upgrade

Status in QEMU:
  New

Bug description:
  We upgraded the libvirt UCA packages from 3.6 to 4.0 and qemu 2.10 to 2.11  
as part of a queens upgrade and noticed that
  virtio-ballon is broken when instances live migrate (started with a prior 3.6 
version)  with:

  2019-07-24T06:46:49.487109Z qemu-system-x86_64: warning: Unknown firmware 
file in legacy mode: etc/msr_feature_control
  2019-07-24T06:47:22.187749Z qemu-system-x86_64: VQ 2 size 0x80 < 
last_avail_idx 0xb57 - used_idx 0xb59
  2019-07-24T06:47:22.187768Z qemu-system-x86_64: Failed to load 
virtio-balloon:virtio
  2019-07-24T06:47:22.187771Z qemu-system-x86_64: error while loading state for 
instance 0x0 of device ':00:05.0/virtio-balloon'
  2019-07-24T06:47:22.188194Z qemu-system-x86_64: load of migration failed: 
Operation not permitted
  2019-07-24 06:47:22.430+: shutting down, reason=failed

  This seem to be the exact problem as reported by
  https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02228.html

  Listed the packages which changed:

  Start-Date: 2019-07-06  06:40:55
  Commandline: /usr/bin/apt-get -y -o Dpkg::Options::=--force-confdef -o 
Dpkg::Options::=--force-confold install libvirt-bin python-lib

[Qemu-devel] [Bug 1838569] [NEW] virtio-balloon change breaks post 4.0 upgrade

2019-07-31 Thread Bjoern Teipel via Qemu-devel
Public bug reported:

We upgraded the libvirt UCA packages from 3.6 to 4.0 as part of a queens 
upgrade and noticed that
virtio-ballon is broken when instances live migrate (started with a prior 3.6 
version)  with:


2019-07-24T06:46:49.487109Z qemu-system-x86_64: warning: Unknown firmware file 
in legacy mode: etc/msr_feature_control
2019-07-24T06:47:22.187749Z qemu-system-x86_64: VQ 2 size 0x80 < last_avail_idx 
0xb57 - used_idx 0xb59
2019-07-24T06:47:22.187768Z qemu-system-x86_64: Failed to load 
virtio-balloon:virtio
2019-07-24T06:47:22.187771Z qemu-system-x86_64: error while loading state for 
instance 0x0 of device ':00:05.0/virtio-balloon'
2019-07-24T06:47:22.188194Z qemu-system-x86_64: load of migration failed: 
Operation not permitted
2019-07-24 06:47:22.430+: shutting down, reason=failed

This seem to be the exact problem as reported by
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02228.html

Listed the packages which changed:

Start-Date: 2019-07-06  06:40:55
Commandline: /usr/bin/apt-get -y -o Dpkg::Options::=--force-confdef -o 
Dpkg::Options::=--force-confold install libvirt-bin python-libvirt qemu 
qemu-utils qemu-system qemu-system-arm qemu-system-mips qemu-system-ppc 
qemu-system-sparc qemu-system-x86 qemu-system-misc qemu-block-extra qemu-utils 
qemu-user qemu-kvm
Install: librdmacm1:amd64 (17.1-1ubuntu0.1~cloud0, automatic), 
libvirt-daemon-driver-storage-rbd:amd64 (4.0.0-1ubuntu8.10~cloud0, automatic), 
ipxe-qemu-256k-compat-efi-roms:amd64 
(1.0.0+git-20150424.a25a16d-0ubuntu2~cloud0, automatic)
Upgrade: qemu-system-mips:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 
1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-system-misc:amd64 
(1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), 
qemu-system-ppc:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 
1:2.11+dfsg-1ubuntu7.13~cloud0), python-libvirt:amd64 (3.5.0-1build1~cloud0, 
4.0.0-1~cloud0), qemu-system-x86:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 
1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt-clients:amd64 
(3.6.0-1ubuntu6.8~cloud0, 4.0.0-1ubuntu8.10~cloud0), qemu-user:amd64 
(1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), 
libvirt-bin:amd64 (3.6.0-1ubuntu6.8~cloud0, 4.0.0-1ubuntu8.10~cloud0), 
qemu:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), 
qemu-utils:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 
1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt-daemon-system:amd64 
(3.6.0-1ubuntu6.8~cloud0, 4.0.0-1ubuntu8.10~cloud0), qemu-system-sparc:amd64 
(1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), 
qemu-user-binfmt:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 
1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-kvm:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 
1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt0:amd64 (3.6.0-1ubuntu6.8~cloud0, 
4.0.0-1ubuntu8.10~cloud0), qemu-system-arm:amd64 
(1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), 
qemu-block-extra:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 
1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-system-common:amd64 
(1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), 
qemu-system:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 
1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt-daemon:amd64 (3.6.0-1ubuntu6.8~cloud0, 
4.0.0-1ubuntu8.10~cloud0)
End-Date: 2019-07-06  06:41:08

At this point the instances would have to be hard rebooted or
stopped/started to fix the issue for future live migration attemps

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1838569

Title:
  virtio-balloon change breaks post 4.0 upgrade

Status in QEMU:
  New

Bug description:
  We upgraded the libvirt UCA packages from 3.6 to 4.0 as part of a queens 
upgrade and noticed that
  virtio-ballon is broken when instances live migrate (started with a prior 3.6 
version)  with:

  
  2019-07-24T06:46:49.487109Z qemu-system-x86_64: warning: Unknown firmware 
file in legacy mode: etc/msr_feature_control
  2019-07-24T06:47:22.187749Z qemu-system-x86_64: VQ 2 size 0x80 < 
last_avail_idx 0xb57 - used_idx 0xb59
  2019-07-24T06:47:22.187768Z qemu-system-x86_64: Failed to load 
virtio-balloon:virtio
  2019-07-24T06:47:22.187771Z qemu-system-x86_64: error while loading state for 
instance 0x0 of device ':00:05.0/virtio-balloon'
  2019-07-24T06:47:22.188194Z qemu-system-x86_64: load of migration failed: 
Operation not permitted
  2019-07-24 06:47:22.430+: shutting down, reason=failed

  This seem to be the exact problem as reported by
  https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02228.html

  Listed the packages which changed:

  Start-Date: 2019-07-06  06:40:55
  Commandline: /usr/bin/apt-get -y -o Dpkg::Options::=--force-confdef -o 
Dpkg::Options::=--force-confold install libvirt-bin python-libvirt qemu 
qemu-utils qemu-system qemu-system-arm qemu-system-mips qemu-system-ppc 
qemu-system-sparc qemu-system-x86 qemu-system-

Re: [Qemu-devel] RFC raspberry pi sd-card reset

2019-07-31 Thread Andrew Baumann via Qemu-devel
Hi,



(Sorry for top-posting, just wanted to give you some quick context.)



The Pi-specific quirk here is that there are two different SD controllers on 
the board, both accessing the same card, where only one can be used at a time. 
IIRC Clement Deschamps added this reparenting logic to accomplish that when he 
implemented the second SD controller. I can’t give you a concrete suggestion, 
but “initialize the platform with the sd-card at the right initial place” is 
not really viable given that the right place changes depending on GPIO 
programming by the guest.



Andrew




From: Damien Hedde 
Sent: Wednesday, July 31, 2019 7:21:02 AM
To: QEMU Developers 
Cc: Peter Maydell ; Andrew Baumann 
; f4...@amsat.org ; qemu-arm 

Subject: RFC raspberry pi sd-card reset

Hi,

Concerning the reset on the raspi2/3 machine, I ran into an issue with
the sd-card.

Here follows a subset of the qbus/qdev tree of the raspi2&3 machine as
it is initialized:
 + System
   * bcm2835_gpio
 + sd-bus
   * sd-card
   * bcm2835-sdhost
 + bcm2835-sdhost-bus
   * generic-sdhci
 + sdhci-bus

raspi_init does 2 things:
 + it creates the soc
 + then it explicitly creates the sd-card on the bcm2835_gpio's sd-bus

As it happens, the reset moves the sd-card on another bus: the
sdhci-bus. More precisely the bcm2835_gpio reset method does it (the
sd-card can be mapped on bcm2835-sdhost-bus or sdhci-bus depending on
bcm2835_gpio registers, reset configuration corresponds to the sdhci-bus).

Reset call order is the following (it follows children-before-parent order):
 1 sd-card
 2 sd-bus
 3 bcm2835_gpio -> move the sd-card
 4 bcm2835-sdhost-bus
 5 bcm2835-sdhost
 6 sd-card  (again)
 7 sdhci-bus
 8 generic-sdhci

In the end, the sd-card is reset twice, which is no big-deal in itself.
But it only depends on the bcm2835_gpio tree being reset before the
generic-sdhci (probably depends on the machine creation code).

I checked and it seems this is the only platform where such things
happen during master reset.

IMO this is a bit hazardous because reset is based on the qdev/qbus
tree (and with the multi-phase I'm working on, even if it still works,
it's worse).
I'm not sure what we should do to avoid this (and if there's is
something to be done).

The easiest solution would be to initialize the platform with the
sd-card at the right initial place (I do not really understand why it is
created in the bcm2835_gpio in the first place since we move it just
after). But it won't solve the issue if a reset is done afterwards.

Or maybe we could move the sd-card on the proper bus in a machine
reset code so that it's on the right place when we do the sysbus tree
reset just after.

What do you think ?

--
Damien


Re: [Qemu-devel] [Qemu-ppc] [PATCH] Allow bit 15 to be set to 1 on slbmfee and slbmfev

2019-07-19 Thread Ivan Warren via Qemu-devel


Le 7/19/2019 à 3:34 AM, David Gibson a écrit :

On Thu, Jul 18, 2019 at 10:15:52PM +0200, Ivan Warren wrote:

Le 7/18/2019 à 7:19 PM, Greg Kurz a écrit :

We usually mention the subsystem name in the subject, ie.

target/ppc: Allow bit 15 to be set to 1 on slbmfee and slbmfev

Gotcha ! Still learning the process as I go. Next time I submit something,
I'll follow the guidelines more accurately.

On Thu, 18 Jul 2019 14:44:49 +0200
Ivan Warren  wrote:


Allow bit 15 to be 1 in the slbmfee and slbmfev in TCG
as per Power ISA 3.0B (Power 9) Book III pages 1029 and 1030.
Per this specification, bit 15 is implementation specific
so it may be 1, but can probably ne safely ignored.

Another typo from me !

s/ne safely/be safely/


Power ISA 2.07B (Power 7/Power 8) indicates the bit is
reserved but some none Linux operating systems do set

s/none Linux/non-Linux

Thanks ! Sorry for the typo !

this bit to 1 when entering the debugger.
So it is likely it is implemented on those systems
but wasn't yet documented.


ISA describes things that are common to several processor types,
but each implementation may do some extra stuff... like giving
a special meaning to an invalid instruction form for example (see
commit fa200c95f7f99ce14b8af25ea0be478c722d3cec). This is supposed
to be documented in the user manual.

Maybe something similar was done with the reserved bit 15, even if I
could fine no trace of that in the Power8 UM... of course. I'll try
to find clues within IBM.

https://openpowerfoundation.org/?resource_lib=power8-processor-users-manual

but it is indeed mentioned in the Power9 UM:

https://openpowerfoundation.org/?resource_lib=power-processor-users-manual

4.10.7.2 SLB Management Instructions

The POWER9 core implements the SLB management instructions as defined in the
Power ISA (Version 3.0B). Specifically, the following instruction details are
noteworthy:
• The slbmfee and slbmfev instructions can read any SLB entry when UPRT = ‘1’,
if the L-bit in the instruction image is set to a ‘1’. This is an
implementation-specific feature that will only be used in the future if and
when the POWER9 processor core supports UPRT = ‘1’ for HPT translation.

Not sure if we support that in TCG, but it doesn't hurt to relax the check
if that's enough to make AIX's debugger happy.

Yep !

Reviewed-by: Greg Kurz 


Signed-off-by: Ivan Warren 
---

The original creator of the patch is "Zhuowei Zhang"
(https://twitter.com/zhuowei) but I couldn't find any e-mail address.


This is the original patch, correct ?

https://github.com/zhuowei/qemu/commit/c5f305c5d0cd336b2bb31cab8a70f90b72905a1e

Indeed !

After speaking with some QEMU folks on irc, it is okay to ignore the lack
of S-o-b because the patch is trivial. But the general rule is to always
require an S-o-b when posting someone else's patch.

Is it good practice to add a S-o-b without the original author's consent
and/or without an e-mail address ?

Absolutely not.


I thought as much (and why I didn't do it). However, I still wanted to 
give credit (but not a "Signed-off-by:" tag since this didn't actually 
occur) to the person who originally found the issue (and the fix). I 
guess it should simply have been included in the commit log as a 
comment, not as a comment in the patch submission message (between the 
--- and the actual diff).


Anyway, I think the patch is sane and carries no systemic risk (it only 
affects AIX KBD - and possibly other low level debuggers) - and there 
are probably no PPC system that actually rely on those instructions 
throwing an exception if this particular bit is not 0 !





Although I would very much doubt he would
have complained.

Anyway, thanks for reviewing and for the tips ! (and sorry for all the
noise).


    target/ppc/translate.c | 4 ++--
    1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 4a5de28036..85f8b147ba 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -7064,8 +7064,8 @@ GEN_HANDLER2(mtsr_64b, "mtsr", 0x1F, 0x12, 0x06,
0x0010F801, PPC_SEGMENT_64B),
    GEN_HANDLER2(mtsrin_64b, "mtsrin", 0x1F, 0x12, 0x07, 0x001F0001,
     PPC_SEGMENT_64B),
    GEN_HANDLER2(slbmte, "slbmte", 0x1F, 0x12, 0x0C, 0x001F0001,
PPC_SEGMENT_64B),
-GEN_HANDLER2(slbmfee, "slbmfee", 0x1F, 0x13, 0x1C, 0x001F0001,
PPC_SEGMENT_64B),
-GEN_HANDLER2(slbmfev, "slbmfev", 0x1F, 0x13, 0x1A, 0x001F0001,
PPC_SEGMENT_64B),
+GEN_HANDLER2(slbmfee, "slbmfee", 0x1F, 0x13, 0x1C, 0x001E0001,
PPC_SEGMENT_64B),
+GEN_HANDLER2(slbmfev, "slbmfev", 0x1F, 0x13, 0x1A, 0x001E0001,
PPC_SEGMENT_64B),
    GEN_HANDLER2(slbfee_, "slbfee.", 0x1F, 0x13, 0x1E, 0x001F,
PPC_SEGMENT_64B),
    #endif
    GEN_HANDLER(tlbia, 0x1F, 0x12, 0x0B, 0x03FFFC01, PPC_MEM_TLBIA),







smime.p7s
Description: Signature cryptographique S/MIME


Re: [Qemu-devel] [Qemu-ppc] [PATCH] Allow bit 15 to be set to 1 on slbmfee and slbmfev

2019-07-18 Thread Ivan Warren via Qemu-devel


Le 7/18/2019 à 7:19 PM, Greg Kurz a écrit :

We usually mention the subsystem name in the subject, ie.

target/ppc: Allow bit 15 to be set to 1 on slbmfee and slbmfev
Gotcha ! Still learning the process as I go. Next time I submit 
something, I'll follow the guidelines more accurately.


On Thu, 18 Jul 2019 14:44:49 +0200
Ivan Warren  wrote:


Allow bit 15 to be 1 in the slbmfee and slbmfev in TCG
as per Power ISA 3.0B (Power 9) Book III pages 1029 and 1030.
Per this specification, bit 15 is implementation specific
so it may be 1, but can probably ne safely ignored.


Another typo from me !

s/ne safely/be safely/



Power ISA 2.07B (Power 7/Power 8) indicates the bit is
reserved but some none Linux operating systems do set

s/none Linux/non-Linux

Thanks ! Sorry for the typo !



this bit to 1 when entering the debugger.
So it is likely it is implemented on those systems
but wasn't yet documented.


ISA describes things that are common to several processor types,
but each implementation may do some extra stuff... like giving
a special meaning to an invalid instruction form for example (see
commit fa200c95f7f99ce14b8af25ea0be478c722d3cec). This is supposed
to be documented in the user manual.

Maybe something similar was done with the reserved bit 15, even if I
could fine no trace of that in the Power8 UM... of course. I'll try
to find clues within IBM.

https://openpowerfoundation.org/?resource_lib=power8-processor-users-manual

but it is indeed mentioned in the Power9 UM:

https://openpowerfoundation.org/?resource_lib=power-processor-users-manual

4.10.7.2 SLB Management Instructions

The POWER9 core implements the SLB management instructions as defined in the
Power ISA (Version 3.0B). Specifically, the following instruction details are
noteworthy:
• The slbmfee and slbmfev instructions can read any SLB entry when UPRT = ‘1’,
   if the L-bit in the instruction image is set to a ‘1’. This is an
   implementation-specific feature that will only be used in the future if and
   when the POWER9 processor core supports UPRT = ‘1’ for HPT translation.

Not sure if we support that in TCG, but it doesn't hurt to relax the check
if that's enough to make AIX's debugger happy.

Yep !


Reviewed-by: Greg Kurz 


Signed-off-by: Ivan Warren 
---

The original creator of the patch is "Zhuowei Zhang"
(https://twitter.com/zhuowei) but I couldn't find any e-mail address.


This is the original patch, correct ?

https://github.com/zhuowei/qemu/commit/c5f305c5d0cd336b2bb31cab8a70f90b72905a1e

Indeed !


After speaking with some QEMU folks on irc, it is okay to ignore the lack
of S-o-b because the patch is trivial. But the general rule is to always
require an S-o-b when posting someone else's patch.


Is it good practice to add a S-o-b without the original author's consent 
and/or without an e-mail address ? Although I would very much doubt he 
would have complained.


Anyway, thanks for reviewing and for the tips ! (and sorry for all the 
noise).





   target/ppc/translate.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 4a5de28036..85f8b147ba 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -7064,8 +7064,8 @@ GEN_HANDLER2(mtsr_64b, "mtsr", 0x1F, 0x12, 0x06,
0x0010F801, PPC_SEGMENT_64B),
   GEN_HANDLER2(mtsrin_64b, "mtsrin", 0x1F, 0x12, 0x07, 0x001F0001,
    PPC_SEGMENT_64B),
   GEN_HANDLER2(slbmte, "slbmte", 0x1F, 0x12, 0x0C, 0x001F0001,
PPC_SEGMENT_64B),
-GEN_HANDLER2(slbmfee, "slbmfee", 0x1F, 0x13, 0x1C, 0x001F0001,
PPC_SEGMENT_64B),
-GEN_HANDLER2(slbmfev, "slbmfev", 0x1F, 0x13, 0x1A, 0x001F0001,
PPC_SEGMENT_64B),
+GEN_HANDLER2(slbmfee, "slbmfee", 0x1F, 0x13, 0x1C, 0x001E0001,
PPC_SEGMENT_64B),
+GEN_HANDLER2(slbmfev, "slbmfev", 0x1F, 0x13, 0x1A, 0x001E0001,
PPC_SEGMENT_64B),
   GEN_HANDLER2(slbfee_, "slbfee.", 0x1F, 0x13, 0x1E, 0x001F,
PPC_SEGMENT_64B),
   #endif
   GEN_HANDLER(tlbia, 0x1F, 0x12, 0x0B, 0x03FFFC01, PPC_MEM_TLBIA),
--
2.20.1






smime.p7s
Description: Signature cryptographique S/MIME


[Qemu-devel] [PATCH] Allow bit 15 to be set to 1 on slbmfee and slbmfev

2019-07-18 Thread Ivan Warren via Qemu-devel

Allow bit 15 to be 1 in the slbmfee and slbmfev in TCG
as per Power ISA 3.0B (Power 9) Book III pages 1029 and 1030.
Per this specification, bit 15 is implementation specific
so it may be 1, but can probably ne safely ignored.

Power ISA 2.07B (Power 7/Power 8) indicates the bit is
reserved but some none Linux operating systems do set
this bit to 1 when entering the debugger.
So it is likely it is implemented on those systems
but wasn't yet documented.

Signed-off-by: Ivan Warren 
---

The original creator of the patch is "Zhuowei Zhang" 
(https://twitter.com/zhuowei) but I couldn't find any e-mail address.


 target/ppc/translate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 4a5de28036..85f8b147ba 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -7064,8 +7064,8 @@ GEN_HANDLER2(mtsr_64b, "mtsr", 0x1F, 0x12, 0x06, 
0x0010F801, PPC_SEGMENT_64B),

 GEN_HANDLER2(mtsrin_64b, "mtsrin", 0x1F, 0x12, 0x07, 0x001F0001,
  PPC_SEGMENT_64B),
 GEN_HANDLER2(slbmte, "slbmte", 0x1F, 0x12, 0x0C, 0x001F0001, 
PPC_SEGMENT_64B),
-GEN_HANDLER2(slbmfee, "slbmfee", 0x1F, 0x13, 0x1C, 0x001F0001, 
PPC_SEGMENT_64B),
-GEN_HANDLER2(slbmfev, "slbmfev", 0x1F, 0x13, 0x1A, 0x001F0001, 
PPC_SEGMENT_64B),
+GEN_HANDLER2(slbmfee, "slbmfee", 0x1F, 0x13, 0x1C, 0x001E0001, 
PPC_SEGMENT_64B),
+GEN_HANDLER2(slbmfev, "slbmfev", 0x1F, 0x13, 0x1A, 0x001E0001, 
PPC_SEGMENT_64B),
 GEN_HANDLER2(slbfee_, "slbfee.", 0x1F, 0x13, 0x1E, 0x001F, 
PPC_SEGMENT_64B),

 #endif
 GEN_HANDLER(tlbia, 0x1F, 0x12, 0x0B, 0x03FFFC01, PPC_MEM_TLBIA),
--
2.20.1




smime.p7s
Description: Signature cryptographique S/MIME


[Qemu-devel] Allow Bit 15 in slbmfee and slbmfev per Power ISA 3.02B Book III pages 1299 and 1300

2019-07-16 Thread Ivan Warren via Qemu-devel
My previous message might have felt through the cracks due to some 
improper formating.




diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 4a5de28036..85f8b147ba 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -7064,8 +7064,8 @@ GEN_HANDLER2(mtsr_64b, "mtsr", 0x1F, 0x12, 0x06, 
0x0010F801, PPC_SEGMENT_64B),

 GEN_HANDLER2(mtsrin_64b, "mtsrin", 0x1F, 0x12, 0x07, 0x001F0001,
  PPC_SEGMENT_64B),
 GEN_HANDLER2(slbmte, "slbmte", 0x1F, 0x12, 0x0C, 0x001F0001, 
PPC_SEGMENT_64B),
-GEN_HANDLER2(slbmfee, "slbmfee", 0x1F, 0x13, 0x1C, 0x001F0001, 
PPC_SEGMENT_64B),
-GEN_HANDLER2(slbmfev, "slbmfev", 0x1F, 0x13, 0x1A, 0x001F0001, 
PPC_SEGMENT_64B),
+GEN_HANDLER2(slbmfee, "slbmfee", 0x1F, 0x13, 0x1C, 0x001E0001, 
PPC_SEGMENT_64B),
+GEN_HANDLER2(slbmfev, "slbmfev", 0x1F, 0x13, 0x1A, 0x001E0001, 
PPC_SEGMENT_64B),
 GEN_HANDLER2(slbfee_, "slbfee.", 0x1F, 0x13, 0x1E, 0x001F, 
PPC_SEGMENT_64B),

 #endif
 GEN_HANDLER(tlbia, 0x1F, 0x12, 0x0B, 0x03FFFC01, PPC_MEM_TLBIA),



If this is already being considered, please disregard,

--Ivan




smime.p7s
Description: Signature cryptographique S/MIME


[Qemu-devel] TCG - Allow bit 15 to 1 for slbmfee and slbmfev

2019-07-16 Thread Ivan Warren via Qemu-devel

All,

Submitting proposal :

Per Power ISA 3.02B Book III at pages 1029 and 1030, bit 15 of the 
slbmfee and slbmfev instructions is now assigned to an implementation 
specific bit and is no longer reserved - meaning it can be set to 1 but 
can probably be safely ignored.


2.07B still indicates bit 15 is reserved but some non Linux Operating 
system's debugger DO set this bit to 1 (so it was probably valid yet not 
documented for Power 7/8).


Therefore I propose :

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 4a5de28036..85f8b147ba 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -7064,8 +7064,8 @@ GEN_HANDLER2(mtsr_64b, "mtsr", 0x1F, 0x12, 0x06, 
0x0010F801, PPC_SEGMENT_64B),

 GEN_HANDLER2(mtsrin_64b, "mtsrin", 0x1F, 0x12, 0x07, 0x001F0001,
  PPC_SEGMENT_64B),
 GEN_HANDLER2(slbmte, "slbmte", 0x1F, 0x12, 0x0C, 0x001F0001, 
PPC_SEGMENT_64B),
-GEN_HANDLER2(slbmfee, "slbmfee", 0x1F, 0x13, 0x1C, 0x001F0001, 
PPC_SEGMENT_64B),
-GEN_HANDLER2(slbmfev, "slbmfev", 0x1F, 0x13, 0x1A, 0x001F0001, 
PPC_SEGMENT_64B),
+GEN_HANDLER2(slbmfee, "slbmfee", 0x1F, 0x13, 0x1C, 0x001E0001, 
PPC_SEGMENT_64B),
+GEN_HANDLER2(slbmfev, "slbmfev", 0x1F, 0x13, 0x1A, 0x001E0001, 
PPC_SEGMENT_64B),
 GEN_HANDLER2(slbfee_, "slbfee.", 0x1F, 0x13, 0x1E, 0x001F, 
PPC_SEGMENT_64B),

 #endif
 GEN_HANDLER(tlbia, 0x1F, 0x12, 0x0B, 0x03FFFC01, PPC_MEM_TLBIA),

PS : This patch is not mine, but gleaned from "Zhuowei Zhang" (no known 
e-mail address). I am just attempting to have it validated.






smime.p7s
Description: Signature cryptographique S/MIME


Re: [Qemu-devel] [PATCH for 4.1] Fix broken build with WHPX enabled

2019-07-15 Thread Justin Terry (VM) via Qemu-devel
Thanks Philippe, LGTM (not a maintainer)

FYI here are some docs on WHvSetPartitionProperty method. I know they aren’t 
comprehensive but they help: 
https://docs.microsoft.com/en-us/virtualization/api/hypervisor-platform/funcs/whvsetpartitionproperty

Not sure if you are asking for something with the link to the VirtualBox 
comment but the API allows for setting any of WHV_PARTITION_PROPERTY found at: 
https://docs.microsoft.com/en-us/virtualization/api/hypervisor-platform/funcs/whvpartitionpropertydatatypes
 (refer to .h file for most up to date in the sdk matching your build). This is 
why its not a simple "9 bytes" as indicated below.

Thanks,
Justin

> -Original Message-
> From: Philippe Mathieu-Daudé 
> Sent: Friday, July 12, 2019 7:35 AM
> To: Stefan Weil ; qemu-devel@nongnu.org
> Cc: Paolo Bonzini ; Eduardo Habkost
> ; Like Xu ; Richard
> Henderson ; Justin Terry (VM) 
> Subject: Re: [Qemu-devel] [PATCH for 4.1] Fix broken build with WHPX
> enabled
> 
> Cc'ing Justin
> 
> Maybe we should add a MAINTAINERS section for the WHPX files.
> 
> On 7/12/19 3:26 PM, Stefan Weil wrote:
> > Signed-off-by: Stefan Weil 
> > ---
> >  target/i386/whpx-all.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/target/i386/whpx-all.c b/target/i386/whpx-all.c index
> > 31d47320e4..ed95105eae 100644
> > --- a/target/i386/whpx-all.c
> > +++ b/target/i386/whpx-all.c
> > @@ -1396,7 +1396,7 @@ static int whpx_accel_init(MachineState *ms)
> >  }
> >
> >  memset(&prop, 0, sizeof(WHV_PARTITION_PROPERTY));
> > -prop.ProcessorCount = smp_cpus;
> > +prop.ProcessorCount = ms->smp.cpus;
> 
> I tried to understand how the Windows Hypervisor would answer to an
> invalid or zeroed property, but I can't find doc for
> WHvPartitionPropertyCodeProcessorCount.
> 
> There is a funny comment in VirtualBox although:
> 
>  /**
>   * @todo Someone at Microsoft please explain another weird API:
>   *  - Why this API doesn't take the WHV_PARTITION_PROPERTY_CODE value
> as an
>   *argument rather than as part of the struct.  That is so weird if
> you've
>   *used any other NT or windows API,  including WHvGetCapability().
>   *  - Why use PVOID when WHV_PARTITION_PROPERTY is what's expected.
> We
>   *technically only need 9 bytes for setting/getting
>   *WHVPartitionPropertyCodeProcessorClFlushSize, but the API insists
> on 16. */
> 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
> w.virtualbox.org%2Fsvn%2Fvbox%2Ftrunk%2Fsrc%2FVBox%2FVMM%2FVM
> MR3%2FNEMR3Native-
> win.cpp&data=02%7C01%7Cjuterry%40microsoft.com%7C792c50df03a8
> 4b0343cd08d706d61f24%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%
> 7C636985389026273493&sdata=DftQGby%2BhjojZkXthjxja0zLRfueRzi%2
> F2DUE%2FsI8m5s%3D&reserved=0
> 
> >  hr = whp_dispatch.WHvSetPartitionProperty(
> >  whpx->partition,
> >  WHvPartitionPropertyCodeProcessorCount,
> > @@ -1405,7 +1405,7 @@ static int whpx_accel_init(MachineState *ms)
> >
> >  if (FAILED(hr)) {
> >  error_report("WHPX: Failed to set partition core count to %d,"
> > - " hr=%08lx", smp_cores, hr);
> > + " hr=%08lx", ms->smp.cores, hr);
> >  ret = -EINVAL;
> >  goto error;
> >  }
> >


Re: [Qemu-devel] [PATCH v5 3/5] 9p: Added virtfs option 'remap_inodes'

2019-07-06 Thread Christian Schoenebeck via Qemu-devel
On Mittwoch, 3. Juli 2019 13:13:26 CEST Christian Schoenebeck wrote:
> To support multiple devices on the 9p share, and avoid
> qid path collisions we take the device id as input
[snip]
>  - Fixed v9fs_do_readdir() having exposed info outside
>export root with '..' entry.
[snip]
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index 8cc65c2c67..39c6c2a894 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
[snip]
> @@ -1940,6 +2041,19 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu,
> V9fsFidState *fidp, int32_t count = 0;
>  off_t saved_dir_pos;
>  struct dirent *dent;
> +struct stat stbuf;
> +bool fidIsExportRoot;
> +
> +/*
> + * determine if fidp is the export root, which is required for safe
> + * handling of ".." below
> + */
> +err = v9fs_co_lstat(pdu, &fidp->path, &stbuf);
> +if (err < 0) {
> +return err;
> +}
> +fidIsExportRoot = pdu->s->dev_id == stbuf.st_dev &&
> +  pdu->s->root_ino == stbuf.st_ino;
> 
>  /* save the directory position */
>  saved_dir_pos = v9fs_co_telldir(pdu, fidp);
> @@ -1964,16 +2078,51 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU
> *pdu, V9fsFidState *fidp, v9fs_string_free(&name);
>  return count;
>  }
> -/*
> - * Fill up just the path field of qid because the client uses
> - * only that. To fill the entire qid structure we will have
> - * to stat each dirent found, which is expensive
> - */
> -size = MIN(sizeof(dent->d_ino), sizeof(qid.path));
> -memcpy(&qid.path, &dent->d_ino, size);
> -/* Fill the other fields with dummy values */
> -qid.type = 0;
> -qid.version = 0;
> +
> +if (fidIsExportRoot && !strcmp("..", dent->d_name)) {
> +/*
> + * if "." is export root, then return qid of export root for
> + * ".." to avoid exposing anything outside the export
> + */
> +err = fid_to_qid(pdu, fidp, &qid);
> +if (err < 0) {
> +v9fs_readdir_unlock(&fidp->fs.dir);
> +v9fs_co_seekdir(pdu, fidp, saved_dir_pos);
> +v9fs_string_free(&name);
> +return err;
> +}

Hmm, I start to wonder whether I should postpone that particular bug fix and 
not make it part of that QID fix patch series (not even as separate patch 
there). Because that fix needs some more adjustments. E.g. I should adjust 
dent->d_type here as well; but more notably it should also distinguish between 
the case where the export root is mounted as / on guest or not and that's 
where this fix could become ugly and grow in size.

To make the case clear:  calling on guest   

readdir(pathOfSome9pExportRootOnGuest);

currently always returns for its ".." result entry the inode number and d_type 
of the export root's parent directory on host, so it exposes information of 
host outside the 9p export.

I don't see that as security issue, since the information revealed is limited 
to the inode number and d_type, but it is definitely incorrect behaviour.

Best regards,
Christian Schoenebeck



[Qemu-devel] [PATCH v5 5/5] 9p: Use variable length suffixes for inode remapping

2019-07-03 Thread Christian Schoenebeck via Qemu-devel
Use variable length suffixes for inode remapping instead of the fixed
16 bit size prefixes before. With this change the inode numbers on guest
will typically be much smaller (e.g. around >2^1 .. >2^7 instead of >2^48
with the previous fixed size inode remapping.

Additionally this solution is more efficient, since inode numbers in
practice can take almost their entire 64 bit range on guest as well, so
there is less likely a need for generating and tracking additional suffixes,
which might also be beneficial for nested virtualization where each level of
virtualization would shift up the inode bits and increase the chance of
expensive remapping actions.

The "Exponential Golomb" algorithm is used as basis for generating the
variable length suffixes. The algorithm has a parameter k which controls the
distribution of bits on increasing indeces (minimum bits at low index vs.
maximum bits at high index). With k=0 the generated suffixes look like:

Index Dec/Bin -> Generated Suffix Bin
1 [1] -> [1] (1 bits)
2 [10] -> [010] (3 bits)
3 [11] -> [110] (3 bits)
4 [100] -> [00100] (5 bits)
5 [101] -> [10100] (5 bits)
6 [110] -> [01100] (5 bits)
7 [111] -> [11100] (5 bits)
8 [1000] -> [0001000] (7 bits)
9 [1001] -> [1001000] (7 bits)
10 [1010] -> [0101000] (7 bits)
11 [1011] -> [1101000] (7 bits)
12 [1100] -> [0011000] (7 bits)
...
65533 [1101] ->  [1011000] (31 bits)
65534 [1110] ->  [0111000] (31 bits)
65535 [] ->  [000] (31 bits)
Hence minBits=1 maxBits=31

And with k=5 they would look like:

Index Dec/Bin -> Generated Suffix Bin
1 [1] -> [01] (6 bits)
2 [10] -> [11] (6 bits)
3 [11] -> [010001] (6 bits)
4 [100] -> [110001] (6 bits)
5 [101] -> [001001] (6 bits)
6 [110] -> [101001] (6 bits)
7 [111] -> [011001] (6 bits)
8 [1000] -> [111001] (6 bits)
9 [1001] -> [000101] (6 bits)
10 [1010] -> [100101] (6 bits)
11 [1011] -> [010101] (6 bits)
12 [1100] -> [110101] (6 bits)
...
65533 [1101] -> [001110001000] (28 bits)
65534 [1110] -> [101110001000] (28 bits)
65535 [] -> [00001000] (28 bits)
Hence minBits=6 maxBits=28

Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p.c | 247 ---
 hw/9pfs/9p.h |  34 +++-
 2 files changed, 251 insertions(+), 30 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 549e279462..5bbd3e2d14 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -26,6 +26,7 @@
 #include "migration/blocker.h"
 #include "sysemu/qtest.h"
 #include "qemu/xxhash.h"
+#include 
 
 int open_fd_hw;
 int total_open_fd;
@@ -572,6 +573,107 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu)
 P9_STAT_MODE_NAMED_PIPE |   \
 P9_STAT_MODE_SOCKET)
 
+/* Mirrors all bits of a byte. So e.g. binary 1010 would become 0101. 
*/
+static inline uint8_t mirror8bit(uint8_t byte)
+{
+return (byte * 0x0202020202ULL & 0x010884422010ULL) % 1023;
+}
+
+/* Same as mirror8bit() just for a 64 bit data type instead for a byte. */
+static inline uint64_t mirror64bit(uint64_t value)
+{
+return ((uint64_t)mirror8bit( value& 0xff) << 56) |
+   ((uint64_t)mirror8bit((value >> 8)  & 0xff) << 48) |
+   ((uint64_t)mirror8bit((value >> 16) & 0xff) << 40) |
+   ((uint64_t)mirror8bit((value >> 24) & 0xff) << 32) |
+   ((uint64_t)mirror8bit((value >> 32) & 0xff) << 24) |
+   ((uint64_t)mirror8bit((value >> 40) & 0xff) << 16) |
+   ((uint64_t)mirror8bit((value >> 48) & 0xff) << 8 ) |
+   ((uint64_t)mirror8bit((value >> 56) & 0xff)  ) ;
+}
+
+/** @brief Parameter k for the Exponential Golomb algorihm to be used.
+ *
+ * The smaller this value, the smaller the minimum bit count for the Exp.
+ * Golomb generated affixes will be (at lowest index) however for the
+ * price of having higher maximum bit count of generated affixes (at highest
+ * index). Likewise increasing this parameter yields in smaller maximum bit
+ * count for the price of having higher minimum bit count.
+ *
+ * In practice that means: a good value for k depends on the expected amount
+ * of devices to be exposed by one export. For a small amount of devices k
+ * should be small, for a large amount of devices k might be increased
+ * instead. The default of k=0 should be fine for most users though.
+ *
+ * @b IMPORTANT: In case this ever becomes a runtime parameter; the value of
+ * k should not change as long as guest is still running! Because that would
+ * cause completely different inode numbers to be genera

[Qemu-devel] [PATCH v5 2/5] 9p: Treat multiple devices on one export as an error

2019-07-03 Thread Christian Schoenebeck via Qemu-devel
The QID path should uniquely identify a file. However, the
inode of a file is currently used as the QID path, which
on its own only uniquely identifies files within a device.
Here we track the device hosting the 9pfs share, in order
to prevent security issues with QID path collisions from
other devices.

Signed-off-by: Antonios Motakis 
[CS: - Assign dev_id to export root's device already in
   v9fs_device_realize_common(), not postponed in
   stat_to_qid().
 - error_report_once() if more than one device was
   shared by export.
 - Return -ENODEV instead of -ENOSYS in stat_to_qid().
 - Fixed typo in log comment. ]
Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p.c | 69 
 hw/9pfs/9p.h |  1 +
 2 files changed, 56 insertions(+), 14 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 586a6dccba..8cc65c2c67 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -572,10 +572,18 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu)
 P9_STAT_MODE_SOCKET)
 
 /* This is the algorithm from ufs in spfs */
-static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp)
+static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID *qidp)
 {
 size_t size;
 
+if (pdu->s->dev_id != stbuf->st_dev) {
+error_report_once(
+"9p: Multiple devices detected in same VirtFS export. "
+"You must use a separate export for each device."
+);
+return -ENODEV;
+}
+
 memset(&qidp->path, 0, sizeof(qidp->path));
 size = MIN(sizeof(stbuf->st_ino), sizeof(qidp->path));
 memcpy(&qidp->path, &stbuf->st_ino, size);
@@ -587,6 +595,8 @@ static void stat_to_qid(const struct stat *stbuf, V9fsQID 
*qidp)
 if (S_ISLNK(stbuf->st_mode)) {
 qidp->type |= P9_QID_TYPE_SYMLINK;
 }
+
+return 0;
 }
 
 static int coroutine_fn fid_to_qid(V9fsPDU *pdu, V9fsFidState *fidp,
@@ -599,7 +609,10 @@ static int coroutine_fn fid_to_qid(V9fsPDU *pdu, 
V9fsFidState *fidp,
 if (err < 0) {
 return err;
 }
-stat_to_qid(&stbuf, qidp);
+err = stat_to_qid(pdu, &stbuf, qidp);
+if (err < 0) {
+return err;
+}
 return 0;
 }
 
@@ -830,7 +843,10 @@ static int coroutine_fn stat_to_v9stat(V9fsPDU *pdu, 
V9fsPath *path,
 
 memset(v9stat, 0, sizeof(*v9stat));
 
-stat_to_qid(stbuf, &v9stat->qid);
+err = stat_to_qid(pdu, stbuf, &v9stat->qid);
+if (err < 0) {
+return err;
+}
 v9stat->mode = stat_to_v9mode(stbuf);
 v9stat->atime = stbuf->st_atime;
 v9stat->mtime = stbuf->st_mtime;
@@ -891,7 +907,7 @@ static int coroutine_fn stat_to_v9stat(V9fsPDU *pdu, 
V9fsPath *path,
 #define P9_STATS_ALL   0x3fffULL /* Mask for All fields above */
 
 
-static void stat_to_v9stat_dotl(V9fsState *s, const struct stat *stbuf,
+static int stat_to_v9stat_dotl(V9fsPDU *pdu, const struct stat *stbuf,
 V9fsStatDotl *v9lstat)
 {
 memset(v9lstat, 0, sizeof(*v9lstat));
@@ -913,7 +929,7 @@ static void stat_to_v9stat_dotl(V9fsState *s, const struct 
stat *stbuf,
 /* Currently we only support BASIC fields in stat */
 v9lstat->st_result_mask = P9_STATS_BASIC;
 
-stat_to_qid(stbuf, &v9lstat->qid);
+return stat_to_qid(pdu, stbuf, &v9lstat->qid);
 }
 
 static void print_sg(struct iovec *sg, int cnt)
@@ -1115,7 +1131,6 @@ static void coroutine_fn v9fs_getattr(void *opaque)
 uint64_t request_mask;
 V9fsStatDotl v9stat_dotl;
 V9fsPDU *pdu = opaque;
-V9fsState *s = pdu->s;
 
 retval = pdu_unmarshal(pdu, offset, "dq", &fid, &request_mask);
 if (retval < 0) {
@@ -1136,7 +1151,10 @@ static void coroutine_fn v9fs_getattr(void *opaque)
 if (retval < 0) {
 goto out;
 }
-stat_to_v9stat_dotl(s, &stbuf, &v9stat_dotl);
+retval = stat_to_v9stat_dotl(pdu, &stbuf, &v9stat_dotl);
+if (retval < 0) {
+goto out;
+}
 
 /*  fill st_gen if requested and supported by underlying fs */
 if (request_mask & P9_STATS_GEN) {
@@ -1381,7 +1399,10 @@ static void coroutine_fn v9fs_walk(void *opaque)
 if (err < 0) {
 goto out;
 }
-stat_to_qid(&stbuf, &qid);
+err = stat_to_qid(pdu, &stbuf, &qid);
+if (err < 0) {
+goto out;
+}
 v9fs_path_copy(&dpath, &path);
 }
 memcpy(&qids[name_idx], &qid, sizeof(qid));
@@ -1483,7 +1504,10 @@ static void coroutine_fn v9fs_open(void *opaque)
 if (err < 0) {
 goto out;
 }
-stat_to_qid(&stbuf, &qid);
+err = stat_to_qid(pdu, &stbuf, &qid);
+if (err < 0) {
+goto out;
+}
 if (S_ISDIR(stbuf.st_mode)) {
 err = v9fs_co_opendir(pdu, fidp);
 if (err < 0) {
@@ -1593,7 +1617,10 @@ static void coroutine_fn v9fs_lcreate(void *opaque)
 fidp->flags |= FID_NON_RECLAIMABLE;
 }
 iounit =  get_iounit(pdu

[Qemu-devel] [PATCH v5 3/5] 9p: Added virtfs option 'remap_inodes'

2019-07-03 Thread Christian Schoenebeck via Qemu-devel
To support multiple devices on the 9p share, and avoid
qid path collisions we take the device id as input
to generate a unique QID path. The lowest 48 bits of
the path will be set equal to the file inode, and the
top bits will be uniquely assigned based on the top
16 bits of the inode and the device id.

Signed-off-by: Antonios Motakis 
[CS: - Rebased to master head.
 - Updated hash calls to new xxhash API.
 - Added virtfs option 'remap_inodes', original patch
   simply did the inode remapping without being asked.
 - Updated docs for new option 'remap_inodes'.
 - Capture root_ino in v9fs_device_realize_common() as
   well, not just the device id.
 - Fixed v9fs_do_readdir() having exposed info outside
   export root with '..' entry.
 - Fixed v9fs_do_readdir() not having remapped inodes.
 - Log error message when running out of prefixes in
   qid_path_prefixmap().
 - Fixed definition of QPATH_INO_MASK.
 - Dropped unnecessary parantheses in qpp_lookup_func().
 - Dropped unnecessary g_malloc0() result checks. ]
Signed-off-by: Christian Schoenebeck 
---
 fsdev/file-op-9p.h  |   1 +
 fsdev/qemu-fsdev-opts.c |   7 +-
 fsdev/qemu-fsdev.c  |   6 ++
 hw/9pfs/9p.c| 196 +++-
 hw/9pfs/9p.h|  13 
 qemu-options.hx |  25 --
 vl.c|   3 +
 7 files changed, 224 insertions(+), 27 deletions(-)

diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h
index c757c8099f..6c1663c252 100644
--- a/fsdev/file-op-9p.h
+++ b/fsdev/file-op-9p.h
@@ -59,6 +59,7 @@ typedef struct ExtendedOps {
 #define V9FS_RDONLY 0x0040
 #define V9FS_PROXY_SOCK_FD  0x0080
 #define V9FS_PROXY_SOCK_NAME0x0100
+#define V9FS_REMAP_INODES   0x0200
 
 #define V9FS_SEC_MASK   0x003C
 
diff --git a/fsdev/qemu-fsdev-opts.c b/fsdev/qemu-fsdev-opts.c
index 7c31af..64a8052266 100644
--- a/fsdev/qemu-fsdev-opts.c
+++ b/fsdev/qemu-fsdev-opts.c
@@ -31,7 +31,9 @@ static QemuOptsList qemu_fsdev_opts = {
 }, {
 .name = "readonly",
 .type = QEMU_OPT_BOOL,
-
+}, {
+.name = "remap_inodes",
+.type = QEMU_OPT_BOOL,
 }, {
 .name = "socket",
 .type = QEMU_OPT_STRING,
@@ -76,6 +78,9 @@ static QemuOptsList qemu_virtfs_opts = {
 .name = "readonly",
 .type = QEMU_OPT_BOOL,
 }, {
+.name = "remap_inodes",
+.type = QEMU_OPT_BOOL,
+}, {
 .name = "socket",
     .type = QEMU_OPT_STRING,
 }, {
diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c
index 077a8c4e2b..b6fa9799be 100644
--- a/fsdev/qemu-fsdev.c
+++ b/fsdev/qemu-fsdev.c
@@ -121,6 +121,7 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp)
 const char *fsdev_id = qemu_opts_id(opts);
 const char *fsdriver = qemu_opt_get(opts, "fsdriver");
 const char *writeout = qemu_opt_get(opts, "writeout");
+bool remap_inodes = qemu_opt_get_bool(opts, "remap_inodes", 0);
 bool ro = qemu_opt_get_bool(opts, "readonly", 0);
 
 if (!fsdev_id) {
@@ -161,6 +162,11 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp)
 } else {
 fsle->fse.export_flags &= ~V9FS_RDONLY;
 }
+if (remap_inodes) {
+fsle->fse.export_flags |= V9FS_REMAP_INODES;
+} else {
+fsle->fse.export_flags &= ~V9FS_REMAP_INODES;
+}
 
 if (fsle->fse.ops->parse_opts) {
 if (fsle->fse.ops->parse_opts(opts, &fsle->fse, errp)) {
diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 8cc65c2c67..39c6c2a894 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -25,6 +25,7 @@
 #include "trace.h"
 #include "migration/blocker.h"
 #include "sysemu/qtest.h"
+#include "qemu/xxhash.h"
 
 int open_fd_hw;
 int total_open_fd;
@@ -571,22 +572,98 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu)
 P9_STAT_MODE_NAMED_PIPE |   \
 P9_STAT_MODE_SOCKET)
 
-/* This is the algorithm from ufs in spfs */
+
+/* creative abuse of tb_hash_func7, which is based on xxhash */
+static uint32_t qpp_hash(QppEntry e)
+{
+return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0);
+}
+
+static bool qpp_lookup_func(const void *obj, const void *userp)
+{
+const QppEntry *e1 = obj, *e2 = userp;
+return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix;
+}
+
+static void qpp_table_remove(void *p, uint32_t h, void *up)
+{
+g_free(p);
+}
+
+static void qpp_table_destroy(struct qht *ht)
+{
+qht_iter(ht, qpp_table_remove, NULL);
+qht_destroy(ht);
+}
+
+/* stat_to_qid needs to map inode number (64 bits) and device id (32 bits)
+ * to a unique 

[Qemu-devel] [PATCH v5 1/5] 9p: unsigned type for type, version, path

2019-07-03 Thread Christian Schoenebeck via Qemu-devel
There is no need for signedness on these QID fields for 9p.

Signed-off-by: Antonios Motakis 
[CS: - Also make QID type unsigned.
 - Adjust donttouch_stat() to new types.
 - Adjust trace-events to new types. ]
Signed-off-by: Christian Schoenebeck 
---
 fsdev/9p-marshal.h   |  6 +++---
 hw/9pfs/9p.c |  6 +++---
 hw/9pfs/trace-events | 14 +++---
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fsdev/9p-marshal.h b/fsdev/9p-marshal.h
index c8823d878f..8f3babb60a 100644
--- a/fsdev/9p-marshal.h
+++ b/fsdev/9p-marshal.h
@@ -9,9 +9,9 @@ typedef struct V9fsString
 
 typedef struct V9fsQID
 {
-int8_t type;
-int32_t version;
-int64_t path;
+uint8_t type;
+uint32_t version;
+uint64_t path;
 } V9fsQID;
 
 typedef struct V9fsStat
diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 55821343e5..586a6dccba 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -743,9 +743,9 @@ static int donttouch_stat(V9fsStat *stat)
 {
 if (stat->type == -1 &&
 stat->dev == -1 &&
-stat->qid.type == -1 &&
-stat->qid.version == -1 &&
-stat->qid.path == -1 &&
+stat->qid.type == 0xff &&
+stat->qid.version == (uint32_t) -1 &&
+stat->qid.path == (uint64_t) -1 &&
 stat->mode == -1 &&
 stat->atime == -1 &&
 stat->mtime == -1 &&
diff --git a/hw/9pfs/trace-events b/hw/9pfs/trace-events
index c0a0a4ab5d..10188daf7f 100644
--- a/hw/9pfs/trace-events
+++ b/hw/9pfs/trace-events
@@ -6,7 +6,7 @@ v9fs_rerror(uint16_t tag, uint8_t id, int err) "tag %d id %d 
err %d"
 v9fs_version(uint16_t tag, uint8_t id, int32_t msize, char* version) "tag %d 
id %d msize %d version %s"
 v9fs_version_return(uint16_t tag, uint8_t id, int32_t msize, char* version) 
"tag %d id %d msize %d version %s"
 v9fs_attach(uint16_t tag, uint8_t id, int32_t fid, int32_t afid, char* uname, 
char* aname) "tag %u id %u fid %d afid %d uname %s aname %s"
-v9fs_attach_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, 
int64_t path) "tag %d id %d type %d version %d path %"PRId64
+v9fs_attach_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, 
uint64_t path) "tag %u id %u type %u version %u path %"PRIu64
 v9fs_stat(uint16_t tag, uint8_t id, int32_t fid) "tag %d id %d fid %d"
 v9fs_stat_return(uint16_t tag, uint8_t id, int32_t mode, int32_t atime, 
int32_t mtime, int64_t length) "tag %d id %d stat={mode %d atime %d mtime %d 
length %"PRId64"}"
 v9fs_getattr(uint16_t tag, uint8_t id, int32_t fid, uint64_t request_mask) 
"tag %d id %d fid %d request_mask %"PRIu64
@@ -14,9 +14,9 @@ v9fs_getattr_return(uint16_t tag, uint8_t id, uint64_t 
result_mask, uint32_t mod
 v9fs_walk(uint16_t tag, uint8_t id, int32_t fid, int32_t newfid, uint16_t 
nwnames) "tag %d id %d fid %d newfid %d nwnames %d"
 v9fs_walk_return(uint16_t tag, uint8_t id, uint16_t nwnames, void* qids) "tag 
%d id %d nwnames %d qids %p"
 v9fs_open(uint16_t tag, uint8_t id, int32_t fid, int32_t mode) "tag %d id %d 
fid %d mode %d"
-v9fs_open_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, 
int64_t path, int iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} 
iounit %d"
+v9fs_open_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, 
uint64_t path, int iounit) "tag %u id %u qid={type %u version %u path 
%"PRIu64"} iounit %d"
 v9fs_lcreate(uint16_t tag, uint8_t id, int32_t dfid, int32_t flags, int32_t 
mode, uint32_t gid) "tag %d id %d dfid %d flags %d mode %d gid %u"
-v9fs_lcreate_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, 
int64_t path, int32_t iounit) "tag %d id %d qid={type %d version %d path 
%"PRId64"} iounit %d"
+v9fs_lcreate_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, 
uint64_t path, int32_t iounit) "tag %u id %u qid={type %u version %u path 
%"PRIu64"} iounit %d"
 v9fs_fsync(uint16_t tag, uint8_t id, int32_t fid, int datasync) "tag %d id %d 
fid %d datasync %d"
 v9fs_clunk(uint16_t tag, uint8_t id, int32_t fid) "tag %d id %d fid %d"
 v9fs_read(uint16_t tag, uint8_t id, int32_t fid, uint64_t off, uint32_t 
max_count) "tag %d id %d fid %d off %"PRIu64" max_count %u"
@@ -26,21 +26,21 @@ v9fs_readdir_return(uint16_t tag, uint8_t id, uint32_t 
count, ssize_t retval) "t
 v9fs_write(uint16_t tag, uint8_t id, int32_t fid, uint64_t off, uint32_t 
count, int cnt) "tag %d id %d fid %d off %"PRIu64" count %u cnt %d"
 v9fs_write_return(uint16_t tag, uint8_t id, int32_t total, ssize_t err) "tag 
%d id %d total %d err %zd"
 v9fs_create(uint16_t tag, uint8_t id, int32_t fid, char* name, int32_t perm, 
int8_t mode) "tag %d id %d fid %d name %s perm %d mode %d"
-v9fs_create_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, 
int64_t path, int iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} 
iounit %d"
+v9fs_create_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, 
uint64_t path, int iounit) "tag %u id %u qid={type %u version %u path 
%"PRIu64"} iounit %d"
 v9fs_symlink(u

[Qemu-devel] [PATCH v5 4/5] 9p: stat_to_qid: implement slow path

2019-07-03 Thread Christian Schoenebeck via Qemu-devel
stat_to_qid attempts via qid_path_prefixmap to map unique files (which are
identified by 64 bit inode nr and 32 bit device id) to a 64 QID path value.
However this implementation makes some assumptions about inode number
generation on the host.

If qid_path_prefixmap fails, we still have 48 bits available in the QID
path to fall back to a less memory efficient full mapping.

Signed-off-by: Antonios Motakis 
[CS: - Rebased to master head.
 - Updated hash calls to new xxhash API.
 - Removed unnecessary parantheses in qpf_lookup_func().
 - Removed unnecessary g_malloc0() result checks.
 - Log error message when running out of prefixes in
   qid_path_fullmap().
 - Log error message about potential degraded performance in
   qid_path_prefixmap().
 - Fixed typo in comment. ]
Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p.c | 70 ++--
 hw/9pfs/9p.h |  9 
 2 files changed, 72 insertions(+), 7 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 39c6c2a894..549e279462 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -579,23 +579,73 @@ static uint32_t qpp_hash(QppEntry e)
 return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0);
 }
 
+static uint32_t qpf_hash(QpfEntry e)
+{
+return qemu_xxhash7(e.ino, e.dev, 0, 0, 0);
+}
+
 static bool qpp_lookup_func(const void *obj, const void *userp)
 {
 const QppEntry *e1 = obj, *e2 = userp;
 return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix;
 }
 
-static void qpp_table_remove(void *p, uint32_t h, void *up)
+static bool qpf_lookup_func(const void *obj, const void *userp)
+{
+const QpfEntry *e1 = obj, *e2 = userp;
+return e1->dev == e2->dev && e1->ino == e2->ino;
+}
+
+static void qp_table_remove(void *p, uint32_t h, void *up)
 {
 g_free(p);
 }
 
-static void qpp_table_destroy(struct qht *ht)
+static void qp_table_destroy(struct qht *ht)
 {
-qht_iter(ht, qpp_table_remove, NULL);
+qht_iter(ht, qp_table_remove, NULL);
 qht_destroy(ht);
 }
 
+static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf,
+uint64_t *path)
+{
+QpfEntry lookup = {
+.dev = stbuf->st_dev,
+.ino = stbuf->st_ino
+}, *val;
+uint32_t hash = qpf_hash(lookup);
+
+/* most users won't need the fullmap, so init the table lazily */
+if (!pdu->s->qpf_table.map) {
+qht_init(&pdu->s->qpf_table, qpf_lookup_func, 1 << 16, 
QHT_MODE_AUTO_RESIZE);
+}
+
+val = qht_lookup(&pdu->s->qpf_table, &lookup, hash);
+
+if (!val) {
+if (pdu->s->qp_fullpath_next == 0) {
+/* no more files can be mapped :'( */
+error_report_once(
+"9p: No more prefixes available for remapping inodes from "
+"host to guest."
+);
+return -ENFILE;
+}
+
+val = g_malloc0(sizeof(QppEntry));
+*val = lookup;
+
+/* new unique inode and device combo */
+val->path = pdu->s->qp_fullpath_next++;
+pdu->s->qp_fullpath_next &= QPATH_INO_MASK;
+qht_insert(&pdu->s->qpf_table, val, hash, NULL);
+}
+
+*path = val->path;
+return 0;
+}
+
 /* stat_to_qid needs to map inode number (64 bits) and device id (32 bits)
  * to a unique QID path (64 bits). To avoid having to map and keep track
  * of up to 2^64 objects, we map only the 16 highest bits of the inode plus
@@ -621,8 +671,7 @@ static int qid_path_prefixmap(V9fsPDU *pdu, const struct 
stat *stbuf,
 if (pdu->s->qp_prefix_next == 0) {
 /* we ran out of prefixes */
 error_report_once(
-"9p: No more prefixes available for remapping inodes from "
-"host to guest."
+"9p: Potential degraded performance of inode remapping"
 );
 return -ENFILE;
 }
@@ -647,6 +696,10 @@ static int stat_to_qid(V9fsPDU *pdu, const struct stat 
*stbuf, V9fsQID *qidp)
 if (pdu->s->ctx.export_flags & V9FS_REMAP_INODES) {
 /* map inode+device to qid path (fast path) */
 err = qid_path_prefixmap(pdu, stbuf, &qidp->path);
+if (err == -ENFILE) {
+/* fast path didn't work, fall back to full map */
+err = qid_path_fullmap(pdu, stbuf, &qidp->path);
+}
 if (err) {
 return err;
 }
@@ -3827,6 +3880,7 @@ int v9fs_device_realize_common(V9fsState *s, const 
V9fsTransport *t,
 /* QID path hash table. 1 entry ought to be enough for anybody ;) */
 qht_init(&s->qpp_table, qpp_lookup_func, 1, QHT_MODE_AUTO_RESIZE);
 s->qp_prefix_next = 1; /* reserve 0 to detect overflow */
+s->qp_fullpath_next = 1;
 
 s->ctx.fst = &fse->fst;
 fsdev_throttle_init(s->ctx.fst);
@@ -3841,7 +3895,8 @@ out:
 }
 g_free(s->tag);
 g_free(s->ctx.fs_root);
-qpp_table_destroy(&s->qpp_table);
+qp_table_destroy(&s->qpp_table);
+qp_table_destro

[Qemu-devel] [PATCH v5 0/5] 9p: Fix file ID collisions

2019-07-03 Thread Christian Schoenebeck via Qemu-devel
This is v5 of a proposed patch set for fixing file ID collisions with 9pfs.

v4->v5:

  All Patches:

  * Added details to individual commit logs of what has been changed
exactly by me on top of Antonios' original 4 patches.

  Patch 1:

  * Fixed format specifiers in hw/9pfs/trace-events.

  Patch 2:

  * Fixed typo in commit log.

  * Assign dev_id to export root's device already in
v9fs_device_realize_common(), not postponed in stat_to_qid().

  * Return -ENODEV instead of -ENOSYS in stat_to_qid().

  Patch 3:

  * Added missing manual parts for new virtfs option 'remap_inodes' in
qemu-options.hx.

  * Capture root_ino in v9fs_device_realize_common() as well, not just the
device id.

  * Added function dirent_to_qid().

  * Fixed v9fs_do_readdir() having exposed info outside export root with
'..' entry (no matter if inode remapping was on or not).

  * Fixed v9fs_do_readdir() not having remapped inodes.

  * Fixed definition of QPATH_INO_MASK.

  * Log error message when running out of prefixes in qid_path_prefixmap().

  * Adjusted changes in stat_to_qid() to qemu code style guidelines.

  Patch 4:

  * Log error message when running out of prefixes in qid_path_fullmap().

  * Log error message about potential degraded performance in
qid_path_prefixmap() (that is when qid_path_fullmap() will start to
kick in next).

  * Fixed typo in code comment.

  Patch 5:

  * Dropped fixed (16 bit) size prefix code and hence removed usage of
P9_VARI_LENGTH_INODE_SUFFIXES macro checks all over the place.

  * Dropped function expGolombEncodeK0(uint64_t n) which was optimized for
the expected default value of k=0; instead always use the generalized
function expGolombEncode(uint64_t n, int k) instead now.

  * Adjusted changes in hw/9pfs/9p.c to qemu code style guidelines.

  * Adjusted functions' API comments in hw/9pfs/9p.c.

v3->v4:

  * Rebased to latest git master head.

  * Splitted Antonios' patch set to its original 4 individual patches.
(was merged previously as only 1 patch).

  * Addressed discussed issues directly on Antonios' patches
(was a separate patch before).

  * Added virtfs command line option "remap_inodes": Unless this option
is not enabled, no inode remapping is performed at all, the user
just gets an error message when trying to use more than 1 device
per export.

  * Dropped persistency feature of QIDs beyond reboots.

  * Dropped disputed "vii" feature.

Christian Schoenebeck (5):
  9p: unsigned type for type, version, path
  9p: Treat multiple devices on one export as an error
  9p: Added virtfs option 'remap_inodes'
  9p: stat_to_qid: implement slow path
  9p: Use variable length suffixes for inode remapping

 fsdev/9p-marshal.h  |   6 +-
 fsdev/file-op-9p.h  |   1 +
 fsdev/qemu-fsdev-opts.c |   7 +-
 fsdev/qemu-fsdev.c  |   6 +
 hw/9pfs/9p.c| 508 +---
 hw/9pfs/9p.h|  51 +
 hw/9pfs/trace-events|  14 +-
 qemu-options.hx |  25 ++-
 vl.c|   3 +
 9 files changed, 573 insertions(+), 48 deletions(-)

-- 
2.11.0




Re: [Qemu-devel] [PATCH v3 19/50] tcg: let plugins instrument memory accesses

2019-07-02 Thread Aaron Lindsay OS via Qemu-devel
On Jul 01 16:00, Alex Bennée wrote:
> Aaron Lindsay OS  writes:
> > - a way for a plugin to reset any instrumentation decisions made in the
> >   past (essentially calls `tb_flush(cpu);` under the covers). We found
> >   this critical for plugins which undergo state changes during the
> >   course of their execution (i.e. watch for event X, then go into a more
> >   detailed profiling mode until you see event Y)
> 
> check:
> 
> /**
>  * qemu_plugin_reset() - Reset a plugin
>  * @id: this plugin's opaque ID
>  * @cb: callback to be called once the plugin has been reset
>  *
>  * Unregisters all callbacks for the plugin given by @id.
>  *
>  * Do NOT assume that the plugin has been reset once this function returns.
>  * Plugins are reset asynchronously, and therefore the given plugin receives
>  * callbacks until @cb is called.
>  */
> void qemu_plugin_reset(qemu_plugin_id_t id, qemu_plugin_simple_cb_t cb);

Is this essentially synchronous for the current cpu, and only
asynchronous for any other running cpus that didn't trigger the callback
from which the call to qemu_plugin_reset() is being made? If not, could
the state resetting be made synchronous for the current cpu (even if the
callback doesn't happen until the others are complete)? This isn't
absolutely critical, but it is often nice to begin capturing precisely
when you mean to.

> > - the ability for a plugin to trigger a checkpoint to be taken
> 
> We don't have this at the moment. Pranith also mentioned it in his
> review comments. I can see its use but I suspect it won't make the
> initial implementation given the broader requirements of QEMU to do
> checkpointing and how to cleanly expose that to plugins.

Sure. Our patch works for us, but I know we're ignoring a few things
that we can externally ensure won't happen while we're attempting a
checkpoint (i.e. migration) that may have to be considered for something
upstream.

-Aaron



Re: [Qemu-devel] [PATCH v3 19/50] tcg: let plugins instrument memory accesses

2019-07-01 Thread Aaron Lindsay OS via Qemu-devel
On Jun 28 21:52, Alex Bennée wrote:
> Aaron Lindsay OS  writes:
> > To make sure I understand - you're implying that one such query will
> > return the PA from the guest's perspective, right?
> 
> Yes - although it will be two queries:
> 
>   struct qemu_plugin_hwaddr *hw = qemu_plugin_get_hwaddr(info, vaddr);
> 
> This does the actual lookup and stores enough information for the
> further queries.
> 
>   uint64_t pa = qemu_plugin_hwaddr_to_raddr(hw);
> 
> will return the physical address (assuming it's a RAM reference and not
> some IO location).

Sounds good, as long as we have a good way to either prevent or cleanly
detect the failure mode for the IO accesses.

> > In terms of our use case - we use QEMU to drive studies to help us
> > design the next generation of processors. As you can imagine, having the
> > right physical addresses is important for some aspects of that. We're
> > currently using a version of Pavel Dovgalyuk's earlier plugin patchset
> > with some of our own patches/fixes on top, but it would obviously make
> > our lives easier to work together to get this sort of infrastructure
> > upstream!
> 
> Was this:
> 
>  Date: Tue, 05 Jun 2018 13:39:15 +0300
>  Message-ID: 
> <152819515565.30857.16834004920507717324.stgit@pasha-ThinkPad-T60>
>  Subject: [Qemu-devel] [RFC PATCH v2 0/7] QEMU binary instrumentation 
> prototype

Yes, that looks like the one.

> What patches did you add on top?

We added:
- plugin support for linux-user mode (I sent that one upstream, I think)
- memory tracing support and a VA->PA conversion helper
- a way for a plugin to request getting a callback just before QEMU
  exits to clean up any internal state
- a way for a plugin to reset any instrumentation decisions made in the
  past (essentially calls `tb_flush(cpu);` under the covers). We found
  this critical for plugins which undergo state changes during the
  course of their execution (i.e. watch for event X, then go into a more
  detailed profiling mode until you see event Y)
- instrumentation at the TB granularity (in addition to the existing
  instruction-level support)
- the ability for a plugin to trigger a checkpoint to be taken

-Aaron



Re: [Qemu-devel] RFC: Why does target/m68k RTE insn. use gen_exception

2019-07-01 Thread Lucien Anti-Spam via Qemu-devel
 

   >On Monday, July 1, 2019, 06:10:55 PM GMT+9, Peter Maydell 
 wrote: > > On Sat, 29 Jun 2019 at 17:37, Lucien 
Murray-Pitts> >  wrote:
> > However for the m68k the do_transaction_failed function pointer field
> > has not been implemented.>Er, I implemented that in commit 
> > e1aaf3a88e95ab007. Are
>you working with an out-of-date version of QEMU ?

Sorry not pulled in a long time, you are right that is there now - I dont 
generally check the development list outside of replies, and was focused on the 
stepping issue - I will be more careful of that in future.  Thanks for the 
heads up.
Further to my initial problem I noticed that TRAP #0 also jumps to the handlers 
+1 instruction.  Same behavior can also be seen with ARM "SWI #0".    (PC shows 
0x0C vs the expected 0x08)
Putting a "BRA $" / "B $" so that it loops on the first address of the handler 
results in the step stopping, of course, at the expected "first instruction" of 
the vector handler.
So it would seem this maybe a wider problem than just the m68K.Since I dont 
really understand the TCG complete execution method, and how it fits in with 
the GNU RSP "s" step command I am going to take some time to work this out.
I appreciate any hints people can provide, but I dont mind plugging away - I am 
learning and surprising myself how much there is to this.
Cheers,Luc

  


Re: [Qemu-devel] [PATCH v4 5/5] 9p: Use variable length suffixes for inode remapping

2019-06-29 Thread Christian Schoenebeck via Qemu-devel
On Freitag, 28. Juni 2019 16:56:15 CEST Christian Schoenebeck via Qemu-devel 
> > > + */
> > > +#define EXP_GOLOMB_K0
> > > +
> > > +# if !EXP_GOLOMB_K
> > > +
> > > +/** @brief Exponential Golomb algorithm limited to the case k=0.
> > > + *
> > 
> > This doesn't really help to have a special implementation for k=0. The
> > resulting function is nearly identical to the general case. It is likely
> > that the compiler can optimize it and generate the same code.
> 
> I don't think the compiler's optimizer would really drop all the
> instructions automatically of the generalized variant of that particular
> function. Does it matter in practice? No, I actually just provided that
> optimized variant for the special case k=0 because I think k=0 will be the
> default value for that parameter and because you were already picky about a
> simple
> 
>   if (pdu->s->dev_id == 0)
> 
> to be optimized. In practice users won't feel the runtime difference either
> one of the two optimization scenarios.

I just checked my assmption made here with a simple C test unit:

// Use manually optimized function for case k=0.
VariLenAffix implK0(uint64_t n) {
return expGolombEncodeK0(n);
}

// Rely on compiler to optimize generalized function for k=0
VariLenAffix implGenK(uint64_t n) {
return expGolombEncode(n, 0);
}

And :   gcc -S -O3 -ffast-math k.c

Like expected the generated 2 functions are almost identical, except that the 
manually optimized variant saves the following 3 instructions:

movl$0, %eax
...
testl   %edx, %edx
cmovns  %edx, %eax

But like I said, it is a tiny difference of course. Not really relevant in 
practice.

Best regards,
Christian Schoenebeck



Re: [Qemu-devel] [PATCH v4 3/5] 9p: Added virtfs option "remap_inodes"

2019-06-29 Thread Christian Schoenebeck via Qemu-devel
On Freitag, 28. Juni 2019 16:23:08 CEST Greg Kurz wrote:
> > > This feature applies to all backends IIUC. We don't really care for the
> > > synth backend since it generates non-colliding inode numbers by design,
> > > but the proxy backend has the same issue as local. So...
> > 
> > Yeah, I was not sure about these, because I did not even know what these
> > two were for exactly. :)  [ lazyness disclaimer end]
> 
> "proxy" is a backend where all I/O accesses are performed by a separate
> process running the virtfs-proxy-helper command. It runs with root
> privileges, which provides the same level of functionality as "local"
> with security_model=passthrough. It also chroot() into the shared
> folder for extra security. But it is slower since it all requests
> still go through the virtio-9p device in QEMU. This would call
> for a vhost-9p implementation, but it's yet another story.
> 
> "synth" is a software pseudo-backend, currently used to test 9pfs
> with QTest (see tests/virtio-9p-test.c).

Thanks for the clarification!

So the proxy backend sounds like an idea that has not been implemented fully 
to its end. I guess it is not really used in production environments, right? 
What is the actual motivation for this proxy backend?

And now that I look at it, I am a bit surprised that there is this pure Unix 
pipe socket based proxy variant, but no TCPIP network socket variant. I mean 
the latter is AFAIK the original idea behind the 9p protocol and IMO might be 
interesting to physical separate pure storage backends that way.

Best regards,
Christian Schoenebeck



Re: [Qemu-devel] [PATCH v3 19/50] tcg: let plugins instrument memory accesses

2019-06-28 Thread Aaron Lindsay OS via Qemu-devel
On Jun 28 18:11, Alex Bennée wrote:
> Aaron Lindsay OS  writes:
> > On Jun 14 18:11, Alex Bennée wrote:
> >> From: "Emilio G. Cota" 
> >>
> >> Here the trickiest feature is passing the host address to
> >> memory callbacks that request it. Perhaps it would be more
> >> appropriate to pass a "physical" address to plugins, but since
> >> in QEMU host addr ~= guest physical, I'm going with that for
> >> simplicity.
> >
> > How much more difficult would it be to get the true physical address (on
> > the guest)?
> 
> Previously there was a helper that converted host address (i.e. where
> QEMU actually stores that value) back to the physical address (ram
> offset + ram base). However the code for calculating all of this is
> pretty invasive and requires tweaks to all the softmmu TCG backends as
> well as hooks into a slew of memory functions.
> 
> I'm re-working this now so we just have the one memory callback and we
> provide a helper function that can provide an opaque hwaddr struct which
> can then be queried.

To make sure I understand - you're implying that one such query will
return the PA from the guest's perspective, right?

> The catch is you can only call this helper during a
> memory callback.

Does this mean it will be difficult to get the physical address for the
bytes containing the instruction encoding itself?

> I'm not sure if having this restriction violates our
> aim of not leaking implementation details to the plugin but it makes the
> code simpler.

Assuming that the purpose of "not leaking implementation details" is to
allow the same plugin interface to work with other backend
implementations in the future, isn't this probably fine? It may add an
unnecessary limitation for another backend driving the same plugin
interface, but I don't think it likely changes the structure of the
interface itself. And that seems like the sort of restriction that could
easily be dropped in the future while remaining backwards-compatible.

> Internally what the helper does is simply re-query the SoftMMU TLB. As
> the TLBs are per-CPU nothing else can have touched the TLB and the cache
> should be hot so the cost of lookup should be minor. We could also
> potentially expand the helpers so if you are interested in only IO
> accesses we can do the full resolution and figure out what device we
> just accessed.

Oh, so you're already working on doing just what I asked about?

> > This is important enough to me that I would be willing to help if
> > pointed in the right direction.
> 
> Well I'll certainly CC on the next series (hopefully posted Monday,
> softfreeze starts Tuesday). I'll welcome any testing and review. Also if
> you can tell us more about your use case that will help.

Awesome, thanks!

In terms of our use case - we use QEMU to drive studies to help us
design the next generation of processors. As you can imagine, having the
right physical addresses is important for some aspects of that. We're
currently using a version of Pavel Dovgalyuk's earlier plugin patchset
with some of our own patches/fixes on top, but it would obviously make
our lives easier to work together to get this sort of infrastructure
upstream!

-Aaron



Re: [Qemu-devel] [PATCH v3 19/50] tcg: let plugins instrument memory accesses

2019-06-28 Thread Aaron Lindsay OS via Qemu-devel
On Jun 14 18:11, Alex Bennée wrote:
> From: "Emilio G. Cota" 
> 
> Here the trickiest feature is passing the host address to
> memory callbacks that request it. Perhaps it would be more
> appropriate to pass a "physical" address to plugins, but since
> in QEMU host addr ~= guest physical, I'm going with that for
> simplicity.

How much more difficult would it be to get the true physical address (on
the guest)?

This is important enough to me that I would be willing to help if
pointed in the right direction.

-Aaron



Re: [Qemu-devel] [PATCH v4 5/5] 9p: Use variable length suffixes for inode remapping

2019-06-28 Thread Christian Schoenebeck via Qemu-devel
On Freitag, 28. Juni 2019 13:50:15 CEST Greg Kurz wrote:
> > And with k=5 they would look like:
> > 
> > Index Dec/Bin -> Generated Suffix Bin
> > 1 [1] -> [01] (6 bits)
> > 2 [10] -> [11] (6 bits)
> > 3 [11] -> [010001] (6 bits)
> > 4 [100] -> [110001] (6 bits)
> > 5 [101] -> [001001] (6 bits)
> > 6 [110] -> [101001] (6 bits)
> > 7 [111] -> [011001] (6 bits)
> > 8 [1000] -> [111001] (6 bits)
> > 9 [1001] -> [000101] (6 bits)
> > 10 [1010] -> [100101] (6 bits)
> > 11 [1011] -> [010101] (6 bits)
> > 12 [1100] -> [110101] (6 bits)
> > ...
> > 65533 [1101] -> [001110001000] (28 bits)
> > 65534 [1110] -> [101110001000] (28 bits)
> > 65535 [] -> [00001000] (28 bits)
> > Hence minBits=6 maxBits=28
> 
> IIUC, this k control parameter should be constant for the
> lifetime of QIDs. So it all boils down to choose a _good_
> value that would cover most scenarios, right ?

That's correct. In the end it's just a matter of how many devices do you 
expect to be exposed with the same 9p export for choosing an appropriate value 
for k. That parameter k must be constant at least until guest is rebooted, 
otherwise you would end up with completely different inode numbers if you 
would change that parameter k while guest is still running.

Like I mentioned before, I can send a short C file if you want to play around 
with that parameter k to see how the generated suffixes would look like. The 
console output is actually like shown above. Additionally the C demo also 
checks and proofs that all generated suffixes really generate unique numbers 
for 
the entire possible 64 bit range, so that should take away the scare about 
what this algorithm does.

Since you said before that you find it already exotic to have more than 1 
device being exported by 9p, I hence did not even bother to make that 
parameter "k" a runtime parameter. I *think* k=0 should be fine in practice.

> For now, I just have some _cosmetic_ remarks.
> 
> > Signed-off-by: Christian Schoenebeck 
> > ---
> > 
> >  hw/9pfs/9p.c | 267
> >  ++- hw/9pfs/9p.h
> >  |  67 ++-
> >  2 files changed, 312 insertions(+), 22 deletions(-)
> > 
> > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > index e6e410972f..46c9f11384 100644
> > --- a/hw/9pfs/9p.c
> > +++ b/hw/9pfs/9p.c
> > @@ -26,6 +26,7 @@
> > 
> >  #include "migration/blocker.h"
> >  #include "sysemu/qtest.h"
> >  #include "qemu/xxhash.h"
> > 
> > +#include 
> > 
> >  int open_fd_hw;
> >  int total_open_fd;
> > 
> > @@ -572,6 +573,123 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu)
> > 
> >  P9_STAT_MODE_NAMED_PIPE |   \
> >  P9_STAT_MODE_SOCKET)
> > 
> > +#if P9_VARI_LENGTH_INODE_SUFFIXES
> 
> The numerous locations guarded by P9_VARI_LENGTH_INODE_SUFFIXES
> really obfuscate the code, and don't ease review (for me at least).
> And anyway, if we go for variable length suffixes, we won't keep
> the fixed length prefix code.

I just did that to provide a direct comparison between the fixed size prefix 
vs. 
variable size suffix code, since the fixed size prefix code is easier to 
understand.

If you want I can add a 6th "cleanup" patch that would remove the fixed size 
prefix code and all the #ifs ?

> > +
> > +/* Mirrors all bits of a byte. So e.g. binary 1010 would become
> > 0101. */ +static inline uint8_t mirror8bit(uint8_t byte) {
> 
> From CODING_STYLE:
> 
> 4. Block structure
> 
> [...]
> 
> for reasons of tradition and clarity it comes on a line by itself:
> 
> void a_function(void)
> {
> do_something();
> }

Got it.

> > +/* Parameter k for the Exponential Golomb algorihm to be used.
> > + *
> > + * The smaller this value, the smaller the minimum bit count for the Exp.
> > + * Golomb generated affixes will be (at lowest index) however for the
> > + * price of having higher maximum bit count of generated affixes (at
> > highest + * index). Likewise increasing this parameter yields in smaller
> > maximum bit + * count for the price of having higher minimum bit count.
> 
> Forgive my laziness but what are the benefits of a smaller or larger
> value, in term of user experience ?

Well, the goal in general is too keep the generated suffix numbers (or their 
bit 
count actually) as small as possible, because you 

Re: [Qemu-devel] [PATCH v4 3/5] 9p: Added virtfs option "remap_inodes"

2019-06-28 Thread Christian Schoenebeck via Qemu-devel
On Freitag, 28. Juni 2019 12:09:31 CEST Greg Kurz wrote:
> On Wed, 26 Jun 2019 20:42:13 +0200
> 
> Christian Schoenebeck via Qemu-devel  wrote:
> > To support multiple devices on the 9p share, and avoid
> > qid path collisions we take the device id as input
> > to generate a unique QID path. The lowest 48 bits of
> > the path will be set equal to the file inode, and the
> > top bits will be uniquely assigned based on the top
> > 16 bits of the inode and the device id.
> > 
> > Signed-off-by: Antonios Motakis 
> 
> Same remark about changes to the original patch.

ack_once();   :)

> BTW, I had a concern with the way v9fs_do_readdir() open-codes QID
> generation without calling stat_to_qid().
> 
> See discussion here:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02724.html
> 
> I guess you should ensure in a preliminary patch that QIDs only
> come out of stat_to_qid().

Mja, actually I first omitted your suggestion consciously, because I first 
thought it was an overkill pure visibility issue lmited to the default case 
remap_inodes==false, but now that I look at it again, it is actually an issue 
even when remap_inodes==true since dirent would expose wrong inode numbers on 
guest as well.

I will see what to do about it. However about your other concern here, quote:

"Also, if we hit a collision while reading the directory, I'm
 afraid the remaining entries won't be read at all. I'm not
 sure this is really what we want."

That's however still a concern here that I would consider overkill to address. 
I mean if a user gets into that situation then because of a configuration error 
that must be corrected by user; the point of this patch set is to prevent 
undefined behaviour and to make the user aware about the root cause of the 
overall issue; the purpose is not to address all possible issues while there 
is still a configuration error.

> > +static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf,
> > +uint64_t *path)
> > +{
> > +QppEntry lookup = {
> > +.dev = stbuf->st_dev,
> > +.ino_prefix = (uint16_t) (stbuf->st_ino >> 48)
> > +}, *val;
> > +uint32_t hash = qpp_hash(lookup);
> > +
> > +val = qht_lookup(&pdu->s->qpp_table, &lookup, hash);
> > +
> > +if (!val) {
> > +if (pdu->s->qp_prefix_next == 0) {
> > +/* we ran out of prefixes */
> 
> And we won't ever be able to allocate a new one. Maybe worth
> adding an error_report_once() to inform the user ?

Yeah, I thought about that as well. Will do.

> >  static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID
> >  *qidp) {
> > 
> > -size_t size;
> > +int err;
> > 
> > -if (pdu->s->dev_id == 0) {
> > -pdu->s->dev_id = stbuf->st_dev;
> > -} else if (pdu->s->dev_id != stbuf->st_dev) {
> > -error_report_once(
> > -"9p: Multiple devices detected in same VirtFS export. "
> > -"You must use a separate export for each device."
> > -);
> > -return -ENOSYS;
> > +if (pdu->s->ctx.export_flags & V9FS_REMAP_INODES) {
> > +/* map inode+device to qid path (fast path) */
> > +err = qid_path_prefixmap(pdu, stbuf, &qidp->path);
> > +if (err) {
> > +return err;
> > +}
> > +} else {
> > +if (pdu->s->dev_id == 0) {
> > +pdu->s->dev_id = stbuf->st_dev;
> > +} else if (pdu->s->dev_id != stbuf->st_dev) {
> > +error_report_once(
> > +"9p: Multiple devices detected in same VirtFS export. "
> > +"You must either use a separate export for each device "
> > +"shared from host or enable virtfs option
> > 'remap_inodes'."
> > +);
> > +return -ENOSYS;
> > +}
> > +    size_t size;
> 
> From CODING_STYLE:
> 
> 5. Declarations
> 
> Mixed declarations (interleaving statements and declarations within
> blocks) are generally not allowed; declarations should be at the beginning
> of blocks.
> 
> Please do so for "size" and add an extra blank line.

Ok.

> > +#define QPATH_INO_MASK(((unsigned long)1 << 48) - 1)
> 
> This won't give the expected result on a 32-bit host. Since this
> is a mask for 64-bit entities, it should rather be:
> 
> #define QPATH

Re: [Qemu-devel] [PATCH v4 4/5] 9p: stat_to_qid: implement slow path

2019-06-28 Thread Christian Schoenebeck via Qemu-devel
On Freitag, 28. Juni 2019 12:21:20 CEST Greg Kurz wrote:
> > +static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf,
> > +uint64_t *path)
> > +{
> > +QpfEntry lookup = {
> > +.dev = stbuf->st_dev,
> > +.ino = stbuf->st_ino
> > +}, *val;
> > +uint32_t hash = qpf_hash(lookup);
> > +
> > +/* most users won't need the fullmap, so init the table lazily */
> > +if (!pdu->s->qpf_table.map) {
> > +qht_init(&pdu->s->qpf_table, qpf_lookup_func, 1 << 16,
> > QHT_MODE_AUTO_RESIZE); +}
> > +
> > +val = qht_lookup(&pdu->s->qpf_table, &lookup, hash);
> > +
> > +if (!val) {
> > +if (pdu->s->qp_fullpath_next == 0) {
> > +/* no more files can be mapped :'( */
> 
> This would be the place to put the error_report_once() suggested
> in the previous patch actually.

I will add the suggested error message to qid_path_prefixmap() in patch 3 and 
then will move over that error message to qid_path_fullmap() in patch 4.

Or if you want I can also leave an error_report_once() in qid_path_prefixmap() 
in patch 4 about potential degraded performance.

> > @@ -3779,7 +3831,8 @@ void v9fs_device_unrealize_common(V9fsState *s,
> > Error **errp)> 
> >  }
> >  fsdev_throttle_cleanup(s->ctx.fst);
> >  g_free(s->tag);
> > 
> > -qpp_table_destroy(&s->qpp_table);
> > +qp_table_destroy(&s->qpp_table);
> > +qp_table_destroy(&s->qpf_table);
> 
> I'm starting to think v9fs_device_unrealize_common() should be made
> idempotent, so that it can be used to handle rollback of a partially
> realized device, and thus avoid the code duplication. But this is
> out-of-scope for this series.

Well, I can also make that e.g.:

if (s->qpf_table.map)
qp_table_destroy(&s->qpf_table);

if you prefer the occurrence amount to be reduced.

Best regards,
Christian Schoenebeck



Re: [Qemu-devel] [PATCH v4 2/5] 9p: Treat multiple devices on one export as an error

2019-06-28 Thread Christian Schoenebeck via Qemu-devel
On Donnerstag, 27. Juni 2019 19:26:22 CEST Greg Kurz wrote:
> On Wed, 26 Jun 2019 20:30:41 +0200
> 
> Christian Schoenebeck via Qemu-devel  wrote:
> > The QID path should uniquely identify a file. However, the
> > inode of a file is currently used as the QID path, which
> > on its own only uniquely identifies wiles within a device.
> 
> s/wile/files

Ah right. :)

> > Here we track the device hosting the 9pfs share, in order
> > to prevent security issues with QID path collisions from
> > other devices.
> > 
> > Signed-off-by: Antonios Motakis 
> 
> You should mention here the changes you made to the original patch.

Got it. Will do for the other cases as well of course.

> > -static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp)
> > +static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID
> > *qidp)> 
> >  {
> >  
> >  size_t size;
> > 
> > +if (pdu->s->dev_id == 0) {
> > +pdu->s->dev_id = stbuf->st_dev;
> 
> st_dev should be captured in v9fs_device_realize_common() since we
> lstat() the root there, instead of every request doing the check.

Ok.

> > +} else if (pdu->s->dev_id != stbuf->st_dev) {
> > +error_report_once(
> > +"9p: Multiple devices detected in same VirtFS export. "
> > +"You must use a separate export for each device."
> > +);
> > +return -ENOSYS;
> 
> This error is likely to end up as the return value of a
> syscall in the guest and -ENOSYS usually means the syscall
> isn't implemented, which is obviously not the case. Maybe
> return -EPERM instead ?

I would rather suggest -ENODEV. The entire device of the requested file/dir is 
not available on guest.

-EPERM IMO rather motivates users looking for file system permission settings 
on individual files intead, and probably not checking the host's logs for the 
detailled error message.

> > @@ -3633,6 +3674,8 @@ int v9fs_device_realize_common(V9fsState *s, const
> > V9fsTransport *t,> 
> >  goto out;
> >  
> >  }
> > 
> > +s->dev_id = 0;
> > +
> 
> Set it to stat->st_dev after lstat() was called later in this function.

I guesst you mean "earlier" not "later". The lstat() call is just before that 
dev_id initalization line. But in general your suggestion makes sense of 
course.

Best regards,
Christian Schoenebeck



Re: [Qemu-devel] [PATCH v4 1/5] 9p: unsigned type for type, version, path

2019-06-28 Thread Christian Schoenebeck via Qemu-devel
On Donnerstag, 27. Juni 2019 18:12:03 CEST Greg Kurz wrote:
> On Wed, 26 Jun 2019 20:25:55 +0200
> Christian Schoenebeck via Qemu-devel  wrote:
> > There is no need for signedness on these QID fields for 9p.
> > 
> > Signed-off-by: Antonios Motakis 
> 
> You should mention here the changes you made on top of Antonios
> original patch. Something like:
> 
> [CS: - also convert path
>  - adapted trace-events and donttouch_stat()]

Haven't seen that comment style in the git logs. Any example hash for that?

> > diff --git a/hw/9pfs/trace-events b/hw/9pfs/trace-events
> > index c0a0a4ab5d..6964756922 100644
> > --- a/hw/9pfs/trace-events
> > +++ b/hw/9pfs/trace-events
> > @@ -6,7 +6,7 @@ v9fs_rerror(uint16_t tag, uint8_t id, int err) "tag %d id
> > %d err %d"> 
> >  v9fs_version(uint16_t tag, uint8_t id, int32_t msize, char* version) "tag
> >  %d id %d msize %d version %s" v9fs_version_return(uint16_t tag, uint8_t
> >  id, int32_t msize, char* version) "tag %d id %d msize %d version %s"
> >  v9fs_attach(uint16_t tag, uint8_t id, int32_t fid, int32_t afid, char*
> >  uname, char* aname) "tag %u id %u fid %d afid %d uname %s aname %s"> 
> > -v9fs_attach_return(uint16_t tag, uint8_t id, int8_t type, int32_t
> > version, int64_t path) "tag %d id %d type %d version %d path %"PRId64
> > +v9fs_attach_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t
> > version, uint64_t path) "tag %d id %d type %d version %d path %"PRId64
> I was expecting to see PRIu64 for an uint64_t but I now realize that %d
> seems to be used all over the place for unsigned types... :-\
> 
> At least, please fix the masks of the lines you're changing in this
> patch so that unsigned are passed to "u" or PRIu64. The rest of the
> mess can be fixed later in a followup.

If you don't mind I will restrict it to your latter suggestion for now, that 
is adjusting it using the short format specifiers e.g. "u", the rest would IMO 
be out of the scope of this patch series.

Too bad that no format specifier warnings are thrown on these.

Best regards,
Christian Schoenebeck



Re: [Qemu-devel] RFC: Why does target/m68k RTE insn. use gen_exception

2019-06-27 Thread Lucien Anti-Spam via Qemu-devel
 Hi Laurent / Richard,
(resent email )
Does anyone have any knowledge why    gen_exception(s, s->base.pc_next, 
EXCP_RTE);

is generated for "RTE" instruction, where as the "RTS" goes a gen_jmp?( note 
see target/m68k/translate.c in functions DISAS_INSN(rte) and DISAS_INSN(rts)

Cheers,Luc  


[Qemu-devel] RFC: Why does target/m68k RTE insn. use gen_exception

2019-06-27 Thread Lucien Anti-Spam via Qemu-devel
Hi folks,
Does anyone have any knowledge why
gen_exception(s, s->base.pc_next, EXCP_RTE);


[Qemu-devel] [PATCH v4 0/5] 9p: Fix file ID collisions

2019-06-26 Thread Christian Schoenebeck via Qemu-devel
This is v4 of a proposed patch set for fixing file ID collisions with 9pfs.

v3->v4:

  * Rebased to latest git master head.

  * Splitted Antonios' patch set to its original 4 individual patches.
(was merged previously as only 1 patch).

  * Addressed discussed issues directly on Antonios' patches
(was a separate patch before).

  * Added virtfs command line option "remap_inodes": Unless this option
is not enabled, no inode remapping is performed at all, the user
just gets an error message when trying to use more than 1 device
per export.

  * Dropped persistency feature of QIDs beyond reboots.

  * Dropped disputed "vii" feature.

Greg, please check if I am doing anything superfluous in patch 3 regarding
the new command line parameter "remap_inodes".

Daniel, I also have a libvirt patch for this new "remap_inodes" command
line parameter, but I guess I wait for this qemu patch set to get through.

Christian Schoenebeck (5):
  9p: unsigned type for type, version, path
  9p: Treat multiple devices on one export as an error
  9p: Added virtfs option "remap_inodes"
  9p: stat_to_qid: implement slow path
  9p: Use variable length suffixes for inode remapping

 fsdev/9p-marshal.h  |   6 +-
 fsdev/file-op-9p.h  |   1 +
 fsdev/qemu-fsdev-opts.c |   7 +-
 fsdev/qemu-fsdev.c  |   6 +
 hw/9pfs/9p.c| 448 +---
 hw/9pfs/9p.h|  83 +
 hw/9pfs/trace-events|  14 +-
 qemu-options.hx |  17 +-
 vl.c|   3 +
 9 files changed, 550 insertions(+), 35 deletions(-)

-- 
2.11.0




[Qemu-devel] [PATCH v4 4/5] 9p: stat_to_qid: implement slow path

2019-06-26 Thread Christian Schoenebeck via Qemu-devel
stat_to_qid attempts via qid_path_prefixmap to map unique files (which are
identified by 64 bit inode nr and 32 bit device id) to a 64 QID path value.
However this implementation makes some assumptions about inode number
generation on the host.

If qid_path_prefixmap fails, we still have 48 bits available in the QID
path to fall back to a less memory efficient full mapping.

Signed-off-by: Antonios Motakis 
Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p.c | 63 +++-
 hw/9pfs/9p.h |  9 +
 2 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 7ccc68a829..e6e410972f 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -579,23 +579,69 @@ static uint32_t qpp_hash(QppEntry e)
 return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0);
 }
 
+static uint32_t qpf_hash(QpfEntry e)
+{
+return qemu_xxhash7(e.ino, e.dev, 0, 0, 0);
+}
+
 static bool qpp_lookup_func(const void *obj, const void *userp)
 {
 const QppEntry *e1 = obj, *e2 = userp;
 return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix;
 }
 
-static void qpp_table_remove(void *p, uint32_t h, void *up)
+static bool qpf_lookup_func(const void *obj, const void *userp)
+{
+const QpfEntry *e1 = obj, *e2 = userp;
+return e1->dev == e2->dev && e1->ino == e2->ino;
+}
+
+static void qp_table_remove(void *p, uint32_t h, void *up)
 {
 g_free(p);
 }
 
-static void qpp_table_destroy(struct qht *ht)
+static void qp_table_destroy(struct qht *ht)
 {
-qht_iter(ht, qpp_table_remove, NULL);
+qht_iter(ht, qp_table_remove, NULL);
 qht_destroy(ht);
 }
 
+static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf,
+uint64_t *path)
+{
+QpfEntry lookup = {
+.dev = stbuf->st_dev,
+.ino = stbuf->st_ino
+}, *val;
+uint32_t hash = qpf_hash(lookup);
+
+/* most users won't need the fullmap, so init the table lazily */
+if (!pdu->s->qpf_table.map) {
+qht_init(&pdu->s->qpf_table, qpf_lookup_func, 1 << 16, 
QHT_MODE_AUTO_RESIZE);
+}
+
+val = qht_lookup(&pdu->s->qpf_table, &lookup, hash);
+
+if (!val) {
+if (pdu->s->qp_fullpath_next == 0) {
+/* no more files can be mapped :'( */
+return -ENFILE;
+}
+
+val = g_malloc0(sizeof(QppEntry));
+*val = lookup;
+
+/* new unique inode and device combo */
+val->path = pdu->s->qp_fullpath_next++;
+pdu->s->qp_fullpath_next &= QPATH_INO_MASK;
+qht_insert(&pdu->s->qpf_table, val, hash, NULL);
+}
+
+*path = val->path;
+return 0;
+}
+
 /* stat_to_qid needs to map inode number (64 bits) and device id (32 bits)
  * to a unique QID path (64 bits). To avoid having to map and keep track
  * of up to 2^64 objects, we map only the 16 highest bits of the inode plus
@@ -642,6 +688,10 @@ static int stat_to_qid(V9fsPDU *pdu, const struct stat 
*stbuf, V9fsQID *qidp)
 if (pdu->s->ctx.export_flags & V9FS_REMAP_INODES) {
 /* map inode+device to qid path (fast path) */
 err = qid_path_prefixmap(pdu, stbuf, &qidp->path);
+if (err == -ENFILE) {
+/* fast path didn't work, fall back to full map */
+err = qid_path_fullmap(pdu, stbuf, &qidp->path);
+}
 if (err) {
 return err;
 }
@@ -3752,6 +3802,7 @@ int v9fs_device_realize_common(V9fsState *s, const 
V9fsTransport *t,
 /* QID path hash table. 1 entry ought to be enough for anybody ;) */
 qht_init(&s->qpp_table, qpp_lookup_func, 1, QHT_MODE_AUTO_RESIZE);
 s->qp_prefix_next = 1; /* reserve 0 to detect overflow */
+s->qp_fullpath_next = 1;
 
 s->ctx.fst = &fse->fst;
 fsdev_throttle_init(s->ctx.fst);
@@ -3766,7 +3817,8 @@ out:
 }
 g_free(s->tag);
 g_free(s->ctx.fs_root);
-qpp_table_destroy(&s->qpp_table);
+qp_table_destroy(&s->qpp_table);
+qp_table_destroy(&s->qpf_table);
 v9fs_path_free(&path);
 }
 return rc;
@@ -3779,7 +3831,8 @@ void v9fs_device_unrealize_common(V9fsState *s, Error 
**errp)
 }
 fsdev_throttle_cleanup(s->ctx.fst);
 g_free(s->tag);
-qpp_table_destroy(&s->qpp_table);
+qp_table_destroy(&s->qpp_table);
+qp_table_destroy(&s->qpf_table);
 g_free(s->ctx.fs_root);
 }
 
diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h
index 0200e04176..2b74561030 100644
--- a/hw/9pfs/9p.h
+++ b/hw/9pfs/9p.h
@@ -245,6 +245,13 @@ typedef struct {
 uint16_t qp_prefix;
 } QppEntry;
 
+/* QID path full entry, as above */
+typedef struct {
+dev_t dev;
+ino_t ino;
+uint64_t path;
+} QpfEntry;
+
 struct V9fsState
 {
 QLIST_HEAD(, V9fsPDU) free_list;
@@ -268,7 +275,9 @@ struct V9fsState
 V9fsQID root_qid;
 dev_t dev_id;
 struct qht qpp_table;
+struct qht qpf_table;
 uint16_t qp_prefix_next;
+uint64_t qp_fullpath_next;
 };
 
 /* 9p2000.L open flags */
-- 
2.11.0




  1   2   3   4   5   6   7   8   9   10   >