Re: [Qemu-devel] virtio-blk broken after system reset

2010-11-12 Thread Jan Kiszka
Am 13.11.2010 08:49, Stefan Hajnoczi wrote:
> On Fri, Nov 12, 2010 at 10:02 PM, Jan Kiszka  wrote:
>> Hi,
>>
>> both after hard and guest-initiated reset, something is seriously broken
>> with virtio block devices. If I reset my Linux guest while still in
>> grub, the bios will simply fail to read from the disk after the reboot. If I
>> reset after Linux touched the device, qemu terminates:
>>
>> Breakpoint 1, 0x74b945b0 in _exit () from /lib64/libc.so.6
>> (gdb) bt
>> #0  0x74b945b0 in _exit () from /lib64/libc.so.6
>> #1  0x74b2948d in __run_exit_handlers () from /lib64/libc.so.6
>> #2  0x74b29535 in exit () from /lib64/libc.so.6
>> #3  0x00568da3 in virtqueue_num_heads (vq=0x17040e0, idx=0) at 
>> /data/qemu/hw/virtio.c:258
>> #4  0x00569511 in virtqueue_pop (vq=0x17040e0, elem=0x17cea58) at 
>> /data/qemu/hw/virtio.c:388
>> #5  0x00419e31 in virtio_blk_get_request (s=0x1704010) at 
>> /data/qemu/hw/virtio-blk.c:132
>> #6  virtio_blk_handle_output (vdev=0x1704010, vq=) at 
>> /data/qemu/hw/virtio-blk.c:369
>>
>> This is with current qemu.git head, haven't tried older versions. Known bug?
> 
> This is a known issue.  Gleb has posted a SeaBIOS fix:
> 
> http://www.mail-archive.com/qemu-devel@nongnu.org/msg45849.html
> 
> Currently the patch only appears in SeaBIOS master.  Gleb and Kevin
> have discussed putting it into 0.6.1.2 stable (see linked thread).
> QEMU should then pick that release up.

Ah, good.

And what about the guest-triggerable qemu exit above?

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] virtio-blk broken after system reset

2010-11-12 Thread Stefan Hajnoczi
On Fri, Nov 12, 2010 at 10:02 PM, Jan Kiszka  wrote:
> Hi,
>
> both after hard and guest-initiated reset, something is seriously broken
> with virtio block devices. If I reset my Linux guest while still in
> grub, the bios will simply fail to read from the disk after the reboot. If I
> reset after Linux touched the device, qemu terminates:
>
> Breakpoint 1, 0x74b945b0 in _exit () from /lib64/libc.so.6
> (gdb) bt
> #0  0x74b945b0 in _exit () from /lib64/libc.so.6
> #1  0x74b2948d in __run_exit_handlers () from /lib64/libc.so.6
> #2  0x74b29535 in exit () from /lib64/libc.so.6
> #3  0x00568da3 in virtqueue_num_heads (vq=0x17040e0, idx=0) at 
> /data/qemu/hw/virtio.c:258
> #4  0x00569511 in virtqueue_pop (vq=0x17040e0, elem=0x17cea58) at 
> /data/qemu/hw/virtio.c:388
> #5  0x00419e31 in virtio_blk_get_request (s=0x1704010) at 
> /data/qemu/hw/virtio-blk.c:132
> #6  virtio_blk_handle_output (vdev=0x1704010, vq=) at 
> /data/qemu/hw/virtio-blk.c:369
>
> This is with current qemu.git head, haven't tried older versions. Known bug?

This is a known issue.  Gleb has posted a SeaBIOS fix:

http://www.mail-archive.com/qemu-devel@nongnu.org/msg45849.html

Currently the patch only appears in SeaBIOS master.  Gleb and Kevin
have discussed putting it into 0.6.1.2 stable (see linked thread).
QEMU should then pick that release up.

Stefan



[Qemu-devel] [Bug 510612] Re: sound broken in qemu 0.12.x

2010-11-12 Thread Bug Watch Updater
** Changed in: qemu-kvm (Debian)
   Status: Fix Committed => Fix Released

-- 
sound broken in qemu 0.12.x
https://bugs.launchpad.net/bugs/510612
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New
Status in “qemu” package in Debian: New
Status in “qemu-kvm” package in Debian: Fix Released

Bug description:
In qemu 0.12.x there is a jitter when sound is played by guest Windows XP SP2. 
Also there are warnings on console:

alsa: Unexpected state 1
alsa: Unexpected state 1

and qemu tries to eat 100% cpu. According to strace it polls alsa device 
continuously:

futex(0x849e60, FUTEX_WAKE_PRIVATE, 1)  = 1
select(23, [6 10 12 20 22], [18], [], {1, 0}) = 1 (out [18], left {0, 97})
poll([{fd=18, events=POLLOUT|POLLERR|POLLNVAL}], 1, 0) = 1 ([{fd=18, 
revents=POLLOUT|POLLERR}])
write(2, "alsa: "..., 6)= 6
write(2, "Unexpected state 1\n"..., 19) = 19

The bug appears with ac97 and es1370 guest sound devices (i didn't test others).

With OSS host sound driver there are no warnings and consumption of cpu by qemu 
is normal, but jitter in the sound is still there.





[Qemu-devel] [Bug 595438] Re: KVM segmentation fault, using SCSI+writeback and linux 2.4 guest

2010-11-12 Thread Bug Watch Updater
** Changed in: qemu-kvm (Debian)
   Status: Confirmed => Fix Released

-- 
KVM segmentation fault, using SCSI+writeback and linux 2.4 guest
https://bugs.launchpad.net/bugs/595438
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in Kernel Virtual Machine: Confirmed
Status in QEMU: Fix Committed
Status in qemu-kvm: Fix Released
Status in “qemu-kvm” package in Ubuntu: Fix Released
Status in “qemu-kvm” source package in Lucid: Fix Committed
Status in “qemu-kvm” package in Debian: Fix Released

Bug description:
I Use Ubuntu 32 bit 10.04 with standard KVM.
I have Intel E7600  @ 3.06GHz processor with VMX

In this system I Run:
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -M pc-0.12 -enable-kvm -m 256 -smp 1 -name 
spamsender -uuid b9cacd5e-08f7-41fd-78c8-89cec59af881 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/spamsender.monitor,server,nowait 
-monitor chardev:monitor -boot d -drive 
file=/mnt/megadiff/cdiso_400_130.iso,if=ide,media=cdrom,index=2 -drive 
file=/home/mmarkk/spamsender2.img,if=scsi,index=0,format=qcow2,cache=writeback 
-net nic,macaddr=00:00:00:00:00:00,vlan=0,name=nic.0 -net tap,vlan=0,name=tap.0 
-chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -vnc 
127.0.0.1:0 -vga cirrus

.iso image contain custom distro of 2.4-linux kernel based system. During 
install process (when .tar.gz actively unpacked), kvm dead with segmentation 
fault.

And ONLY when I choose scsi virtual disk and writeback simultaneously.
But, writeback+ide, writethrough+scsi works OK.

I use qcow2. It seems, that qcow does not have such problems.

Virtual machine get down at random time during file copy. It seems, when qcow2 
file size need to be expanded.

IMPACT: kvm used with scsi virtual disk and writeback dies with segfault.

FIX: is the inclusion of a patch cherry-picked from upstream which dequeues
requests before invoking callbacks.  It is at
http://bazaar.launchpad.net/~serge-hallyn/ubuntu/lucid/qemu-kvm/fix-scsi-writeback/revision/70

TO REPRODUCE: See the command above.

REGRESSION POTENTIAL: this is cherry-picked from upstream, and has been
tested by the bug reporter with no ill effects.






[Qemu-devel] [Bug 674740] Re: qemu segfaults when security_model=none using virtio-9p-pci driver

2010-11-12 Thread Mike Ryan
Add the space in 'thesecurity'.

** Patch added: "0002-include-a-missing-space-in-and-error-message.patch"
   
https://bugs.launchpad.net/qemu/+bug/674740/+attachment/1731776/+files/0002-include-a-missing-space-in-and-error-message.patch

-- 
qemu segfaults when security_model=none using virtio-9p-pci driver
https://bugs.launchpad.net/bugs/674740
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
qemu version: 0.13
commit-id: 6ed912999d6ef636a5be5ccb266d7d3c0f0310b4

example invocation:
$ qemu -virtfs local,path=/tmp,security_model=none,mount_tag=mmm r.img
one of the following must be specified as thesecurity option:
 security_model=passthrough 
 security_model=mapped
Segmentation fault

Patch is attached. Also attached is a patch that addes the space between 'the' 
and 'security' in 'thesecurity'.

Does not affect trunk.





[Qemu-devel] [Bug 674740] Re: qemu segfaults when security_model=none using virtio-9p-pci driver

2010-11-12 Thread Mike Ryan

** Patch added: "patch to fix the bug"
   
https://bugs.launchpad.net/bugs/674740/+attachment/1731775/+files/0001-return-an-error-code-if-the-virtio-9p-pci-driver-can.patch

-- 
qemu segfaults when security_model=none using virtio-9p-pci driver
https://bugs.launchpad.net/bugs/674740
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
qemu version: 0.13
commit-id: 6ed912999d6ef636a5be5ccb266d7d3c0f0310b4

example invocation:
$ qemu -virtfs local,path=/tmp,security_model=none,mount_tag=mmm r.img
one of the following must be specified as thesecurity option:
 security_model=passthrough 
 security_model=mapped
Segmentation fault

Patch is attached. Also attached is a patch that addes the space between 'the' 
and 'security' in 'thesecurity'.

Does not affect trunk.





[Qemu-devel] [Bug 674740] [NEW] qemu segfaults when security_model=none using virtio-9p-pci driver

2010-11-12 Thread Mike Ryan
Public bug reported:

qemu version: 0.13
commit-id: 6ed912999d6ef636a5be5ccb266d7d3c0f0310b4

example invocation:
$ qemu -virtfs local,path=/tmp,security_model=none,mount_tag=mmm r.img
one of the following must be specified as thesecurity option:
 security_model=passthrough 
 security_model=mapped
Segmentation fault

Patch is attached. Also attached is a patch that addes the space between
'the' and 'security' in 'thesecurity'.

Does not affect trunk.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
qemu segfaults when security_model=none using virtio-9p-pci driver
https://bugs.launchpad.net/bugs/674740
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
qemu version: 0.13
commit-id: 6ed912999d6ef636a5be5ccb266d7d3c0f0310b4

example invocation:
$ qemu -virtfs local,path=/tmp,security_model=none,mount_tag=mmm r.img
one of the following must be specified as thesecurity option:
 security_model=passthrough 
 security_model=mapped
Segmentation fault

Patch is attached. Also attached is a patch that addes the space between 'the' 
and 'security' in 'thesecurity'.

Does not affect trunk.





Re: [Qemu-devel] [PATCH 4/8] ARM: Return correct result for float-to-integer conversion of NaN

2010-11-12 Thread Peter Maydell
On 11 November 2010 19:21, Nathan Froyd  wrote:
> On Thu, Nov 11, 2010 at 06:23:58PM +, Peter Maydell wrote:
>> The ARM architecture mandates that converting a NaN value to
>> integer gives zero. This isn't the behaviour of the SoftFloat
>> library, so NaNs must be special-cased.
>
> This is correct, but it's really only correct if FP traps are disabled.

True (in that if you take a trap you don't return any result at all),
but isn't it a bit of a red herring for this patchset given that qemu
doesn't implement trapping on FP exceptions?

I can make the commit message read "gives zero (if Invalid
Operation FP exceptions are not being trapped)" if you think
that would be clearer.

-- PMM



[Qemu-devel] virtio-blk broken after system reset

2010-11-12 Thread Jan Kiszka
Hi,

both after hard and guest-initiated reset, something is seriously broken
with virtio block devices. If I reset my Linux guest while still in
grub, the bios will simply fail to read from the disk after the reboot. If I
reset after Linux touched the device, qemu terminates:

Breakpoint 1, 0x74b945b0 in _exit () from /lib64/libc.so.6
(gdb) bt
#0  0x74b945b0 in _exit () from /lib64/libc.so.6
#1  0x74b2948d in __run_exit_handlers () from /lib64/libc.so.6
#2  0x74b29535 in exit () from /lib64/libc.so.6
#3  0x00568da3 in virtqueue_num_heads (vq=0x17040e0, idx=0) at 
/data/qemu/hw/virtio.c:258
#4  0x00569511 in virtqueue_pop (vq=0x17040e0, elem=0x17cea58) at 
/data/qemu/hw/virtio.c:388
#5  0x00419e31 in virtio_blk_get_request (s=0x1704010) at 
/data/qemu/hw/virtio-blk.c:132
#6  virtio_blk_handle_output (vdev=0x1704010, vq=) at 
/data/qemu/hw/virtio-blk.c:369

This is with current qemu.git head, haven't tried older versions. Known bug?

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] No Virtual Console

2010-11-12 Thread Mulyadi Santosa
On Fri, Nov 12, 2010 at 16:49,   wrote:
> Yes - that's it! The SDL window. I can't get it to show up ... any ideas?

make sure you have "SDL" package installed. Also, if you compile Qemu
by your own, please install "SDL-devel" package before ./configure and
make.

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com



[Qemu-devel] [PATCH v2 2/9] pci: Remove pci_enable_capability_support()

2010-11-12 Thread Alex Williamson
This interface doesn't make much sense, adding a capability can
take care of everything, just provide a means to register
capability read/write handlers.

Device assignment does it's own thing, so requires a couple
ugly hacks that will be cleaned by subsequent patches.

Signed-off-by: Alex Williamson 
---

 hw/device-assignment.c |   12 ---
 hw/pci.c   |   52 +---
 hw/pci.h   |9 +++-
 3 files changed, 35 insertions(+), 38 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index bde231d..311c197 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1292,7 +1292,12 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 PCIRegion *pci_region = dev->real_device.regions;
 int next_cap_pt = 0;
 
+pci_dev->cap.supported = 1;
+pci_dev->cap.start = PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR;
 pci_dev->cap.length = 0;
+pci_dev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
+pci_dev->config[PCI_CAPABILITY_LIST] = pci_dev->cap.start;
+
 #ifdef KVM_CAP_IRQ_ROUTING
 #ifdef KVM_CAP_DEVICE_MSI
 /* Expose MSI capability
@@ -1468,9 +1473,10 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 dev->h_busnr = dev->host.bus;
 dev->h_devfn = PCI_DEVFN(dev->host.dev, dev->host.func);
 
-if (pci_enable_capability_support(pci_dev, 0, NULL,
-assigned_device_pci_cap_write_config,
-assigned_device_pci_cap_init) < 0)
+pci_register_capability_handlers(pci_dev, NULL,
+ assigned_device_pci_cap_write_config);
+
+if (assigned_device_pci_cap_init(pci_dev) < 0)
 goto out;
 
 /* assign device to guest */
diff --git a/hw/pci.c b/hw/pci.c
index 4bc5882..607eb27 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -787,6 +787,8 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 config_write = pci_default_write_config;
 pci_dev->config_read = config_read;
 pci_dev->config_write = config_write;
+pci_dev->cap.config_read = pci_default_cap_read_config;
+pci_dev->cap.config_write = pci_default_cap_write_config;
 bus->devices[devfn] = pci_dev;
 pci_dev->irq = qemu_allocate_irqs(pci_set_irq, pci_dev, PCI_NUM_PINS);
 pci_dev->version_id = 2; /* Current pci device vmstate version */
@@ -1894,35 +1896,21 @@ PCIDevice *pci_create_simple_multifunction(PCIBus *bus, 
int devfn,
 return dev;
 }
 
-int pci_enable_capability_support(PCIDevice *pci_dev,
-  uint32_t config_start,
-  PCICapConfigReadFunc *config_read,
-  PCICapConfigWriteFunc *config_write,
-  PCICapConfigInitFunc *config_init)
+void pci_register_capability_handlers(PCIDevice *pdev,
+  PCICapConfigReadFunc *config_read,
+  PCICapConfigWriteFunc *config_write)
 {
-if (!pci_dev)
-return -ENODEV;
-
-pci_dev->config[0x06] |= 0x10; // status = capabilities
-
-if (config_start == 0)
-   pci_dev->cap.start = PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR;
-else if (config_start >= 0x40 && config_start < 0xff)
-pci_dev->cap.start = config_start;
-else
-return -EINVAL;
+if (config_read) {
+pdev->cap.config_read = config_read;
+} else {
+pdev->cap.config_read = pci_default_cap_read_config;
+}
 
-if (config_read)
-pci_dev->cap.config_read = config_read;
-else
-pci_dev->cap.config_read = pci_default_cap_read_config;
-if (config_write)
-pci_dev->cap.config_write = config_write;
-else
-pci_dev->cap.config_write = pci_default_cap_write_config;
-pci_dev->cap.supported = 1;
-pci_dev->config[PCI_CAPABILITY_LIST] = pci_dev->cap.start;
-return config_init(pci_dev);
+if (config_write) {
+pdev->cap.config_write = config_write;
+} else {
+pdev->cap.config_write = pci_default_cap_write_config;
+}
 }
 
 PCIDevice *pci_create(PCIBus *bus, int devfn, const char *name)
@@ -2046,12 +2034,16 @@ int pci_add_capability_at_offset(PCIDevice *pdev, 
uint8_t cap_id,
 config[PCI_CAP_LIST_ID] = cap_id;
 config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
 pdev->config[PCI_CAPABILITY_LIST] = offset;
-pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
 memset(pdev->used + offset, 0xFF, size);
 /* Make capability read-only by default */
 memset(pdev->wmask + offset, 0, size);
 /* Check capability by default */
 memset(pdev->cmask + offset, 0xFF, size);
+
+pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
+pdev->cap.supported = 1;
+pdev->cap.start = pdev->cap.start ? MIN(pdev->cap.start, offset) : offset;
+
 return offset;
 }
 
@@ -2079,8 +2071,10 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, 

[Qemu-devel] [PATCH v2 9/9] pci: Store capability offsets in PCIDevice

2010-11-12 Thread Alex Williamson
This not only makes pci_find_capability a directly lookup, but also
allows us to better track added capabilities and avoids the proliferation
of random additional capability offset markers.

Signed-off-by: Alex Williamson 
---

 hw/msix.c |   15 +++
 hw/pci.c  |   20 ++--
 hw/pci.h  |5 +++--
 3 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index b98b34a..060f27b 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -204,7 +204,6 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned 
short nentries,
 pci_set_long(config + MSIX_PBA_OFFSET, (bar_size + MSIX_PAGE_PENDING) |
  bar_nr);
 }
-pdev->msix_cap = config_offset;
 /* Make flags bit writeable. */
 pdev->wmask[config_offset + MSIX_CONTROL_OFFSET] |= MSIX_ENABLE_MASK |
MSIX_MASKALL_MASK;
@@ -253,7 +252,8 @@ static void msix_clr_pending(PCIDevice *dev, int vector)
 
 static int msix_function_masked(PCIDevice *dev)
 {
-return dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] & 
MSIX_MASKALL_MASK;
+return dev->config[dev->caps[PCI_CAP_ID_MSIX] +
+   MSIX_CONTROL_OFFSET] & MSIX_MASKALL_MASK;
 }
 
 static int msix_is_masked(PCIDevice *dev, int vector)
@@ -275,7 +275,7 @@ static void msix_handle_mask_update(PCIDevice *dev, int 
vector)
 void msix_write_config(PCIDevice *dev, uint32_t addr,
uint32_t val, int len)
 {
-unsigned enable_pos = dev->msix_cap + MSIX_CONTROL_OFFSET;
+unsigned enable_pos = dev->caps[PCI_CAP_ID_MSIX] + MSIX_CONTROL_OFFSET;
 int vector;
 
 if (!range_covers_byte(addr, len, enable_pos)) {
@@ -334,7 +334,7 @@ static CPUReadMemoryFunc * const msix_mmio_read[] = {
 void msix_mmio_map(PCIDevice *d, int region_num,
pcibus_t addr, pcibus_t size, int type)
 {
-uint8_t *config = d->config + d->msix_cap;
+uint8_t *config = d->config + d->caps[PCI_CAP_ID_MSIX];
 uint32_t table = pci_get_long(config + MSIX_TABLE_OFFSET);
 uint32_t offset = table & ~(MSIX_PAGE_SIZE - 1);
 /* TODO: for assigned devices, we'll want to make it possible to map
@@ -437,7 +437,6 @@ int msix_uninit(PCIDevice *dev)
 if (!(dev->cap_present & QEMU_PCI_CAP_MSIX))
 return 0;
 pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
-dev->msix_cap = 0;
 msix_free_irq_entries(dev);
 dev->msix_entries_nr = 0;
 cpu_unregister_io_memory(dev->msix_mmio_index);
@@ -493,7 +492,7 @@ int msix_present(PCIDevice *dev)
 int msix_enabled(PCIDevice *dev)
 {
 return (dev->cap_present & QEMU_PCI_CAP_MSIX) &&
-(dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
+(dev->config[dev->caps[PCI_CAP_ID_MSIX] + MSIX_CONTROL_OFFSET] &
  MSIX_ENABLE_MASK);
 }
 
@@ -534,8 +533,8 @@ void msix_reset(PCIDevice *dev)
 if (!(dev->cap_present & QEMU_PCI_CAP_MSIX))
 return;
 msix_free_irq_entries(dev);
-dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &=
-   ~dev->wmask[dev->msix_cap + MSIX_CONTROL_OFFSET];
+dev->config[dev->caps[PCI_CAP_ID_MSIX] + MSIX_CONTROL_OFFSET] &=
+   ~dev->wmask[dev->caps[PCI_CAP_ID_MSIX] + MSIX_CONTROL_OFFSET];
 memset(dev->msix_table_page, 0, MSIX_PAGE_SIZE);
 msix_mask_all(dev, dev->msix_entries_nr);
 }
diff --git a/hw/pci.c b/hw/pci.c
index bc25be7..773afa5 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1990,15 +1990,24 @@ int pci_add_capability_at_offset(PCIDevice *pdev, 
uint8_t cap_id,
 {
 uint8_t i, *config = pdev->config + offset;
 
+/* Check overlap with existing capabilities, valid cap, already added */
 for (i = 0; i < size; i++) {
 if (pdev->config_map[offset + i]) {
 return -EFAULT;
 }
 }
 
+if (!cap_id || cap_id > PCI_CAP_ID_MAX) {
+return -EINVAL;
+}
+
+if (pdev->caps[cap_id]) {
+return -EFAULT;
+}
+
 config[PCI_CAP_LIST_ID] = cap_id;
 config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
-pdev->config[PCI_CAPABILITY_LIST] = offset;
+pdev->caps[cap_id] = pdev->config[PCI_CAPABILITY_LIST] = offset;
 memset(pdev->config_map + offset, cap_id, size);
 /* Make capability read-only by default */
 memset(pdev->wmask + offset, 0, size);
@@ -2033,6 +2042,7 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, 
uint8_t size)
 /* Clear cmask as device-specific registers can't be checked */
 memset(pdev->cmask + offset, 0, size);
 memset(pdev->config_map + offset, 0, size);
+pdev->caps[cap_id] = 0;
 
 if (!pdev->config[PCI_CAPABILITY_LIST]) {
 pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
@@ -2041,7 +2051,13 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, 
uint8_t size)
 
 uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id)
 {
-return pci_find_capability_list(pdev, cap_id, NULL);
+if (cap_id == PCI_CAP_ID_BASIC) {
+return 0;
+}
+if (cap_id && cap_id <= PCI_CAP_ID_MAX

[Qemu-devel] [PATCH v2 8/9] pci: Remove capability read/write config handlers

2010-11-12 Thread Alex Williamson
These are just as easy to handle out of the main config read/write
handlers.  Also expand cap_map to config_map so we can use it to
track all of config space.

Signed-off-by: Alex Williamson 
---

 hw/device-assignment.c |   22 +++-
 hw/pci.c   |   66 
 hw/pci.h   |   25 +++---
 3 files changed, 35 insertions(+), 78 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 179c7dc..85fa50d 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -63,6 +63,11 @@ static void assigned_dev_load_option_rom(AssignedDevice 
*dev);
 
 static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
 
+static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
+ uint8_t cap_id,
+ uint32_t address,
+ uint32_t val, int len);
+
 static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
uint32_t addr, int len, uint32_t *val)
 {
@@ -400,20 +405,25 @@ static void assigned_dev_pci_write_config(PCIDevice *d, 
uint32_t address,
 {
 int fd;
 ssize_t ret;
+uint8_t cap_id = pci_get_cap_id(d, address);
 AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
 
 DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
   ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
   (uint16_t) address, val, len);
 
+if (cap_id && cap_id != PCI_CAP_ID_BASIC) {
+return assigned_device_pci_cap_write_config(d, cap_id, address,
+val, len);
+}
+
 if (address == 0x4) {
 pci_default_write_config(d, address, val, len);
 /* Continue to program the card */
 }
 
 if ((address >= 0x10 && address <= 0x24) || address == 0x30 ||
-address == 0x34 || address == 0x3c || address == 0x3d ||
-pci_access_cap_config(d, address, len)) {
+address == 0x34 || address == 0x3c || address == 0x3d) {
 /* used for update-mappings (BAR emulation) */
 pci_default_write_config(d, address, val, len);
 return;
@@ -443,13 +453,14 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice 
*d, uint32_t address,
 {
 uint32_t val = 0;
 int fd;
+uint8_t cap_id = pci_get_cap_id(d, address);
 ssize_t ret;
 AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
 
 if (address < 0x4 || (pci_dev->need_emulate_cmd && address == 0x4) ||
(address >= 0x10 && address <= 0x24) || address == 0x30 ||
 address == 0x34 || address == 0x3c || address == 0x3d ||
-pci_access_cap_config(d, address, len)) {
+(cap_id && cap_id != PCI_CAP_ID_BASIC)) {
 val = pci_default_read_config(d, address, len);
 DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
   (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
@@ -1249,7 +1260,7 @@ static void 
assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
  uint32_t address,
  uint32_t val, int len)
 {
-pci_default_cap_write_config(pci_dev, cap_id, address, val, len);
+pci_default_write_config(pci_dev, address, val, len);
 
 switch (cap_id) {
 #ifdef KVM_CAP_IRQ_ROUTING
@@ -1466,9 +1477,6 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 dev->h_busnr = dev->host.bus;
 dev->h_devfn = PCI_DEVFN(dev->host.dev, dev->host.func);
 
-pci_register_capability_handlers(pci_dev, NULL,
- assigned_device_pci_cap_write_config);
-
 if (assigned_device_pci_cap_init(pci_dev) < 0)
 goto out;
 
diff --git a/hw/pci.c b/hw/pci.c
index 337afc4..bc25be7 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -730,7 +730,7 @@ static void pci_config_alloc(PCIDevice *pci_dev)
 pci_dev->config = qemu_mallocz(config_size);
 pci_dev->cmask = qemu_mallocz(config_size);
 pci_dev->wmask = qemu_mallocz(config_size);
-pci_dev->cap_map = qemu_mallocz(config_size);
+pci_dev->config_map = qemu_mallocz(config_size);
 }
 
 static void pci_config_free(PCIDevice *pci_dev)
@@ -738,7 +738,7 @@ static void pci_config_free(PCIDevice *pci_dev)
 qemu_free(pci_dev->config);
 qemu_free(pci_dev->cmask);
 qemu_free(pci_dev->wmask);
-qemu_free(pci_dev->cap_map);
+qemu_free(pci_dev->config_map);
 }
 
 /* -1 for devfn means auto assign */
@@ -767,6 +767,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 pstrcpy(pci_dev->name, sizeof(pci_dev->name), name);
 pci_dev->irq_state = 0;
 pci_config_alloc(pci_dev);
+memset(pci_dev->config_map, PCI_CAP_ID_BASIC, PCI_CONFIG_HEADER_SIZE);
 
 if (!is_bridge) {
 pci_set_default_subsystem_id(pci_dev);
@@ -787,8 +788,6 @@ s

[Qemu-devel] [PATCH v2 6/9] device-assignment: Move PCI capabilities to match physical hardware

2010-11-12 Thread Alex Williamson
Now that common PCI code doesn't have a hangup on capabilities
being contiguous, move assigned device capabilities to match
their offset on physical hardware.  This helps for drivers that
assume a capability configuration and don't bother searching.

We can also remove several calls to assigned_dev_pci_read_* because
we're overlaying the capability at the same location as the initial
copy we made of config space.  We can therefore just use pci_get_*.

Signed-off-by: Alex Williamson 
---

 hw/device-assignment.c |   67 +++-
 1 files changed, 21 insertions(+), 46 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 322fa9f..39f19be 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -366,16 +366,6 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, 
int pos)
 return (uint8_t)assigned_dev_pci_read(d, pos, 1);
 }
 
-static uint16_t assigned_dev_pci_read_word(PCIDevice *d, int pos)
-{
-return (uint16_t)assigned_dev_pci_read(d, pos, 2);
-}
-
-static uint32_t assigned_dev_pci_read_long(PCIDevice *d, int pos)
-{
-return assigned_dev_pci_read(d, pos, 4);
-}
-
 static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap)
 {
 int id;
@@ -1285,6 +1275,7 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 {
 AssignedDevice *dev = container_of(pci_dev, AssignedDevice, dev);
 PCIRegion *pci_region = dev->real_device.regions;
+int pos;
 
 /* Clear initial capabilities pointer and status copied from hw */
 pci_set_byte(pci_dev->config + PCI_CAPABILITY_LIST, 0);
@@ -1296,60 +1287,44 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 #ifdef KVM_CAP_DEVICE_MSI
 /* Expose MSI capability
  * MSI capability is the 1st capability in capability config */
-if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI)) {
-int vpos, ppos;
-uint16_t flags;
-
+if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI))) {
 dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI;
-vpos = pci_add_capability(pci_dev, PCI_CAP_ID_MSI,
-  PCI_CAPABILITY_CONFIG_MSI_LENGTH);
-
-memset(pci_dev->config + vpos + PCI_CAP_FLAGS, 0,
-   PCI_CAPABILITY_CONFIG_MSI_LENGTH - PCI_CAP_FLAGS);
+pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_MSI, pos,
+ PCI_CAPABILITY_CONFIG_MSI_LENGTH);
 
 /* Only 32-bit/no-mask currently supported */
-ppos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI);
-flags = assigned_dev_pci_read_word(pci_dev, ppos + PCI_MSI_FLAGS);
-flags &= PCI_MSI_FLAGS_QMASK;
-pci_set_word(pci_dev->config + vpos + PCI_MSI_FLAGS, flags);
+pci_set_word(pci_dev->config + pos + PCI_MSI_FLAGS,
+ pci_get_word(pci_dev->config + pos + PCI_MSI_FLAGS) &
+ PCI_MSI_FLAGS_QMASK);
+pci_set_long(pci_dev->config + pos + PCI_MSI_ADDRESS_LO, 0);
+pci_set_word(pci_dev->config + pos + PCI_MSI_DATA_32, 0);
 
 /* Set writable fields */
-pci_set_word(pci_dev->wmask + vpos + PCI_MSI_FLAGS,
+pci_set_word(pci_dev->wmask + pos + PCI_MSI_FLAGS,
  PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
-pci_set_long(pci_dev->wmask + vpos + PCI_MSI_ADDRESS_LO, 0xfffc);
-pci_set_long(pci_dev->wmask + vpos + PCI_MSI_DATA_32, 0x);
+pci_set_long(pci_dev->wmask + pos + PCI_MSI_ADDRESS_LO, 0xfffc);
+pci_set_word(pci_dev->wmask + pos + PCI_MSI_DATA_32, 0x);
 }
 #endif
 #ifdef KVM_CAP_DEVICE_MSIX
 /* Expose MSI-X capability */
-if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX)) {
-int vpos, ppos, entry_nr, bar_nr;
+if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX))) {
+int bar_nr;
 uint32_t msix_table_entry;
 
 dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX;
-vpos = pci_add_capability(pci_dev, PCI_CAP_ID_MSIX,
-   PCI_CAPABILITY_CONFIG_MSIX_LENGTH);
+pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_MSIX, pos,
+ PCI_CAPABILITY_CONFIG_MSIX_LENGTH);
 
-memset(pci_dev->config + vpos + PCI_CAP_FLAGS, 0,
-   PCI_CAPABILITY_CONFIG_MSIX_LENGTH - PCI_CAP_FLAGS);
+pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS,
+ pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) &
+ PCI_MSIX_TABSIZE);
 
 /* Only enable and function mask bits are writable */
-pci_set_word(pci_dev->wmask + vpos + PCI_MSIX_FLAGS,
+pci_set_word(pci_dev->wmask + pos + PCI_MSIX_FLAGS,
  PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
 
-ppos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX);
-
-entry_nr = assigned_dev_pci_read_word(pci_dev, ppos + PCI_MSIX_FLAGS);
-entry_nr &= PCI_MSIX_TABSIZE;
-pci_se

[Qemu-devel] [PATCH v2 5/9] pci: Remove cap.length, cap.start, cap.supported

2010-11-12 Thread Alex Williamson
Capabilities aren't required to be contiguous, so cap.length never
really made much sense.  Likewise, cap.start is mostly meaningless
too.  Both of these are better served by the capability map.  We
can also get rid of cap.supported, since it's really now unused
and redundant with flag in the status word anyway.

Signed-off-by: Alex Williamson 
---

 hw/device-assignment.c |4 
 hw/pci.c   |8 +---
 hw/pci.h   |2 --
 3 files changed, 1 insertions(+), 13 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 74cdd26..322fa9f 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1292,8 +1292,6 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
  pci_get_word(pci_dev->config + PCI_STATUS) &
  ~PCI_STATUS_CAP_LIST);
 
-pci_dev->cap.length = 0;
-
 #ifdef KVM_CAP_IRQ_ROUTING
 #ifdef KVM_CAP_DEVICE_MSI
 /* Expose MSI capability
@@ -1320,7 +1318,6 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
  PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
 pci_set_long(pci_dev->wmask + vpos + PCI_MSI_ADDRESS_LO, 0xfffc);
 pci_set_long(pci_dev->wmask + vpos + PCI_MSI_DATA_32, 0x);
-pci_dev->cap.length += PCI_CAPABILITY_CONFIG_MSI_LENGTH;
 }
 #endif
 #ifdef KVM_CAP_DEVICE_MSIX
@@ -1356,7 +1353,6 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 bar_nr = msix_table_entry & PCI_MSIX_BIR;
 msix_table_entry &= ~PCI_MSIX_BIR;
 dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
-pci_dev->cap.length += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
 }
 #endif
 #endif
diff --git a/hw/pci.c b/hw/pci.c
index 80610b3..a0a6126 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1191,10 +1191,7 @@ static void pci_write_config_with_mask(PCIDevice *d, 
uint32_t addr,
 
 int pci_access_cap_config(PCIDevice *pci_dev, uint32_t address, int len)
 {
-if (pci_dev->cap.supported && address >= pci_dev->cap.start &&
-(address + len) < pci_dev->cap.start + pci_dev->cap.length)
-return 1;
-return 0;
+return pci_dev->cap_map[address];
 }
 
 uint32_t pci_default_cap_read_config(PCIDevice *pci_dev,
@@ -2041,8 +2038,6 @@ int pci_add_capability_at_offset(PCIDevice *pdev, uint8_t 
cap_id,
 memset(pdev->cmask + offset, 0xFF, size);
 
 pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
-pdev->cap.supported = 1;
-pdev->cap.start = pdev->cap.start ? MIN(pdev->cap.start, offset) : offset;
 
 return offset;
 }
@@ -2073,7 +2068,6 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, 
uint8_t size)
 
 if (!pdev->config[PCI_CAPABILITY_LIST]) {
 pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
-pdev->cap.start = pdev->cap.length = 0;
 }
 }
 
diff --git a/hw/pci.h b/hw/pci.h
index 2265c70..177008a 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -208,8 +208,6 @@ struct PCIDevice {
 
 /* Device capability configuration space */
 struct {
-int supported;
-unsigned int start, length;
 PCICapConfigReadFunc *config_read;
 PCICapConfigWriteFunc *config_write;
 } cap;




[Qemu-devel] [PATCH v2 4/9] pci: Replace used bitmap with capability byte map

2010-11-12 Thread Alex Williamson
Capabilities are allocated in bytes, so we can track both whether
a byte is used and by what capability in the same structure.

Remove pci_reserve_capability() as there are no users.

Signed-off-by: Alex Williamson 
---

 hw/pci.c |   16 +---
 hw/pci.h |6 ++
 2 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 607eb27..80610b3 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -730,7 +730,7 @@ static void pci_config_alloc(PCIDevice *pci_dev)
 pci_dev->config = qemu_mallocz(config_size);
 pci_dev->cmask = qemu_mallocz(config_size);
 pci_dev->wmask = qemu_mallocz(config_size);
-pci_dev->used = qemu_mallocz(config_size);
+pci_dev->cap_map = qemu_mallocz(config_size);
 }
 
 static void pci_config_free(PCIDevice *pci_dev)
@@ -738,7 +738,7 @@ static void pci_config_free(PCIDevice *pci_dev)
 qemu_free(pci_dev->config);
 qemu_free(pci_dev->cmask);
 qemu_free(pci_dev->wmask);
-qemu_free(pci_dev->used);
+qemu_free(pci_dev->cap_map);
 }
 
 /* -1 for devfn means auto assign */
@@ -1929,7 +1929,7 @@ static int pci_find_space(PCIDevice *pdev, uint8_t size)
 int offset = PCI_CONFIG_HEADER_SIZE;
 int i;
 for (i = PCI_CONFIG_HEADER_SIZE; i < config_size; ++i)
-if (pdev->used[i])
+if (pdev->cap_map[i])
 offset = i + 1;
 else if (i - offset + 1 == size)
 return offset;
@@ -2034,7 +2034,7 @@ int pci_add_capability_at_offset(PCIDevice *pdev, uint8_t 
cap_id,
 config[PCI_CAP_LIST_ID] = cap_id;
 config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
 pdev->config[PCI_CAPABILITY_LIST] = offset;
-memset(pdev->used + offset, 0xFF, size);
+memset(pdev->cap_map + offset, cap_id, size);
 /* Make capability read-only by default */
 memset(pdev->wmask + offset, 0, size);
 /* Check capability by default */
@@ -2069,7 +2069,7 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, 
uint8_t size)
 memset(pdev->wmask + offset, 0xff, size);
 /* Clear cmask as device-specific registers can't be checked */
 memset(pdev->cmask + offset, 0, size);
-memset(pdev->used + offset, 0, size);
+memset(pdev->cap_map + offset, 0, size);
 
 if (!pdev->config[PCI_CAPABILITY_LIST]) {
 pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
@@ -2077,12 +2077,6 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, 
uint8_t size)
 }
 }
 
-/* Reserve space for capability at a known offset (to call after load). */
-void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size)
-{
-memset(pdev->used + offset, 0xff, size);
-}
-
 uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id)
 {
 return pci_find_capability_list(pdev, cap_id, NULL);
diff --git a/hw/pci.h b/hw/pci.h
index 0712e55..2265c70 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -151,8 +151,8 @@ struct PCIDevice {
 /* Used to implement R/W bytes */
 uint8_t *wmask;
 
-/* Used to allocate config space for capabilities. */
-uint8_t *used;
+/* Used to allocate config space and track capabilities. */
+uint8_t *cap_map;
 
 /* the following fields are read only */
 PCIBus *bus;
@@ -239,8 +239,6 @@ int pci_add_capability_at_offset(PCIDevice *pci_dev, 
uint8_t cap_id,
 
 void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);
 
-void pci_reserve_capability(PCIDevice *pci_dev, uint8_t offset, uint8_t size);
-
 uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id);
 
 uint32_t pci_default_read_config(PCIDevice *d,




[Qemu-devel] [PATCH v2 7/9] pci: Pass ID for capability read/write handlers

2010-11-12 Thread Alex Williamson
Any handlers that actually want to interact with specific capabilities
are going to want to know the capability ID being accessed.  With the
capability map, this is readily available, so we can save handlers the
trouble of figuring it out.

Signed-off-by: Alex Williamson 
---

 hw/device-assignment.c |   36 ++--
 hw/pci.c   |   14 --
 hw/pci.h   |8 
 3 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 39f19be..179c7dc 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1244,30 +1244,38 @@ static void assigned_dev_update_msix(PCIDevice 
*pci_dev, unsigned int ctrl_pos)
 #endif
 #endif
 
-static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t 
address,
+static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
+ uint8_t cap_id,
+ uint32_t address,
  uint32_t val, int len)
 {
-AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
+pci_default_cap_write_config(pci_dev, cap_id, address, val, len);
 
-pci_default_cap_write_config(pci_dev, address, val, len);
+switch (cap_id) {
 #ifdef KVM_CAP_IRQ_ROUTING
+case PCI_CAP_ID_MSI:
 #ifdef KVM_CAP_DEVICE_MSI
-if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
-int pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSI);
-if (ranges_overlap(address, len, pos + PCI_MSI_FLAGS, 1)) {
-assigned_dev_update_msi(pci_dev, pos + PCI_MSI_FLAGS);
+{
+uint8_t cap = pci_find_capability(pci_dev, cap_id);
+if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
+assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
+}
 }
-}
 #endif
+break;
+
+case PCI_CAP_ID_MSIX:
 #ifdef KVM_CAP_DEVICE_MSIX
-if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
-int pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
-if (ranges_overlap(address, len, pos + PCI_MSIX_FLAGS + 1, 1)) {
-assigned_dev_update_msix(pci_dev, pos + PCI_MSIX_FLAGS);
-   }
-}
+{
+uint8_t cap = pci_find_capability(pci_dev, cap_id);
+if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
+assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
+}
+}
 #endif
+break;
 #endif
+}
 return;
 }
 
diff --git a/hw/pci.c b/hw/pci.c
index a0a6126..337afc4 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1168,10 +1168,11 @@ static uint32_t pci_read_config(PCIDevice *d,
 uint32_t pci_default_read_config(PCIDevice *d,
  uint32_t address, int len)
 {
+uint8_t cap_id;
 assert(len == 1 || len == 2 || len == 4);
 
-if (pci_access_cap_config(d, address, len)) {
-return d->cap.config_read(d, address, len);
+if ((cap_id = pci_access_cap_config(d, address, len))) {
+return d->cap.config_read(d, cap_id, address, len);
 }
 
 return pci_read_config(d, address, len);
@@ -1194,13 +1195,13 @@ int pci_access_cap_config(PCIDevice *pci_dev, uint32_t 
address, int len)
 return pci_dev->cap_map[address];
 }
 
-uint32_t pci_default_cap_read_config(PCIDevice *pci_dev,
+uint32_t pci_default_cap_read_config(PCIDevice *pci_dev, uint8_t cap_id,
  uint32_t address, int len)
 {
 return pci_read_config(pci_dev, address, len);
 }
 
-void pci_default_cap_write_config(PCIDevice *pci_dev,
+void pci_default_cap_write_config(PCIDevice *pci_dev, uint8_t cap_id,
   uint32_t address, uint32_t val, int len)
 {
 pci_write_config_with_mask(pci_dev, address, val, len);
@@ -1209,9 +1210,10 @@ void pci_default_cap_write_config(PCIDevice *pci_dev,
 void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
 {
 int was_irq_disabled = pci_irq_disabled(d);
+uint8_t cap_id;
 
-if (pci_access_cap_config(d, addr, l)) {
-d->cap.config_write(d, addr, val, l);
+if ((cap_id = pci_access_cap_config(d, addr, l))) {
+d->cap.config_write(d, cap_id, addr, val, l);
 return;
 }
 
diff --git a/hw/pci.h b/hw/pci.h
index 177008a..3f0b4e0 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -83,9 +83,9 @@ typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int 
region_num,
 pcibus_t addr, pcibus_t size, int type);
 typedef int PCIUnregisterFunc(PCIDevice *pci_dev);
 
-typedef void PCICapConfigWriteFunc(PCIDevice *pci_dev,
+typedef void PCICapConfigWriteFunc(PCIDevice *pci_dev, uint8_t cap_id,
uint32_t address, uint32_t val, int len);
-typedef uint32_t PCICapConfigReadFunc(PCIDevice *pci_dev,
+typedef uint32_

[Qemu-devel] [PATCH v2 3/9] device-assignment: Use PCI capabilities support

2010-11-12 Thread Alex Williamson
Convert to use common pci_add_capabilities() rather than creating
our own mess.

Signed-off-by: Alex Williamson 
---

 hw/device-assignment.c |  112 +++-
 1 files changed, 63 insertions(+), 49 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 311c197..74cdd26 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -38,6 +38,7 @@
 #include "device-assignment.h"
 #include "loader.h"
 #include "monitor.h"
+#include "range.h"
 #include 
 
 /* From linux/ioport.h */
@@ -1075,17 +1076,17 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, 
unsigned int ctrl_pos)
 }
 
 if (ctrl_byte & PCI_MSI_FLAGS_ENABLE) {
+int pos = ctrl_pos - PCI_MSI_FLAGS;
 assigned_dev->entry = calloc(1, sizeof(struct kvm_irq_routing_entry));
 if (!assigned_dev->entry) {
 perror("assigned_dev_update_msi: ");
 return;
 }
 assigned_dev->entry->u.msi.address_lo =
-*(uint32_t *)(pci_dev->config + pci_dev->cap.start +
-  PCI_MSI_ADDRESS_LO);
+pci_get_long(pci_dev->config + pos + PCI_MSI_ADDRESS_LO);
 assigned_dev->entry->u.msi.address_hi = 0;
-assigned_dev->entry->u.msi.data = *(uint16_t *)(pci_dev->config +
-pci_dev->cap.start + PCI_MSI_DATA_32);
+assigned_dev->entry->u.msi.data =
+pci_get_word(pci_dev->config + pos + PCI_MSI_DATA_32);
 assigned_dev->entry->type = KVM_IRQ_ROUTING_MSI;
 r = kvm_get_irq_route_gsi();
 if (r < 0) {
@@ -1123,10 +1124,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 struct kvm_assigned_msix_entry msix_entry;
 void *va = adev->msix_table_page;
 
-if (adev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
-pos = pci_dev->cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
-else
-pos = pci_dev->cap.start;
+pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
 
 entries_max_nr = *(uint16_t *)(pci_dev->config + pos + 2);
 entries_max_nr &= PCI_MSIX_TABSIZE;
@@ -1260,26 +1258,23 @@ static void 
assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
  uint32_t val, int len)
 {
 AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
-unsigned int pos = pci_dev->cap.start, ctrl_pos;
 
 pci_default_cap_write_config(pci_dev, address, val, len);
 #ifdef KVM_CAP_IRQ_ROUTING
 #ifdef KVM_CAP_DEVICE_MSI
 if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
-ctrl_pos = pos + PCI_MSI_FLAGS;
-if (address <= ctrl_pos && address + len > ctrl_pos)
-assigned_dev_update_msi(pci_dev, ctrl_pos);
-pos += PCI_CAPABILITY_CONFIG_MSI_LENGTH;
+int pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSI);
+if (ranges_overlap(address, len, pos + PCI_MSI_FLAGS, 1)) {
+assigned_dev_update_msi(pci_dev, pos + PCI_MSI_FLAGS);
+}
 }
 #endif
 #ifdef KVM_CAP_DEVICE_MSIX
 if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
-ctrl_pos = pos + 3;
-if (address <= ctrl_pos && address + len > ctrl_pos) {
-ctrl_pos--; /* control is word long */
-assigned_dev_update_msix(pci_dev, ctrl_pos);
+int pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
+if (ranges_overlap(address, len, pos + PCI_MSIX_FLAGS + 1, 1)) {
+assigned_dev_update_msix(pci_dev, pos + PCI_MSIX_FLAGS);
}
-pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
 }
 #endif
 #endif
@@ -1290,58 +1285,77 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 {
 AssignedDevice *dev = container_of(pci_dev, AssignedDevice, dev);
 PCIRegion *pci_region = dev->real_device.regions;
-int next_cap_pt = 0;
 
-pci_dev->cap.supported = 1;
-pci_dev->cap.start = PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR;
+/* Clear initial capabilities pointer and status copied from hw */
+pci_set_byte(pci_dev->config + PCI_CAPABILITY_LIST, 0);
+pci_set_word(pci_dev->config + PCI_STATUS,
+ pci_get_word(pci_dev->config + PCI_STATUS) &
+ ~PCI_STATUS_CAP_LIST);
+
 pci_dev->cap.length = 0;
-pci_dev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
-pci_dev->config[PCI_CAPABILITY_LIST] = pci_dev->cap.start;
 
 #ifdef KVM_CAP_IRQ_ROUTING
 #ifdef KVM_CAP_DEVICE_MSI
 /* Expose MSI capability
  * MSI capability is the 1st capability in capability config */
 if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI)) {
+int vpos, ppos;
+uint16_t flags;
+
 dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI;
-memset(&pci_dev->config[pci_dev->cap.start + pci_dev->cap.length],
-   0, PCI_CAPABILITY_CONFIG_MSI_LENGTH);
-pci_dev->config[pci_dev->cap.start + pci_dev->cap.length] =
-PCI_CAP_ID_MSI;
+ 

[Qemu-devel] [PATCH v2 1/9] pci: pci_default_cap_write_config ignores wmask

2010-11-12 Thread Alex Williamson
Make use of wmask, just like the rest of config space.

Signed-off-by: Alex Williamson 
---

 hw/pci.c |   22 ++
 1 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 92aaa85..4bc5882 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1175,13 +1175,15 @@ uint32_t pci_default_read_config(PCIDevice *d,
 return pci_read_config(d, address, len);
 }
 
-static void pci_write_config(PCIDevice *pci_dev,
- uint32_t address, uint32_t val, int len)
+static void pci_write_config_with_mask(PCIDevice *d, uint32_t addr,
+   uint32_t val, int l)
 {
 int i;
-for (i = 0; i < len; i++) {
-pci_dev->config[address + i] = val & 0xff;
-val >>= 8;
+uint32_t config_size = pci_config_size(d);
+
+for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
+uint8_t wmask = d->wmask[addr + i];
+d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
 }
 }
 
@@ -1202,23 +1204,19 @@ uint32_t pci_default_cap_read_config(PCIDevice *pci_dev,
 void pci_default_cap_write_config(PCIDevice *pci_dev,
   uint32_t address, uint32_t val, int len)
 {
-pci_write_config(pci_dev, address, val, len);
+pci_write_config_with_mask(pci_dev, address, val, len);
 }
 
 void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
 {
-int i, was_irq_disabled = pci_irq_disabled(d);
-uint32_t config_size = pci_config_size(d);
+int was_irq_disabled = pci_irq_disabled(d);
 
 if (pci_access_cap_config(d, addr, l)) {
 d->cap.config_write(d, addr, val, l);
 return;
 }
 
-for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
-uint8_t wmask = d->wmask[addr + i];
-d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
-}
+pci_write_config_with_mask(d, addr, val, l);
 
 #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
 if (kvm_enabled() && kvm_irqchip_in_kernel() &&




[Qemu-devel] [PATCH v2 0/9] PCI capability and device assignment improvements

2010-11-12 Thread Alex Williamson
v2:

 - Fixed the function name in 1/8 per Michael's suggestion.
 - Removed capability specific config read/write registration
 - Added more checks to add_capability
 - Added capability lookup table to PCIDevice

I've dropped the RFC patch to add more capabilities to device
assignment while I do some more work on it.  Please feel free
to comment on the v1 version though.  Patches still against
qemu-kvm, but I hope some of this makes it easier to bring
qemu & qemu-kvm closer here.  Thanks,

Alex

v1:

This series attempts to clean up capability support between common
code and device assignment.  In doing so, we can move existing MSI &
MSI-X capabilities to offsets matching real hardware, and further
enable more capabilities to be exposed.

The last patch is only for RFC, I'd like some input on what we should
pass directly and where we should only provide read-only/emulated
access.  Patches 1-7 are submitted for commit.  Thanks,

Alex

---

Alex Williamson (9):
  pci: Store capability offsets in PCIDevice
  pci: Remove capability read/write config handlers
  pci: Pass ID for capability read/write handlers
  device-assignment: Move PCI capabilities to match physical hardware
  pci: Remove cap.length, cap.start, cap.supported
  pci: Replace used bitmap with capability byte map
  device-assignment: Use PCI capabilities support
  pci: Remove pci_enable_capability_support()
  pci: pci_default_cap_write_config ignores wmask


 hw/device-assignment.c |  157 +---
 hw/msix.c  |   15 ++---
 hw/pci.c   |  132 +++-
 hw/pci.h   |   39 ++--
 4 files changed, 145 insertions(+), 198 deletions(-)



Re: [Qemu-devel] [PATCH 0/2] v7 Decouple block device removal from device removal

2010-11-12 Thread Markus Armbruster
Kevin Wolf  writes:

> I have to admit that I didn't follow your discussion very closely any
> more after a few versions, so just to confirm: You came to the
> conclusion that we want to add drive_del to QMP and not only the human
> monitor, even though there is no drive_add in QMP?

Fair question.

-drive and drive_add conflate host and guest part.  They were hacked up
to work with -device, and it shows.  I'd rather not copy the mess to
QMP.

Maybe it's best to leave out the QMP part for now, i.e. drop PATCH 2/2.
If we can't get blockdev_add, blockdev_del done in time, we reconsider.

> The HMP help text doesn't seem to be completely right (sent a comment),
> but once that's fixed, I'm going to merge the series based on Markus's ACK.



Re: [Qemu-devel] Re: [PATCH v4 3/5] qed: Table, L2 cache, and cluster functions

2010-11-12 Thread Stefan Hajnoczi
On Fri, Nov 12, 2010 at 5:26 PM, Kevin Wolf  wrote:
> Am 28.10.2010 13:01, schrieb Stefan Hajnoczi:
>> This patch adds code to look up data cluster offsets in the image via
>> the L1/L2 tables.  The L2 tables are writethrough cached in memory for
>> performance (each read/write requires a lookup so it is essential to
>> cache the tables).
>>
>> With cluster lookup code in place it is possible to implement
>> bdrv_is_allocated() to query the number of contiguous
>> allocated/unallocated clusters.
>>
>> Signed-off-by: Stefan Hajnoczi 
>
> Didn't find any obvious problems with this one, but maybe that just
> means that I'm tired and should stop working for today. ;-)

Thanks for your review effort, Kevin.

Stefan



[Qemu-devel] Re: [PATCH 0/2] v8 Decouple block device removal from device removal

2010-11-12 Thread Kevin Wolf
Am 12.11.2010 18:07, schrieb Ryan Harper:
> details, details, v8
> 
> This patch series decouples the detachment of a block device from the
> removal of the backing pci-device.  Removal of a hotplugged pci device
> requires the guest to respond before qemu tears down the block device.
> In some cases, the guest may not respond leaving the guest with
> continued access to the block device.  Mgmt layer doesn't have a
> reliable method to force a disconnect.  
> 
> The new monitor command, drive_del, will revoke a guests access to the
> block device independently of the removal of the pci device.
> 
> The first patch implements drive_del, the second patch implements the
> qmp version of the monitor command.
> 
> Changes since v7:
> - Fixed up doc strings (delete -> drive_del)
> Changes since v6:
> - Updated patch description
> - Dropped bdrv_unplug and inlined in drive_del
> - Explicitly invoke drive_uninit()
> Changes since v5:
> - Removed dangling pointers in guest and host state.  This ensures things 
> like 
>   info block no longer displays the deleted drive; though info pci will
>   continue to display the pci device until the guest responds to the removal
>   request.
> - Renamed drive_unplug -> drive_del
> Changes since v4:
> - Droppped drive_get_by_id patch and use bdrv_find() instead
> - Added additional details about drive_unplug to hmp/qmp interface
> 
> Changes since v3:
> - Moved QMP command for drive_unplug() to separate patch
> 
> Changes since v2:
> - Added QMP command for drive_unplug()
> 
> Changes since v1:
> - CodingStyle fixes
> - Added qemu_aio_flush() to bdrv_unplug()
> 
> Signed-off-by: Ryan Harper 

Thanks, applied both to the block branch.

Kevin



[Qemu-devel] Re: [PATCH v4 3/5] qed: Table, L2 cache, and cluster functions

2010-11-12 Thread Kevin Wolf
Am 28.10.2010 13:01, schrieb Stefan Hajnoczi:
> This patch adds code to look up data cluster offsets in the image via
> the L1/L2 tables.  The L2 tables are writethrough cached in memory for
> performance (each read/write requires a lookup so it is essential to
> cache the tables).
> 
> With cluster lookup code in place it is possible to implement
> bdrv_is_allocated() to query the number of contiguous
> allocated/unallocated clusters.
> 
> Signed-off-by: Stefan Hajnoczi 

Didn't find any obvious problems with this one, but maybe that just
means that I'm tired and should stop working for today. ;-)

Kevin



Re: [Qemu-devel] Re: [PATCH v4 2/5] qed: Add QEMU Enhanced Disk image format

2010-11-12 Thread Kevin Wolf
Am 12.11.2010 18:24, schrieb Markus Armbruster:
> Kevin Wolf  writes:
> 
>> Am 12.11.2010 17:34, schrieb Stefan Hajnoczi:
>>> On Fri, Nov 12, 2010 at 3:43 PM, Kevin Wolf  wrote:
 Am 28.10.2010 13:01, schrieb Stefan Hajnoczi:
> +/**
> + * Check whether an image format is raw
> + *
> + * @fmt:Backing file format, may be NULL
> + */
> +static bool qed_fmt_is_raw(const char *fmt)
> +{
> +return fmt && strcmp(fmt, "raw") == 0;
> +}

 People shouldn't use them directly, but should we also consider file,
 host_device, etc.?
>>>
>>> Hrm..I will look into it for v5.  I thought we always have a "raw"
>>> format on top of "file", "host_device", etc protocols?
>>
>> That's how it's meant to be used. I think at the moment we still allow
>> to directly use file etc. and you know that if something is possible,
>> users will do it.
> 
> It's possible, it's perfectly well-defined, and it works.

Wow, I'm not used to be confirmed that quickly!

Yes, it's possible, it works, but it's only exposed because we didn't
pay attention when host_* was split out. There's no real reason to use
it (or do you have any example where format=raw doesn't work?) and it
should be considered an implementation detail.

Kevin



Re: [Qemu-devel] Re: [PATCH v4 2/5] qed: Add QEMU Enhanced Disk image format

2010-11-12 Thread Markus Armbruster
Kevin Wolf  writes:

> Am 12.11.2010 17:34, schrieb Stefan Hajnoczi:
>> On Fri, Nov 12, 2010 at 3:43 PM, Kevin Wolf  wrote:
>>> Am 28.10.2010 13:01, schrieb Stefan Hajnoczi:
 +/**
 + * Check whether an image format is raw
 + *
 + * @fmt:Backing file format, may be NULL
 + */
 +static bool qed_fmt_is_raw(const char *fmt)
 +{
 +return fmt && strcmp(fmt, "raw") == 0;
 +}
>>>
>>> People shouldn't use them directly, but should we also consider file,
>>> host_device, etc.?
>> 
>> Hrm..I will look into it for v5.  I thought we always have a "raw"
>> format on top of "file", "host_device", etc protocols?
>
> That's how it's meant to be used. I think at the moment we still allow
> to directly use file etc. and you know that if something is possible,
> users will do it.

It's possible, it's perfectly well-defined, and it works.

> But instead of listing all of them here, maybe we should just make the
> final step and return an error if they are used directly.

No.

>   Unfortunately,
> that will bring us back to the discussion about formats and protocols... ;-)

/me runs away screaming

[...]



[Qemu-devel] [PATCH 1/2] Implement drive_del to decouple block removal from device removal

2010-11-12 Thread Ryan Harper
Currently device hotplug removal code is tied to device removal via
ACPI.  All pci devices that are removable via device_del() require the
guest to respond to the request.  In some cases the guest may not
respond leaving the device still accessible to the guest.  The management
layer doesn't currently have a reliable way to revoke access to host
resource in the presence of an uncooperative guest.

This patch implements a new monitor command, drive_del, which
provides an explicit command to revoke access to a host block device.

drive_del first quiesces the block device (qemu_aio_flush;
bdrv_flush() and bdrv_close()).  This prevents further IO from being
submitted against the host device.  Finally, drive_del cleans up
pointers between the drive object (host resource) and the device
object (guest resource).

Signed-off-by: Ryan Harper 
---
 blockdev.c  |   39 +++
 blockdev.h  |1 +
 hmp-commands.hx |   18 ++
 3 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 6cb179a..f6ac439 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -14,6 +14,8 @@
 #include "qemu-option.h"
 #include "qemu-config.h"
 #include "sysemu.h"
+#include "hw/qdev.h"
+#include "block_int.h"
 
 static QTAILQ_HEAD(drivelist, DriveInfo) drives = 
QTAILQ_HEAD_INITIALIZER(drives);
 
@@ -597,3 +599,40 @@ int do_change_block(Monitor *mon, const char *device,
 }
 return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
+
+int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+const char *id = qdict_get_str(qdict, "id");
+BlockDriverState *bs;
+BlockDriverState **ptr;
+Property *prop;
+
+bs = bdrv_find(id);
+if (!bs) {
+qerror_report(QERR_DEVICE_NOT_FOUND, id);
+return -1;
+}
+
+/* quiesce block driver; prevent further io */
+qemu_aio_flush();
+bdrv_flush(bs);
+bdrv_close(bs);
+
+/* clean up guest state from pointing to host resource by
+ * finding and removing DeviceState "drive" property */
+for (prop = bs->peer->info->props; prop && prop->name; prop++) {
+if (prop->info->type == PROP_TYPE_DRIVE) {
+ptr = qdev_get_prop_ptr(bs->peer, prop);
+if ((*ptr) == bs) {
+bdrv_detach(bs, bs->peer);
+*ptr = NULL;
+break;
+}
+}
+}
+
+/* clean up host side */
+drive_uninit(drive_get_by_blockdev(bs));
+
+return 0;
+}
diff --git a/blockdev.h b/blockdev.h
index 653affc..2a0559e 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject 
**ret_data);
 int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
 const char *filename, const char *fmt);
+int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index e5585ba..c8b0206 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it).
 ETEXI
 
 {
+.name   = "drive_del",
+.args_type  = "id:s",
+.params = "device",
+.help   = "remove host block device",
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_drive_del,
+},
+
+STEXI
+...@item drive_del @var{device}
+...@findex drive_del
+Remove host block device.  The result is that guest generated IO is no longer
+submitted against the host device underlying the disk.  Once a drive has
+been deleted, the QEMU Block layer returns -EIO which results in IO 
+errors in the guest for applications that are reading/writing to the device.
+ETEXI
+
+{
 .name   = "change",
 .args_type  = "device:B,target:F,arg:s?",
 .params = "device filename [format]",
-- 
1.6.3.3




[Qemu-devel] [PATCH 0/2] v8 Decouple block device removal from device removal

2010-11-12 Thread Ryan Harper
details, details, v8

This patch series decouples the detachment of a block device from the
removal of the backing pci-device.  Removal of a hotplugged pci device
requires the guest to respond before qemu tears down the block device.
In some cases, the guest may not respond leaving the guest with
continued access to the block device.  Mgmt layer doesn't have a
reliable method to force a disconnect.  

The new monitor command, drive_del, will revoke a guests access to the
block device independently of the removal of the pci device.

The first patch implements drive_del, the second patch implements the
qmp version of the monitor command.

Changes since v7:
- Fixed up doc strings (delete -> drive_del)
Changes since v6:
- Updated patch description
- Dropped bdrv_unplug and inlined in drive_del
- Explicitly invoke drive_uninit()
Changes since v5:
- Removed dangling pointers in guest and host state.  This ensures things like 
  info block no longer displays the deleted drive; though info pci will
  continue to display the pci device until the guest responds to the removal
  request.
- Renamed drive_unplug -> drive_del
Changes since v4:
- Droppped drive_get_by_id patch and use bdrv_find() instead
- Added additional details about drive_unplug to hmp/qmp interface

Changes since v3:
- Moved QMP command for drive_unplug() to separate patch

Changes since v2:
- Added QMP command for drive_unplug()

Changes since v1:
- CodingStyle fixes
- Added qemu_aio_flush() to bdrv_unplug()

Signed-off-by: Ryan Harper 



[Qemu-devel] [PATCH 2/2] Add qmp version of drive_del

2010-11-12 Thread Ryan Harper
Signed-off-by: Ryan Harper 
---
 qmp-commands.hx |   29 +
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/qmp-commands.hx b/qmp-commands.hx
index 793cf1c..1e0d4e9 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -338,6 +338,35 @@ Example:
 EQMP
 
 {
+.name   = "drive_del",
+.args_type  = "id:s",
+.params = "device",
+.help   = "remove host block device",
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_drive_del,
+},
+
+SQMP
+drive del
+--
+
+Remove host block device.  The result is that guest generated IO is no longer
+submitted against the host device underlying the disk.  Once a drive has
+been deleted, the QEMU Block layer returns -EIO which results in IO 
+errors in the guest for applications that are reading/writing to the device.
+
+Arguments:
+
+- "id": the device's ID (json-string)
+
+Example:
+
+-> { "execute": "drive_del", "arguments": { "id": "drive-virtio-blk1" } }
+<- { "return": {} }
+
+EQMP
+
+{
 .name   = "cpu",
 .args_type  = "index:i",
 .params = "index",
-- 
1.6.3.3




Re: [Qemu-devel] Re: [Try2][PATCH] Initial implementation of a mpeg1 layer2 streaming audio driver.

2010-11-12 Thread François Revol

Le 12 nov. 2010 à 15:32, Anthony Liguori a écrit :

>> I did try years ago, but at least the current wav driver really didn't like 
>> fifos back then. I recall trying for hours to get it pipe to ffmpeg or 
>> others without much luck.
>> 
>> Also, this poses several problems about the control of the external process 
>> (respawn on listener disconnection, close on exit...).
> 
> Doing encoding in QEMU is going to starve the guest pretty bad.  For an x86 
> guest, that's going to result in issues with time drift, etc.
> 
> Making reencoding with an external process work a bit more nicely also has 
> the advantage of making this work with other formats like Ogg which are a bit 
> more Open Source friendly.

The current patch uses a separate thread, but since I clone this part from the 
esdaudio code I didn't check what it was doing. It seems it's not really 
queueing anything though except the single mix buffer, which kind of defeats 
the purpose of having a separate thread. This might explain why it lags so much 
here... I tried increasing the buffer size and lowering the threshold but it 
doesn't seem to help.

I agree it'd be better to use external programs when possible but as I said 
it's a bit harder to handle the errors and such, and I wanted to have something 
working.
Also, it requires more work to set it up for users, they must install the 
externals, figure out the command line options...
Possibly we can provide default templates for known programs, either text files 
with % escaping for args, or just a shell script passing env vars maybe...

Besides, external or not, IIRC a pipe is by default 4kB max, which isn't much 
better for decoupling the processing on its own, if the encoder is too slow it 
will still starve the audio thread, and the rest. Also it all requires more 
context switching and IPC, which increase the total processing time.

So I think it might be interesting to have both.

I'll see if I can buffer a bit more in the twolame code and if it helps, then 
I'll try to merge with the failed attempts I have around at using external 
progs.

François.


Re: [Qemu-devel] Re: [PATCH v4 2/5] qed: Add QEMU Enhanced Disk image format

2010-11-12 Thread Kevin Wolf
Am 12.11.2010 17:34, schrieb Stefan Hajnoczi:
> On Fri, Nov 12, 2010 at 3:43 PM, Kevin Wolf  wrote:
>> Am 28.10.2010 13:01, schrieb Stefan Hajnoczi:
>>> +/**
>>> + * Check whether an image format is raw
>>> + *
>>> + * @fmt:Backing file format, may be NULL
>>> + */
>>> +static bool qed_fmt_is_raw(const char *fmt)
>>> +{
>>> +return fmt && strcmp(fmt, "raw") == 0;
>>> +}
>>
>> People shouldn't use them directly, but should we also consider file,
>> host_device, etc.?
> 
> Hrm..I will look into it for v5.  I thought we always have a "raw"
> format on top of "file", "host_device", etc protocols?

That's how it's meant to be used. I think at the moment we still allow
to directly use file etc. and you know that if something is possible,
users will do it.

But instead of listing all of them here, maybe we should just make the
final step and return an error if they are used directly. Unfortunately,
that will bring us back to the discussion about formats and protocols... ;-)

>>> +if (s->header.magic != QED_MAGIC) {
>>> +return -ENOENT;
>>> +}
>>
>> ENOENT seems a bit odd for a wrong magic number, especially if it's used
>> for the error message. Wouldn't EINVAL or ENOTSUP be a closer match?
> 
> You're right, ENOENT is confusing for the user.
> 
>>> +static void bdrv_qed_flush(BlockDriverState *bs)
>>> +{
>>> +bdrv_flush(bs->file);
>>> +}
>>
>> This conflicts with one of my recent changes. bdrv_flush should return
>> an int now.
> 
> Will fix and will also check for bdrv_flush() failures.
> 
>>> +while (options && options->name) {
>>> +if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
>>> +image_size = options->value.n;
>>> +} else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
>>> +backing_file = options->value.s;
>>> +} else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
>>> +backing_fmt = options->value.s;
>>
>> I'm not sure here. It doesn't really matter for QED, but I find it
>> strange that -o backing_fmt=foobar works and gives you a non-raw backing
>> file. Should we check if the format exists at least?
> 
> I see the same issue in QCOW2 so let's solve this generically in a
> separate patch.

Okay, that makes sense.

Kevin



Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver

2010-11-12 Thread Luiz Capitulino
On Fri, 12 Nov 2010 17:06:16 +0100
Markus Armbruster  wrote:

> Luiz Capitulino  writes:
> 
> > On Fri, 12 Nov 2010 16:04:39 +0100
> > Markus Armbruster  wrote:
> >
> >> Luiz Capitulino  writes:
> >> 
> >> > On Fri, 12 Nov 2010 15:16:33 +0100
> >> > Markus Armbruster  wrote:
> >> >
> >> >> Luiz Capitulino  writes:
> >> >> 
> >> >> > On Fri, 12 Nov 2010 11:21:57 +0100
> >> >> > Markus Armbruster  wrote:
> >> >> >
> >> >> >> Luiz Capitulino  writes:
> >> >> [...]
> >> >> >> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr)
> >> >> >> > +{
> >> >> >> > +MemoryDriver *d = chr->opaque;
> >> >> >> > +
> >> >> >> > +if (d->outbuf_size == 0) {
> >> >> >> > +return qstring_new();
> >> >> >> > +}
> >> >> >> 
> >> >> >> Why is this necessary?  Is qstring_from_substr() broken for empty
> >> >> >> substrings?  If it is, it ought to be fixed!
> >> >> >
> >> >> > qstring_from_substr() takes a character range; outbuf_size stores a 
> >> >> > size,
> >> >> > not a string length. So we do:
> >> >> >
> >> >> >> > +return qstring_from_substr((char *) d->outbuf, 0, 
> >> >> >> > d->outbuf_size - 1);
> >> >> >
> >> >> > If outbuf_size is 0, we'll be passing a negative value down.
> >> >> 
> >> >> What's wrong with that?
> >> >
> >> > Although it's going to work with the current QString implementation, I 
> >> > don't
> >> > think it's it's a good idea to rely on a negative index.
> >> 
> >> How should I extract the substring of S beginning at index B with length
> >> L?  If I cant't do this for any B, L with interval [B,B+L-1] fully
> >> within [0,length(S)], then the API is flawed, and ought to be replaced.
> >
> > Not sure we're talking about the same problem, anymore. When you said:
> >
> >> >> What's wrong with that?
> >
> > What did you mean? Did you mean 'let's not decrement outbuf_size' or did
> > you mean 'let's pass -1 anyway'?
> 
> Yes, what's wrong with qstring_from_substr(S, 0, -1)?
> 
> Its function comment is imprecise, it doesn't tell us whether the END-th
> character is included in the substring or not.
> 
> The code, however, is clear enough: it *is* included.  And the unit test
> checks that.
> 
> Therefore, qstring_from_substr("abc", 0, 0) returns the qstring "a".
> 
> > Both seem wrong to me: the substring [0,-1] should be invalid
> 
> Why?
> 
> How do you express "the empty substring starting at 0" then?

I didn't consider that when I wrote the code, so it's a matter a defining
the behavior we want it to have.


> 
> >   and not
> > decrementing outbuf_size is wrong, because it contains the buffer size and
> > qstring_from_substr() will consume an additional char from the buffer (which
> > should be '\0' today, but we shouldn't count on that).
> >
> >> 
> >> > Maybe, we could have:
> >> >
> >> > return qstring_from_substr((char *) d->outbuf, 0,
> >> > d->outbuf_size > 0 ? d->outbuf_size - 1 : 0);
> >> >
> >> > A bit harder to read, but makes the function smaller.
> >> 
> >> Err, doesn't qstring_from_substr(s, 0, 0) extract a substring of length
> >> 1?
> >
> > Yeah, it's a bug. But that doesn't change my suggestion, can we do this way?
> >
> > This should fix the bug (not even compiled tested):
> >
> > diff --git a/qstring.c b/qstring.c
> > index 4e2ba08..72a25de 100644
> > --- a/qstring.c
> > +++ b/qstring.c
> > @@ -42,10 +42,10 @@ QString *qstring_from_substr(const char *str, int 
> > start, int end)
> >  
> >  qstring = qemu_malloc(sizeof(*qstring));
> >  
> > -qstring->length = end - start + 1;
> > -qstring->capacity = qstring->length;
> > +qstring->length = end - start;
> > +qstring->capacity = qstring->length + 1;
> >  
> > -qstring->string = qemu_malloc(qstring->capacity + 1);
> > +qstring->string = qemu_malloc(qstring->capacity);
> >  memcpy(qstring->string, str + start, qstring->length);
> >  qstring->string[qstring->length] = 0;
> 
> I suspect this will fail your unit test.

Haven't checked it yet, but maybe it has to be fixed too.



[Qemu-devel] Re: [PATCH 1/2] Implement drive_del to decouple block removal from device removal

2010-11-12 Thread Ryan Harper
* Kevin Wolf  [2010-11-12 10:43]:
> Am 12.11.2010 16:38, schrieb Ryan Harper:
> > Currently device hotplug removal code is tied to device removal via
> > ACPI.  All pci devices that are removable via device_del() require the
> > guest to respond to the request.  In some cases the guest may not
> > respond leaving the device still accessible to the guest.  The management
> > layer doesn't currently have a reliable way to revoke access to host
> > resource in the presence of an uncooperative guest.
> > 
> > This patch implements a new monitor command, drive_del, which
> > provides an explicit command to revoke access to a host block device.
> > 
> > drive_del first quiesces the block device (qemu_aio_flush;
> > bdrv_flush() and bdrv_close()).  This prevents further IO from being
> > submitted against the host device.  Finally, drive_del cleans up
> > pointers between the drive object (host resource) and the device
> > object (guest resource).
> > 
> > Signed-off-by: Ryan Harper 
> > ---
> >  blockdev.c  |   39 +++
> >  blockdev.h  |1 +
> >  hmp-commands.hx |   18 ++
> >  3 files changed, 58 insertions(+), 0 deletions(-)
> > 
> > diff --git a/blockdev.c b/blockdev.c
> > index 6cb179a..f6ac439 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -14,6 +14,8 @@
> >  #include "qemu-option.h"
> >  #include "qemu-config.h"
> >  #include "sysemu.h"
> > +#include "hw/qdev.h"
> > +#include "block_int.h"
> >  
> >  static QTAILQ_HEAD(drivelist, DriveInfo) drives = 
> > QTAILQ_HEAD_INITIALIZER(drives);
> >  
> > @@ -597,3 +599,40 @@ int do_change_block(Monitor *mon, const char *device,
> >  }
> >  return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
> >  }
> > +
> > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
> > +{
> > +const char *id = qdict_get_str(qdict, "id");
> > +BlockDriverState *bs;
> > +BlockDriverState **ptr;
> > +Property *prop;
> > +
> > +bs = bdrv_find(id);
> > +if (!bs) {
> > +qerror_report(QERR_DEVICE_NOT_FOUND, id);
> > +return -1;
> > +}
> > +
> > +/* quiesce block driver; prevent further io */
> > +qemu_aio_flush();
> > +bdrv_flush(bs);
> > +bdrv_close(bs);
> > +
> > +/* clean up guest state from pointing to host resource by
> > + * finding and removing DeviceState "drive" property */
> > +for (prop = bs->peer->info->props; prop && prop->name; prop++) {
> > +if (prop->info->type == PROP_TYPE_DRIVE) {
> > +ptr = qdev_get_prop_ptr(bs->peer, prop);
> > +if ((*ptr) == bs) {
> > +bdrv_detach(bs, bs->peer);
> > +*ptr = NULL;
> > +break;
> > +}
> > +}
> > +}
> > +
> > +/* clean up host side */
> > +drive_uninit(drive_get_by_blockdev(bs));
> > +
> > +return 0;
> > +}
> > diff --git a/blockdev.h b/blockdev.h
> > index 653affc..2a0559e 100644
> > --- a/blockdev.h
> > +++ b/blockdev.h
> > @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject 
> > **ret_data);
> >  int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject 
> > **ret_data);
> >  int do_change_block(Monitor *mon, const char *device,
> >  const char *filename, const char *fmt);
> > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
> >  
> >  #endif
> > diff --git a/hmp-commands.hx b/hmp-commands.hx
> > index e5585ba..d6dc18c 100644
> > --- a/hmp-commands.hx
> > +++ b/hmp-commands.hx
> > @@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it).
> >  ETEXI
> >  
> >  {
> > +.name   = "drive_del",
> > +.args_type  = "id:s",
> > +.params = "device",
> > +.help   = "remove host block device",
> > +.user_print = monitor_user_noop,
> > +.mhandler.cmd_new = do_drive_del,
> > +},
> > +
> > +STEXI
> > +...@item delete @var{device}
> > +...@findex delete
> 
> I think this should be @item drive_del and @findex drive_del.

Yep.

> 
> Kevin

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com



Re: [Qemu-devel] [PATCH 0/2] v7 Decouple block device removal from device removal

2010-11-12 Thread Kevin Wolf
Am 12.11.2010 17:28, schrieb Markus Armbruster:
> Ryan Harper  writes:
> 
>> Once more dear friends, v7
>>
>> This patch series decouples the detachment of a block device from the
>> removal of the backing pci-device.  Removal of a hotplugged pci device
>> requires the guest to respond before qemu tears down the block device.
>> In some cases, the guest may not respond leaving the guest with
>> continued access to the block device.  Mgmt layer doesn't have a
>> reliable method to force a disconnect.  
>>
>> The new monitor command, drive_del, will revoke a guests access to the
>> block device independently of the removal of the pci device.
>>
>> The first patch implements drive_del, the second patch implements the
>> qmp version of the monitor command.
>>
>> Changes since v6:
>> - Updated patch description
>> - Dropped bdrv_unplug and inlined in drive_del
>> - Explicitly invoke drive_uninit()
>> Changes since v5:
>> - Removed dangling pointers in guest and host state.  This ensures things 
>> like 
>>   info block no longer displays the deleted drive; though info pci will
>>   continue to display the pci device until the guest responds to the removal
>>   request.
>> - Renamed drive_unplug -> drive_del
>> Changes since v4:
>> - Droppped drive_get_by_id patch and use bdrv_find() instead
>> - Added additional details about drive_unplug to hmp/qmp interface
>>
>> Changes since v3:
>> - Moved QMP command for drive_unplug() to separate patch
>>
>> Changes since v2:
>> - Added QMP command for drive_unplug()
>>
>> Changes since v1:
>> - CodingStyle fixes
>> - Added qemu_aio_flush() to bdrv_unplug()
>>
>> Signed-off-by: Ryan Harper 
> 
> ACK series

I have to admit that I didn't follow your discussion very closely any
more after a few versions, so just to confirm: You came to the
conclusion that we want to add drive_del to QMP and not only the human
monitor, even though there is no drive_add in QMP?

The HMP help text doesn't seem to be completely right (sent a comment),
but once that's fixed, I'm going to merge the series based on Markus's ACK.

Kevin



[Qemu-devel] Re: [PATCH 1/2] Implement drive_del to decouple block removal from device removal

2010-11-12 Thread Kevin Wolf
Am 12.11.2010 16:38, schrieb Ryan Harper:
> Currently device hotplug removal code is tied to device removal via
> ACPI.  All pci devices that are removable via device_del() require the
> guest to respond to the request.  In some cases the guest may not
> respond leaving the device still accessible to the guest.  The management
> layer doesn't currently have a reliable way to revoke access to host
> resource in the presence of an uncooperative guest.
> 
> This patch implements a new monitor command, drive_del, which
> provides an explicit command to revoke access to a host block device.
> 
> drive_del first quiesces the block device (qemu_aio_flush;
> bdrv_flush() and bdrv_close()).  This prevents further IO from being
> submitted against the host device.  Finally, drive_del cleans up
> pointers between the drive object (host resource) and the device
> object (guest resource).
> 
> Signed-off-by: Ryan Harper 
> ---
>  blockdev.c  |   39 +++
>  blockdev.h  |1 +
>  hmp-commands.hx |   18 ++
>  3 files changed, 58 insertions(+), 0 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 6cb179a..f6ac439 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -14,6 +14,8 @@
>  #include "qemu-option.h"
>  #include "qemu-config.h"
>  #include "sysemu.h"
> +#include "hw/qdev.h"
> +#include "block_int.h"
>  
>  static QTAILQ_HEAD(drivelist, DriveInfo) drives = 
> QTAILQ_HEAD_INITIALIZER(drives);
>  
> @@ -597,3 +599,40 @@ int do_change_block(Monitor *mon, const char *device,
>  }
>  return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
>  }
> +
> +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
> +{
> +const char *id = qdict_get_str(qdict, "id");
> +BlockDriverState *bs;
> +BlockDriverState **ptr;
> +Property *prop;
> +
> +bs = bdrv_find(id);
> +if (!bs) {
> +qerror_report(QERR_DEVICE_NOT_FOUND, id);
> +return -1;
> +}
> +
> +/* quiesce block driver; prevent further io */
> +qemu_aio_flush();
> +bdrv_flush(bs);
> +bdrv_close(bs);
> +
> +/* clean up guest state from pointing to host resource by
> + * finding and removing DeviceState "drive" property */
> +for (prop = bs->peer->info->props; prop && prop->name; prop++) {
> +if (prop->info->type == PROP_TYPE_DRIVE) {
> +ptr = qdev_get_prop_ptr(bs->peer, prop);
> +if ((*ptr) == bs) {
> +bdrv_detach(bs, bs->peer);
> +*ptr = NULL;
> +break;
> +}
> +}
> +}
> +
> +/* clean up host side */
> +drive_uninit(drive_get_by_blockdev(bs));
> +
> +return 0;
> +}
> diff --git a/blockdev.h b/blockdev.h
> index 653affc..2a0559e 100644
> --- a/blockdev.h
> +++ b/blockdev.h
> @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject 
> **ret_data);
>  int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject 
> **ret_data);
>  int do_change_block(Monitor *mon, const char *device,
>  const char *filename, const char *fmt);
> +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
>  
>  #endif
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index e5585ba..d6dc18c 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it).
>  ETEXI
>  
>  {
> +.name   = "drive_del",
> +.args_type  = "id:s",
> +.params = "device",
> +.help   = "remove host block device",
> +.user_print = monitor_user_noop,
> +.mhandler.cmd_new = do_drive_del,
> +},
> +
> +STEXI
> +...@item delete @var{device}
> +...@findex delete

I think this should be @item drive_del and @findex drive_del.

Kevin



Re: [Qemu-devel] Re: [PATCH v4 2/5] qed: Add QEMU Enhanced Disk image format

2010-11-12 Thread Stefan Hajnoczi
On Fri, Nov 12, 2010 at 3:43 PM, Kevin Wolf  wrote:
> Am 28.10.2010 13:01, schrieb Stefan Hajnoczi:
>> +/**
>> + * Check whether an image format is raw
>> + *
>> + * @fmt:    Backing file format, may be NULL
>> + */
>> +static bool qed_fmt_is_raw(const char *fmt)
>> +{
>> +    return fmt && strcmp(fmt, "raw") == 0;
>> +}
>
> People shouldn't use them directly, but should we also consider file,
> host_device, etc.?

Hrm..I will look into it for v5.  I thought we always have a "raw"
format on top of "file", "host_device", etc protocols?

>> +    if (s->header.magic != QED_MAGIC) {
>> +        return -ENOENT;
>> +    }
>
> ENOENT seems a bit odd for a wrong magic number, especially if it's used
> for the error message. Wouldn't EINVAL or ENOTSUP be a closer match?

You're right, ENOENT is confusing for the user.

>> +static void bdrv_qed_flush(BlockDriverState *bs)
>> +{
>> +    bdrv_flush(bs->file);
>> +}
>
> This conflicts with one of my recent changes. bdrv_flush should return
> an int now.

Will fix and will also check for bdrv_flush() failures.

>> +    while (options && options->name) {
>> +        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
>> +            image_size = options->value.n;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
>> +            backing_file = options->value.s;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
>> +            backing_fmt = options->value.s;
>
> I'm not sure here. It doesn't really matter for QED, but I find it
> strange that -o backing_fmt=foobar works and gives you a non-raw backing
> file. Should we check if the format exists at least?

I see the same issue in QCOW2 so let's solve this generically in a
separate patch.

Stefan



Re: [Qemu-devel] [PATCH 0/2] v7 Decouple block device removal from device removal

2010-11-12 Thread Markus Armbruster
Ryan Harper  writes:

> Once more dear friends, v7
>
> This patch series decouples the detachment of a block device from the
> removal of the backing pci-device.  Removal of a hotplugged pci device
> requires the guest to respond before qemu tears down the block device.
> In some cases, the guest may not respond leaving the guest with
> continued access to the block device.  Mgmt layer doesn't have a
> reliable method to force a disconnect.  
>
> The new monitor command, drive_del, will revoke a guests access to the
> block device independently of the removal of the pci device.
>
> The first patch implements drive_del, the second patch implements the
> qmp version of the monitor command.
>
> Changes since v6:
> - Updated patch description
> - Dropped bdrv_unplug and inlined in drive_del
> - Explicitly invoke drive_uninit()
> Changes since v5:
> - Removed dangling pointers in guest and host state.  This ensures things 
> like 
>   info block no longer displays the deleted drive; though info pci will
>   continue to display the pci device until the guest responds to the removal
>   request.
> - Renamed drive_unplug -> drive_del
> Changes since v4:
> - Droppped drive_get_by_id patch and use bdrv_find() instead
> - Added additional details about drive_unplug to hmp/qmp interface
>
> Changes since v3:
> - Moved QMP command for drive_unplug() to separate patch
>
> Changes since v2:
> - Added QMP command for drive_unplug()
>
> Changes since v1:
> - CodingStyle fixes
> - Added qemu_aio_flush() to bdrv_unplug()
>
> Signed-off-by: Ryan Harper 

ACK series



Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver

2010-11-12 Thread Luiz Capitulino
On Fri, 12 Nov 2010 16:54:14 +0100
Markus Armbruster  wrote:

> Luiz Capitulino  writes:
> 
> > On Fri, 12 Nov 2010 11:16:54 +0100
> > Markus Armbruster  wrote:
> >
> >> Luiz Capitulino  writes:
> >> 
> >> > On Thu, 11 Nov 2010 17:32:06 +0100
> >> > Markus Armbruster  wrote:
> >> >
> >> >> Luiz Capitulino  writes:
> >> >> 
> >> >> > On Thu, 11 Nov 2010 16:30:26 +0100
> >> >> > Markus Armbruster  wrote:
> >> >> >
> >> >> >> Luiz Capitulino  writes:
> >> >> >> 
> >> >> >> > This driver handles in-memory chardev operations. That's, all 
> >> >> >> > writes
> >> >> >> > to this driver are stored in an internal buffer and it doesn't talk
> >> >> >> > to the external world in any way.
> >> >> >> >
> >> >> >> > Right now it's very simple: it supports only writes. But it can be
> >> >> >> > easily extended to support more operations.
> >> >> >> >
> >> >> >> > This is going to be used by the monitor's "HMP passthrough via QMP"
> >> >> >> > feature, which needs to run monitor handlers without a backing
> >> >> >> > device.
> >> >> >> >
> >> >> >> > Signed-off-by: Luiz Capitulino 
> >> >> >> > ---
> >> >> >> >  qemu-char.c |   66 
> >> >> >> > +++
> >> >> >> >  qemu-char.h |6 +
> >> >> >> >  2 files changed, 72 insertions(+), 0 deletions(-)
> >> >> >> >
> >> >> >> > diff --git a/qemu-char.c b/qemu-char.c
> >> >> >> > index 88997f9..896df14 100644
> >> >> >> > --- a/qemu-char.c
> >> >> >> > +++ b/qemu-char.c
> >> >> >> > @@ -2275,6 +2275,72 @@ static CharDriverState 
> >> >> >> > *qemu_chr_open_socket(QemuOpts *opts)
> >> >> >> >  return NULL;
> >> >> >> >  }
> >> >> >> >  
> >> >> >> > +/***/
> >> >> >> > +/* Memory chardev */
> >> >> >> > +typedef struct {
> >> >> >> > +size_t outbuf_size;
> >> >> >> > +size_t outbuf_capacity;
> >> >> >> > +uint8_t *outbuf;
> >> >> >> > +} MemoryDriver;
> >> >> >> > +
> >> >> >> > +static int mem_chr_write(CharDriverState *chr, const uint8_t 
> >> >> >> > *buf, int len)
> >> >> >> > +{
> >> >> >> > +MemoryDriver *d = chr->opaque;
> >> >> >> > +
> >> >> >> > +/* TODO: the QString implementation has the same code, we 
> >> >> >> > should
> >> >> >> > + * introduce a generic way to do this in cutils.c */
> >> >> >> > +if (d->outbuf_capacity < d->outbuf_size + len) {
> >> >> >> > +/* grown outbuf */
> >> >> >> 
> >> >> >> Used to say "grow" (sans n) here.  Intentional change?
> >> >> >
> >> >> > Hum, no. I think I've squashed an older commit while rebasing (but 
> >> >> > this seems
> >> >> > to be the only problem).
> >> >> >
> >> >> >> 
> >> >> >> > +d->outbuf_capacity += len;
> >> >> >> > +d->outbuf_capacity *= 2;
> >> >> >> > +d->outbuf = qemu_realloc(d->outbuf, d->outbuf_capacity);
> >> >> >> > +}
> >> >> >> > +
> >> >> >> > +memcpy(d->outbuf + d->outbuf_size, buf, len);
> >> >> >> > +d->outbuf_size += len;
> >> >> >> > +
> >> >> >> > +return len;
> >> >> >> > +}
> >> >> >> > +
> >> >> >> > +#define DEFAULT_BUF_SIZE 4096
> >> >> >> 
> >> >> >> It's the *initial* buffer size, isn't it?
> >> >> >
> >> >> > Yes.
> >> >> 
> >> >> Could we make the name reflect that then?
> >> >> 
> >> >> >> Doubt it's worth a #define (there's just one user), but that's a 
> >> >> >> matter
> >> >> >> of taste.
> >> >> >> 
> >> >> >> > +
> >> >> >> > +void qemu_chr_init_mem(CharDriverState *chr)
> >> >> >> > +{
> >> >> >> > +MemoryDriver *d;
> >> >> >> > +
> >> >> >> > +d = qemu_malloc(sizeof(*d));
> >> >> >> > +d->outbuf_size = 0;
> >> >> >> > +d->outbuf_capacity = DEFAULT_BUF_SIZE;
> >> >> >> > +d->outbuf = qemu_mallocz(d->outbuf_capacity);
> >> >> >> > +
> >> >> >> > +memset(chr, 0, sizeof(*chr));
> >> >> >> > +chr->opaque = d;
> >> >> >> > +chr->chr_write = mem_chr_write;
> >> >> >> > +}
> >> >> >> > +
> >> >> >> > +/* assumes the stored data is a string */
> >> >> >> 
> >> >> >> What else could it be?  Worrying about embedded '\0's?
> >> >> >
> >> >> > Yes, as the driver itself doesn't interpret the contents of its
> >> >> > buffer.
> >> >> 
> >> >> What happens if there are embedded '\0's?
> >> >
> >> > The string will be shorter than expected? And what if it contains
> >> > non-printable characters?
> >> >
> >> > It's just a cautionary comment to help the user identify such problems, 
> >> > I think
> >> > we're making a whole argument about a quite minor thing.
> >> 
> >> When I see "assumes X" in a function comment, I immediately ask "and
> >> what happens when !X?"  The default answer is "it explodes, so don't do
> >> that".  That answer is wrong here.  Therefore, I find the comment
> >> misleading.
> >
> > That's how you interpret it, my interpretation is that I might not get
> > the expected behavior.
> 
> Actually, this function works just fine for embedded '\0's (I tested
> it): it returns the correct QString, with full length and '\0' embedded.

Good.

> Only l

Re: [Qemu-devel] [PATCH 1/2] Implement drive_del to decouple block removal from device removal

2010-11-12 Thread Markus Armbruster
Ryan Harper  writes:

> Currently device hotplug removal code is tied to device removal via
> ACPI.  All pci devices that are removable via device_del() require the
> guest to respond to the request.  In some cases the guest may not
> respond leaving the device still accessible to the guest.  The management
> layer doesn't currently have a reliable way to revoke access to host
> resource in the presence of an uncooperative guest.
>
> This patch implements a new monitor command, drive_del, which
> provides an explicit command to revoke access to a host block device.
>
> drive_del first quiesces the block device (qemu_aio_flush;
> bdrv_flush() and bdrv_close()).  This prevents further IO from being
> submitted against the host device.  Finally, drive_del cleans up
> pointers between the drive object (host resource) and the device
> object (guest resource).
>
> Signed-off-by: Ryan Harper 
> ---
>  blockdev.c  |   39 +++
>  blockdev.h  |1 +
>  hmp-commands.hx |   18 ++
>  3 files changed, 58 insertions(+), 0 deletions(-)
>
> diff --git a/blockdev.c b/blockdev.c
> index 6cb179a..f6ac439 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -14,6 +14,8 @@
>  #include "qemu-option.h"
>  #include "qemu-config.h"
>  #include "sysemu.h"
> +#include "hw/qdev.h"
> +#include "block_int.h"
>  
>  static QTAILQ_HEAD(drivelist, DriveInfo) drives = 
> QTAILQ_HEAD_INITIALIZER(drives);
>  
> @@ -597,3 +599,40 @@ int do_change_block(Monitor *mon, const char *device,
>  }
>  return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
>  }
> +
> +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
> +{
> +const char *id = qdict_get_str(qdict, "id");
> +BlockDriverState *bs;
> +BlockDriverState **ptr;
> +Property *prop;
> +
> +bs = bdrv_find(id);
> +if (!bs) {
> +qerror_report(QERR_DEVICE_NOT_FOUND, id);
> +return -1;
> +}
> +
> +/* quiesce block driver; prevent further io */
> +qemu_aio_flush();
> +bdrv_flush(bs);
> +bdrv_close(bs);
> +
> +/* clean up guest state from pointing to host resource by
> + * finding and removing DeviceState "drive" property */
> +for (prop = bs->peer->info->props; prop && prop->name; prop++) {
> +if (prop->info->type == PROP_TYPE_DRIVE) {
> +ptr = qdev_get_prop_ptr(bs->peer, prop);
> +if ((*ptr) == bs) {

Superfluous parenthesis around *ptr.  Not worth a respin; I've tormented
you enough ;)


> +bdrv_detach(bs, bs->peer);
> +*ptr = NULL;
> +break;
> +}
> +}
> +}
> +
> +/* clean up host side */
> +drive_uninit(drive_get_by_blockdev(bs));
> +
> +return 0;
> +}
> diff --git a/blockdev.h b/blockdev.h
> index 653affc..2a0559e 100644
> --- a/blockdev.h
> +++ b/blockdev.h
> @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject 
> **ret_data);
>  int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject 
> **ret_data);
>  int do_change_block(Monitor *mon, const char *device,
>  const char *filename, const char *fmt);
> +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
>  
>  #endif
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index e5585ba..d6dc18c 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it).
>  ETEXI
>  
>  {
> +.name   = "drive_del",
> +.args_type  = "id:s",
> +.params = "device",
> +.help   = "remove host block device",
> +.user_print = monitor_user_noop,
> +.mhandler.cmd_new = do_drive_del,
> +},
> +
> +STEXI
> +...@item delete @var{device}
> +...@findex delete
> +Remove host block device.  The result is that guest generated IO is no longer
> +submitted against the host device underlying the disk.  Once a drive has
> +been deleted, the QEMU Block layer returns -EIO which results in IO 
> +errors in the guest for applications that are reading/writing to the device.
> +ETEXI
> +
> +{
>  .name   = "change",
>  .args_type  = "device:B,target:F,arg:s?",
>  .params = "device filename [format]",



[Qemu-devel] Announce: Auto/Lazy-migration Patches RFC on linux-numa list

2010-11-12 Thread Lee Schermerhorn
At last weeks' LPC, there was some interest in my patches for Auto/Lazy
Migration to improve locality and possibly performance of unpinned guest
VMs on a NUMA platform.  As a result of these conversations I have reposted
the patches [4 series, ~40 patches] as RFCs to the linux-numa list.  Links
to threads given below.

I have rebased the patches atop 3Nov10 mmotm series [2.6.36 + 3nov mmotm].
The patched kernel builds, boots and survives some fairly heavy testing on
an 8 node istanbul x86_64.  Under heavy load, I do encounter a race in the
somewhat optional migration cache.  Currently this generates a warning and
carries on, but the one migration cache entry and related page is then
wedged.  This would need to be resolved.


The series/threads in the order applied:

[PATCH/RFC 0/14] Shared Policy Overview
http://markmail.org/message/trvpl3t7gimvwht6

[PATCH/RFC 0/8] numa - Migrate-on-Fault
http://markmail.org/message/mdwbcitql5ka4uws

[PATCH/RFC 0/11] numa - Automatic-migration
http://markmail.org/message/zik3itmqed65mol2

[PATCH/RFC 1/5] numa - migration cache - core implementation
http://markmail.org/message/xvck7enyezx6chyi

RESEND: [PATCH/RFC 1/5] numa - migration cache - core implementation
http://markmail.org/message/xgvvrnn2nk4nsn2e

resend to add back the patch description missing from 1st attempt.







Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver

2010-11-12 Thread Markus Armbruster
Luiz Capitulino  writes:

> On Fri, 12 Nov 2010 16:04:39 +0100
> Markus Armbruster  wrote:
>
>> Luiz Capitulino  writes:
>> 
>> > On Fri, 12 Nov 2010 15:16:33 +0100
>> > Markus Armbruster  wrote:
>> >
>> >> Luiz Capitulino  writes:
>> >> 
>> >> > On Fri, 12 Nov 2010 11:21:57 +0100
>> >> > Markus Armbruster  wrote:
>> >> >
>> >> >> Luiz Capitulino  writes:
>> >> [...]
>> >> >> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr)
>> >> >> > +{
>> >> >> > +MemoryDriver *d = chr->opaque;
>> >> >> > +
>> >> >> > +if (d->outbuf_size == 0) {
>> >> >> > +return qstring_new();
>> >> >> > +}
>> >> >> 
>> >> >> Why is this necessary?  Is qstring_from_substr() broken for empty
>> >> >> substrings?  If it is, it ought to be fixed!
>> >> >
>> >> > qstring_from_substr() takes a character range; outbuf_size stores a 
>> >> > size,
>> >> > not a string length. So we do:
>> >> >
>> >> >> > +return qstring_from_substr((char *) d->outbuf, 0, 
>> >> >> > d->outbuf_size - 1);
>> >> >
>> >> > If outbuf_size is 0, we'll be passing a negative value down.
>> >> 
>> >> What's wrong with that?
>> >
>> > Although it's going to work with the current QString implementation, I 
>> > don't
>> > think it's it's a good idea to rely on a negative index.
>> 
>> How should I extract the substring of S beginning at index B with length
>> L?  If I cant't do this for any B, L with interval [B,B+L-1] fully
>> within [0,length(S)], then the API is flawed, and ought to be replaced.
>
> Not sure we're talking about the same problem, anymore. When you said:
>
>> >> What's wrong with that?
>
> What did you mean? Did you mean 'let's not decrement outbuf_size' or did
> you mean 'let's pass -1 anyway'?

Yes, what's wrong with qstring_from_substr(S, 0, -1)?

Its function comment is imprecise, it doesn't tell us whether the END-th
character is included in the substring or not.

The code, however, is clear enough: it *is* included.  And the unit test
checks that.

Therefore, qstring_from_substr("abc", 0, 0) returns the qstring "a".

> Both seem wrong to me: the substring [0,-1] should be invalid

Why?

How do you express "the empty substring starting at 0" then?

>   and not
> decrementing outbuf_size is wrong, because it contains the buffer size and
> qstring_from_substr() will consume an additional char from the buffer (which
> should be '\0' today, but we shouldn't count on that).
>
>> 
>> > Maybe, we could have:
>> >
>> > return qstring_from_substr((char *) d->outbuf, 0,
>> > d->outbuf_size > 0 ? d->outbuf_size - 1 : 0);
>> >
>> > A bit harder to read, but makes the function smaller.
>> 
>> Err, doesn't qstring_from_substr(s, 0, 0) extract a substring of length
>> 1?
>
> Yeah, it's a bug. But that doesn't change my suggestion, can we do this way?
>
> This should fix the bug (not even compiled tested):
>
> diff --git a/qstring.c b/qstring.c
> index 4e2ba08..72a25de 100644
> --- a/qstring.c
> +++ b/qstring.c
> @@ -42,10 +42,10 @@ QString *qstring_from_substr(const char *str, int start, 
> int end)
>  
>  qstring = qemu_malloc(sizeof(*qstring));
>  
> -qstring->length = end - start + 1;
> -qstring->capacity = qstring->length;
> +qstring->length = end - start;
> +qstring->capacity = qstring->length + 1;
>  
> -qstring->string = qemu_malloc(qstring->capacity + 1);
> +qstring->string = qemu_malloc(qstring->capacity);
>  memcpy(qstring->string, str + start, qstring->length);
>  qstring->string[qstring->length] = 0;

I suspect this will fail your unit test.



[Qemu-devel] Re: [RfC PATCH] spice: qmp windup: connection events & info command.

2010-11-12 Thread Luiz Capitulino
On Thu, 11 Nov 2010 13:49:32 +0100
Gerd Hoffmann  wrote:

>   Hi,
> 
> Looking for comments especially from Luiz and Daniel (aka qmp and
> libvirt masters) ...
> 
> This patch adds support for connection events to spice.  The events are
> quite simliar to the vnc events.  Unlike vnc spice uses multiple tcp
> channels though.  qemu will report every single tcp connection (aka
> spice channel).  If you want track spice sessions only you can filter
> for the main channel (channel-type == 1).
> 
> The patch also adds a 'query-spice' monitor command which returns
> informations about the spice server configuration and also a list of
> channel connections.

The first thing we have to check with Daniel is whether or not we're
providing the info they would need/expect, apart from that I have the
following general comments:

 1. It's missing documentation in QMP/qmp-events.txt and qmp-commands.hx
(yeah, docs are far from code, hope to fix soon)
 2. Can you please split this in two patches? One adding the events and
the other adding the query command

Some small comments follow.

> 
> cheers,
>   Gerd
> 
> Cc: lcapitul...@redhat.com
> Cc: berra...@redhat.com
> ---
>  monitor.c   |   43 +++
>  monitor.h   |3 +
>  ui/qemu-spice.h |2 +
>  ui/spice-core.c |  160 
> +++
>  4 files changed, 208 insertions(+), 0 deletions(-)
> 
> diff --git a/monitor.c b/monitor.c
> index 8cee35d..4f18cbc 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -59,6 +59,7 @@
>  #ifdef CONFIG_SIMPLE_TRACE
>  #include "trace.h"
>  #endif
> +#include "ui/qemu-spice.h"
>  
>  //#define DEBUG
>  //#define DEBUG_COMPLETION
> @@ -459,6 +460,15 @@ void monitor_protocol_event(MonitorEvent event, QObject 
> *data)
>  case QEVENT_WATCHDOG:
>  event_name = "WATCHDOG";
>  break;
> +case QEVENT_SPICE_CONNECTED:
> +event_name = "SPICE_CONNECTED";
> +break;
> +case QEVENT_SPICE_INITIALIZED:
> +event_name = "SPICE_INITIALIZED";
> +break;
> +case QEVENT_SPICE_DISCONNECTED:
> +event_name = "SPICE_DISCONNECTED";
> +break;
>  default:
>  abort();
>  break;
> @@ -640,6 +650,19 @@ static void user_async_info_handler(Monitor *mon, const 
> mon_cmd_t *cmd)
>  }
>  }
>  
> +/*
> + * generic print handler for hmp 'info $what'
> + * simply pretty-print the josn representation
> + */
> +#if defined(CONFIG_SPICE) /* because 'info spice' is the only user */
> +static void do_info_generic_print(Monitor *mon, const QObject *data)
> +{
> +QString *json = qobject_to_json_pretty(data);
> +monitor_printf(mon, "%s\n", qstring_get_str(json));
> +QDECREF(json);
> +}
> +#endif

We definitely need a generic print handler, but I don't that stringifying
JSON makes a minimal good user interface.

>  static void do_info(Monitor *mon, const QDict *qdict)
>  {
>  const mon_cmd_t *cmd;
> @@ -2533,6 +2556,16 @@ static const mon_cmd_t info_cmds[] = {
>  .user_print = do_info_vnc_print,
>  .mhandler.info_new = do_info_vnc,
>  },
> +#if defined(CONFIG_SPICE)
> +{
> +.name   = "spice",
> +.args_type  = "",
> +.params = "",
> +.help   = "show the spice server status",
> +.user_print = do_info_generic_print,
> +.mhandler.info_new = do_info_spice,
> +},
> +#endif
>  {
>  .name   = "name",
>  .args_type  = "",
> @@ -2720,6 +2753,16 @@ static const mon_cmd_t qmp_query_cmds[] = {
>  .user_print = do_info_vnc_print,
>  .mhandler.info_new = do_info_vnc,
>  },
> +#if defined(CONFIG_SPICE)
> +{
> +.name   = "spice",
> +.args_type  = "",
> +.params = "",
> +.help   = "show the spice server status",
> +.user_print = do_info_generic_print,
> +.mhandler.info_new = do_info_spice,
> +},
> +#endif
>  {
>  .name   = "name",
>  .args_type  = "",
> diff --git a/monitor.h b/monitor.h
> index 2d36bba..4f2d328 100644
> --- a/monitor.h
> +++ b/monitor.h
> @@ -32,6 +32,9 @@ typedef enum MonitorEvent {
>  QEVENT_BLOCK_IO_ERROR,
>  QEVENT_RTC_CHANGE,
>  QEVENT_WATCHDOG,
> +QEVENT_SPICE_CONNECTED,
> +QEVENT_SPICE_INITIALIZED,
> +QEVENT_SPICE_DISCONNECTED,
>  QEVENT_MAX,
>  } MonitorEvent;
>  
> diff --git a/ui/qemu-spice.h b/ui/qemu-spice.h
> index 0e3ad9b..33c1e71 100644
> --- a/ui/qemu-spice.h
> +++ b/ui/qemu-spice.h
> @@ -33,6 +33,8 @@ void qemu_spice_audio_init(void);
>  void qemu_spice_display_init(DisplayState *ds);
>  int qemu_spice_add_interface(SpiceBaseInstance *sin);
>  
> +void do_info_spice(Monitor *mon, QObject **ret_data);
> +
>  #else  /* CONFIG_SPICE */
>  
>  #define using_spice 0
> diff --git a/ui/spice-core.c b/ui/spice-core.c
> index b7fa031..d2c4a70 100644
> --- a/ui/spice-core.c
> +++ b/ui/spice

[Qemu-devel] Re: [patch 0/3] block migration fixes

2010-11-12 Thread Marcelo Tosatti
On Thu, Nov 11, 2010 at 06:06:02PM +0900, Yoshiaki Tamura wrote:
> Marcelo Tosatti wrote:
> >On Tue, Nov 09, 2010 at 03:02:12PM +0900, Yoshiaki Tamura wrote:
> >>Marcelo Tosatti wrote:
> >>>Following patchset fixes block migration corruption issues.
> >>
> >>Hi Marcelo,
> >>
> >>Thanks for looking into this issue.  Although we tried your patches, we're 
> >>still
> >>seeing the corruption.  If we execute block migration while copying a file
> >>locally, md5sum of the copied file doesn't match with the original.  
> >>Sometimes,
> >>the filesystem returns an I/O error.
> >>
> >>Could you let us know how you tested and debugged?  Did you use blkverify?
> >
> >Yoshiaki,
> >
> >I first reproduced corruption by copying a large file during "migrate
> >-i", with shared base on qcow2 filesystem, as in your original report.
> >
> >To debug the problem, file with different byte pattern at every 1MB
> >(size of dirty chunk) was created and copied directly to an IDE disk in
> >the guest. Raw format used for the disk image.
> >
> >With this patchset, i'm not able to reproduce the original issue
> >anymore.
> >
> >Can you please provide more details on how to reproduce?
> 
> Marcelo,
> 
> We double checked and the patchset does seem to fix the problem.
> The was a mistake in our test procedure.  Sorry for the confusion.
> 
> Thanks,
> 
> Yoshi

I was also experiencing corruption on automated test, but it turned out
to be fixed by kvm.git's ae8894c00b560bde4.

Thanks.




Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver

2010-11-12 Thread Markus Armbruster
Luiz Capitulino  writes:

> On Fri, 12 Nov 2010 11:16:54 +0100
> Markus Armbruster  wrote:
>
>> Luiz Capitulino  writes:
>> 
>> > On Thu, 11 Nov 2010 17:32:06 +0100
>> > Markus Armbruster  wrote:
>> >
>> >> Luiz Capitulino  writes:
>> >> 
>> >> > On Thu, 11 Nov 2010 16:30:26 +0100
>> >> > Markus Armbruster  wrote:
>> >> >
>> >> >> Luiz Capitulino  writes:
>> >> >> 
>> >> >> > This driver handles in-memory chardev operations. That's, all writes
>> >> >> > to this driver are stored in an internal buffer and it doesn't talk
>> >> >> > to the external world in any way.
>> >> >> >
>> >> >> > Right now it's very simple: it supports only writes. But it can be
>> >> >> > easily extended to support more operations.
>> >> >> >
>> >> >> > This is going to be used by the monitor's "HMP passthrough via QMP"
>> >> >> > feature, which needs to run monitor handlers without a backing
>> >> >> > device.
>> >> >> >
>> >> >> > Signed-off-by: Luiz Capitulino 
>> >> >> > ---
>> >> >> >  qemu-char.c |   66 
>> >> >> > +++
>> >> >> >  qemu-char.h |6 +
>> >> >> >  2 files changed, 72 insertions(+), 0 deletions(-)
>> >> >> >
>> >> >> > diff --git a/qemu-char.c b/qemu-char.c
>> >> >> > index 88997f9..896df14 100644
>> >> >> > --- a/qemu-char.c
>> >> >> > +++ b/qemu-char.c
>> >> >> > @@ -2275,6 +2275,72 @@ static CharDriverState 
>> >> >> > *qemu_chr_open_socket(QemuOpts *opts)
>> >> >> >  return NULL;
>> >> >> >  }
>> >> >> >  
>> >> >> > +/***/
>> >> >> > +/* Memory chardev */
>> >> >> > +typedef struct {
>> >> >> > +size_t outbuf_size;
>> >> >> > +size_t outbuf_capacity;
>> >> >> > +uint8_t *outbuf;
>> >> >> > +} MemoryDriver;
>> >> >> > +
>> >> >> > +static int mem_chr_write(CharDriverState *chr, const uint8_t *buf, 
>> >> >> > int len)
>> >> >> > +{
>> >> >> > +MemoryDriver *d = chr->opaque;
>> >> >> > +
>> >> >> > +/* TODO: the QString implementation has the same code, we should
>> >> >> > + * introduce a generic way to do this in cutils.c */
>> >> >> > +if (d->outbuf_capacity < d->outbuf_size + len) {
>> >> >> > +/* grown outbuf */
>> >> >> 
>> >> >> Used to say "grow" (sans n) here.  Intentional change?
>> >> >
>> >> > Hum, no. I think I've squashed an older commit while rebasing (but this 
>> >> > seems
>> >> > to be the only problem).
>> >> >
>> >> >> 
>> >> >> > +d->outbuf_capacity += len;
>> >> >> > +d->outbuf_capacity *= 2;
>> >> >> > +d->outbuf = qemu_realloc(d->outbuf, d->outbuf_capacity);
>> >> >> > +}
>> >> >> > +
>> >> >> > +memcpy(d->outbuf + d->outbuf_size, buf, len);
>> >> >> > +d->outbuf_size += len;
>> >> >> > +
>> >> >> > +return len;
>> >> >> > +}
>> >> >> > +
>> >> >> > +#define DEFAULT_BUF_SIZE 4096
>> >> >> 
>> >> >> It's the *initial* buffer size, isn't it?
>> >> >
>> >> > Yes.
>> >> 
>> >> Could we make the name reflect that then?
>> >> 
>> >> >> Doubt it's worth a #define (there's just one user), but that's a matter
>> >> >> of taste.
>> >> >> 
>> >> >> > +
>> >> >> > +void qemu_chr_init_mem(CharDriverState *chr)
>> >> >> > +{
>> >> >> > +MemoryDriver *d;
>> >> >> > +
>> >> >> > +d = qemu_malloc(sizeof(*d));
>> >> >> > +d->outbuf_size = 0;
>> >> >> > +d->outbuf_capacity = DEFAULT_BUF_SIZE;
>> >> >> > +d->outbuf = qemu_mallocz(d->outbuf_capacity);
>> >> >> > +
>> >> >> > +memset(chr, 0, sizeof(*chr));
>> >> >> > +chr->opaque = d;
>> >> >> > +chr->chr_write = mem_chr_write;
>> >> >> > +}
>> >> >> > +
>> >> >> > +/* assumes the stored data is a string */
>> >> >> 
>> >> >> What else could it be?  Worrying about embedded '\0's?
>> >> >
>> >> > Yes, as the driver itself doesn't interpret the contents of its
>> >> > buffer.
>> >> 
>> >> What happens if there are embedded '\0's?
>> >
>> > The string will be shorter than expected? And what if it contains
>> > non-printable characters?
>> >
>> > It's just a cautionary comment to help the user identify such problems, I 
>> > think
>> > we're making a whole argument about a quite minor thing.
>> 
>> When I see "assumes X" in a function comment, I immediately ask "and
>> what happens when !X?"  The default answer is "it explodes, so don't do
>> that".  That answer is wrong here.  Therefore, I find the comment
>> misleading.
>
> That's how you interpret it, my interpretation is that I might not get
> the expected behavior.

Actually, this function works just fine for embedded '\0's (I tested
it): it returns the correct QString, with full length and '\0' embedded.

Only later, when we attempt to put that QString on the wire do we screw
up, in to_json().  It fails to consider the length, and stops at the
first 0.  In fact, there's not even a way to get the length of a
QString!  There's only qstring_get_str().  I'd call that an API bug.
You might call it a restriction instead ;)

If anything needs a comment, it's qobject_to_json().  But I th

[Qemu-devel] Re: [PATCH 1/8] pci: pci_default_cap_write_config ignores wmask

2010-11-12 Thread Alex Williamson
On Fri, 2010-11-12 at 10:48 +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2010 at 11:03:19PM -0700, Alex Williamson wrote:
> > On Fri, 2010-11-12 at 07:22 +0200, Michael S. Tsirkin wrote:
> > > On Thu, Nov 11, 2010 at 07:55:01PM -0700, Alex Williamson wrote:
> > > > Make use of wmask, just like the rest of config space.
> > > > 
> > > > Signed-off-by: Alex Williamson 
> > > > ---
> > > > 
> > > >  hw/pci.c |   19 ---
> > > >  1 files changed, 8 insertions(+), 11 deletions(-)
> > > > 
> > > > diff --git a/hw/pci.c b/hw/pci.c
> > > > index 92aaa85..12c47ac 100644
> > > > --- a/hw/pci.c
> > > > +++ b/hw/pci.c
> > > > @@ -1175,13 +1175,14 @@ uint32_t pci_default_read_config(PCIDevice *d,
> > > >  return pci_read_config(d, address, len);
> > > >  }
> > > >  
> > > > -static void pci_write_config(PCIDevice *pci_dev,
> > > > - uint32_t address, uint32_t val, int len)
> > > > +static void pci_write_config(PCIDevice *d, uint32_t addr, uint32_t 
> > > > val, int l)
> > > >  {
> > > >  int i;
> > > > -for (i = 0; i < len; i++) {
> > > > -pci_dev->config[address + i] = val & 0xff;
> > > > -val >>= 8;
> > > > +uint32_t config_size = pci_config_size(d);
> > > > +
> > > > +for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
> > > > +uint8_t wmask = d->wmask[addr + i];
> > > > +d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & 
> > > > wmask);
> > > >  }
> > > >  }
> > > >
> > > 
> > > 
> > > Let's not name an internal static helper pci_write_config.
> > > This is really update_config_by_mask or something like that.
> > > But see below: maybe we don't need it at all?
> > 
> > The function already exists, I just made it do what it seems like it
> > should have been doing already.
> 
> Yep. But since you are rewriting this function - could you rename it as
> well?

Ok

> > > > @@ -1207,18 +1208,14 @@ void pci_default_cap_write_config(PCIDevice 
> > > > *pci_dev,
> > > >  
> > > >  void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t 
> > > > val, int l)
> > > >  {
> > > > -int i, was_irq_disabled = pci_irq_disabled(d);
> > > > -uint32_t config_size = pci_config_size(d);
> > > > +int was_irq_disabled = pci_irq_disabled(d);
> > > >  
> > > >  if (pci_access_cap_config(d, addr, l)) {
> > > >  d->cap.config_write(d, addr, val, l);
> > > >  return;
> > > >  }
> > > >  
> > > 
> > > I would like to also examine the need for _cap_
> > > functions. Why can assigned devices just do
> > > 
> > >   pci_default_write_config 
> > >   if (range_overlap(...msi)) {
> > >   }
> > >   if (range_overlap(...msix)) {
> > >   }
> > > and then we could remove all the _cap_ extensions
> > > altogether?
> > 
> > I think that somewhere we need to track what capabilities are at what
> > offset, config space isn't a performance path, but that look horribly
> > inefficient and gets worse with more capabilities.
> 
> Looks like premature optimization to me.  I guess when we get more than
> say 8 capabilities to support, I'll start to worry.
> Even then, these optimizations are better internal in pci core.

It's not just an optimization, as noted in another reply, we should be
using it to make sure we don't have collisions, and it simply makes the
callback code much cleaner to be able to do a switch statement instead
of a pile of 'if (ranges_overlap)', IMO.

> > Why don't we define capability id 0xff as normal config space (first 64
> > bytes), then add the capability id to read/write_config (this is what
> > vfio does).  Then the driver can split capability handling off from
> > their main functions if they want.
> 
> My feeling is we need higher level APIs than 'capability write'.
> Otherwise we get the PCI config handling all over the place.
> E.g. a callback when msi is enabled/disabled would make sense,
> so that pci core can keep track of current state and only notify
> the device when there are things to do.

I agree, but it's difficult to provide the flexibility to meet all the
needs.  Device assignment might want to be called for more or less bit
flips than an emulated device, PM is probably a good example of this.
We could actually change state on a PMCSR write, but I'm not sure what
an emulated device would do.  Does that mean we add a callback
specifically for that, or do we provide some generic interface that
drivers can register which bits they want to know about changing?

> >  Anyway, I think such an improvement
> > could be added incrementally later.  Thanks,
> > 
> > Alex
> 
> Sure.
> 
> > > > -for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
> > > > -uint8_t wmask = d->wmask[addr + i];
> > > > -d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & 
> > > > wmask);
> > > > -}
> > > > +pci_write_config(d, addr, val, l);
> > > >  
> > > >  #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
> > > >  if (kvm_enabled() && kvm_i

[Qemu-devel] Re: [RFC PATCH 8/8] device-assignment: pass through and stub more PCI caps

2010-11-12 Thread Alex Williamson
On Fri, 2010-11-12 at 11:11 +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2010 at 11:30:07PM -0700, Alex Williamson wrote:
> > On Fri, 2010-11-12 at 07:36 +0200, Michael S. Tsirkin wrote:
> > > On Thu, Nov 11, 2010 at 07:56:46PM -0700, Alex Williamson wrote:
> > > > Some drivers depend on finding capabilities like power management,
> > > > PCI express/X, vital product data, or vendor specific fields.  Now
> > > > that we have better capability support, we can pass more of these
> > > > tables through to the guest.  Note that VPD and VNDR are direct pass
> > > > through capabilies, the rest are mostly empty shells with a few
> > > > writable bits where necessary.
> > > > 
> > > > Signed-off-by: Alex Williamson 
> > > > ---
> > > > 
> > > >  hw/device-assignment.c |  160 
> > > > +---
> > > >  1 files changed, 149 insertions(+), 11 deletions(-)
> > > > 
> > > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > > index 179c7dc..1b228ad 100644
> > > > --- a/hw/device-assignment.c
> > > > +++ b/hw/device-assignment.c
> > > > @@ -366,6 +366,27 @@ static uint8_t 
> > > > assigned_dev_pci_read_byte(PCIDevice *d, int pos)
> > > >  return (uint8_t)assigned_dev_pci_read(d, pos, 1);
> > > >  }
> > > >  
> > > > +static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t 
> > > > val, int len)
> > > > +{
> > > > +AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
> > > > +ssize_t ret;
> > > > +int fd = pci_dev->real_device.config_fd;
> > > > +
> > > > +again:
> > > > +ret = pwrite(fd, &val, len, pos);
> > > > +if (ret != len) {
> > > > +   if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
> > > > +   goto again;
> > > 
> > > 
> > > do {} while() ?
> > 
> > Sure, this is just a copy of another place that does something similar.
> > They should either be merged or both converted in a separate patch.
> > 
> > > > +
> > > > +   fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
> > > > +   __func__, ret, errno);
> > > > +
> > > > +   exit(1);
> > > > +}
> > > > +
> > > > +return;
> > > > +}
> > > > +
> > > >  static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap)
> > > >  {
> > > >  int id;
> > > > @@ -1244,37 +1265,75 @@ static void assigned_dev_update_msix(PCIDevice 
> > > > *pci_dev, unsigned int ctrl_pos)
> > > >  #endif
> > > >  #endif
> > > >  
> > > > +static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
> > > > +uint8_t cap_id,
> > > > +uint32_t address, 
> > > > int len)
> > > > +{
> > > > +uint8_t cap;
> > > > +
> > > > +switch (cap_id) {
> > > > +
> > > > +case PCI_CAP_ID_VPD:
> > > > +cap = pci_find_capability(pci_dev, cap_id);
> > > > +if (address - cap >= PCI_CAP_FLAGS) {
> > > > +return assigned_dev_pci_read(pci_dev, address, len);
> > > > +}
> > > > +break;
> > > > +
> > > > +case PCI_CAP_ID_VNDR:
> > > > +cap = pci_find_capability(pci_dev, cap_id);
> > > > +if (address - cap > PCI_CAP_FLAGS) {
> > > > +return assigned_dev_pci_read(pci_dev, address, len);
> > > > +}
> > > > +break;
> > > > +}
> > > > +
> > > > +return pci_default_cap_read_config(pci_dev, cap_id, address, len);
> > > > +}
> > > > +
> > > >  static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
> > > >   uint8_t cap_id,
> > > >   uint32_t address,
> > > >   uint32_t val, int len)
> > > >  {
> > > > +uint8_t cap;
> > > > +
> > > >  pci_default_cap_write_config(pci_dev, cap_id, address, val, len);
> > > >  
> > > >  switch (cap_id) {
> > > >  #ifdef KVM_CAP_IRQ_ROUTING
> > > >  case PCI_CAP_ID_MSI:
> > > >  #ifdef KVM_CAP_DEVICE_MSI
> > > > -{
> > > > -uint8_t cap = pci_find_capability(pci_dev, cap_id);
> > > > -if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
> > > > -assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
> > > > -}
> > > > +cap = pci_find_capability(pci_dev, cap_id);
> > > > +if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
> > > > +assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
> > > >  }
> > > >  #endif
> > > >  break;
> > > >  
> > > >  case PCI_CAP_ID_MSIX:
> > > >  #ifdef KVM_CAP_DEVICE_MSIX
> > > > -{
> > > > -uint8_t cap = pci_find_capability(pci_dev, cap_id);
> > > > -if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 
> > > > 1)) {
> > > > -assigned_dev_update_msix(pci_dev, cap + 
> > > > PCI_MSIX_FLAGS);
> > > > -}
> > > > +cap = p

[Qemu-devel] [PATCH 2/2] Add qmp version of drive_del

2010-11-12 Thread Ryan Harper
Signed-off-by: Ryan Harper 
---
 qmp-commands.hx |   29 +
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/qmp-commands.hx b/qmp-commands.hx
index 793cf1c..1e0d4e9 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -338,6 +338,35 @@ Example:
 EQMP
 
 {
+.name   = "drive_del",
+.args_type  = "id:s",
+.params = "device",
+.help   = "remove host block device",
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_drive_del,
+},
+
+SQMP
+drive del
+--
+
+Remove host block device.  The result is that guest generated IO is no longer
+submitted against the host device underlying the disk.  Once a drive has
+been deleted, the QEMU Block layer returns -EIO which results in IO 
+errors in the guest for applications that are reading/writing to the device.
+
+Arguments:
+
+- "id": the device's ID (json-string)
+
+Example:
+
+-> { "execute": "drive_del", "arguments": { "id": "drive-virtio-blk1" } }
+<- { "return": {} }
+
+EQMP
+
+{
 .name   = "cpu",
 .args_type  = "index:i",
 .params = "index",
-- 
1.6.3.3




[Qemu-devel] Re: [PATCH v4 2/5] qed: Add QEMU Enhanced Disk image format

2010-11-12 Thread Kevin Wolf
Am 28.10.2010 13:01, schrieb Stefan Hajnoczi:
> This patch introduces the qed on-disk layout and implements image
> creation.  Later patches add read/write and other functionality.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  Makefile.objs |1 +
>  block/qed.c   |  548 
> +
>  block/qed.h   |  145 +++
>  block_int.h   |1 +
>  4 files changed, 695 insertions(+), 0 deletions(-)
>  create mode 100644 block/qed.c
>  create mode 100644 block/qed.h
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index f07fb01..7bae72a 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -14,6 +14,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
>  
>  block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o 
> vpc.o vvfat.o
>  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
> +block-nested-y += qed.o
>  block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>  block-nested-$(CONFIG_WIN32) += raw-win32.o
>  block-nested-$(CONFIG_POSIX) += raw-posix.o
> diff --git a/block/qed.c b/block/qed.c
> new file mode 100644
> index 000..8469cf0
> --- /dev/null
> +++ b/block/qed.c
> @@ -0,0 +1,548 @@
> +/*
> + * QEMU Enhanced Disk Format
> + *
> + * Copyright IBM, Corp. 2010
> + *
> + * Authors:
> + *  Stefan Hajnoczi   
> + *  Anthony Liguori   
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "qed.h"
> +
> +static int bdrv_qed_probe(const uint8_t *buf, int buf_size,
> +  const char *filename)
> +{
> +const QEDHeader *header = (const QEDHeader *)buf;
> +
> +if (buf_size < sizeof(*header)) {
> +return 0;
> +}
> +if (le32_to_cpu(header->magic) != QED_MAGIC) {
> +return 0;
> +}
> +return 100;
> +}
> +
> +/**
> + * Check whether an image format is raw
> + *
> + * @fmt:Backing file format, may be NULL
> + */
> +static bool qed_fmt_is_raw(const char *fmt)
> +{
> +return fmt && strcmp(fmt, "raw") == 0;
> +}

People shouldn't use them directly, but should we also consider file,
host_device, etc.?

> +
> +static void qed_header_le_to_cpu(const QEDHeader *le, QEDHeader *cpu)
> +{
> +cpu->magic = le32_to_cpu(le->magic);
> +cpu->cluster_size = le32_to_cpu(le->cluster_size);
> +cpu->table_size = le32_to_cpu(le->table_size);
> +cpu->header_size = le32_to_cpu(le->header_size);
> +cpu->features = le64_to_cpu(le->features);
> +cpu->compat_features = le64_to_cpu(le->compat_features);
> +cpu->autoclear_features = le64_to_cpu(le->autoclear_features);
> +cpu->l1_table_offset = le64_to_cpu(le->l1_table_offset);
> +cpu->image_size = le64_to_cpu(le->image_size);
> +cpu->backing_filename_offset = le32_to_cpu(le->backing_filename_offset);
> +cpu->backing_filename_size = le32_to_cpu(le->backing_filename_size);
> +}
> +
> +static void qed_header_cpu_to_le(const QEDHeader *cpu, QEDHeader *le)
> +{
> +le->magic = cpu_to_le32(cpu->magic);
> +le->cluster_size = cpu_to_le32(cpu->cluster_size);
> +le->table_size = cpu_to_le32(cpu->table_size);
> +le->header_size = cpu_to_le32(cpu->header_size);
> +le->features = cpu_to_le64(cpu->features);
> +le->compat_features = cpu_to_le64(cpu->compat_features);
> +le->autoclear_features = cpu_to_le64(cpu->autoclear_features);
> +le->l1_table_offset = cpu_to_le64(cpu->l1_table_offset);
> +le->image_size = cpu_to_le64(cpu->image_size);
> +le->backing_filename_offset = cpu_to_le32(cpu->backing_filename_offset);
> +le->backing_filename_size = cpu_to_le32(cpu->backing_filename_size);
> +}
> +
> +static int qed_write_header_sync(BDRVQEDState *s)
> +{
> +QEDHeader le;
> +int ret;
> +
> +qed_header_cpu_to_le(&s->header, &le);
> +ret = bdrv_pwrite(s->bs->file, 0, &le, sizeof(le));
> +if (ret != sizeof(le)) {
> +return ret;
> +}
> +return 0;
> +}
> +
> +static uint64_t qed_max_image_size(uint32_t cluster_size, uint32_t 
> table_size)
> +{
> +uint64_t table_entries;
> +uint64_t l2_size;
> +
> +table_entries = (table_size * cluster_size) / sizeof(uint64_t);
> +l2_size = table_entries * cluster_size;
> +
> +return l2_size * table_entries;
> +}
> +
> +static bool qed_is_cluster_size_valid(uint32_t cluster_size)
> +{
> +if (cluster_size < QED_MIN_CLUSTER_SIZE ||
> +cluster_size > QED_MAX_CLUSTER_SIZE) {
> +return false;
> +}
> +if (cluster_size & (cluster_size - 1)) {
> +return false; /* not power of 2 */
> +}
> +return true;
> +}
> +
> +static bool qed_is_table_size_valid(uint32_t table_size)
> +{
> +if (table_size < QED_MIN_TABLE_SIZE ||
> +table_size > QED_MAX_TABLE_SIZE) {
> +return false;
> +}
> +if (table_size & (table_size - 1)) {
> +return false; /* not power of 2 */
> +}
> +retur

Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver

2010-11-12 Thread Luiz Capitulino
On Fri, 12 Nov 2010 16:04:39 +0100
Markus Armbruster  wrote:

> Luiz Capitulino  writes:
> 
> > On Fri, 12 Nov 2010 15:16:33 +0100
> > Markus Armbruster  wrote:
> >
> >> Luiz Capitulino  writes:
> >> 
> >> > On Fri, 12 Nov 2010 11:21:57 +0100
> >> > Markus Armbruster  wrote:
> >> >
> >> >> Luiz Capitulino  writes:
> >> [...]
> >> >> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr)
> >> >> > +{
> >> >> > +MemoryDriver *d = chr->opaque;
> >> >> > +
> >> >> > +if (d->outbuf_size == 0) {
> >> >> > +return qstring_new();
> >> >> > +}
> >> >> 
> >> >> Why is this necessary?  Is qstring_from_substr() broken for empty
> >> >> substrings?  If it is, it ought to be fixed!
> >> >
> >> > qstring_from_substr() takes a character range; outbuf_size stores a size,
> >> > not a string length. So we do:
> >> >
> >> >> > +return qstring_from_substr((char *) d->outbuf, 0, d->outbuf_size 
> >> >> > - 1);
> >> >
> >> > If outbuf_size is 0, we'll be passing a negative value down.
> >> 
> >> What's wrong with that?
> >
> > Although it's going to work with the current QString implementation, I don't
> > think it's it's a good idea to rely on a negative index.
> 
> How should I extract the substring of S beginning at index B with length
> L?  If I cant't do this for any B, L with interval [B,B+L-1] fully
> within [0,length(S)], then the API is flawed, and ought to be replaced.

Not sure we're talking about the same problem, anymore. When you said:

> >> What's wrong with that?

What did you mean? Did you mean 'let's not decrement outbuf_size' or did
you mean 'let's pass -1 anyway'?

Both seem wrong to me: the substring [0,-1] should be invalid and not
decrementing outbuf_size is wrong, because it contains the buffer size and
qstring_from_substr() will consume an additional char from the buffer (which
should be '\0' today, but we shouldn't count on that).

> 
> > Maybe, we could have:
> >
> > return qstring_from_substr((char *) d->outbuf, 0,
> > d->outbuf_size > 0 ? d->outbuf_size - 1 : 0);
> >
> > A bit harder to read, but makes the function smaller.
> 
> Err, doesn't qstring_from_substr(s, 0, 0) extract a substring of length
> 1?

Yeah, it's a bug. But that doesn't change my suggestion, can we do this way?

This should fix the bug (not even compiled tested):

diff --git a/qstring.c b/qstring.c
index 4e2ba08..72a25de 100644
--- a/qstring.c
+++ b/qstring.c
@@ -42,10 +42,10 @@ QString *qstring_from_substr(const char *str, int start, 
int end)
 
 qstring = qemu_malloc(sizeof(*qstring));
 
-qstring->length = end - start + 1;
-qstring->capacity = qstring->length;
+qstring->length = end - start;
+qstring->capacity = qstring->length + 1;
 
-qstring->string = qemu_malloc(qstring->capacity + 1);
+qstring->string = qemu_malloc(qstring->capacity);
 memcpy(qstring->string, str + start, qstring->length);
 qstring->string[qstring->length] = 0;



[Qemu-devel] [PATCH 1/2] Implement drive_del to decouple block removal from device removal

2010-11-12 Thread Ryan Harper
Currently device hotplug removal code is tied to device removal via
ACPI.  All pci devices that are removable via device_del() require the
guest to respond to the request.  In some cases the guest may not
respond leaving the device still accessible to the guest.  The management
layer doesn't currently have a reliable way to revoke access to host
resource in the presence of an uncooperative guest.

This patch implements a new monitor command, drive_del, which
provides an explicit command to revoke access to a host block device.

drive_del first quiesces the block device (qemu_aio_flush;
bdrv_flush() and bdrv_close()).  This prevents further IO from being
submitted against the host device.  Finally, drive_del cleans up
pointers between the drive object (host resource) and the device
object (guest resource).

Signed-off-by: Ryan Harper 
---
 blockdev.c  |   39 +++
 blockdev.h  |1 +
 hmp-commands.hx |   18 ++
 3 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 6cb179a..f6ac439 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -14,6 +14,8 @@
 #include "qemu-option.h"
 #include "qemu-config.h"
 #include "sysemu.h"
+#include "hw/qdev.h"
+#include "block_int.h"
 
 static QTAILQ_HEAD(drivelist, DriveInfo) drives = 
QTAILQ_HEAD_INITIALIZER(drives);
 
@@ -597,3 +599,40 @@ int do_change_block(Monitor *mon, const char *device,
 }
 return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
+
+int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+const char *id = qdict_get_str(qdict, "id");
+BlockDriverState *bs;
+BlockDriverState **ptr;
+Property *prop;
+
+bs = bdrv_find(id);
+if (!bs) {
+qerror_report(QERR_DEVICE_NOT_FOUND, id);
+return -1;
+}
+
+/* quiesce block driver; prevent further io */
+qemu_aio_flush();
+bdrv_flush(bs);
+bdrv_close(bs);
+
+/* clean up guest state from pointing to host resource by
+ * finding and removing DeviceState "drive" property */
+for (prop = bs->peer->info->props; prop && prop->name; prop++) {
+if (prop->info->type == PROP_TYPE_DRIVE) {
+ptr = qdev_get_prop_ptr(bs->peer, prop);
+if ((*ptr) == bs) {
+bdrv_detach(bs, bs->peer);
+*ptr = NULL;
+break;
+}
+}
+}
+
+/* clean up host side */
+drive_uninit(drive_get_by_blockdev(bs));
+
+return 0;
+}
diff --git a/blockdev.h b/blockdev.h
index 653affc..2a0559e 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject 
**ret_data);
 int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
 const char *filename, const char *fmt);
+int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index e5585ba..d6dc18c 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it).
 ETEXI
 
 {
+.name   = "drive_del",
+.args_type  = "id:s",
+.params = "device",
+.help   = "remove host block device",
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_drive_del,
+},
+
+STEXI
+...@item delete @var{device}
+...@findex delete
+Remove host block device.  The result is that guest generated IO is no longer
+submitted against the host device underlying the disk.  Once a drive has
+been deleted, the QEMU Block layer returns -EIO which results in IO 
+errors in the guest for applications that are reading/writing to the device.
+ETEXI
+
+{
 .name   = "change",
 .args_type  = "device:B,target:F,arg:s?",
 .params = "device filename [format]",
-- 
1.6.3.3




[Qemu-devel] [PATCH 0/2] v7 Decouple block device removal from device removal

2010-11-12 Thread Ryan Harper
Once more dear friends, v7

This patch series decouples the detachment of a block device from the
removal of the backing pci-device.  Removal of a hotplugged pci device
requires the guest to respond before qemu tears down the block device.
In some cases, the guest may not respond leaving the guest with
continued access to the block device.  Mgmt layer doesn't have a
reliable method to force a disconnect.  

The new monitor command, drive_del, will revoke a guests access to the
block device independently of the removal of the pci device.

The first patch implements drive_del, the second patch implements the
qmp version of the monitor command.

Changes since v6:
- Updated patch description
- Dropped bdrv_unplug and inlined in drive_del
- Explicitly invoke drive_uninit()
Changes since v5:
- Removed dangling pointers in guest and host state.  This ensures things like 
  info block no longer displays the deleted drive; though info pci will
  continue to display the pci device until the guest responds to the removal
  request.
- Renamed drive_unplug -> drive_del
Changes since v4:
- Droppped drive_get_by_id patch and use bdrv_find() instead
- Added additional details about drive_unplug to hmp/qmp interface

Changes since v3:
- Moved QMP command for drive_unplug() to separate patch

Changes since v2:
- Added QMP command for drive_unplug()

Changes since v1:
- CodingStyle fixes
- Added qemu_aio_flush() to bdrv_unplug()

Signed-off-by: Ryan Harper 



[Qemu-devel] Re: [PATCH 4/8] pci: Replace used bitmap with capability byte map

2010-11-12 Thread Alex Williamson
On Fri, 2010-11-12 at 11:02 +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2010 at 11:07:15PM -0700, Alex Williamson wrote:
> > On Fri, 2010-11-12 at 07:40 +0200, Michael S. Tsirkin wrote:
> > > On Thu, Nov 11, 2010 at 07:55:43PM -0700, Alex Williamson wrote:
> > > > Capabilities are allocated in bytes, so we can track both whether
> > > > a byte is used and by what capability in the same structure.
> > > > 
> > > > Remove pci_reserve_capability() as there are no users.
> > > > 
> > > > Signed-off-by: Alex Williamson 
> > > 
> > > I actually wanted to remove the used array completely and ask
> > > all users to add offsets directly.
> > > Will this be needed then?
> > 
> > Can you give an example, I don't understand what you mean by asking
> > users to add offsets directly.
> 
> Here's a dump of patch in progress.
> Something like the below (untested).  After applying this we can remove
> the whole used array allocator.
> 
> >  I think some kind of tracking what's
> > where in config space needs to be done somewhere and the common PCI code
> > seems like it'd be the place.
> 
> Why do we need it? config lets us scan the capability list
> readily enough. I had this idea that we should pack
> capabilities in config space, but now I think it's silly.
> 
> >  Thanks,
> > 
> > Alex
> 
> pci: remove config space allocator
> 
> pci supports allocating caps in config space so they are packed tightly.
> This doesn't seem to be useful, especially since caps must stay at fixed
> offsets to ensure backwards compatibility (e.g. for migration).
> 
> Remove this support, and make virtio-pci supply the offset to
> the MSIX capability explicitly.
> 
> Signed-off-by: Michael S. Tsirkin 

So add_capability effectively becomes the add_capability_at_offset that
we have in the qemu-kvm tree.  I think that's fine, but doesn't
necessarily support removing the cap_map.  A direct config offset to
capability id map is pretty useful for device assignment, which only
uses add_capability_at_offset after these patches.  We might also want
to use it to check drivers to make sure they're not overlapping caps.
For instance in the below, we now have core virtio code forcing an
offset for the msix cap.  What if the driver already placed something at
that offset, or what if the driver wants to add capability foo, it
should at least error if it tries to place it over something else in the
list.

Alex

> diff --git a/hw/msix.c b/hw/msix.c
> index f66d255..6fd3791 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -52,7 +52,7 @@ int msix_supported;
>   * Original bar size must be a power of 2 or 0.
>   * New bar size is returned. */
>  static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
> -   unsigned bar_nr, unsigned bar_size)
> +   unsigned offset, unsigned bar_nr, unsigned 
> bar_size)
>  {
>  int config_offset;
>  uint8_t *config;
> @@ -75,7 +75,7 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned 
> short nentries,
>  
>  pdev->msix_bar_size = new_size;
>  config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX,
> -   0, MSIX_CAP_LENGTH);
> +   offset, MSIX_CAP_LENGTH);
>  if (config_offset < 0)
>  return config_offset;
>  config = pdev->config + config_offset;
> @@ -237,7 +237,7 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned 
> nentries)
>  /* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size is
>   * modified, it should be retrieved with msix_bar_size. */
>  int msix_init(struct PCIDevice *dev, unsigned short nentries,
> -  unsigned bar_nr, unsigned bar_size)
> +  unsigned offset, unsigned bar_nr, unsigned bar_size)
>  {
>  int ret;
>  /* Nothing to do if MSI is not supported by interrupt controller */
> @@ -261,7 +261,7 @@ int msix_init(struct PCIDevice *dev, unsigned short 
> nentries,
>  }
>  
>  dev->msix_entries_nr = nentries;
> -ret = msix_add_config(dev, nentries, bar_nr, bar_size);
> +ret = msix_add_config(dev, nentries, offset, bar_nr, bar_size);
>  if (ret)
>  goto err_config;
>  
> diff --git a/hw/msix.h b/hw/msix.h
> index a9f7993..b61e42e 100644
> --- a/hw/msix.h
> +++ b/hw/msix.h
> @@ -5,7 +5,7 @@
>  #include "pci.h"
>  
>  int msix_init(PCIDevice *pdev, unsigned short nentries,
> -  unsigned bar_nr, unsigned bar_size);
> +  unsigned offset, unsigned bar_nr, unsigned bar_size);
>  
>  void msix_write_config(PCIDevice *pci_dev, uint32_t address,
> uint32_t val, int len);
> diff --git a/hw/pci.c b/hw/pci.c
> index aed2d42..e411f12 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1737,12 +1737,7 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
> uint8_t offset, uint8_t size)
>  {
>  uint8_t *config;
> -if (!offset) {
> -offset = pci_find_space(pdev, size)

Re: [Qemu-devel] [PATCH] Makefile: Fix check dependency breakage

2010-11-12 Thread Markus Armbruster
Luiz Capitulino  writes:

> Commit b152aa84d52882bb1846485a89baf13aa07c86bc broke the unit-tests
> build, fix it.

Got bitten by that, and the patch fixes it for me.



Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver

2010-11-12 Thread Markus Armbruster
Luiz Capitulino  writes:

> On Fri, 12 Nov 2010 15:16:33 +0100
> Markus Armbruster  wrote:
>
>> Luiz Capitulino  writes:
>> 
>> > On Fri, 12 Nov 2010 11:21:57 +0100
>> > Markus Armbruster  wrote:
>> >
>> >> Luiz Capitulino  writes:
>> [...]
>> >> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr)
>> >> > +{
>> >> > +MemoryDriver *d = chr->opaque;
>> >> > +
>> >> > +if (d->outbuf_size == 0) {
>> >> > +return qstring_new();
>> >> > +}
>> >> 
>> >> Why is this necessary?  Is qstring_from_substr() broken for empty
>> >> substrings?  If it is, it ought to be fixed!
>> >
>> > qstring_from_substr() takes a character range; outbuf_size stores a size,
>> > not a string length. So we do:
>> >
>> >> > +return qstring_from_substr((char *) d->outbuf, 0, d->outbuf_size - 
>> >> > 1);
>> >
>> > If outbuf_size is 0, we'll be passing a negative value down.
>> 
>> What's wrong with that?
>
> Although it's going to work with the current QString implementation, I don't
> think it's it's a good idea to rely on a negative index.

How should I extract the substring of S beginning at index B with length
L?  If I cant't do this for any B, L with interval [B,B+L-1] fully
within [0,length(S)], then the API is flawed, and ought to be replaced.

> Maybe, we could have:
>
> return qstring_from_substr((char *) d->outbuf, 0,
> d->outbuf_size > 0 ? d->outbuf_size - 1 : 0);
>
> A bit harder to read, but makes the function smaller.

Err, doesn't qstring_from_substr(s, 0, 0) extract a substring of length
1?



[Qemu-devel] [PATCH] Makefile: Fix check dependency breakage

2010-11-12 Thread Luiz Capitulino
Commit b152aa84d52882bb1846485a89baf13aa07c86bc broke the unit-tests
build, fix it.

Signed-off-by: Luiz Capitulino 
---
 Makefile |   14 --
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index 02698e9..719aca9 100644
--- a/Makefile
+++ b/Makefile
@@ -142,12 +142,14 @@ qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx
 
 check-qint.o check-qstring.o check-qdict.o check-qlist.o check-qfloat.o 
check-qjson.o: $(GENERATED_HEADERS)
 
-check-qint: check-qint.o qint.o qemu-malloc.o $(trace-obj-y)
-check-qstring: check-qstring.o qstring.o qemu-malloc.o $(trace-obj-y)
-check-qdict: check-qdict.o qdict.o qfloat.o qint.o qstring.o qbool.o 
qemu-malloc.o qlist.o $(trace-obj-y)
-check-qlist: check-qlist.o qlist.o qint.o qemu-malloc.o $(trace-obj-y)
-check-qfloat: check-qfloat.o qfloat.o qemu-malloc.o $(trace-obj-y)
-check-qjson: check-qjson.o qfloat.o qint.o qdict.o qstring.o qlist.o qbool.o 
qjson.o json-streamer.o json-lexer.o json-parser.o qemu-malloc.o $(trace-obj-y)
+CHECK_PROG_DEPS = qemu-malloc.o $(oslib-obj-y) $(trace-obj-y)
+
+check-qint: check-qint.o qint.o $(CHECK_PROG_DEPS)
+check-qstring: check-qstring.o qstring.o $(CHECK_PROG_DEPS)
+check-qdict: check-qdict.o qdict.o qfloat.o qint.o qstring.o qbool.o qlist.o 
$(CHECK_PROG_DEPS)
+check-qlist: check-qlist.o qlist.o qint.o $(CHECK_PROG_DEPS)
+check-qfloat: check-qfloat.o qfloat.o $(CHECK_PROG_DEPS)
+check-qjson: check-qjson.o qfloat.o qint.o qdict.o qstring.o qlist.o qbool.o 
qjson.o json-streamer.o json-lexer.o json-parser.o $(CHECK_PROG_DEPS)
 
 clean:
 # avoid old build problems by removing potentially incorrect old files
-- 
1.7.3.2.164.g6f10c




Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver

2010-11-12 Thread Luiz Capitulino
On Fri, 12 Nov 2010 15:16:33 +0100
Markus Armbruster  wrote:

> Luiz Capitulino  writes:
> 
> > On Fri, 12 Nov 2010 11:21:57 +0100
> > Markus Armbruster  wrote:
> >
> >> Luiz Capitulino  writes:
> [...]
> >> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr)
> >> > +{
> >> > +MemoryDriver *d = chr->opaque;
> >> > +
> >> > +if (d->outbuf_size == 0) {
> >> > +return qstring_new();
> >> > +}
> >> 
> >> Why is this necessary?  Is qstring_from_substr() broken for empty
> >> substrings?  If it is, it ought to be fixed!
> >
> > qstring_from_substr() takes a character range; outbuf_size stores a size,
> > not a string length. So we do:
> >
> >> > +return qstring_from_substr((char *) d->outbuf, 0, d->outbuf_size - 
> >> > 1);
> >
> > If outbuf_size is 0, we'll be passing a negative value down.
> 
> What's wrong with that?

Although it's going to work with the current QString implementation, I don't
think it's it's a good idea to rely on a negative index.

Maybe, we could have:

return qstring_from_substr((char *) d->outbuf, 0,
d->outbuf_size > 0 ? d->outbuf_size - 1 : 0);

A bit harder to read, but makes the function smaller.



Re: [Qemu-devel] Re: [Try2][PATCH] Initial implementation of a mpeg1 layer2 streaming audio driver.

2010-11-12 Thread Anthony Liguori

On 11/08/2010 12:15 AM, François Revol wrote:

Le 8 nov. 2010 à 04:57, malc a écrit :

   

And can you, please, elaborate some more on usage scenarios of this thing?
 

cf.
http://dev.haiku-os.org/browser/haiku/trunk/3rdparty/mmu_man/onlinedemo/haiku.php
   

Sorry my PHP skills have lapsed aeons ago.
 

Sorry I don't have it installed on a publically accessible machine atm.

   

and possibly http://oszoo.org/wiki/index.php/Main_Page some day...

The idea is to use it along with the -vnc option and the VNC applet
to present a VM on the web.
   

While not very sexy, but can't you just use wav output to a fifo and
compress it via separate process.
 

I did try years ago, but at least the current wav driver really didn't like 
fifos back then. I recall trying for hours to get it pipe to ffmpeg or others 
without much luck.

Also, this poses several problems about the control of the external process 
(respawn on listener disconnection, close on exit...).
   


Doing encoding in QEMU is going to starve the guest pretty bad.  For an 
x86 guest, that's going to result in issues with time drift, etc.


Making reencoding with an external process work a bit more nicely also 
has the advantage of making this work with other formats like Ogg which 
are a bit more Open Source friendly.


Regards,

Anthony Liguori


François.
   





Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver

2010-11-12 Thread Markus Armbruster
Luiz Capitulino  writes:

> On Fri, 12 Nov 2010 11:21:57 +0100
> Markus Armbruster  wrote:
>
>> Luiz Capitulino  writes:
[...]
>> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr)
>> > +{
>> > +MemoryDriver *d = chr->opaque;
>> > +
>> > +if (d->outbuf_size == 0) {
>> > +return qstring_new();
>> > +}
>> 
>> Why is this necessary?  Is qstring_from_substr() broken for empty
>> substrings?  If it is, it ought to be fixed!
>
> qstring_from_substr() takes a character range; outbuf_size stores a size,
> not a string length. So we do:
>
>> > +return qstring_from_substr((char *) d->outbuf, 0, d->outbuf_size - 1);
>
> If outbuf_size is 0, we'll be passing a negative value down.

What's wrong with that?

[...]



Re: [Qemu-devel] Re: [PATCH v4 1/5] docs: Add QED image format specification

2010-11-12 Thread Stefan Hajnoczi
On Fri, Nov 12, 2010 at 1:58 PM, Kevin Wolf  wrote:
> Am 28.10.2010 13:01, schrieb Stefan Hajnoczi:
>> Signed-off-by: Stefan Hajnoczi 
>> ---
>>  docs/specs/qed_spec.txt |  128 
>> +++
>>  1 files changed, 128 insertions(+), 0 deletions(-)
>>  create mode 100644 docs/specs/qed_spec.txt
>>
>> diff --git a/docs/specs/qed_spec.txt b/docs/specs/qed_spec.txt
>> new file mode 100644
>> index 000..e4425c8
>> --- /dev/null
>> +++ b/docs/specs/qed_spec.txt
>> @@ -0,0 +1,128 @@
>> +=Specification=
>> +
>> +The file format looks like this:
>> +
>> + +--+--+--+-+
>> + | cluster0 | cluster1 | cluster2 | ... |
>> + +--+--+--+-+
>> +
>> +The first cluster begins with the '''header'''.  The header contains 
>> information about where regular clusters start; this allows the header to be 
>> extensible and store extra information about the image file.  A regular 
>> cluster may be a '''data cluster''', an '''L2''', or an '''L1 table'''.  L1 
>> and L2 tables are composed of one or more contiguous clusters.
>> +
>> +Normally the file size will be a multiple of the cluster size.  If the file 
>> size is not a multiple, extra information after the last cluster may not be 
>> preserved if data is written.  Legitimate extra information should use space 
>> between the header and the first regular cluster.
>> +
>> +All fields are little-endian.
>> +
>> +==Header==
>> + Header {
>> +     uint32_t magic;               /* QED\0 */
>> +
>> +     uint32_t cluster_size;        /* in bytes */
>> +     uint32_t table_size;          /* for L1 and L2 tables, in clusters */
>> +     uint32_t header_size;         /* in clusters */
>> +
>> +     uint64_t features;            /* format feature bits */
>> +     uint64_t compat_features;     /* compat feature bits */
>> +     uint64_t l1_table_offset;     /* in bytes */
>> +     uint64_t image_size;          /* total logical image size, in bytes */
>> +
>> +     /* if (features & QED_F_BACKING_FILE) */
>> +     uint32_t backing_filename_offset; /* in bytes from start of header */
>> +     uint32_t backing_filename_size;   /* in bytes */
>> + }
>> +
>> +Field descriptions:
>> +* ''cluster_size'' must be a power of 2 in range [2^12, 2^26].
>> +* ''table_size'' must be a power of 2 in range [1, 16].
>> +* ''header_size'' is the number of clusters used by the header and any 
>> additional information stored before regular clusters.
>> +* ''features'', ''compat_features'', and ''autoclear_features'' are file 
>> format extension bitmaps.  They work as follows:
>> +** An image with unknown ''features'' bits enabled must not be opened.  
>> File format changes that are not backwards-compatible must use ''features'' 
>> bits.
>> +** An image with unknown ''compat_features'' bits enabled can be opened 
>> safely.  The unknown features are simply ignored and represent 
>> backwards-compatible changes to the file format.
>> +** An image with unknown ''autoclear_features'' bits enable can be opened 
>> safely after clearing the unknown bits.  This allows for 
>> backwards-compatible changes to the file format which degrade gracefully and 
>> can be re-enabled again by a new program later.
>
> autoclear features aren't even part of the header in the spec.

Thanks for spotting this, I forgot to sync the header with the code.

>> +* ''l1_table_offset'' is the offset of the first byte of the L1 table in 
>> the image file and must be a multiple of ''cluster_size''.
>> +* ''image_size'' is the block device size seen by the guest and must be a 
>> multiple of 512 bytes.
>> +* ''backing_filename'' is a string in (byte offset, byte size) form.  It is 
>> not NUL-terminated and has no alignment constraints.
>> +
>> +Feature bits:
>> +* QED_F_BACKING_FILE = 0x01.  The image uses a backing file.  The backing 
>> filename string is given in the ''backing_filename_{offset,size}'' fields 
>> and may be an absolute path or relative to the image file.
>> +* QED_F_NEED_CHECK = 0x02.  The image needs a consistency check before use.
>> +* QED_F_BACKING_FORMAT_NO_PROBE = 0x04.  The backing file is a raw disk 
>> image and no file format autodetection should be attempted.  This should be 
>> used to ensure that raw backing images are never detected as an image format 
>> if they happen to contain magic constants.
>> +
>> +There are currently no defined ''compat_features'' or 
>> ''autoclear_features'' bits.
>> +
>> +Fields predicated on a feature bit are only used when that feature is set.  
>> The fields always take up header space, regardless of whether or not the 
>> feature bit is set.
>> +
>> +==Tables==
>> +
>> +Tables provide the translation from logical offsets in the block device to 
>> cluster offsets in the file.
>> +
>> + #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))
>> +
>> + Table {
>> +     uint64_t offsets[TABLE_NOFFSETS];
>> + }
>> +
>> +The tables are organized as follows:
>> +
>> +  

[Qemu-devel] Re: [PATCH v3 3/3] virtio-pci: Don't use ioeventfd on old kernels

2010-11-12 Thread Stefan Hajnoczi
On Fri, Nov 12, 2010 at 1:24 PM, Stefan Hajnoczi
 wrote:
> @@ -1046,6 +1087,11 @@ int kvm_has_xcrs(void)
>     return kvm_state->xcrs;
>  }
>
> +int kvm_has_many_ioeventfds(void)
> +{
> +    return kvm_state->many_ioeventfds;
> +}
> +

Missing if (!kvm_enabled()) { return 0; }.  Will fix in next version,
would still appreciate review comments on any other aspect of the
patch.

Stefan



[Qemu-devel] Re: [PATCH v4 1/5] docs: Add QED image format specification

2010-11-12 Thread Kevin Wolf
Am 28.10.2010 13:01, schrieb Stefan Hajnoczi:
> Signed-off-by: Stefan Hajnoczi 
> ---
>  docs/specs/qed_spec.txt |  128 
> +++
>  1 files changed, 128 insertions(+), 0 deletions(-)
>  create mode 100644 docs/specs/qed_spec.txt
> 
> diff --git a/docs/specs/qed_spec.txt b/docs/specs/qed_spec.txt
> new file mode 100644
> index 000..e4425c8
> --- /dev/null
> +++ b/docs/specs/qed_spec.txt
> @@ -0,0 +1,128 @@
> +=Specification=
> +
> +The file format looks like this:
> +
> + +--+--+--+-+
> + | cluster0 | cluster1 | cluster2 | ... |
> + +--+--+--+-+
> +
> +The first cluster begins with the '''header'''.  The header contains 
> information about where regular clusters start; this allows the header to be 
> extensible and store extra information about the image file.  A regular 
> cluster may be a '''data cluster''', an '''L2''', or an '''L1 table'''.  L1 
> and L2 tables are composed of one or more contiguous clusters.
> +
> +Normally the file size will be a multiple of the cluster size.  If the file 
> size is not a multiple, extra information after the last cluster may not be 
> preserved if data is written.  Legitimate extra information should use space 
> between the header and the first regular cluster.
> +
> +All fields are little-endian.
> +
> +==Header==
> + Header {
> + uint32_t magic;   /* QED\0 */
> + 
> + uint32_t cluster_size;/* in bytes */
> + uint32_t table_size;  /* for L1 and L2 tables, in clusters */
> + uint32_t header_size; /* in clusters */
> + 
> + uint64_t features;/* format feature bits */
> + uint64_t compat_features; /* compat feature bits */
> + uint64_t l1_table_offset; /* in bytes */
> + uint64_t image_size;  /* total logical image size, in bytes */
> + 
> + /* if (features & QED_F_BACKING_FILE) */
> + uint32_t backing_filename_offset; /* in bytes from start of header */
> + uint32_t backing_filename_size;   /* in bytes */
> + }
> +
> +Field descriptions:
> +* ''cluster_size'' must be a power of 2 in range [2^12, 2^26].
> +* ''table_size'' must be a power of 2 in range [1, 16].
> +* ''header_size'' is the number of clusters used by the header and any 
> additional information stored before regular clusters.
> +* ''features'', ''compat_features'', and ''autoclear_features'' are file 
> format extension bitmaps.  They work as follows:
> +** An image with unknown ''features'' bits enabled must not be opened.  File 
> format changes that are not backwards-compatible must use ''features'' bits.
> +** An image with unknown ''compat_features'' bits enabled can be opened 
> safely.  The unknown features are simply ignored and represent 
> backwards-compatible changes to the file format.
> +** An image with unknown ''autoclear_features'' bits enable can be opened 
> safely after clearing the unknown bits.  This allows for backwards-compatible 
> changes to the file format which degrade gracefully and can be re-enabled 
> again by a new program later.

autoclear features aren't even part of the header in the spec.

> +* ''l1_table_offset'' is the offset of the first byte of the L1 table in the 
> image file and must be a multiple of ''cluster_size''.
> +* ''image_size'' is the block device size seen by the guest and must be a 
> multiple of 512 bytes.
> +* ''backing_filename'' is a string in (byte offset, byte size) form.  It is 
> not NUL-terminated and has no alignment constraints.
> +
> +Feature bits:
> +* QED_F_BACKING_FILE = 0x01.  The image uses a backing file.  The backing 
> filename string is given in the ''backing_filename_{offset,size}'' fields and 
> may be an absolute path or relative to the image file.
> +* QED_F_NEED_CHECK = 0x02.  The image needs a consistency check before use.
> +* QED_F_BACKING_FORMAT_NO_PROBE = 0x04.  The backing file is a raw disk 
> image and no file format autodetection should be attempted.  This should be 
> used to ensure that raw backing images are never detected as an image format 
> if they happen to contain magic constants.
> +
> +There are currently no defined ''compat_features'' or ''autoclear_features'' 
> bits.
> +
> +Fields predicated on a feature bit are only used when that feature is set.  
> The fields always take up header space, regardless of whether or not the 
> feature bit is set.
> +
> +==Tables==
> +
> +Tables provide the translation from logical offsets in the block device to 
> cluster offsets in the file.
> +
> + #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))
> +  
> + Table {
> + uint64_t offsets[TABLE_NOFFSETS];
> + }
> +
> +The tables are organized as follows:
> +
> ++--+
> +| L1 table |
> ++--+
> +   ,--'  |  '--.
> +  +--+   |+--+
> +  | L2 table |  ...   | L2 tabl

Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver

2010-11-12 Thread Luiz Capitulino
On Fri, 12 Nov 2010 11:21:57 +0100
Markus Armbruster  wrote:

> Luiz Capitulino  writes:
> 
> > This driver handles in-memory chardev operations. That's, all writes
> > to this driver are stored in an internal buffer and it doesn't talk
> > to the external world in any way.
> >
> > Right now it's very simple: it supports only writes. But it can be
> > easily extended to support more operations.
> >
> > This is going to be used by the monitor's "HMP passthrough via QMP"
> > feature, which needs to run monitor handlers without a backing
> > device.
> >
> > Signed-off-by: Luiz Capitulino 
> > ---
> >  qemu-char.c |   70 
> > +++
> >  qemu-char.h |7 ++
> >  2 files changed, 77 insertions(+), 0 deletions(-)
> >
> > diff --git a/qemu-char.c b/qemu-char.c
> > index 88997f9..36d23c6 100644
> > --- a/qemu-char.c
> > +++ b/qemu-char.c
> > @@ -2275,6 +2275,76 @@ static CharDriverState 
> > *qemu_chr_open_socket(QemuOpts *opts)
> >  return NULL;
> >  }
> >  
> > +/***/
> > +/* Memory chardev */
> > +typedef struct {
> > +size_t outbuf_size;
> > +size_t outbuf_capacity;
> > +uint8_t *outbuf;
> > +} MemoryDriver;
> > +
> > +static int mem_chr_write(CharDriverState *chr, const uint8_t *buf, int len)
> > +{
> > +MemoryDriver *d = chr->opaque;
> > +
> > +/* TODO: the QString implementation has the same code, we should
> > + * introduce a generic way to do this in cutils.c */
> > +if (d->outbuf_capacity < d->outbuf_size + len) {
> > +/* grow outbuf */
> > +d->outbuf_capacity += len;
> > +d->outbuf_capacity *= 2;
> > +d->outbuf = qemu_realloc(d->outbuf, d->outbuf_capacity);
> > +}
> > +
> > +memcpy(d->outbuf + d->outbuf_size, buf, len);
> > +d->outbuf_size += len;
> > +
> > +return len;
> > +}
> > +
> > +void qemu_chr_init_mem(CharDriverState *chr)
> > +{
> > +MemoryDriver *d;
> > +
> > +d = qemu_malloc(sizeof(*d));
> > +d->outbuf_size = 0;
> > +d->outbuf_capacity = 4096;
> > +d->outbuf = qemu_mallocz(d->outbuf_capacity);
> > +
> > +memset(chr, 0, sizeof(*chr));
> > +chr->opaque = d;
> > +chr->chr_write = mem_chr_write;
> > +}
> > +
> > +/* assumes the stored data is a string */
> 
> This could indicate a problem.  See my reply in the thread for v2.

Replied, but I can't see the problem.

> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr)
> > +{
> > +MemoryDriver *d = chr->opaque;
> > +
> > +if (d->outbuf_size == 0) {
> > +return qstring_new();
> > +}
> 
> Why is this necessary?  Is qstring_from_substr() broken for empty
> substrings?  If it is, it ought to be fixed!

qstring_from_substr() takes a character range; outbuf_size stores a size,
not a string length. So we do:

> > +return qstring_from_substr((char *) d->outbuf, 0, d->outbuf_size - 1);

If outbuf_size is 0, we'll be passing a negative value down.

> > +}
> > +
> > +/* NOTE: this driver can not be closed with qemu_chr_close()! */
> > +void qemu_chr_close_mem(CharDriverState *chr)
> > +{
> > +MemoryDriver *d = chr->opaque;
> > +
> > +qemu_free(d->outbuf);
> > +qemu_free(chr->opaque);
> > +chr->opaque = NULL;
> > +chr->chr_write = NULL;
> > +}
> > +
> > +size_t qemu_chr_mem_osize(const CharDriverState *chr)
> > +{
> > +const MemoryDriver *d = chr->opaque;
> > +return d->outbuf_size;
> > +}
> > +
> >  QemuOpts *qemu_chr_parse_compat(const char *label, const char *filename)
> >  {
> >  char host[65], port[33], width[8], height[8];
> > diff --git a/qemu-char.h b/qemu-char.h
> > index 18ad12b..e6ee6c4 100644
> > --- a/qemu-char.h
> > +++ b/qemu-char.h
> > @@ -6,6 +6,7 @@
> >  #include "qemu-option.h"
> >  #include "qemu-config.h"
> >  #include "qobject.h"
> > +#include "qstring.h"
> >  
> >  /* character device */
> >  
> > @@ -100,6 +101,12 @@ CharDriverState *qemu_chr_open_eventfd(int eventfd);
> >  
> >  extern int term_escape_char;
> >  
> > +/* memory chardev */
> > +void qemu_chr_init_mem(CharDriverState *chr);
> > +void qemu_chr_close_mem(CharDriverState *chr);
> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr);
> > +size_t qemu_chr_mem_osize(const CharDriverState *chr);
> > +
> >  /* async I/O support */
> >  
> >  int qemu_set_fd_handler2(int fd,
> 




[Qemu-devel] Re: [PATCH 6/8] device-assignment: Move PCI capabilities to match physical hardware

2010-11-12 Thread Alex Williamson
On Fri, 2010-11-12 at 11:20 +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2010 at 07:56:13PM -0700, Alex Williamson wrote:
> > Now that common PCI code doesn't have a hangup on capabilities
> > being contiguous,
> 
> Hmm, this comment confused me : there's no requirement of
> contigious allocations in current code in pci.c, is there?

Exactly, but the code used to have cap.start and cap.length, which
implied it was contiguous.  Since those were removed in 5/8, we don't
need to worry about where the physical capabilities land in config
space.  Thanks,

Alex




Re: [Qemu-devel] [PATCH 2/2] Add support for generating a systemtap tapset static probes

2010-11-12 Thread Stefan Hajnoczi
On Fri, Nov 12, 2010 at 1:20 PM, Daniel P. Berrange  wrote:
> This introduces generation of a qemu.stp/qemu-system-XXX.stp
> files which provides tapsets with friendly names for static
> probes & their arguments. Instead of
>
>    probe process("qemu").mark("qemu_malloc") {
>        printf("Malloc %d %p\n", $arg1, $arg2);
>    }
>
> It is now possible todo
>
>    probe qemu.system.i386.qemu_malloc {
>        printf("Malloc %d %p\n", size, ptr);
>    }
>
> There is one tapset defined per target arch, for both
> user and system emulators.
>
> * Makefile.target: Generate stp files for each target
> * tracetool: Support for generating systemtap tapsets
> * configure: Check for whether systemtap is available
>  with the DTrace backend
>
> Signed-off-by: Daniel P. Berrange 
> ---
>  Makefile.target |   29 -
>  configure       |    7 +++
>  tracetool       |  128 +++---
>  3 files changed, 146 insertions(+), 18 deletions(-)

Reviewed-by: Stefan Hajnoczi 



Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver

2010-11-12 Thread Luiz Capitulino
On Fri, 12 Nov 2010 11:16:54 +0100
Markus Armbruster  wrote:

> Luiz Capitulino  writes:
> 
> > On Thu, 11 Nov 2010 17:32:06 +0100
> > Markus Armbruster  wrote:
> >
> >> Luiz Capitulino  writes:
> >> 
> >> > On Thu, 11 Nov 2010 16:30:26 +0100
> >> > Markus Armbruster  wrote:
> >> >
> >> >> Luiz Capitulino  writes:
> >> >> 
> >> >> > This driver handles in-memory chardev operations. That's, all writes
> >> >> > to this driver are stored in an internal buffer and it doesn't talk
> >> >> > to the external world in any way.
> >> >> >
> >> >> > Right now it's very simple: it supports only writes. But it can be
> >> >> > easily extended to support more operations.
> >> >> >
> >> >> > This is going to be used by the monitor's "HMP passthrough via QMP"
> >> >> > feature, which needs to run monitor handlers without a backing
> >> >> > device.
> >> >> >
> >> >> > Signed-off-by: Luiz Capitulino 
> >> >> > ---
> >> >> >  qemu-char.c |   66 
> >> >> > +++
> >> >> >  qemu-char.h |6 +
> >> >> >  2 files changed, 72 insertions(+), 0 deletions(-)
> >> >> >
> >> >> > diff --git a/qemu-char.c b/qemu-char.c
> >> >> > index 88997f9..896df14 100644
> >> >> > --- a/qemu-char.c
> >> >> > +++ b/qemu-char.c
> >> >> > @@ -2275,6 +2275,72 @@ static CharDriverState 
> >> >> > *qemu_chr_open_socket(QemuOpts *opts)
> >> >> >  return NULL;
> >> >> >  }
> >> >> >  
> >> >> > +/***/
> >> >> > +/* Memory chardev */
> >> >> > +typedef struct {
> >> >> > +size_t outbuf_size;
> >> >> > +size_t outbuf_capacity;
> >> >> > +uint8_t *outbuf;
> >> >> > +} MemoryDriver;
> >> >> > +
> >> >> > +static int mem_chr_write(CharDriverState *chr, const uint8_t *buf, 
> >> >> > int len)
> >> >> > +{
> >> >> > +MemoryDriver *d = chr->opaque;
> >> >> > +
> >> >> > +/* TODO: the QString implementation has the same code, we should
> >> >> > + * introduce a generic way to do this in cutils.c */
> >> >> > +if (d->outbuf_capacity < d->outbuf_size + len) {
> >> >> > +/* grown outbuf */
> >> >> 
> >> >> Used to say "grow" (sans n) here.  Intentional change?
> >> >
> >> > Hum, no. I think I've squashed an older commit while rebasing (but this 
> >> > seems
> >> > to be the only problem).
> >> >
> >> >> 
> >> >> > +d->outbuf_capacity += len;
> >> >> > +d->outbuf_capacity *= 2;
> >> >> > +d->outbuf = qemu_realloc(d->outbuf, d->outbuf_capacity);
> >> >> > +}
> >> >> > +
> >> >> > +memcpy(d->outbuf + d->outbuf_size, buf, len);
> >> >> > +d->outbuf_size += len;
> >> >> > +
> >> >> > +return len;
> >> >> > +}
> >> >> > +
> >> >> > +#define DEFAULT_BUF_SIZE 4096
> >> >> 
> >> >> It's the *initial* buffer size, isn't it?
> >> >
> >> > Yes.
> >> 
> >> Could we make the name reflect that then?
> >> 
> >> >> Doubt it's worth a #define (there's just one user), but that's a matter
> >> >> of taste.
> >> >> 
> >> >> > +
> >> >> > +void qemu_chr_init_mem(CharDriverState *chr)
> >> >> > +{
> >> >> > +MemoryDriver *d;
> >> >> > +
> >> >> > +d = qemu_malloc(sizeof(*d));
> >> >> > +d->outbuf_size = 0;
> >> >> > +d->outbuf_capacity = DEFAULT_BUF_SIZE;
> >> >> > +d->outbuf = qemu_mallocz(d->outbuf_capacity);
> >> >> > +
> >> >> > +memset(chr, 0, sizeof(*chr));
> >> >> > +chr->opaque = d;
> >> >> > +chr->chr_write = mem_chr_write;
> >> >> > +}
> >> >> > +
> >> >> > +/* assumes the stored data is a string */
> >> >> 
> >> >> What else could it be?  Worrying about embedded '\0's?
> >> >
> >> > Yes, as the driver itself doesn't interpret the contents of its
> >> > buffer.
> >> 
> >> What happens if there are embedded '\0's?
> >
> > The string will be shorter than expected? And what if it contains
> > non-printable characters?
> >
> > It's just a cautionary comment to help the user identify such problems, I 
> > think
> > we're making a whole argument about a quite minor thing.
> 
> When I see "assumes X" in a function comment, I immediately ask "and
> what happens when !X?"  The default answer is "it explodes, so don't do
> that".  That answer is wrong here.  Therefore, I find the comment
> misleading.

That's how you interpret it, my interpretation is that I might not get
the expected behavior.

> Let's figure out what really happens.  The human command's output is
> sent to the client as a JSON string (response object member return).
> JSON strings can consist of Unicode characters, "except for the
> characters that must be escaped: quotation mark, reverse solidus, and
> the control characters (U+ through U+001F)" (RFC 4627, section 2.5).
> 
> Do we escape these characters?  Where in the code?

Should be in the json parser, but qemu_chr_mem_to_qs() doesn't assume its
users (and it obviously shouldn't).

> 
> >> >> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr)
> >> >> > +{
> >> >> > +MemoryDriver *d = chr->opaque;
> >> >> > +
> >> >> > +if (d->o

[Qemu-devel] [PATCH v3 3/3] virtio-pci: Don't use ioeventfd on old kernels

2010-11-12 Thread Stefan Hajnoczi
There used to be a limit of 6 KVM io bus devices inside the kernel.  On
such a kernel, don't use ioeventfd for virtqueue host notification since
the limit is reached too easily.  This ensures that existing vhost-net
setups (which always use ioeventfd) have ioeventfds available so they
can continue to work.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio-pci.c |4 
 kvm-all.c   |   46 ++
 kvm-stub.c  |5 +
 kvm.h   |1 +
 4 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 117e855..d3a7a9c 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -661,6 +661,10 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, 
VirtIODevice *vdev,
 pci_register_bar(&proxy->pci_dev, 0, size, PCI_BASE_ADDRESS_SPACE_IO,
virtio_map);
 
+if (!kvm_has_many_ioeventfds()) {
+proxy->flags &= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
+}
+
 virtio_bind_device(vdev, &virtio_pci_bindings, proxy);
 proxy->host_features |= 0x1 << VIRTIO_F_NOTIFY_ON_EMPTY;
 proxy->host_features |= 0x1 << VIRTIO_F_BAD_FEATURE;
diff --git a/kvm-all.c b/kvm-all.c
index 37b99c7..ba302bc 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -28,6 +28,11 @@
 #include "kvm.h"
 #include "bswap.h"
 
+/* This check must be after config-host.h is included */
+#ifdef CONFIG_EVENTFD
+#include 
+#endif
+
 /* KVM uses PAGE_SIZE in it's definition of COALESCED_MMIO_MAX */
 #define PAGE_SIZE TARGET_PAGE_SIZE
 
@@ -72,6 +77,7 @@ struct KVMState
 int irqchip_in_kernel;
 int pit_in_kernel;
 int xsave, xcrs;
+int many_ioeventfds;
 };
 
 static KVMState *kvm_state;
@@ -441,6 +447,39 @@ int kvm_check_extension(KVMState *s, unsigned int 
extension)
 return ret;
 }
 
+static int kvm_check_many_ioeventfds(void)
+{
+/* Older kernels have a 6 device limit on the KVM io bus.  Find out so we
+ * can avoid creating too many ioeventfds.
+ */
+#ifdef CONFIG_EVENTFD
+int ioeventfds[7];
+int i, ret = 0;
+for (i = 0; i < ARRAY_SIZE(ioeventfds); i++) {
+ioeventfds[i] = eventfd(0, EFD_CLOEXEC);
+if (ioeventfds[i] < 0) {
+break;
+}
+ret = kvm_set_ioeventfd_pio_word(ioeventfds[i], 0, i, true);
+if (ret < 0) {
+close(ioeventfds[i]);
+break;
+}
+}
+
+/* Decide whether many devices are supported or not */
+ret = i == ARRAY_SIZE(ioeventfds);
+
+while (i-- > 0) {
+kvm_set_ioeventfd_pio_word(ioeventfds[i], 0, i, false);
+close(ioeventfds[i]);
+}
+return ret;
+#else
+return 0;
+#endif
+}
+
 static void kvm_set_phys_mem(target_phys_addr_t start_addr,
 ram_addr_t size,
 ram_addr_t phys_offset)
@@ -717,6 +756,8 @@ int kvm_init(int smp_cpus)
 kvm_state = s;
 cpu_register_phys_memory_client(&kvm_cpu_phys_memory_client);
 
+s->many_ioeventfds = kvm_check_many_ioeventfds();
+
 return 0;
 
 err:
@@ -1046,6 +1087,11 @@ int kvm_has_xcrs(void)
 return kvm_state->xcrs;
 }
 
+int kvm_has_many_ioeventfds(void)
+{
+return kvm_state->many_ioeventfds;
+}
+
 void kvm_setup_guest_memory(void *start, size_t size)
 {
 if (!kvm_has_sync_mmu()) {
diff --git a/kvm-stub.c b/kvm-stub.c
index 5384a4b..33d4476 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -99,6 +99,11 @@ int kvm_has_robust_singlestep(void)
 return 0;
 }
 
+int kvm_has_many_ioeventfds(void)
+{
+return 0;
+}
+
 void kvm_setup_guest_memory(void *start, size_t size)
 {
 }
diff --git a/kvm.h b/kvm.h
index 60a9b42..ce08d42 100644
--- a/kvm.h
+++ b/kvm.h
@@ -42,6 +42,7 @@ int kvm_has_robust_singlestep(void);
 int kvm_has_debugregs(void);
 int kvm_has_xsave(void);
 int kvm_has_xcrs(void);
+int kvm_has_many_ioeventfds(void);
 
 #ifdef NEED_CPU_H
 int kvm_init_vcpu(CPUState *env);
-- 
1.7.2.3




Re: [Qemu-devel] [PATCH 1/2] Add a DTrace tracing backend targetted for SystemTAP compatability

2010-11-12 Thread Stefan Hajnoczi
On Fri, Nov 12, 2010 at 1:20 PM, Daniel P. Berrange  wrote:
> This introduces a new tracing backend that targets the SystemTAP
> implementation of DTrace userspace tracing. The core functionality
> should be applicable and standard across any DTrace implementation
> on Solaris, OS-X, *BSD, but the Makefile rules will likely need
> some small additional changes to cope with OS specific build
> requirements.
>
> This backend builds a little differently from the other tracing
> backends. Specifically there is no 'trace.c' file, because the
> 'dtrace' command line tool generates a '.o' file directly from
> the dtrace probe definition file. The probe definition is usually
> named with a '.d' extension but QEMU uses '.d' files for its
> external makefile dependancy tracking, so this uses '.dtrace' as
> the extension for the probe definition file.
>
> The 'tracetool' program gains the ability to generate a trace.h
> file for DTrace, and also to generate the trace.d file containing
> the dtrace probe definition.
>
> Example usage of a dtrace probe in systemtap looks like:
>
>  probe process("qemu").mark("qemu_malloc") {
>    printf("Malloc %d %p\n", $arg1, $arg2);
>  }
>
> * .gitignore: Ignore trace-dtrace.*
> * Makefile: Extra rules for generating DTrace files
> * Makefile.obj: Don't build trace.o for DTrace, use
>  trace-dtrace.o generated by 'dtrace' instead
> * tracetool: Support for generating DTrace data files
>
> Signed-off-by: Daniel P. Berrange 
> ---
>  .gitignore    |    2 +
>  Makefile      |   23 +++
>  Makefile.objs |    4 ++
>  configure     |   14 ++-
>  tracetool     |  122 
> -
>  5 files changed, 154 insertions(+), 11 deletions(-)

Reviewed-by: Stefan Hajnoczi 



[Qemu-devel] [PATCH v3 1/3] virtio-pci: Rename bugs field to flags

2010-11-12 Thread Stefan Hajnoczi
The VirtIOPCIProxy bugs field is currently used to enable workarounds
for older guests.  Rename it to flags so that other per-device behavior
can be tracked.

A later patch uses the flags field to remember whether ioeventfd should
be used for virtqueue host notification.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio-pci.c |   15 +++
 1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 729917d..549118d 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -80,9 +80,8 @@
  * 12 is historical, and due to x86 page size. */
 #define VIRTIO_PCI_QUEUE_ADDR_SHIFT12
 
-/* We can catch some guest bugs inside here so we continue supporting older
-   guests. */
-#define VIRTIO_PCI_BUG_BUS_MASTER  (1 << 0)
+/* Flags track per-device state like workarounds for quirks in older guests. */
+#define VIRTIO_PCI_FLAG_BUS_MASTER_BUG  (1 << 0)
 
 /* QEMU doesn't strictly need write barriers since everything runs in
  * lock-step.  We'll leave the calls to wmb() in though to make it obvious for
@@ -95,7 +94,7 @@
 typedef struct {
 PCIDevice pci_dev;
 VirtIODevice *vdev;
-uint32_t bugs;
+uint32_t flags;
 uint32_t addr;
 uint32_t class_code;
 uint32_t nvectors;
@@ -159,7 +158,7 @@ static int virtio_pci_load_config(void * opaque, QEMUFile 
*f)
in ready state. Then we have a buggy guest OS. */
 if ((proxy->vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
 !(proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER)) {
-proxy->bugs |= VIRTIO_PCI_BUG_BUS_MASTER;
+proxy->flags |= VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
 }
 return 0;
 }
@@ -185,7 +184,7 @@ static void virtio_pci_reset(DeviceState *d)
 VirtIOPCIProxy *proxy = container_of(d, VirtIOPCIProxy, pci_dev.qdev);
 virtio_reset(proxy->vdev);
 msix_reset(&proxy->pci_dev);
-proxy->bugs = 0;
+proxy->flags = 0;
 }
 
 static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
@@ -235,7 +234,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
some safety checks. */
 if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
 !(proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER)) {
-proxy->bugs |= VIRTIO_PCI_BUG_BUS_MASTER;
+proxy->flags |= VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
 }
 break;
 case VIRTIO_MSI_CONFIG_VECTOR:
@@ -403,7 +402,7 @@ static void virtio_write_config(PCIDevice *pci_dev, 
uint32_t address,
 
 if (PCI_COMMAND == address) {
 if (!(val & PCI_COMMAND_MASTER)) {
-if (!(proxy->bugs & VIRTIO_PCI_BUG_BUS_MASTER)) {
+if (!(proxy->flags & VIRTIO_PCI_FLAG_BUS_MASTER_BUG)) {
 virtio_set_status(proxy->vdev,
   proxy->vdev->status & 
~VIRTIO_CONFIG_S_DRIVER_OK);
 }
-- 
1.7.2.3




Re: [Qemu-devel] Re: [PATCH] Implement a virtio GPU transport

2010-11-12 Thread Anthony Liguori

On 11/12/2010 06:14 AM, Ian Molton wrote:

On 10/11/10 17:47, Anthony Liguori wrote:

On 11/10/2010 11:22 AM, Ian Molton wrote:

Ping ?


I think the best way forward is to post patches.


I posted links to the git trees. I can post patches, but they are 
*large*. Do you really want me to post them?


Yes, and they have to be split up into something reviewable.




To summarize what I was trying to express in the thread, I think this is
not the right long term architecture but am not opposed to it as a short
term solution. I think having a new virtio device is a bad design choice
but am not totally opposed to it.


Ok! (I agree (that this should be a short term solution) :) )


you want to go for the path of integration, you're going to have to fix
all of the coding style issues and make the code fit into QEMU. Dropping
a bunch of junk into target-i386/ is not making the code fit into QEMU.


I agree. how about hw/gl for the renderer and hw/ for the virtio module?


That would be fine.


If you post just what you have now in patch form, I can try to provide
more concrete advice ignoring the coding style problems.


I can post patches, although I dont think LKML would appreciate the 
volume! I can post them to the qemu list if you do.


Yes, qemu is where I was suggesting you post them.

Regards,

Anthony Liguori


-Ian





[Qemu-devel] [PATCH v3 2/3] virtio-pci: Use ioeventfd for virtqueue notify

2010-11-12 Thread Stefan Hajnoczi
Virtqueue notify is currently handled synchronously in userspace virtio.  This
prevents the vcpu from executing guest code while hardware emulation code
handles the notify.

On systems that support KVM, the ioeventfd mechanism can be used to make
virtqueue notify a lightweight exit by deferring hardware emulation to the
iothread and allowing the VM to continue execution.  This model is similar to
how vhost receives virtqueue notifies.

The result of this change is improved performance for userspace virtio devices.
Virtio-blk throughput increases especially for multithreaded scenarios and
virtio-net transmit throughput increases substantially.

Some virtio devices are known to have guest drivers which expect a notify to be
processed synchronously and spin waiting for completion.  Only enable ioeventfd
for virtio-blk and virtio-net for now.

Care must be taken not to interfere with vhost-net, which already uses
ioeventfd host notifiers.  The following list shows the behavior implemented in
this patch and is designed to take vhost-net into account:

 * VIRTIO_CONFIG_S_DRIVER_OK -> assign host notifiers, 
qemu_set_fd_handler(virtio_pci_host_notifier_read)
 * !VIRTIO_CONFIG_S_DRIVER_OK -> qemu_set_fd_handler(NULL), deassign host 
notifiers
 * virtio_pci_set_host_notifier(true) -> qemu_set_fd_handler(NULL)
 * virtio_pci_set_host_notifier(false) -> 
qemu_set_fd_handler(virtio_pci_host_notifier_read)

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio-pci.c |  152 ++
 hw/virtio.c |   14 -
 hw/virtio.h |   13 +
 3 files changed, 153 insertions(+), 26 deletions(-)

Now toggles host notifiers based on VIRTIO_CONFIG_S_DRIVER_OK status changes.
The cleanest way I could see was to introduce pre and a post set_status()
callbacks.  They allow a binding to hook status changes, including the status
change from virtio_reset().

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 549118d..117e855 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -83,6 +83,11 @@
 /* Flags track per-device state like workarounds for quirks in older guests. */
 #define VIRTIO_PCI_FLAG_BUS_MASTER_BUG  (1 << 0)
 
+/* Performance improves when virtqueue kick processing is decoupled from the
+ * vcpu thread using ioeventfd for some devices. */
+#define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1
+#define VIRTIO_PCI_FLAG_USE_IOEVENTFD   (1 << 
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT)
+
 /* QEMU doesn't strictly need write barriers since everything runs in
  * lock-step.  We'll leave the calls to wmb() in though to make it obvious for
  * KVM or if kqemu gets SMP support.
@@ -179,12 +184,125 @@ static int virtio_pci_load_queue(void * opaque, int n, 
QEMUFile *f)
 return 0;
 }
 
+static int virtio_pci_set_host_notifier_ioeventfd(VirtIOPCIProxy *proxy,
+  int n, bool assign)
+{
+VirtQueue *vq = virtio_get_queue(proxy->vdev, n);
+EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
+int r;
+if (assign) {
+r = event_notifier_init(notifier, 1);
+if (r < 0) {
+return r;
+}
+r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
+   proxy->addr + VIRTIO_PCI_QUEUE_NOTIFY,
+   n, assign);
+if (r < 0) {
+event_notifier_cleanup(notifier);
+}
+} else {
+r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
+   proxy->addr + VIRTIO_PCI_QUEUE_NOTIFY,
+   n, assign);
+if (r < 0) {
+return r;
+}
+event_notifier_cleanup(notifier);
+}
+return r;
+}
+
+static void virtio_pci_host_notifier_read(void *opaque)
+{
+VirtQueue *vq = opaque;
+EventNotifier *n = virtio_queue_get_host_notifier(vq);
+if (event_notifier_test_and_clear(n)) {
+virtio_queue_notify_vq(vq);
+}
+}
+
+static void virtio_pci_set_host_notifier_fd_handler(VirtIOPCIProxy *proxy,
+int n, bool assign)
+{
+VirtQueue *vq = virtio_get_queue(proxy->vdev, n);
+EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
+if (assign) {
+qemu_set_fd_handler(event_notifier_get_fd(notifier),
+virtio_pci_host_notifier_read, NULL, vq);
+} else {
+qemu_set_fd_handler(event_notifier_get_fd(notifier),
+NULL, NULL, NULL);
+}
+}
+
+static int virtio_pci_set_host_notifiers(VirtIOPCIProxy *proxy, bool assign)
+{
+int n, r;
+
+for (n = 0; n < VIRTIO_PCI_QUEUE_MAX; n++) {
+if (!virtio_queue_get_num(proxy->vdev, n)) {
+continue;
+}
+
+if (assign) {
+r = virtio_pci_set_host_notifier_ioeventfd(proxy, n, true);
+if (r < 0) {
+goto assign_error;
+}
+
+

[Qemu-devel] [PATCH 2/2] Add support for generating a systemtap tapset static probes

2010-11-12 Thread Daniel P. Berrange
This introduces generation of a qemu.stp/qemu-system-XXX.stp
files which provides tapsets with friendly names for static
probes & their arguments. Instead of

probe process("qemu").mark("qemu_malloc") {
printf("Malloc %d %p\n", $arg1, $arg2);
}

It is now possible todo

probe qemu.system.i386.qemu_malloc {
printf("Malloc %d %p\n", size, ptr);
}

There is one tapset defined per target arch, for both
user and system emulators.

* Makefile.target: Generate stp files for each target
* tracetool: Support for generating systemtap tapsets
* configure: Check for whether systemtap is available
  with the DTrace backend

Signed-off-by: Daniel P. Berrange 
---
 Makefile.target |   29 -
 configure   |7 +++
 tracetool   |  128 +++---
 3 files changed, 146 insertions(+), 18 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index 91e6e74..7c3671c 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -40,7 +40,27 @@ kvm.o kvm-all.o vhost.o vhost_net.o: 
QEMU_CFLAGS+=$(KVM_CFLAGS)
 config-target.h: config-target.h-timestamp
 config-target.h-timestamp: config-target.mak
 
-all: $(PROGS)
+ifdef CONFIG_SYSTEMTAP_TRACE
+stap: $(QEMU_PROG).stp
+
+ifdef CONFIG_USER_ONLY
+TARGET_TYPE=user
+else
+TARGET_TYPE=system
+endif
+
+$(QEMU_PROG).stp:
+   $(call quiet-command,sh $(SRC_PATH)/tracetool \
+   --$(TRACE_BACKEND) \
+   --binary $(bindir)/$(QEMU_PROG) \
+   --target-arch $(TARGET_ARCH) \
+   --target-type $(TARGET_TYPE) \
+   --stap < $(SRC_PATH)/trace-events > $(QEMU_PROG).stp,"  GEN   
$(QEMU_PROG).stp")
+else
+stap:
+endif
+
+all: $(PROGS) stap
 
 # Dummy command so that make thinks it has done something
@true
@@ -340,6 +360,9 @@ clean:
rm -f *.o *.a *~ $(PROGS) nwfpe/*.o fpu/*.o
rm -f *.d */*.d tcg/*.o ide/*.o
rm -f hmp-commands.h qmp-commands.h gdbstub-xml.c
+ifdef CONFIG_SYSTEMTAP_TRACE
+   rm -f *.stp
+endif
 
 install: all
 ifneq ($(PROGS),)
@@ -348,6 +371,10 @@ ifneq ($(STRIP),)
$(STRIP) $(patsubst %,"$(DESTDIR)$(bindir)/%",$(PROGS))
 endif
 endif
+ifdef CONFIG_SYSTEMTAP_TRACE
+   $(INSTALL_DIR) "$(DESTDIR)$(datadir)/../systemtap/tapset"
+   $(INSTALL_DATA) $(QEMU_PROG).stp 
"$(DESTDIR)$(datadir)/../systemtap/tapset"
+endif
 
 # Include automatically generated dependency files
 -include $(wildcard *.d */*.d)
diff --git a/configure b/configure
index f8dad3e..2917874 100755
--- a/configure
+++ b/configure
@@ -2203,6 +2203,10 @@ if test "$trace_backend" = "dtrace"; then
 echo
 exit 1
   fi
+  trace_backend_stap="no"
+  if has 'stap' ; then
+trace_backend_stap="yes"
+  fi
 fi
 
 ##
@@ -2645,6 +2649,9 @@ fi
 if test "$trace_backend" = "simple"; then
   trace_file="\"$trace_file-%u\""
 fi
+if test "$trace_backend" = "dtrace" -a "$trace_backend_stap" = "yes" ; then
+  echo "CONFIG_SYSTEMTAP_TRACE=y" >> $config_host_mak
+fi
 echo "CONFIG_TRACE_FILE=$trace_file" >> $config_host_mak
 
 echo "TOOLS=$tools" >> $config_host_mak
diff --git a/tracetool b/tracetool
index 1ade103..fce491c 100755
--- a/tracetool
+++ b/tracetool
@@ -23,9 +23,16 @@ Backends:
   --dtrace  DTrace/SystemTAP backend
 
 Output formats:
-  -hGenerate .h file
-  -cGenerate .c file
-  -dGenerate .d file (DTrace only)
+  -h Generate .h file
+  -c Generate .c file
+  -d Generate .d file (DTrace only)
+  --stap Generate .stp file (DTrace with SystemTAP only)
+
+Options:
+  --binary  [path]  Full path to QEMU binary
+  --target-arch [arch]  QEMU emulator target arch
+  --target-type [type]  QEMU emulator target type ('system' or 'user')
+
 EOF
 exit 1
 }
@@ -396,6 +403,51 @@ linetod_end_dtrace()
 EOF
 }
 
+linetostap_begin_dtrace()
+{
+return
+}
+
+linetostap_dtrace()
+{
+local i arg name args arglist state
+name=$(get_name "$1")
+args=$(get_args "$1")
+arglist=$(get_argnames "$1", "")
+state=$(get_state "$1")
+if [ "$state" = "0" ] ; then
+name=${name##disable }
+fi
+
+# Define prototype for probe arguments
+cat <

[Qemu-devel] [PATCH 1/2] Add a DTrace tracing backend targetted for SystemTAP compatability

2010-11-12 Thread Daniel P. Berrange
This introduces a new tracing backend that targets the SystemTAP
implementation of DTrace userspace tracing. The core functionality
should be applicable and standard across any DTrace implementation
on Solaris, OS-X, *BSD, but the Makefile rules will likely need
some small additional changes to cope with OS specific build
requirements.

This backend builds a little differently from the other tracing
backends. Specifically there is no 'trace.c' file, because the
'dtrace' command line tool generates a '.o' file directly from
the dtrace probe definition file. The probe definition is usually
named with a '.d' extension but QEMU uses '.d' files for its
external makefile dependancy tracking, so this uses '.dtrace' as
the extension for the probe definition file.

The 'tracetool' program gains the ability to generate a trace.h
file for DTrace, and also to generate the trace.d file containing
the dtrace probe definition.

Example usage of a dtrace probe in systemtap looks like:

  probe process("qemu").mark("qemu_malloc") {
printf("Malloc %d %p\n", $arg1, $arg2);
  }

* .gitignore: Ignore trace-dtrace.*
* Makefile: Extra rules for generating DTrace files
* Makefile.obj: Don't build trace.o for DTrace, use
  trace-dtrace.o generated by 'dtrace' instead
* tracetool: Support for generating DTrace data files

Signed-off-by: Daniel P. Berrange 
---
 .gitignore|2 +
 Makefile  |   23 +++
 Makefile.objs |4 ++
 configure |   14 ++-
 tracetool |  122 -
 5 files changed, 154 insertions(+), 11 deletions(-)

diff --git a/.gitignore b/.gitignore
index a43e4d1..3efb4ec 100644
--- a/.gitignore
+++ b/.gitignore
@@ -4,6 +4,8 @@ config-host.*
 config-target.*
 trace.h
 trace.c
+trace-dtrace.h
+trace-dtrace.dtrace
 *-timestamp
 *-softmmu
 *-darwin-user
diff --git a/Makefile b/Makefile
index 02698e9..554ad97 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,9 @@
 # Makefile for QEMU.
 
 GENERATED_HEADERS = config-host.h trace.h qemu-options.def
+ifeq ($(TRACE_BACKEND),dtrace)
+GENERATED_HEADERS += trace-dtrace.h
+endif
 
 ifneq ($(wildcard config-host.mak),)
 # Put the all: rule here so that config-host.mak can contain dependencies.
@@ -108,7 +111,11 @@ ui/vnc.o: QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
 
 bt-host.o: QEMU_CFLAGS += $(BLUEZ_CFLAGS)
 
+ifeq ($(TRACE_BACKEND),dtrace)
+trace.h: trace.h-timestamp trace-dtrace.h
+else
 trace.h: trace.h-timestamp
+endif
 trace.h-timestamp: $(SRC_PATH)/trace-events config-host.mak
$(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -h < 
$< > $@,"  GEN   trace.h")
@cmp -s $@ trace.h || cp $@ trace.h
@@ -120,6 +127,20 @@ trace.c-timestamp: $(SRC_PATH)/trace-events config-host.mak
 
 trace.o: trace.c $(GENERATED_HEADERS)
 
+trace-dtrace.h: trace-dtrace.dtrace
+   $(call quiet-command,dtrace -o $@ -h -s $<, "  GEN   trace-dtrace.h")
+
+# Normal practice is to name DTrace probe file with a '.d' extension
+# but that gets picked up by QEMU's Makefile as an external dependancy
+# rule file. So we use '.dtrace' instead
+trace-dtrace.dtrace: trace-dtrace.dtrace-timestamp
+trace-dtrace.dtrace-timestamp: $(SRC_PATH)/trace-events config-host.mak
+   $(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -d < 
$< > $@,"  GEN   trace-dtrace.dtrace")
+   @cmp -s $@ trace-dtrace.dtrace || cp $@ trace-dtrace.dtrace
+
+trace-dtrace.o: trace-dtrace.dtrace $(GENERATED_HEADERS)
+   $(call quiet-command,dtrace -o $@ -G -s $<, "  GEN trace-dtrace.o")
+
 simpletrace.o: simpletrace.c $(GENERATED_HEADERS)
 
 version.o: $(SRC_PATH)/version.rc config-host.mak
@@ -157,6 +178,8 @@ clean:
rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d 
net/*.o net/*.d fsdev/*.o fsdev/*.d ui/*.o ui/*.d
rm -f qemu-img-cmds.h
rm -f trace.c trace.h trace.c-timestamp trace.h-timestamp
+   rm -f trace-dtrace.dtrace trace-dtrace.dtrace-timestamp
+   rm -f trace-dtrace.h trace-dtrace.h-timestamp
$(MAKE) -C tests clean
for d in $(ALL_SUBDIRS) libhw32 libhw64 libuser libdis libdis-user; do \
if test -d $$d; then $(MAKE) -C $$d $@ || exit 1; fi; \
diff --git a/Makefile.objs b/Makefile.objs
index 15569af..23b17ce 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -286,11 +286,15 @@ libdis-$(CONFIG_SPARC_DIS) += sparc-dis.o
 ##
 # trace
 
+ifeq ($(TRACE_BACKEND),dtrace)
+trace-obj-y = trace-dtrace.o
+else
 trace-obj-y = trace.o
 ifeq ($(TRACE_BACKEND),simple)
 trace-obj-y += simpletrace.o
 user-obj-y += qemu-timer-common.o
 endif
+endif
 
 vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS)
 
diff --git a/configure b/configure
index 7025d2b..f8dad3e 100755
--- a/configure
+++ b/configure
@@ -929,7 +929,7 @@ echo "  --enable-docsenable documentation build"
 echo "  --disable-docs   disable documentation build"
 echo "  --disable-vhost-net  disable vhost-net acceleratio

[Qemu-devel] Add support for SystemTAP and DTrace tracing backends (v6)

2010-11-12 Thread Daniel P. Berrange
An update of the SystemTAP/DTrace patches from

  http://lists.gnu.org/archive/html/qemu-devel/2010-11/msg00563.html

The first patch contains the generic DTrace tracing backend support.
The second patch contains additional pieces for SystemTAP to generate
a tapset for each of the qemu-system-XXX binaries, to simplify life
for admins wanting to use the tracing backend.

Changed in v6:

 - Fix handling of probes with no-args in DTrace provider
 - Generate one tapset per target, for both system and user
   emulators
 - Re-write command line arg parsing in tracetool so it
   doesn't assume a particular ordering of args
 - Change '-s' to '--stap' in tracetool to avoid confusion
 - Pass full binary name into tracetool




[Qemu-devel] Re: [PATCH 2/3] virtio-pci: Use ioeventfd for virtqueue notify

2010-11-12 Thread Stefan Hajnoczi
On Fri, Nov 12, 2010 at 9:25 AM, Michael S. Tsirkin  wrote:
> On Fri, Nov 12, 2010 at 09:18:48AM +, Stefan Hajnoczi wrote:
>> On Thu, Nov 11, 2010 at 3:53 PM, Michael S. Tsirkin  wrote:
>> > On Thu, Nov 11, 2010 at 01:47:21PM +, Stefan Hajnoczi wrote:
>> >> Care must be taken not to interfere with vhost-net, which already uses
>> >> ioeventfd host notifiers.  The following list shows the behavior 
>> >> implemented in
>> >> this patch and is designed to take vhost-net into account:
>> >>
>> >>  * VIRTIO_CONFIG_S_DRIVER_OK -> assign host notifiers, 
>> >> qemu_set_fd_handler(virtio_pci_host_notifier_read)
>> >
>> > we should also deassign when VIRTIO_CONFIG_S_DRIVER_OK is cleared
>> > by io write or bus master bit?
>>
>> You're right, I'll fix the lifecycle to trigger symmetrically on
>> status bit changes rather than VIRTIO_CONFIG_S_DRIVER_OK/reset.
>>
>> >> +static void virtio_pci_reset_vdev(VirtIOPCIProxy *proxy)
>> >> +{
>> >> +    /* Poke virtio device so it deassigns its host notifiers (if any) */
>> >> +    virtio_set_status(proxy->vdev, 0);
>> >
>> > Hmm. virtio_reset already sets status to 0.
>> > I guess it should just be fixed to call virtio_set_status?
>>
>> This part is ugly.  The problem is that virtio_reset() calls
>> virtio_set_status(vdev, 0) but doesn't give the transport binding a
>> chance clean up after the virtio device has cleaned up.  Since
>> virtio-net will spot status=0 and deassign its host notifier, we need
>> to perform our own clean up after vhost.
>>
>> What makes this slightly less of a hack is the fact that virtio-pci.c
>> was already causing virtio_set_status(vdev, 0) to be invoked twice
>> during reset.  When 0 is written to the VIRTIO_PCI_STATUS register, we
>> do virtio_set_status(proxy->vdev, val & 0xFF) and then
>> virtio_reset(proxy->vdev).  So the status byte callback already gets
>> invoked twice today.
>>
>> I've just split this out into virtio_pci_reset_vdev() and (ab)used it
>> to correctly clean up virtqueue ioeventfd.
>>
>> The alternative is to add another callback from virtio.c so we are
>> notified after the vdev's reset callback has finished.
>
> Oh, likely not worth it. Mabe put the above explanation in the comment.
> Will this go away now that you move to set notifiers on status write?

For v3 I have switched to a bindings callback.  I wish it wasn't
necessary but the only other ways I can think of catching status
writes are hacks which depend on side-effects too much.

Stefan



[Qemu-devel] Re: Cannot not unplug cold-plugged devices

2010-11-12 Thread Isaku Yamahata
On Fri, Nov 12, 2010 at 01:26:30PM +0200, Michael S. Tsirkin wrote:
> No, I am just trying to understand why is hotplug event dangerous.
> We still get it if we do device add before starting the VM, right?

I'm not sure if it's safe to call enable/disable_device() and
pm_update_sci() before starting VM.
So I'd like to avoid to call them instead of making sure it.
If someone else ensures its safety, I'm willing to eliminate the if clause.
-- 
yamahata



Re: [Qemu-devel] Access to specific isa card with Qemu???

2010-11-12 Thread Jes Sorensen
On 11/08/10 17:21, Djamel Hakkar wrote:
> Hello,
> 
> We have a software that runs on MS-DOS and must communicate with a specific 
> card installed on port isa.
> We  want to use this software in Qemu with a machine that runs XP.
> Is it possible to access to the ISA port with Qemu in this case?
> 
> Do we have to do a specific development?
> Can you help us in this development, how much would it cost?

Right now I don't think there is any support for ISA pass-through, but
it could probably be done, even if it sounds pretty scary. There's a lot
of risk with ISA cards, they can easily take down the whole system, and
if it does DMA you are probably out of luck.

You might find someone on the list willing to do the work on a contract,
but I will let interested parties reply to you directly. I cannot do it.

Cheers,
Jes



Re: [Qemu-devel] [PATCH] virtio-9p: fix build on !CONFIG_UTIMENSAT v2

2010-11-12 Thread Jes Sorensen
On 11/08/10 07:44, M. Mohan Kumar wrote:
>> This patch introduce a fallback mechanism for old systems that do not
>> support utimensat.  This will fix build failure with following warnings:
>>
>> hw/virtio-9p-local.c: In function 'local_utimensat':
>> hw/virtio-9p-local.c:479: warning: implicit declaration of function
>> 'utimensat' hw/virtio-9p-local.c:479: warning: nested extern declaration
>> of 'utimensat'
>>
>> and
>>
>> hw/virtio-9p.c: In function 'v9fs_setattr_post_chmod':
>> hw/virtio-9p.c:1410: error: 'UTIME_NOW' undeclared (first use in this
>> function) hw/virtio-9p.c:1410: error: (Each undeclared identifier is
>> reported only once hw/virtio-9p.c:1410: error: for each function it
>> appears in.)
>> hw/virtio-9p.c:1413: error: 'UTIME_OMIT' undeclared (first use in this
>> function) hw/virtio-9p.c: In function 'v9fs_wstat_post_chmod':
>> hw/virtio-9p.c:2905: error: 'UTIME_OMIT' undeclared (first use in this
>> function)
>>
>> Signed-off-by: Hidetoshi Seto 
>> ---
>>  hw/virtio-9p-local.c |   32 ++--
>>  hw/virtio-9p.h   |   10 ++
>>  2 files changed, 40 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/virtio-9p-local.c b/hw/virtio-9p-local.c
>> index 0d52020..7811d2c 100644
>> --- a/hw/virtio-9p-local.c
>> +++ b/hw/virtio-9p-local.c
>> @@ -479,10 +479,38 @@ static int local_chown(FsContext *fs_ctx, const char
>> *path, FsCred *credp) return -1;
>>  }
>>
>> +/* TODO: relocate this to proper file, and make it more generic */
>> +static int qemu_utimensat(int dirfd, const char *path,
>> +  const struct timespec *times, int flags)
>> +{
> 
> IMHO, this code can be moved to cutils.c

It's not a C utility function, so it really belongs in oslib-posix.c,
but otherwise I agree. This is emulation of a C library function, it
shouldn't be in the 9p code.

Cheers,
Jes



Re: [Qemu-devel] Re: [PATCH] Implement a virtio GPU transport

2010-11-12 Thread Ian Molton

On 10/11/10 17:47, Anthony Liguori wrote:

On 11/10/2010 11:22 AM, Ian Molton wrote:

Ping ?


I think the best way forward is to post patches.


I posted links to the git trees. I can post patches, but they are 
*large*. Do you really want me to post them?



To summarize what I was trying to express in the thread, I think this is
not the right long term architecture but am not opposed to it as a short
term solution. I think having a new virtio device is a bad design choice
but am not totally opposed to it.


Ok! (I agree (that this should be a short term solution) :) )


you want to go for the path of integration, you're going to have to fix
all of the coding style issues and make the code fit into QEMU. Dropping
a bunch of junk into target-i386/ is not making the code fit into QEMU.


I agree. how about hw/gl for the renderer and hw/ for the virtio module?


If you post just what you have now in patch form, I can try to provide
more concrete advice ignoring the coding style problems.


I can post patches, although I dont think LKML would appreciate the 
volume! I can post them to the qemu list if you do.


-Ian



[Qemu-devel] Re: Cannot not unplug cold-plugged devices

2010-11-12 Thread Michael S. Tsirkin
On Fri, Nov 12, 2010 at 06:59:52PM +0900, Isaku Yamahata wrote:
> On Fri, Nov 12, 2010 at 11:18:57AM +0200, Michael S. Tsirkin wrote:
> > On Fri, Nov 12, 2010 at 04:24:11PM +0900, Isaku Yamahata wrote:
> > > On Thu, Nov 11, 2010 at 11:29:39PM -0700, Cam Macdonell wrote:
> > > > Hi,
> > > > 
> > > > I was trying to do a "device_del" on my ivshmem device and it won't
> > > > work unless the device is added via hotplug.  If the device is
> > > > coldplugged (added at startup) then nothing happens.  I think I
> > > > tracked this behaviour to the patch below.
> > > > 
> > > > Is not allowing coldplugged devices to be unplugged the desired 
> > > > behaviour?
> > > 
> > > Oh, my bad. Does the following patch help?
> > > 
> > > >From 45914b2fc95750b65685dfb98a435f58e38b45ba Mon Sep 17 00:00:00 2001
> > > Message-Id: 
> > > <45914b2fc95750b65685dfb98a435f58e38b45ba.1289546596.git.yamah...@valinux.co.jp>
> > > From: Isaku Yamahata 
> > > Date: Fri, 12 Nov 2010 16:21:35 +0900
> > > Subject: [PATCH] acpi/piix4: allow pci hotplug for cold plugged device.
> > > 
> > > This patch fixes 5beb8ad503c88a76f2b8106c3b74b4ce485a60e1
> > > reported by Cam Macdonell .
> > > Before the change set, cold plugged device can be hot unplugged.
> > > This patch unbreaks it.
> > > 
> > > Signed-off-by: Isaku Yamahata 
> > 
> > 
> > 
> > > ---
> > >  hw/acpi_piix4.c |4 +++-
> > >  1 files changed, 3 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> > > index 66c7885..bca330e 100644
> > > --- a/hw/acpi_piix4.c
> > > +++ b/hw/acpi_piix4.c
> > > @@ -621,8 +621,10 @@ static int piix4_device_hotplug(DeviceState *qdev, 
> > > PCIDevice *dev, int state)
> > >  PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev,
> > >  DO_UPCAST(PCIDevice, qdev, qdev));
> > >  
> > > -if (!dev->qdev.hotplugged)
> > > +/* qemu initialization case */
> > 
> > That comment is pretty obscure :)
> 
> How about the following?
> 
> Reached here during qemu creating a machine before
> VM runs. hot plug event shouldn't be triggered
> because it's dangerous.
> 
> > 
> > 
> > > +if (state && !dev->qdev.hotplugged) {
> > >  return 0;
> > > +}
> > 
> > Do we even need this check at all?
> 
> Hmm, I suppose you don't like the condition more complex.
> How about "if (qdev_hotplug)".
> An access function to qdev_hotplug in qdev.c would be desirable.


No, I am just trying to understand why is hotplug event dangerous.
We still get it if we do device add before starting the VM, right?

> 
> > 
> > >  
> > >  s->pci0_status.up = 0;
> > >  s->pci0_status.down = 0;
> > > -- 
> > > 1.7.1.1
> > > 
> > > 
> > > 
> > > -- 
> > > yamahata
> > 
> 
> -- 
> yamahata



Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver

2010-11-12 Thread Markus Armbruster
Luiz Capitulino  writes:

> On Thu, 11 Nov 2010 17:32:06 +0100
> Markus Armbruster  wrote:
>
>> Luiz Capitulino  writes:
>> 
>> > On Thu, 11 Nov 2010 16:30:26 +0100
>> > Markus Armbruster  wrote:
>> >
>> >> Luiz Capitulino  writes:
>> >> 
>> >> > This driver handles in-memory chardev operations. That's, all writes
>> >> > to this driver are stored in an internal buffer and it doesn't talk
>> >> > to the external world in any way.
>> >> >
>> >> > Right now it's very simple: it supports only writes. But it can be
>> >> > easily extended to support more operations.
>> >> >
>> >> > This is going to be used by the monitor's "HMP passthrough via QMP"
>> >> > feature, which needs to run monitor handlers without a backing
>> >> > device.
>> >> >
>> >> > Signed-off-by: Luiz Capitulino 
>> >> > ---
>> >> >  qemu-char.c |   66 
>> >> > +++
>> >> >  qemu-char.h |6 +
>> >> >  2 files changed, 72 insertions(+), 0 deletions(-)
>> >> >
>> >> > diff --git a/qemu-char.c b/qemu-char.c
>> >> > index 88997f9..896df14 100644
>> >> > --- a/qemu-char.c
>> >> > +++ b/qemu-char.c
>> >> > @@ -2275,6 +2275,72 @@ static CharDriverState 
>> >> > *qemu_chr_open_socket(QemuOpts *opts)
>> >> >  return NULL;
>> >> >  }
>> >> >  
>> >> > +/***/
>> >> > +/* Memory chardev */
>> >> > +typedef struct {
>> >> > +size_t outbuf_size;
>> >> > +size_t outbuf_capacity;
>> >> > +uint8_t *outbuf;
>> >> > +} MemoryDriver;
>> >> > +
>> >> > +static int mem_chr_write(CharDriverState *chr, const uint8_t *buf, int 
>> >> > len)
>> >> > +{
>> >> > +MemoryDriver *d = chr->opaque;
>> >> > +
>> >> > +/* TODO: the QString implementation has the same code, we should
>> >> > + * introduce a generic way to do this in cutils.c */
>> >> > +if (d->outbuf_capacity < d->outbuf_size + len) {
>> >> > +/* grown outbuf */
>> >> 
>> >> Used to say "grow" (sans n) here.  Intentional change?
>> >
>> > Hum, no. I think I've squashed an older commit while rebasing (but this 
>> > seems
>> > to be the only problem).
>> >
>> >> 
>> >> > +d->outbuf_capacity += len;
>> >> > +d->outbuf_capacity *= 2;
>> >> > +d->outbuf = qemu_realloc(d->outbuf, d->outbuf_capacity);
>> >> > +}
>> >> > +
>> >> > +memcpy(d->outbuf + d->outbuf_size, buf, len);
>> >> > +d->outbuf_size += len;
>> >> > +
>> >> > +return len;
>> >> > +}
>> >> > +
>> >> > +#define DEFAULT_BUF_SIZE 4096
>> >> 
>> >> It's the *initial* buffer size, isn't it?
>> >
>> > Yes.
>> 
>> Could we make the name reflect that then?
>> 
>> >> Doubt it's worth a #define (there's just one user), but that's a matter
>> >> of taste.
>> >> 
>> >> > +
>> >> > +void qemu_chr_init_mem(CharDriverState *chr)
>> >> > +{
>> >> > +MemoryDriver *d;
>> >> > +
>> >> > +d = qemu_malloc(sizeof(*d));
>> >> > +d->outbuf_size = 0;
>> >> > +d->outbuf_capacity = DEFAULT_BUF_SIZE;
>> >> > +d->outbuf = qemu_mallocz(d->outbuf_capacity);
>> >> > +
>> >> > +memset(chr, 0, sizeof(*chr));
>> >> > +chr->opaque = d;
>> >> > +chr->chr_write = mem_chr_write;
>> >> > +}
>> >> > +
>> >> > +/* assumes the stored data is a string */
>> >> 
>> >> What else could it be?  Worrying about embedded '\0's?
>> >
>> > Yes, as the driver itself doesn't interpret the contents of its
>> > buffer.
>> 
>> What happens if there are embedded '\0's?
>
> The string will be shorter than expected? And what if it contains
> non-printable characters?
>
> It's just a cautionary comment to help the user identify such problems, I 
> think
> we're making a whole argument about a quite minor thing.

When I see "assumes X" in a function comment, I immediately ask "and
what happens when !X?"  The default answer is "it explodes, so don't do
that".  That answer is wrong here.  Therefore, I find the comment
misleading.

Let's figure out what really happens.  The human command's output is
sent to the client as a JSON string (response object member return).
JSON strings can consist of Unicode characters, "except for the
characters that must be escaped: quotation mark, reverse solidus, and
the control characters (U+ through U+001F)" (RFC 4627, section 2.5).

Do we escape these characters?  Where in the code?

>> >> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr)
>> >> > +{
>> >> > +MemoryDriver *d = chr->opaque;
>> >> > +
>> >> > +if (d->outbuf_size == 0) {
>> >> > +return NULL;
>> >> > +}
>> >> 
>> >> Did you forget to change this?  We agreed to return an empty QString
>> >> when chr contains an empty string.
>> >
>> > I've changed my mind and forgot to mention it: I thought that we would
>> > need to return NULL on error conditions, but turns out that this function
>> > never fails.
>> >
>> > So, I do think it's better to let it that way for two reasons:
>> >
>> >  1. An empty has at least the '\0' character, but in this case the buffer
>> > is rea

Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver

2010-11-12 Thread Markus Armbruster
Luiz Capitulino  writes:

> This driver handles in-memory chardev operations. That's, all writes
> to this driver are stored in an internal buffer and it doesn't talk
> to the external world in any way.
>
> Right now it's very simple: it supports only writes. But it can be
> easily extended to support more operations.
>
> This is going to be used by the monitor's "HMP passthrough via QMP"
> feature, which needs to run monitor handlers without a backing
> device.
>
> Signed-off-by: Luiz Capitulino 
> ---
>  qemu-char.c |   70 
> +++
>  qemu-char.h |7 ++
>  2 files changed, 77 insertions(+), 0 deletions(-)
>
> diff --git a/qemu-char.c b/qemu-char.c
> index 88997f9..36d23c6 100644
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -2275,6 +2275,76 @@ static CharDriverState *qemu_chr_open_socket(QemuOpts 
> *opts)
>  return NULL;
>  }
>  
> +/***/
> +/* Memory chardev */
> +typedef struct {
> +size_t outbuf_size;
> +size_t outbuf_capacity;
> +uint8_t *outbuf;
> +} MemoryDriver;
> +
> +static int mem_chr_write(CharDriverState *chr, const uint8_t *buf, int len)
> +{
> +MemoryDriver *d = chr->opaque;
> +
> +/* TODO: the QString implementation has the same code, we should
> + * introduce a generic way to do this in cutils.c */
> +if (d->outbuf_capacity < d->outbuf_size + len) {
> +/* grow outbuf */
> +d->outbuf_capacity += len;
> +d->outbuf_capacity *= 2;
> +d->outbuf = qemu_realloc(d->outbuf, d->outbuf_capacity);
> +}
> +
> +memcpy(d->outbuf + d->outbuf_size, buf, len);
> +d->outbuf_size += len;
> +
> +return len;
> +}
> +
> +void qemu_chr_init_mem(CharDriverState *chr)
> +{
> +MemoryDriver *d;
> +
> +d = qemu_malloc(sizeof(*d));
> +d->outbuf_size = 0;
> +d->outbuf_capacity = 4096;
> +d->outbuf = qemu_mallocz(d->outbuf_capacity);
> +
> +memset(chr, 0, sizeof(*chr));
> +chr->opaque = d;
> +chr->chr_write = mem_chr_write;
> +}
> +
> +/* assumes the stored data is a string */

This could indicate a problem.  See my reply in the thread for v2.

> +QString *qemu_chr_mem_to_qs(CharDriverState *chr)
> +{
> +MemoryDriver *d = chr->opaque;
> +
> +if (d->outbuf_size == 0) {
> +return qstring_new();
> +}

Why is this necessary?  Is qstring_from_substr() broken for empty
substrings?  If it is, it ought to be fixed!

> +
> +return qstring_from_substr((char *) d->outbuf, 0, d->outbuf_size - 1);
> +}
> +
> +/* NOTE: this driver can not be closed with qemu_chr_close()! */
> +void qemu_chr_close_mem(CharDriverState *chr)
> +{
> +MemoryDriver *d = chr->opaque;
> +
> +qemu_free(d->outbuf);
> +qemu_free(chr->opaque);
> +chr->opaque = NULL;
> +chr->chr_write = NULL;
> +}
> +
> +size_t qemu_chr_mem_osize(const CharDriverState *chr)
> +{
> +const MemoryDriver *d = chr->opaque;
> +return d->outbuf_size;
> +}
> +
>  QemuOpts *qemu_chr_parse_compat(const char *label, const char *filename)
>  {
>  char host[65], port[33], width[8], height[8];
> diff --git a/qemu-char.h b/qemu-char.h
> index 18ad12b..e6ee6c4 100644
> --- a/qemu-char.h
> +++ b/qemu-char.h
> @@ -6,6 +6,7 @@
>  #include "qemu-option.h"
>  #include "qemu-config.h"
>  #include "qobject.h"
> +#include "qstring.h"
>  
>  /* character device */
>  
> @@ -100,6 +101,12 @@ CharDriverState *qemu_chr_open_eventfd(int eventfd);
>  
>  extern int term_escape_char;
>  
> +/* memory chardev */
> +void qemu_chr_init_mem(CharDriverState *chr);
> +void qemu_chr_close_mem(CharDriverState *chr);
> +QString *qemu_chr_mem_to_qs(CharDriverState *chr);
> +size_t qemu_chr_mem_osize(const CharDriverState *chr);
> +
>  /* async I/O support */
>  
>  int qemu_set_fd_handler2(int fd,



[Qemu-devel] Re: [PATCH] scsi-disk: Move active request asserts

2010-11-12 Thread Kevin Wolf
Am 12.11.2010 10:57, schrieb Stefan Hajnoczi:
> SCSI read/write requests should not be re-issued before the current
> fragment of I/O completes.  There are asserts in scsi-disk.c that guard
> this constraint but they trigger on SPARC Linux 2.4.  It turns out that
> the asserts are too early in the code path and don't allow for read
> requests to terminate.
> 
> Only the read assert needs to be moved but move the write assert too for
> consistency.
> 
> Reported-by: Nigel Horne 
> Signed-off-by: Stefan Hajnoczi 

Thanks, applied to the block branch.

Kevin



[Qemu-devel] Re: Cannot not unplug cold-plugged devices

2010-11-12 Thread Isaku Yamahata
On Fri, Nov 12, 2010 at 11:18:57AM +0200, Michael S. Tsirkin wrote:
> On Fri, Nov 12, 2010 at 04:24:11PM +0900, Isaku Yamahata wrote:
> > On Thu, Nov 11, 2010 at 11:29:39PM -0700, Cam Macdonell wrote:
> > > Hi,
> > > 
> > > I was trying to do a "device_del" on my ivshmem device and it won't
> > > work unless the device is added via hotplug.  If the device is
> > > coldplugged (added at startup) then nothing happens.  I think I
> > > tracked this behaviour to the patch below.
> > > 
> > > Is not allowing coldplugged devices to be unplugged the desired behaviour?
> > 
> > Oh, my bad. Does the following patch help?
> > 
> > >From 45914b2fc95750b65685dfb98a435f58e38b45ba Mon Sep 17 00:00:00 2001
> > Message-Id: 
> > <45914b2fc95750b65685dfb98a435f58e38b45ba.1289546596.git.yamah...@valinux.co.jp>
> > From: Isaku Yamahata 
> > Date: Fri, 12 Nov 2010 16:21:35 +0900
> > Subject: [PATCH] acpi/piix4: allow pci hotplug for cold plugged device.
> > 
> > This patch fixes 5beb8ad503c88a76f2b8106c3b74b4ce485a60e1
> > reported by Cam Macdonell .
> > Before the change set, cold plugged device can be hot unplugged.
> > This patch unbreaks it.
> > 
> > Signed-off-by: Isaku Yamahata 
> 
> 
> 
> > ---
> >  hw/acpi_piix4.c |4 +++-
> >  1 files changed, 3 insertions(+), 1 deletions(-)
> > 
> > diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> > index 66c7885..bca330e 100644
> > --- a/hw/acpi_piix4.c
> > +++ b/hw/acpi_piix4.c
> > @@ -621,8 +621,10 @@ static int piix4_device_hotplug(DeviceState *qdev, 
> > PCIDevice *dev, int state)
> >  PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev,
> >  DO_UPCAST(PCIDevice, qdev, qdev));
> >  
> > -if (!dev->qdev.hotplugged)
> > +/* qemu initialization case */
> 
> That comment is pretty obscure :)

How about the following?

Reached here during qemu creating a machine before
VM runs. hot plug event shouldn't be triggered
because it's dangerous.

> 
> 
> > +if (state && !dev->qdev.hotplugged) {
> >  return 0;
> > +}
> 
> Do we even need this check at all?

Hmm, I suppose you don't like the condition more complex.
How about "if (qdev_hotplug)".
An access function to qdev_hotplug in qdev.c would be desirable.


> 
> >  
> >  s->pci0_status.up = 0;
> >  s->pci0_status.down = 0;
> > -- 
> > 1.7.1.1
> > 
> > 
> > 
> > -- 
> > yamahata
> 

-- 
yamahata



[Qemu-devel] Re: [patch] fix scsi-generic

2010-11-12 Thread Paolo Bonzini

On 08/09/2010 01:51 AM, adq wrote:

Figured out what the problem is - READ DVD STRUCTURE has its xfer
length in an unexpected place, so hw/scsi-bus.c retrieves completely
the wrong value for the transfer length. Attached nasty hacky (!)
patch fixes it as a proof of concept, will see what I can do to clean
it up later. I'd probably want it to warn if it sees SCSI commands it
doesn't know how to explicitly handle to aid debugging this sort of
thing in future.


Hi Andrew, are you going to submit a similar patch in definitive form?

Paolo



[Qemu-devel] [PATCH 3/4] pci: fix accesses to pci status register

2010-11-12 Thread Isaku Yamahata
pci status register is 16 bit, not 8 bit.
So use helper function to manipulate status register.

Signed-off-by: Isaku Yamahata 
---
 hw/pci.c |   21 +
 1 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index a734e14..a16e763 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -127,9 +127,11 @@ static void pci_change_irq_level(PCIDevice *pci_dev, int 
irq_num, int change)
 static void pci_update_irq_status(PCIDevice *dev)
 {
 if (dev->irq_state) {
-dev->config[PCI_STATUS] |= PCI_STATUS_INTERRUPT;
+pci_word_test_and_set_mask(dev->config + PCI_STATUS,
+   PCI_STATUS_INTERRUPT);
 } else {
-dev->config[PCI_STATUS] &= ~PCI_STATUS_INTERRUPT;
+pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
+ PCI_STATUS_INTERRUPT);
 }
 }
 
@@ -413,7 +415,7 @@ void pci_device_save(PCIDevice *s, QEMUFile *f)
  * in irq_state which we are saving.
  * This makes us compatible with old devices
  * which never set or clear this bit. */
-s->config[PCI_STATUS] &= ~PCI_STATUS_INTERRUPT;
+pci_word_test_and_clear_mask(s->config + PCI_STATUS, PCI_STATUS_INTERRUPT);
 vmstate_save_state(f, pci_get_vmstate(s), s);
 /* Restore the interrupt status bit. */
 pci_update_irq_status(s);
@@ -626,7 +628,7 @@ static void pci_init_cmask(PCIDevice *dev)
 {
 pci_set_word(dev->cmask + PCI_VENDOR_ID, 0x);
 pci_set_word(dev->cmask + PCI_DEVICE_ID, 0x);
-dev->cmask[PCI_STATUS] = PCI_STATUS_CAP_LIST;
+pci_set_word(dev->cmask + PCI_STATUS, PCI_STATUS_CAP_LIST);
 dev->cmask[PCI_REVISION_ID] = 0xff;
 dev->cmask[PCI_CLASS_PROG] = 0xff;
 pci_set_word(dev->cmask + PCI_CLASS_DEVICE, 0x);
@@ -1793,8 +1795,9 @@ static uint8_t pci_find_capability_list(PCIDevice *pdev, 
uint8_t cap_id,
 {
 uint8_t next, prev;
 
-if (!(pdev->config[PCI_STATUS] & PCI_STATUS_CAP_LIST))
+if (!(pci_get_word(pdev->config + PCI_STATUS) & PCI_STATUS_CAP_LIST)) {
 return 0;
+}
 
 for (prev = PCI_CAPABILITY_LIST; (next = pdev->config[prev]);
  prev = next + PCI_CAP_LIST_NEXT)
@@ -1900,7 +1903,7 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
 config[PCI_CAP_LIST_ID] = cap_id;
 config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
 pdev->config[PCI_CAPABILITY_LIST] = offset;
-pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
+pci_word_test_and_set_mask(pdev->config + PCI_STATUS, PCI_STATUS_CAP_LIST);
 memset(pdev->used + offset, 0xFF, size);
 /* Make capability read-only by default */
 memset(pdev->wmask + offset, 0, size);
@@ -1923,8 +1926,10 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, 
uint8_t size)
 memset(pdev->cmask + offset, 0, size);
 memset(pdev->used + offset, 0, size);
 
-if (!pdev->config[PCI_CAPABILITY_LIST])
-pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
+if (!pdev->config[PCI_CAPABILITY_LIST]) {
+pci_word_test_and_clear_mask(pdev->config + PCI_STATUS,
+ PCI_STATUS_CAP_LIST);
+}
 }
 
 /* Reserve space for capability at a known offset (to call after load). */
-- 
1.7.1.1




[Qemu-devel] [PATCH] scsi-disk: Move active request asserts

2010-11-12 Thread Stefan Hajnoczi
SCSI read/write requests should not be re-issued before the current
fragment of I/O completes.  There are asserts in scsi-disk.c that guard
this constraint but they trigger on SPARC Linux 2.4.  It turns out that
the asserts are too early in the code path and don't allow for read
requests to terminate.

Only the read assert needs to be moved but move the write assert too for
consistency.

Reported-by: Nigel Horne 
Signed-off-by: Stefan Hajnoczi 
---
 hw/scsi-disk.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index dc71957..7d85899 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -170,6 +170,9 @@ static void scsi_read_request(SCSIDiskReq *r)
 return;
 }
 
+/* No data transfer may already be in progress */
+assert(r->req.aiocb == NULL);
+
 n = r->sector_count;
 if (n > SCSI_DMA_BUF_SIZE / 512)
 n = SCSI_DMA_BUF_SIZE / 512;
@@ -197,9 +200,6 @@ static void scsi_read_data(SCSIDevice *d, uint32_t tag)
 return;
 }
 
-/* No data transfer may already be in progress */
-assert(r->req.aiocb == NULL);
-
 scsi_read_request(r);
 }
 
@@ -269,6 +269,9 @@ static void scsi_write_request(SCSIDiskReq *r)
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 uint32_t n;
 
+/* No data transfer may already be in progress */
+assert(r->req.aiocb == NULL);
+
 n = r->iov.iov_len / 512;
 if (n) {
 qemu_iovec_init_external(&r->qiov, &r->iov, 1);
@@ -298,9 +301,6 @@ static int scsi_write_data(SCSIDevice *d, uint32_t tag)
 return 1;
 }
 
-/* No data transfer may already be in progress */
-assert(r->req.aiocb == NULL);
-
 scsi_write_request(r);
 
 return 0;
-- 
1.7.2.3




[Qemu-devel] [PATCH 2/4] pci: clean up pci command register io/memory bit initialization

2010-11-12 Thread Isaku Yamahata
This patch fixes the initialization of io/memory bit of command register.
Those bits for type 1 device is RW.
Those bits for type 0 device is
  RO = 0 if it has no io/memory BAR
  RW if it has io/memory BAR

Signed-off-by: Isaku Yamahata 
---
 hw/pci.c |   20 +---
 1 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 86900a2..a734e14 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -644,10 +644,14 @@ static void pci_init_wmask(PCIDevice *dev)
 /*
  * bit 0: PCI_COMMAND_IO
  *type 0: if IO BAR is used, RW
- *type 1: RW
+ *This is handled by pci_register_bar()
+ *type 1: RW:
+ *This is fixed by pci_init_wmask_bridge()
  * bit 1: PCI_COMMAND_MEMORY
  *type 0: if IO BAR is used, RW
+ *This is handled by pci_register_bar()
  *type 1: RW
+ *This is fixed by pci_init_wmask_bridge()
  * bit 2: PCI_COMMAND_MASTER
  *type 0: RW if bus master
  *type 1: RW
@@ -682,8 +686,7 @@ static void pci_init_wmask(PCIDevice *dev)
  * bit 11-15: reserved
  */
 pci_set_word(dev->wmask + PCI_COMMAND,
- PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER |
- PCI_COMMAND_PARITY | PCI_COMMAND_SERR |
+ PCI_COMMAND_MASTER | PCI_COMMAND_PARITY | PCI_COMMAND_SERR |
  PCI_COMMAND_INTX_DISABLE);
 
 memset(dev->wmask + PCI_CONFIG_HEADER_SIZE, 0xff,
@@ -692,6 +695,9 @@ static void pci_init_wmask(PCIDevice *dev)
 
 static void pci_init_wmask_bridge(PCIDevice *d)
 {
+pci_word_test_and_set_mask(d->wmask + PCI_COMMAND,
+   PCI_COMMAND_IO | PCI_COMMAND_MEMORY);
+
 /* PCI_PRIMARY_BUS, PCI_SECONDARY_BUS, PCI_SUBORDINATE_BUS and
PCI_SEC_LETENCY_TIMER */
 memset(d->wmask + PCI_PRIMARY_BUS, 0xff, 4);
@@ -929,6 +935,14 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
 if (region_num == PCI_ROM_SLOT) {
 /* ROM enable bit is writeable */
 wmask |= PCI_ROM_ADDRESS_ENABLE;
+} else {
+if (r->type & PCI_BASE_ADDRESS_SPACE_IO) {
+pci_word_test_and_set_mask(pci_dev->wmask + PCI_COMMAND,
+   PCI_COMMAND_IO);
+} else {
+pci_word_test_and_set_mask(pci_dev->wmask + PCI_COMMAND,
+   PCI_COMMAND_MEMORY);
+}
 }
 pci_set_long(pci_dev->config + addr, type);
 if (!(r->type & PCI_BASE_ADDRESS_SPACE_IO) &&
-- 
1.7.1.1




[Qemu-devel] [PATCH 1/4] pci: revise pci command register initialization

2010-11-12 Thread Isaku Yamahata
This patch cleans up command register initialization with
comments.

Signed-off-by: Isaku Yamahata 
---
 hw/pci.c |   42 ++
 1 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 8b79ad6..86900a2 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -640,8 +640,50 @@ static void pci_init_wmask(PCIDevice *dev)
 
 dev->wmask[PCI_CACHE_LINE_SIZE] = 0xff;
 dev->wmask[PCI_INTERRUPT_LINE] = 0xff;
+
+/*
+ * bit 0: PCI_COMMAND_IO
+ *type 0: if IO BAR is used, RW
+ *type 1: RW
+ * bit 1: PCI_COMMAND_MEMORY
+ *type 0: if IO BAR is used, RW
+ *type 1: RW
+ * bit 2: PCI_COMMAND_MASTER
+ *type 0: RW if bus master
+ *type 1: RW
+ * bit 3: PCI_COMMAND_SPECIAL
+ *RO=0, optionally RW: Such device should set this bit itself
+ * bit 4: PCI_COMMAND_INVALIDATE
+ *RO=0, optionally RW: Such device should set this bit itself
+ * bit 5: PCI_COMMAND_VGA_PALETTE
+ *RO=0, optionally RW: Such device should set this bit itself
+ * bit 6: PCI_COMMAND_PARITY
+ *RW with exceptions: Such device should clear this bit itself
+ *Given that qemu doesn't emulate pci bus cycles, so that there
+ *is no place to generate parity error. So just make this
+ *register RW is safe because there is no place which refers
+ *this bit.
+ *TODO: When device assignment tried to inject PERR# into qemu,
+ *  some extra work would be needed.
+ * bit 7: PCI_COMMAND_WAIT: reserved (PCI 3.0)
+ *RO=0
+ * bit 8: PCI_COMMAND_SERR
+ *RW with exceptions: Such device should clear this bit itself
+ *Given that qemu doesn't emulate pci bus cycles, so that there
+ *is no place to generate system error. So just make this
+ *register RW is safe because there is no place which refers
+ *this bit.
+ *TODO: When device assignment tried to inject SERR# into qemu,
+ *  some extra work would be needed.
+ * bit 9: PCI_COMMAND_FAST_BACK
+ *RO=0, optionally RW: Such device should set this bit itself
+ * bit 10: PCI_COMMAND_INTX_DISABLE
+ * RW
+ * bit 11-15: reserved
+ */
 pci_set_word(dev->wmask + PCI_COMMAND,
  PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER |
+ PCI_COMMAND_PARITY | PCI_COMMAND_SERR |
  PCI_COMMAND_INTX_DISABLE);
 
 memset(dev->wmask + PCI_CONFIG_HEADER_SIZE, 0xff,
-- 
1.7.1.1




[Qemu-devel] [PATCH 0/4] pci: clean up of command/status register

2010-11-12 Thread Isaku Yamahata
This patch set cleans up of initialization of pci command/status register.

Isaku Yamahata (4):
  pci: revise pci command register initialization
  pci: clean up pci command register io/memory bit initialization
  pci: fix accesses to pci status register
  pci: clean up of pci status register

 hw/pci.c |  119 --
 1 files changed, 108 insertions(+), 11 deletions(-)




[Qemu-devel] [PATCH 4/4] pci: clean up of pci status register

2010-11-12 Thread Isaku Yamahata
This patch refine the initialization/reset of
pci status registers.

Signed-off-by: Isaku Yamahata 
---
 hw/pci.c |   40 ++--
 1 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index a16e763..184db6c 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -154,6 +154,9 @@ static void pci_device_reset(PCIDevice *dev)
 pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
  pci_get_word(dev->wmask + PCI_COMMAND) |
  pci_get_word(dev->w1cmask + PCI_COMMAND));
+pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
+ pci_get_word(dev->wmask + PCI_STATUS) |
+ pci_get_word(dev->w1cmask + PCI_STATUS));
 dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
 dev->config[PCI_INTERRUPT_LINE] = 0x0;
 for (r = 0; r < PCI_NUM_REGIONS; ++r) {
@@ -636,7 +639,7 @@ static void pci_init_cmask(PCIDevice *dev)
 dev->cmask[PCI_CAPABILITY_LIST] = 0xff;
 }
 
-static void pci_init_wmask(PCIDevice *dev)
+static void pci_init_wmask_w1cmask(PCIDevice *dev)
 {
 int config_size = pci_config_size(dev);
 
@@ -691,6 +694,39 @@ static void pci_init_wmask(PCIDevice *dev)
  PCI_COMMAND_MASTER | PCI_COMMAND_PARITY | PCI_COMMAND_SERR |
  PCI_COMMAND_INTX_DISABLE);
 
+/*
+ * bit 0-2: reserved
+ * bit 3: PCI_STATUS_INTERRUPT: RO
+ * bit 4: PCI_STATUS_CAP_LIST: RO
+ * bit 5: PCI_STATUS_66MHZ: RO
+ * bit 6: PCI_STATUS_UDF: reserved (PCI 2.2-)
+ * bit 7: PCI_STATUS_FAST_BACK: RO
+ * bit 8: PCI_STATUS_PARITY
+ *type 0: RW for bus master
+ *type 1: RW1C
+ * bit 9-10: PCI_STATUS_DEVSEL: RO
+ * bit 11: PCI_STATUS_SIG_TARGET_ABORT
+ * type 0: RW1C for targets that is capable of terminating
+ * a transaction.
+ * type 1: RW1C
+ * bit 12: PCI_STATUS_REC_TARGET_ABORT
+ * type 0: RW1C for masters
+ * type 1: RW1C
+ * bit 13: PCI_STATUS_REC_MASTER_ABORT
+ * type 0: RW1C for masters
+ * type 1: RW1C
+ * bit 14: PCI_STATUS_SIG_SYSTEM_ERROR
+ * type 0: RW1C with execptions
+ * type 1: RW1C
+ * bit : PCI_STATUS_DETECTED_PARITY: RW1C
+ *
+ * It's safe to set w1mask even for RO.
+ */
+pci_set_word(dev->w1cmask + PCI_STATUS,
+ PCI_STATUS_PARITY | PCI_STATUS_SIG_TARGET_ABORT |
+ PCI_STATUS_REC_TARGET_ABORT | PCI_STATUS_REC_MASTER_ABORT |
+ PCI_STATUS_SIG_SYSTEM_ERROR | PCI_STATUS_DETECTED_PARITY);
+
 memset(dev->wmask + PCI_CONFIG_HEADER_SIZE, 0xff,
config_size - PCI_CONFIG_HEADER_SIZE);
 }
@@ -821,7 +857,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 pci_set_default_subsystem_id(pci_dev);
 }
 pci_init_cmask(pci_dev);
-pci_init_wmask(pci_dev);
+pci_init_wmask_w1cmask(pci_dev);
 if (is_bridge) {
 pci_init_wmask_bridge(pci_dev);
 }
-- 
1.7.1.1




Re: [Qemu-devel] No Virtual Console

2010-11-12 Thread qemu
Yes - that's it! The SDL window. I can't get it to show up ... any ideas?

 

Thanks!

 


On Fri, Nov 12, 2010 00:18  AM, Mulyadi Santosa  
wrote:


> 
Hi...
> 
> On Fri, Nov 12, 2010 at 03:30, Russell Morris  wrote:
> > Hi,
> >
> >
> >
> > Yes, I am in runlevel 5. I have to admit, I did check /etc/inittab, but I'm
> > not sure what I'm looking for ... :-(. A bit lost here as to what you're
> > saying, sorry! Can you clarify a bit?
> 
> i mean, do you something like below in your /etc/inittab?
> 1:2345:respawn:/sbin/mingetty tty1
> 
> it could be tty2, tty3 and so on...it represents the console number.
> So it reads "in console 1, in run level 2 up to 5, execute mingetty
> and make it servicing tty1"
> 
> > Yes, VNC works - I was trying to say that, just not very clearly. What I'm
> > after though is to have a separate window open when I launch qemu, which is
> > the "target" OS ... make sense?
> 
> I think what you mean here is the standart graphical window, right?
> the SDL one, right?
> 
> --
> regards,
> 
> Mulyadi Santosa
> Freelance Linux trainer and consultant
> 
> blog: the-hydra.blogspot.com
> training: mulyaditraining.blogspot.com
> 
> 

[Qemu-devel] (no subject)

2010-11-12 Thread Stefan Hajnoczi



[Qemu-devel] [Bug 671831] Re: Sparc guest assert error

2010-11-12 Thread Stefan Hajnoczi
** Changed in: qemu
   Status: New => In Progress

** Changed in: qemu
 Assignee: (unassigned) => Stefan Hajnoczi (stefanha)

-- 
Sparc guest assert error
https://bugs.launchpad.net/bugs/671831
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: In Progress

Bug description:
The latest version in git (d33ea50a958b2e050d2b28e5f17e3b55e91c6d74) crashes 
with an assert error when booting a Sparc/Linux guest.

The last time I tried it (about a week ago) it worked fine.  Yesterdai, I did a 
git pull, make clean, reran configure and compiled.

Host OS: Debian Linux/x86_64 5.0
C Compiler: 4.4.5
Guest OS: Linux/Sparc (2.4)
Command Line: qemu-system-sparc -hda ~njh/qemu/sparc/debian.img -nographic -m 
128
Build Configure: ./configure --enable-linux-aio --enable-io-thread --enable-kvm
GIT commit: d33ea50a958b2e050d2b28e5f17e3b55e91c6d74

Output:

Adding Swap: 122532k swap-space (priority -1)
.
Will now check root file system:fsck 1.40-WIP (14-Nov-2006)
[/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a -C0 /dev/sda2 
qemu-system-sparc: /home/njh/src/qemu/hw/scsi-disk.c:201: scsi_read_data: 
Assertion `r->req.aiocb == ((void *)0)' failed.


It crashes in the same place every time.

(gdb) thread apply all bt:

Thread 3 (Thread 17643):
#0  0x7f4db21bc8d3 in select () at ../sysdeps/unix/syscall-template.S:82
#1  0x004d02c4 in main_loop_wait (nonblocking=)
at /home/njh/src/qemu/vl.c:1246
#2  0x004d0e57 in main_loop (argc=, 
argv=, envp=)
at /home/njh/src/qemu/vl.c:1309
#3  main (argc=, argv=, 
envp=) at /home/njh/src/qemu/vl.c:2999

Thread 2 (Thread 17645):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211
#1  0x0042450b in cond_timedwait (unused=)
at posix-aio-compat.c:104
#2  aio_thread (unused=) at posix-aio-compat.c:325
#3  0x7f4db3b818ba in start_thread (arg=)
at pthread_create.c:300
#4  0x7f4db21c302d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5  0x in ?? ()
Current language:  auto
The current source language is "auto; currently asm".

Thread 1 (Thread 17644):
#0  0x7f4db2126165 in *__GI_raise (sig=)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x7f4db2128f70 in *__GI_abort () at abort.c:92
#2  0x7f4db211f2b1 in *__GI___assert_fail (
assertion=0x52690a "r->req.aiocb == ((void *)0)", 
file=, line=201, function=0x527480 "scsi_read_data")
at assert.c:81
#3  0x0044f363 in scsi_read_data (d=, tag=0)
at /home/njh/src/qemu/hw/scsi-disk.c:201
#4  0x004ebd6c in esp_do_dma (s=0x20679d0)
at /home/njh/src/qemu/hw/esp.c:377
#5  0x004ec781 in handle_ti (opaque=0x20679d0, 
addr=, val=)
at /home/njh/src/qemu/hw/esp.c:443
#6  esp_mem_writeb (opaque=0x20679d0, addr=, 
val=) at /home/njh/src/qemu/hw/esp.c:595
#7  0x41b2d971 in ?? ()
#8  0x in ?? ()
#9  0x031ad000 in ?? ()
#10 0x000301adfa20 in ?? ()
#11 0x1007 in ?? ()
#12 0x7f4daf80e8a0 in ?? ()
#13 0x0001 in ?? ()
#14 0x in ?? ()





[Qemu-devel] Re: [PATCH 2/3] virtio-pci: Use ioeventfd for virtqueue notify

2010-11-12 Thread Michael S. Tsirkin
On Fri, Nov 12, 2010 at 09:18:48AM +, Stefan Hajnoczi wrote:
> On Thu, Nov 11, 2010 at 3:53 PM, Michael S. Tsirkin  wrote:
> > On Thu, Nov 11, 2010 at 01:47:21PM +, Stefan Hajnoczi wrote:
> >> Care must be taken not to interfere with vhost-net, which already uses
> >> ioeventfd host notifiers.  The following list shows the behavior 
> >> implemented in
> >> this patch and is designed to take vhost-net into account:
> >>
> >>  * VIRTIO_CONFIG_S_DRIVER_OK -> assign host notifiers, 
> >> qemu_set_fd_handler(virtio_pci_host_notifier_read)
> >
> > we should also deassign when VIRTIO_CONFIG_S_DRIVER_OK is cleared
> > by io write or bus master bit?
> 
> You're right, I'll fix the lifecycle to trigger symmetrically on
> status bit changes rather than VIRTIO_CONFIG_S_DRIVER_OK/reset.
> 
> >> +static void virtio_pci_reset_vdev(VirtIOPCIProxy *proxy)
> >> +{
> >> +    /* Poke virtio device so it deassigns its host notifiers (if any) */
> >> +    virtio_set_status(proxy->vdev, 0);
> >
> > Hmm. virtio_reset already sets status to 0.
> > I guess it should just be fixed to call virtio_set_status?
> 
> This part is ugly.  The problem is that virtio_reset() calls
> virtio_set_status(vdev, 0) but doesn't give the transport binding a
> chance clean up after the virtio device has cleaned up.  Since
> virtio-net will spot status=0 and deassign its host notifier, we need
> to perform our own clean up after vhost.
> 
> What makes this slightly less of a hack is the fact that virtio-pci.c
> was already causing virtio_set_status(vdev, 0) to be invoked twice
> during reset.  When 0 is written to the VIRTIO_PCI_STATUS register, we
> do virtio_set_status(proxy->vdev, val & 0xFF) and then
> virtio_reset(proxy->vdev).  So the status byte callback already gets
> invoked twice today.
> 
> I've just split this out into virtio_pci_reset_vdev() and (ab)used it
> to correctly clean up virtqueue ioeventfd.
> 
> The alternative is to add another callback from virtio.c so we are
> notified after the vdev's reset callback has finished.

Oh, likely not worth it. Mabe put the above explanation in the comment.
Will this go away now that you move to set notifiers on status write?

> >> @@ -223,10 +322,16 @@ static void virtio_ioport_write(void *opaque, 
> >> uint32_t addr, uint32_t val)
> >>          virtio_queue_notify(vdev, val);
> >>          break;
> >>      case VIRTIO_PCI_STATUS:
> >> -        virtio_set_status(vdev, val & 0xFF);
> >> -        if (vdev->status == 0) {
> >> -            virtio_reset(proxy->vdev);
> >> -            msix_unuse_all_vectors(&proxy->pci_dev);
> >> +        if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
> >> +            !(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
> >> +            (proxy->flags & VIRTIO_PCI_FLAG_USE_IOEVENTFD)) {
> >> +            virtio_pci_set_host_notifiers(proxy, true);
> >> +        }
> >
> > So we set host notifiers to true from here, but to false
> > only on reset? This seems strange. Should not we disable
> > notifiers when driver clears OK status?
> > How about on bus master disable?
> 
> You're right, this needs to be fixed.
> 
> >> @@ -714,6 +803,8 @@ static PCIDeviceInfo virtio_info[] = {
> >>          .exit       = virtio_net_exit_pci,
> >>          .romfile    = "pxe-virtio.bin",
> >>          .qdev.props = (Property[]) {
> >> +            DEFINE_PROP_UINT32("flags", VirtIOPCIProxy, flags,
> >> +                               VIRTIO_PCI_FLAG_USE_IOEVENTFD),
> >>              DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 3),
> >>              DEFINE_VIRTIO_NET_FEATURES(VirtIOPCIProxy, host_features),
> >>              DEFINE_NIC_PROPERTIES(VirtIOPCIProxy, nic),
> >
> > This ties interface to an internal macro value.  Further, user gets to
> > tweak other fields in this integer which we don't want.  Finally, the
> > interface is extremely unfriendly.
> > Please use a bit property instead: DEFINE_PROP_BIT.
> 
> Will fix in v3.
> 
> >> diff --git a/hw/virtio.c b/hw/virtio.c
> >> index a2a657e..f588e29 100644
> >> --- a/hw/virtio.c
> >> +++ b/hw/virtio.c
> >> @@ -582,6 +582,11 @@ void virtio_queue_notify(VirtIODevice *vdev, int n)
> >>      }
> >>  }
> >>
> >> +void virtio_queue_notify_vq(VirtQueue *vq)
> >> +{
> >> +    virtio_queue_notify(vq->vdev, vq - vq->vdev->vq);
> >
> > Let's implement virtio_queue_notify in terms of virtio_queue_notify_vq.
> > Not the other way around.
> 
> Will fix in v3.
> 
> Stefan



Re: [Qemu-devel] Re: [PATCH 2/3] virtio-pci: Use ioeventfd for virtqueue notify

2010-11-12 Thread Stefan Hajnoczi
On Thu, Nov 11, 2010 at 4:45 PM, Christoph Hellwig  wrote:
> On Thu, Nov 11, 2010 at 01:47:21PM +, Stefan Hajnoczi wrote:
>> Some virtio devices are known to have guest drivers which expect a notify to 
>> be
>> processed synchronously and spin waiting for completion.  Only enable 
>> ioeventfd
>> for virtio-blk and virtio-net for now.
>
> Who guarantees that less common virtio-blk and virtio-net guest drivers
> for non-Linux OSes are fine with it?  Maybe you should add a feature flag
> that the guest has to ACK to enable it.

Virtio-blk and virtio-net are fine.  Both of those devices are
expected to operate asynchronously.  SeaBIOS and gPXE virtio-net
drivers spin but they expect to and it is okay in those environments.
They already burn CPU today.

Virtio-console expects synchronous virtqueue kick.  In Linux,
virtio_console.c __send_control_msg() and send_buf() will spin.  Qemu
userspace is able to complete those requests synchronously so that the
guest never actually burns CPU (e.g.
hw/virtio-serial-bus.c:send_control_msg()).  I don't want to burn CPU
in places where we previously didn't.

It's good that QEMU can decide whether or not to handle virtqueue kick
in the vcpu thread.  For high performance asynchronous devices like
virtio-net and virtio-blk it makes sense to use ioeventfd.  For others
it may not be useful.  I'm not sure a feature bit that exposes this
detail to the guest would be useful.

Stefan



[Qemu-devel] Re: [PATCH 6/8] device-assignment: Move PCI capabilities to match physical hardware

2010-11-12 Thread Michael S. Tsirkin
On Thu, Nov 11, 2010 at 07:56:13PM -0700, Alex Williamson wrote:
> Now that common PCI code doesn't have a hangup on capabilities
> being contiguous,

Hmm, this comment confused me : there's no requirement of
contigious allocations in current code in pci.c, is there?

-- 
MST



[Qemu-devel] Re: Cannot not unplug cold-plugged devices

2010-11-12 Thread Michael S. Tsirkin
On Fri, Nov 12, 2010 at 04:24:11PM +0900, Isaku Yamahata wrote:
> On Thu, Nov 11, 2010 at 11:29:39PM -0700, Cam Macdonell wrote:
> > Hi,
> > 
> > I was trying to do a "device_del" on my ivshmem device and it won't
> > work unless the device is added via hotplug.  If the device is
> > coldplugged (added at startup) then nothing happens.  I think I
> > tracked this behaviour to the patch below.
> > 
> > Is not allowing coldplugged devices to be unplugged the desired behaviour?
> 
> Oh, my bad. Does the following patch help?
> 
> >From 45914b2fc95750b65685dfb98a435f58e38b45ba Mon Sep 17 00:00:00 2001
> Message-Id: 
> <45914b2fc95750b65685dfb98a435f58e38b45ba.1289546596.git.yamah...@valinux.co.jp>
> From: Isaku Yamahata 
> Date: Fri, 12 Nov 2010 16:21:35 +0900
> Subject: [PATCH] acpi/piix4: allow pci hotplug for cold plugged device.
> 
> This patch fixes 5beb8ad503c88a76f2b8106c3b74b4ce485a60e1
> reported by Cam Macdonell .
> Before the change set, cold plugged device can be hot unplugged.
> This patch unbreaks it.
> 
> Signed-off-by: Isaku Yamahata 



> ---
>  hw/acpi_piix4.c |4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> index 66c7885..bca330e 100644
> --- a/hw/acpi_piix4.c
> +++ b/hw/acpi_piix4.c
> @@ -621,8 +621,10 @@ static int piix4_device_hotplug(DeviceState *qdev, 
> PCIDevice *dev, int state)
>  PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev,
>  DO_UPCAST(PCIDevice, qdev, qdev));
>  
> -if (!dev->qdev.hotplugged)
> +/* qemu initialization case */

That comment is pretty obscure :)


> +if (state && !dev->qdev.hotplugged) {
>  return 0;
> +}

Do we even need this check at all?

>  
>  s->pci0_status.up = 0;
>  s->pci0_status.down = 0;
> -- 
> 1.7.1.1
> 
> 
> 
> -- 
> yamahata



[Qemu-devel] Re: [PATCH 2/3] virtio-pci: Use ioeventfd for virtqueue notify

2010-11-12 Thread Stefan Hajnoczi
On Thu, Nov 11, 2010 at 3:53 PM, Michael S. Tsirkin  wrote:
> On Thu, Nov 11, 2010 at 01:47:21PM +, Stefan Hajnoczi wrote:
>> Care must be taken not to interfere with vhost-net, which already uses
>> ioeventfd host notifiers.  The following list shows the behavior implemented 
>> in
>> this patch and is designed to take vhost-net into account:
>>
>>  * VIRTIO_CONFIG_S_DRIVER_OK -> assign host notifiers, 
>> qemu_set_fd_handler(virtio_pci_host_notifier_read)
>
> we should also deassign when VIRTIO_CONFIG_S_DRIVER_OK is cleared
> by io write or bus master bit?

You're right, I'll fix the lifecycle to trigger symmetrically on
status bit changes rather than VIRTIO_CONFIG_S_DRIVER_OK/reset.

>> +static void virtio_pci_reset_vdev(VirtIOPCIProxy *proxy)
>> +{
>> +    /* Poke virtio device so it deassigns its host notifiers (if any) */
>> +    virtio_set_status(proxy->vdev, 0);
>
> Hmm. virtio_reset already sets status to 0.
> I guess it should just be fixed to call virtio_set_status?

This part is ugly.  The problem is that virtio_reset() calls
virtio_set_status(vdev, 0) but doesn't give the transport binding a
chance clean up after the virtio device has cleaned up.  Since
virtio-net will spot status=0 and deassign its host notifier, we need
to perform our own clean up after vhost.

What makes this slightly less of a hack is the fact that virtio-pci.c
was already causing virtio_set_status(vdev, 0) to be invoked twice
during reset.  When 0 is written to the VIRTIO_PCI_STATUS register, we
do virtio_set_status(proxy->vdev, val & 0xFF) and then
virtio_reset(proxy->vdev).  So the status byte callback already gets
invoked twice today.

I've just split this out into virtio_pci_reset_vdev() and (ab)used it
to correctly clean up virtqueue ioeventfd.

The alternative is to add another callback from virtio.c so we are
notified after the vdev's reset callback has finished.

>> @@ -223,10 +322,16 @@ static void virtio_ioport_write(void *opaque, uint32_t 
>> addr, uint32_t val)
>>          virtio_queue_notify(vdev, val);
>>          break;
>>      case VIRTIO_PCI_STATUS:
>> -        virtio_set_status(vdev, val & 0xFF);
>> -        if (vdev->status == 0) {
>> -            virtio_reset(proxy->vdev);
>> -            msix_unuse_all_vectors(&proxy->pci_dev);
>> +        if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
>> +            !(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
>> +            (proxy->flags & VIRTIO_PCI_FLAG_USE_IOEVENTFD)) {
>> +            virtio_pci_set_host_notifiers(proxy, true);
>> +        }
>
> So we set host notifiers to true from here, but to false
> only on reset? This seems strange. Should not we disable
> notifiers when driver clears OK status?
> How about on bus master disable?

You're right, this needs to be fixed.

>> @@ -714,6 +803,8 @@ static PCIDeviceInfo virtio_info[] = {
>>          .exit       = virtio_net_exit_pci,
>>          .romfile    = "pxe-virtio.bin",
>>          .qdev.props = (Property[]) {
>> +            DEFINE_PROP_UINT32("flags", VirtIOPCIProxy, flags,
>> +                               VIRTIO_PCI_FLAG_USE_IOEVENTFD),
>>              DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 3),
>>              DEFINE_VIRTIO_NET_FEATURES(VirtIOPCIProxy, host_features),
>>              DEFINE_NIC_PROPERTIES(VirtIOPCIProxy, nic),
>
> This ties interface to an internal macro value.  Further, user gets to
> tweak other fields in this integer which we don't want.  Finally, the
> interface is extremely unfriendly.
> Please use a bit property instead: DEFINE_PROP_BIT.

Will fix in v3.

>> diff --git a/hw/virtio.c b/hw/virtio.c
>> index a2a657e..f588e29 100644
>> --- a/hw/virtio.c
>> +++ b/hw/virtio.c
>> @@ -582,6 +582,11 @@ void virtio_queue_notify(VirtIODevice *vdev, int n)
>>      }
>>  }
>>
>> +void virtio_queue_notify_vq(VirtQueue *vq)
>> +{
>> +    virtio_queue_notify(vq->vdev, vq - vq->vdev->vq);
>
> Let's implement virtio_queue_notify in terms of virtio_queue_notify_vq.
> Not the other way around.

Will fix in v3.

Stefan



[Qemu-devel] Re: [RFC PATCH 8/8] device-assignment: pass through and stub more PCI caps

2010-11-12 Thread Michael S. Tsirkin
On Thu, Nov 11, 2010 at 11:30:07PM -0700, Alex Williamson wrote:
> On Fri, 2010-11-12 at 07:36 +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 11, 2010 at 07:56:46PM -0700, Alex Williamson wrote:
> > > Some drivers depend on finding capabilities like power management,
> > > PCI express/X, vital product data, or vendor specific fields.  Now
> > > that we have better capability support, we can pass more of these
> > > tables through to the guest.  Note that VPD and VNDR are direct pass
> > > through capabilies, the rest are mostly empty shells with a few
> > > writable bits where necessary.
> > > 
> > > Signed-off-by: Alex Williamson 
> > > ---
> > > 
> > >  hw/device-assignment.c |  160 
> > > +---
> > >  1 files changed, 149 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > index 179c7dc..1b228ad 100644
> > > --- a/hw/device-assignment.c
> > > +++ b/hw/device-assignment.c
> > > @@ -366,6 +366,27 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice 
> > > *d, int pos)
> > >  return (uint8_t)assigned_dev_pci_read(d, pos, 1);
> > >  }
> > >  
> > > +static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t val, 
> > > int len)
> > > +{
> > > +AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
> > > +ssize_t ret;
> > > +int fd = pci_dev->real_device.config_fd;
> > > +
> > > +again:
> > > +ret = pwrite(fd, &val, len, pos);
> > > +if (ret != len) {
> > > + if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
> > > + goto again;
> > 
> > 
> > do {} while() ?
> 
> Sure, this is just a copy of another place that does something similar.
> They should either be merged or both converted in a separate patch.
> 
> > > +
> > > + fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
> > > + __func__, ret, errno);
> > > +
> > > + exit(1);
> > > +}
> > > +
> > > +return;
> > > +}
> > > +
> > >  static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap)
> > >  {
> > >  int id;
> > > @@ -1244,37 +1265,75 @@ static void assigned_dev_update_msix(PCIDevice 
> > > *pci_dev, unsigned int ctrl_pos)
> > >  #endif
> > >  #endif
> > >  
> > > +static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
> > > +uint8_t cap_id,
> > > +uint32_t address, 
> > > int len)
> > > +{
> > > +uint8_t cap;
> > > +
> > > +switch (cap_id) {
> > > +
> > > +case PCI_CAP_ID_VPD:
> > > +cap = pci_find_capability(pci_dev, cap_id);
> > > +if (address - cap >= PCI_CAP_FLAGS) {
> > > +return assigned_dev_pci_read(pci_dev, address, len);
> > > +}
> > > +break;
> > > +
> > > +case PCI_CAP_ID_VNDR:
> > > +cap = pci_find_capability(pci_dev, cap_id);
> > > +if (address - cap > PCI_CAP_FLAGS) {
> > > +return assigned_dev_pci_read(pci_dev, address, len);
> > > +}
> > > +break;
> > > +}
> > > +
> > > +return pci_default_cap_read_config(pci_dev, cap_id, address, len);
> > > +}
> > > +
> > >  static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
> > >   uint8_t cap_id,
> > >   uint32_t address,
> > >   uint32_t val, int len)
> > >  {
> > > +uint8_t cap;
> > > +
> > >  pci_default_cap_write_config(pci_dev, cap_id, address, val, len);
> > >  
> > >  switch (cap_id) {
> > >  #ifdef KVM_CAP_IRQ_ROUTING
> > >  case PCI_CAP_ID_MSI:
> > >  #ifdef KVM_CAP_DEVICE_MSI
> > > -{
> > > -uint8_t cap = pci_find_capability(pci_dev, cap_id);
> > > -if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
> > > -assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
> > > -}
> > > +cap = pci_find_capability(pci_dev, cap_id);
> > > +if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
> > > +assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
> > >  }
> > >  #endif
> > >  break;
> > >  
> > >  case PCI_CAP_ID_MSIX:
> > >  #ifdef KVM_CAP_DEVICE_MSIX
> > > -{
> > > -uint8_t cap = pci_find_capability(pci_dev, cap_id);
> > > -if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 
> > > 1)) {
> > > -assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
> > > -}
> > > +cap = pci_find_capability(pci_dev, cap_id);
> > > +if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
> > > +assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
> > >  }
> > >  #endif
> > >  break;
> > >  #endif
> > > +
> > > +case PCI_CAP_ID_VPD:
> > > +cap = pci_find_capability(pci_dev

[Qemu-devel] Re: [PATCH 4/8] pci: Replace used bitmap with capability byte map

2010-11-12 Thread Michael S. Tsirkin
On Thu, Nov 11, 2010 at 11:07:15PM -0700, Alex Williamson wrote:
> On Fri, 2010-11-12 at 07:40 +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 11, 2010 at 07:55:43PM -0700, Alex Williamson wrote:
> > > Capabilities are allocated in bytes, so we can track both whether
> > > a byte is used and by what capability in the same structure.
> > > 
> > > Remove pci_reserve_capability() as there are no users.
> > > 
> > > Signed-off-by: Alex Williamson 
> > 
> > I actually wanted to remove the used array completely and ask
> > all users to add offsets directly.
> > Will this be needed then?
> 
> Can you give an example, I don't understand what you mean by asking
> users to add offsets directly.

Here's a dump of patch in progress.
Something like the below (untested).  After applying this we can remove
the whole used array allocator.

>  I think some kind of tracking what's
> where in config space needs to be done somewhere and the common PCI code
> seems like it'd be the place.

Why do we need it? config lets us scan the capability list
readily enough. I had this idea that we should pack
capabilities in config space, but now I think it's silly.

>  Thanks,
> 
> Alex

pci: remove config space allocator

pci supports allocating caps in config space so they are packed tightly.
This doesn't seem to be useful, especially since caps must stay at fixed
offsets to ensure backwards compatibility (e.g. for migration).

Remove this support, and make virtio-pci supply the offset to
the MSIX capability explicitly.

Signed-off-by: Michael S. Tsirkin 

diff --git a/hw/msix.c b/hw/msix.c
index f66d255..6fd3791 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -52,7 +52,7 @@ int msix_supported;
  * Original bar size must be a power of 2 or 0.
  * New bar size is returned. */
 static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
-   unsigned bar_nr, unsigned bar_size)
+   unsigned offset, unsigned bar_nr, unsigned bar_size)
 {
 int config_offset;
 uint8_t *config;
@@ -75,7 +75,7 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned 
short nentries,
 
 pdev->msix_bar_size = new_size;
 config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX,
-   0, MSIX_CAP_LENGTH);
+   offset, MSIX_CAP_LENGTH);
 if (config_offset < 0)
 return config_offset;
 config = pdev->config + config_offset;
@@ -237,7 +237,7 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned 
nentries)
 /* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size is
  * modified, it should be retrieved with msix_bar_size. */
 int msix_init(struct PCIDevice *dev, unsigned short nentries,
-  unsigned bar_nr, unsigned bar_size)
+  unsigned offset, unsigned bar_nr, unsigned bar_size)
 {
 int ret;
 /* Nothing to do if MSI is not supported by interrupt controller */
@@ -261,7 +261,7 @@ int msix_init(struct PCIDevice *dev, unsigned short 
nentries,
 }
 
 dev->msix_entries_nr = nentries;
-ret = msix_add_config(dev, nentries, bar_nr, bar_size);
+ret = msix_add_config(dev, nentries, offset, bar_nr, bar_size);
 if (ret)
 goto err_config;
 
diff --git a/hw/msix.h b/hw/msix.h
index a9f7993..b61e42e 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -5,7 +5,7 @@
 #include "pci.h"
 
 int msix_init(PCIDevice *pdev, unsigned short nentries,
-  unsigned bar_nr, unsigned bar_size);
+  unsigned offset, unsigned bar_nr, unsigned bar_size);
 
 void msix_write_config(PCIDevice *pci_dev, uint32_t address,
uint32_t val, int len);
diff --git a/hw/pci.c b/hw/pci.c
index aed2d42..e411f12 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1737,12 +1737,7 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
uint8_t offset, uint8_t size)
 {
 uint8_t *config;
-if (!offset) {
-offset = pci_find_space(pdev, size);
-if (!offset) {
-return -ENOSPC;
-}
-}
+assert(offset >= PCI_CONFIG_HEADER_SIZE);
 
 config = pdev->config + offset;
 config[PCI_CAP_LIST_ID] = cap_id;
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 729917d..bcb0266 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -543,7 +543,8 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, 
VirtIODevice *vdev,
 
 config[0x3d] = 1;
 
-if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors, 1, 0)) {
+if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors,
+ PCI_CONFIG_HEADER_SIZE, 1, 0)) {
 pci_register_bar(&proxy->pci_dev, 1,
  msix_bar_size(&proxy->pci_dev),
  PCI_BASE_ADDRESS_SPACE_MEMORY,



  1   2   >