Re: [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec

2014-06-26 Thread Cam Macdonell
Hi,

Thank you for everyone's interest and work on this.  Sorry I haven't
been...better.  I will offer my knowledge where it helps.  And the server
is GPL in case that was seen as an issue.

On Mon, Jun 23, 2014 at 8:18 AM, Claudio Fontana claudio.font...@huawei.com
 wrote:

 Hi,

 we were reading through this quickly today, and these are some of the
 questions that
 we think can came up when reading this. Answers to some of these questions
 we think
 we have figured out, but I think it's important to put this information
 into the
 documentation.

 I will quote the file in its entirety, and insert some questions inline.

  Device Specification for Inter-VM shared memory device
  --
 
  The Inter-VM shared memory device is designed to share a region of
 memory to
  userspace in multiple virtual guests.

 What does to userspace mean in this context? The userspace of the host,
 or the userspace in the guest?


The memory is intended to be shared between userspaces in the guests.
 However, since the memory is POSIX shm region, it is visible on the host
too.



 What about The Inter-VM shared memory device is designed to share a
 memory region (created on the host via the POSIX shared memory API) between
 multiple QEMU processes running different guests. In order for all guests
 to be able to pick up the shared memory area, it is modeled by QEMU as a
 PCI device exposing said memory to the guest as a PCI BAR.

 Whether in those guests the memory region is used in kernel space or
 userspace, or there is even any meaning for those terms is guest-dependent
 I would think (I think of an OSv here, where the application and kernel
 execute at the same privilege level and in the same address space).


I'm not exactly clear what you're asking here.  The region is visible to
both the guest kernel and userspace (once mounted).



  The memory region does not belong to any
  guest, but is a POSIX memory object on the host.

 Ok that's clear.
 One thing I would ask is, but I don't know if it makes sense to mention
 here, is who creates this memory object on the host?

I understand in some cases it's the contributed server (what you provide in
 contrib/), in some cases it's the user of this device who has to write
 some server code for that, but is it true that also the qemu process itself
 can create this memory object on its own, without any external process
 needed? Is this the use case for host-guest only?


(Answering based on my original server code) When using the server, the
server creates it.  Without the server, each qemu process will check if it
exists and if it does, it will use it.  If it does not exist, the qemu
process will create it.


  Optionally, the device may
  support sending interrupts to other guests sharing the same memory
 region.

 This opens a lot of questions here which are partly answered later (If I
 understand correctly, not only interrupts are involved, but a complete
 communication protocol involving registers in BAR0), but what about staying
 a bit general here, like
 Optionally, the device may also provide a communication mechanism between
 guests sharing the same memory region. More details about that in the
 section 'OPTIONAL ivshmem guest to guest communication protocol'.

 Thinking out loud, I wonder if this communication mechanism should be part
 of this device in QEMU, or it should be provided at another layer..


 
 
  The Inter-VM PCI device
  ---
 
  *BARs*
 
  The device supports three BARs.  BAR0 is a 1 Kbyte MMIO region to support
  registers.  BAR1 is used for MSI-X when it is enabled in the device.
  BAR2 is
  used to map the shared memory object from the host.  The size of BAR2 is
  specified when the guest is started and must be a power of 2 in size.

 Are BAR0 and BAR1 optional? That's what I would think by reading the
 whole, but I'm still not sure.
 Am I forced to map BAR0 and BAR1 anyway? I don't think so, but..


They do not need to be mapped.  You do not need to map them if you don't
want to use them.


 If so, can we separate the explanation into the base shared memory
 feature, and a separate section which explains the OPTIONAL communication
 mechanism, and the OPTIONAL MSI-X BAR?

 For example, say that I am a potential ivshmem user (which I am), and I am
 interested in the shared memory but I want to use my own communication
 mechanism and protocol between guests, can we make it so that I don't have
 to wonder whether some of the info I read applies or not?
 The solution to that I think is to put all the OPTIONAL parts into
 separate sections.

 
  *Registers*

 Ok, so this should I think go into one such OPTIONAL sections.

 
  The device currently supports 4 registers of 32-bits each.  Registers
  are used for synchronization between guests sharing the same memory
 object when
  interrupts are supported (this requires using the shared memory server).

 So use of BAR0 goes together with 

Re: [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec

2014-06-26 Thread Cam Macdonell
Hi Vince,

Yes, I did see the patches for the new server.  I will review within the
week.

Cheers,
Cam


On Thu, Jun 26, 2014 at 9:37 AM, Vincent JARDIN vincent.jar...@6wind.com
wrote:

 Hi Cam,

 FYI, David did implement a new server.
   http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg04978.html

 which is easier to maintain.

 Please, could you review his patch? He'll be back from holiday within 1
 week.

 Best regards,
   Vincent

 PS: thanks for your comments




Re: [Qemu-devel] Why I advise against using ivshmem

2014-06-26 Thread Cam Macdonell
Hello,

Just to add my two bits.

I will fully support getting all the necessary parts of ivshmem into tree
where appropriate, both qemu and a driver in Linux.  I understand those
concerns.

I do not have the time to fully maintain ivshmem at the level needed, but I
will help as much as I can.

Sorry for the delay in contributing to this conversation.

Cheers,
Cam


On Sat, Jun 21, 2014 at 3:34 AM, Stefan Hajnoczi stefa...@gmail.com wrote:

 On Wed, Jun 18, 2014 at 10:57 PM, David Marchand
 david.march...@6wind.com wrote:
  On 06/18/2014 12:48 PM, Stefan Hajnoczi wrote:
 
  One more thing to add to the list:
 
  static void ivshmem_read(void *opaque, const uint8_t * buf, int flags)
 
  The flags argument should be size.  Size should be checked before
  accessing buf.
 
 
  You are welcome to send a fix and I will review it.

 I don't plan to send ivshmem patches in the near future because I
 don't use or support it.

 I thought you were interested in bringing ivshmem up to a level where
 distros feel comfortable enabling and supporting it.  Getting there
 will require effort from you to audit, clean up, and achieve test
 coverage.  That's what a maintainer needs to do in a case like this.

 Stefan
 ___
 Virtualization mailing list
 virtualizat...@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/virtualization




Re: [Qemu-devel] [Bug 1314857] Re: seg fault in ivshmem when using ioeventfd=on

2014-05-15 Thread Cam Macdonell
Hello,

The patch for this later bug has been proposed.   I'm not sure why it's not
merged.

http://patchwork.ozlabs.org/patch/316785/

Cheers,
Cam


On Thu, May 1, 2014 at 10:53 AM, Gene Snider g...@cvtt.net wrote:

 When I tried the same thing with git master (latest)  I get a different
 error:
 qemu_chr_fe_claim_no_fail: error chardev (null) already used

 ** Also affects: qemu-kvm (Ubuntu)
Importance: Undecided
Status: New

 --
 You received this bug notification because you are a member of qemu-
 devel-ml, which is subscribed to QEMU.
 https://bugs.launchpad.net/bugs/1314857

 Title:
   seg fault in ivshmem when using ioeventfd=on

 Status in QEMU:
   New
 Status in “qemu-kvm” package in Ubuntu:
   New

 Bug description:
   When launching qemu with the ivshmem device and the nahanni guest
   server there is segmentation fault in the setup_ioeventfds function of
   ivshmem.c. If the ioeventfd=on flag is set the pci_ivshmem_init will
   call setup_ioeventfds at line 668. This function relies on the 'peers'
   member of the server info which is not allocated until line 669.

   To reproduce you will need the nahanni guest server code. The driver
   code is not needed. You will also need a qcow2 or other bootable image
   to use for launching qemu. The error occurs before the actual image
   launch.

   Start the nahanni ivshmem server with a small global memory space (
 although the bug is not allocation specific )
   ivshmem -m 1 -n 2 -p /tmp/ivshmem_socket

   Next launch qemu with initialization for the ivshmem device.
   qemu-system-x86_64 -hda test_iso.qcow2 -localtime -boot c -chardev
 socket,path=/tmp/ivshmem_socket,id=ivshmem_socket -device
 ivshmem,chardev=ivshmem_socket,size=1,ioeventfd=on

   If gdb is used the following error is recorded:
   Program received signal SIGSEGV, Segmentation fault.
   0x5579dd52 in setup_ioeventfds (s=0x56619580)
   at /home/genes/work/ubuntu/qemu-kvm-1.0+noroms/hw/ivshmem.c:367
   367 for (j = 0; j  s-peers[i].nb_eventfds; j++) {
   (gdb) print s-peers
   $2 = (Peer *) 0x0

 To manage notifications about this bug go to:
 https://bugs.launchpad.net/qemu/+bug/1314857/+subscriptions




Re: [Qemu-devel] [PATCH RFC] char: fix avail_connections init in qemu_chr_open_eventfd()

2014-02-13 Thread Cam Macdonell
Hi David,

I'm not sure which is the correct approach.  You could either do what you
did or you could simply remove the qemu_chr_fe_claim_no_fail() from
ivshmem.c.  I'm not sure how your change impacts other devices.

Sincerely,
Cam



On Tue, Feb 4, 2014 at 2:17 PM, David Marchand david.march...@6wind.comwrote:

 Hello,

 First of all, this is a pure RFC patch, I did not take too much time to
 dig into
 qemu source code to find the right solution, but since
 qemu_chr_open_eventfd()
 is only used by the code I was looking at, here is a patch.


 When trying to use a ivshmem server with qemu, ivshmem init code tries to
 create a CharDriverState object for each eventfd retrieved from the server.
 To create this object, a call to qemu_chr_open_eventfd() is done.
 Right after this, before adding a frontend, qemu_chr_fe_claim_no_fail() is
 called.
 qemu_chr_open_eventfd() does not set avail_connections to 1, so no
 frontend can
 be associated because qemu_chr_fe_claim_no_fail() makes qemu stop right
 away.

 I suppose this problem comes from 456d60692310e7ac25cf822cc1e98192ad636ece
 qemu-char: Call fe_claim / fe_release when not using qdev chr properties.

 Fix this, by setting avail_connections to 1 in qemu_chr_open_eventfd().

 Signed-off-by: David Marchand david.march...@6wind.com
 ---
  qemu-char.c |7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

 diff --git a/qemu-char.c b/qemu-char.c
 index 30c5a6a..c0adb04 100644
 --- a/qemu-char.c
 +++ b/qemu-char.c
 @@ -2492,7 +2492,12 @@ static gboolean tcp_chr_read(GIOChannel *chan,
 GIOCondition cond, void *opaque)
  #ifndef _WIN32
  CharDriverState *qemu_chr_open_eventfd(int eventfd)
  {
 -return qemu_chr_open_fd(eventfd, eventfd);
 +CharDriverState *chr = qemu_chr_open_fd(eventfd, eventfd);
 +
 +if (chr)
 +chr-avail_connections = 1;
 +
 +return chr;
  }
  #endif

 --
 David Marchand





Re: [Qemu-devel] help me for nahanni device

2013-07-03 Thread Cam Macdonell
Hello,

Can you please try with the most recent version?  I would suggest building
from the source code which can be found here:

http://qemu-project.org/Main_Page

Sincerely,
Cam


On Tue, Jul 2, 2013 at 7:50 PM, DAI Weibin
weibin@alcatel-sbell.com.cnwrote:

  Hello Cam,

 I am honored that receive your reply.

 ** **

 1 the QEMU version is showed as following, the version is 1.0.50

 *[root@node1 sh-vm]# /opt/qemu-kvm-rt/x86_64-softmmu/qemu-system-x86_64
 -version*

 *QEMU emulator version 1.0.50 (qemu-kvm-devel), Copyright (c) 2003-2008
 Fabrice Bellard*

 *[root@node1 sh-vm]#*

 * *

 2 ”QEMU/KVM doesn't work”

 My mean is that qemu has been start, but I can’t log on VM or ping VM IP.
 

 When only attach a nahanni device, all are ok.

 ** **

 Thank you.

 ** **

 Dai weibin

 ** **

 *From:* c...@ualberta.ca [mailto:c...@ualberta.ca] *On Behalf Of *Cam
 Macdonell
 *Sent:* 2013年7月3日 0:20
 *To:* DAI Weibin
 *Cc:* qemu-devel@nongnu.org
 *Subject:* Re: [Qemu-devel] help me for nahanni device

 ** **

 Hello Dai,

 ** **

 Which qemu version you are using?  When you say QEMU/KVM doesn't work,
 do you mean it does not start?

 ** **

 Cam

 ** **

 On Mon, Jul 1, 2013 at 12:39 AM, DAI Weibin 
 weibin@alcatel-sbell.com.cn wrote:

 Hi all,

 When use nahanni device as following, The QEMU/KVM doesn’t work.

 I attach two ivshmem devices to VM, QEMU/KVM only supports a ivshmem
 device, why?

 Expect your reply!

  

  

 *-device
 virtio-net-pci,vlan=0,id=net0,mac=DE:AD:BE:09:03:01,bus=pci.0,addr=0x5 \**
 ***

 *-net tap,script=/etc/qemu-ifup \*

 *-device ivshmem,size=256m,shm=l1shm \*

 *-device ivshmem,size=256m,shm=l2shm \*

 *-usbdevice tablet \*

 *-vnc :2 \*

 *-daemonize*

  

 Best regards

 戴卫彬

  

  

  

 ** **



Re: [Qemu-devel] help me for nahanni device

2013-07-02 Thread Cam Macdonell
Hello Dai,

Which qemu version you are using?  When you say QEMU/KVM doesn't work, do
you mean it does not start?

Cam


On Mon, Jul 1, 2013 at 12:39 AM, DAI Weibin weibin@alcatel-sbell.com.cn
 wrote:

  Hi all,
 When use nahanni device as following, The QEMU/KVM doesn’t work.
 I attach two ivshmem devices to VM, QEMU/KVM only supports a ivshmem
 device, why?
 Expect your reply!


 *-device
 virtio-net-pci,vlan=0,id=net0,mac=DE:AD:BE:09:03:01,bus=pci.0,addr=0x5 \*
 *-net tap,script=/etc/qemu-ifup \*
 *-device ivshmem,size=256m,shm=l1shm \*
 *-device ivshmem,size=256m,shm=l2shm \*
 *-usbdevice tablet \*
 *-vnc :2 \*
 *-daemonize*

 Best regards
 戴卫彬






Re: [Qemu-devel] [PATCH v2 2/2] ivshmem: use irqfd to interrupt among VMs

2012-12-05 Thread Cam Macdonell
On Wed, Dec 5, 2012 at 1:50 AM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-12-05 06:34, Cam Macdonell wrote:
  static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address,
  uint32_t val, int len)
  {
 +bool is_enabled, was_enabled = msi_enabled(pci_dev);
 +
  pci_default_write_config(pci_dev, address, val, len);
 +is_enabled = msi_enabled(pci_dev);

 Problem 1)  in my tests is_enabled is always 0, so I don't think the
 irqfds are getting setup

 You likely want to call msix_enabled here.

Yup, that gets it working.

Liu Ping, can you update the patch to use msix_enabled()?

Also, it seems that with irqfd enabled the user-level handlers are not
triggered, but it may still be a better idea to not add the user-level
handlers to the char devices at all if irqfd is enabled.

Cam


 Jan

 --
 Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
 Corporate Competence Center Embedded Linux




Re: [Qemu-devel] [PATCH v2 2/2] ivshmem: use irqfd to interrupt among VMs

2012-12-04 Thread Cam Macdonell
On Tue, Dec 4, 2012 at 4:10 AM, Andrew Jones drjo...@redhat.com wrote:


 - Original Message -
 On Thu, Nov 29, 2012 at 1:34 AM, liu ping fan qemul...@gmail.com
 wrote:
  On Thu, Nov 29, 2012 at 12:42 PM, Cam Macdonell
  c...@cs.ualberta.ca wrote:
  On Tue, Nov 27, 2012 at 7:53 PM, liu ping fan qemul...@gmail.com
  wrote:
  On Wed, Nov 28, 2012 at 5:48 AM, Cam Macdonell
  c...@cs.ualberta.ca wrote:
  On Sat, Nov 24, 2012 at 8:51 PM, Liu Ping Fan
  qemul...@gmail.com wrote:
  From: Liu Ping Fan pingf...@linux.vnet.ibm.com
 
  Using irqfd, so we can avoid switch between kernel and user
  when
  VMs interrupts each other.
 
  Nice work.  Due to a hardware failure, there will be a small
  delay in
  me being able to test this.  I'll follow up as soon as I can.
 
  BTW where can I find the latest guest code for test?
  I got the guest code from
  git://gitorious.org/nahanni/guest-code.git.
  But it seems outdated, after fixing the unlocked_ioctl, and vm-id
  bits
  position (the codes conflict with ivshmem spec), it works (I have
  tested for V1).
 
  Hello,
 
  Which device driver are you using?
 
  guest-code/kernel_module/standard/kvm_ivshmem.c

 The uio driver is the recommended one, however if you want to use the
 kvm_ivshmem one and have it working, then feel free to continue.

 If the uio driver is the recommended one, then can you please post it
 to lkml? It should be integrated into drivers/virt with an appropriate
 Kconfig update.


Sure.  Should it go under drivers/virt or drivers/uio?  It seems the
uio drivers all get grouped together.

Thanks,
Cam

 Thanks,
 Drew


 I had deleted it form the repo, but some users had based solutions
 off
 it, so I added it back.

 btw, my hardware issue has been resolved, so I'll get to testing your
 patch soon.

 Sincerely,
 Cam

 
  Cam
 
 
  Regards,
  Pingfan
 
 
  Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
  ---
   hw/ivshmem.c |   54
   +-
   1 files changed, 53 insertions(+), 1 deletions(-)
 
  diff --git a/hw/ivshmem.c b/hw/ivshmem.c
  index 7c8630c..5709e89 100644
  --- a/hw/ivshmem.c
  +++ b/hw/ivshmem.c
  @@ -19,6 +19,7 @@
   #include hw.h
   #include pc.h
   #include pci.h
  +#include msi.h
   #include msix.h
   #include kvm.h
   #include migration.h
  @@ -83,6 +84,7 @@ typedef struct IVShmemState {
   uint32_t vectors;
   uint32_t features;
   EventfdEntry *eventfd_table;
  +int *vector_virqs;
 
   Error *migration_blocker;
 
  @@ -625,16 +627,62 @@ static int ivshmem_load(QEMUFile* f, void
  *opaque, int version_id)
   return 0;
   }
 
  +static int ivshmem_vector_use(PCIDevice *dev, unsigned vector,
  + MSIMessage msg)
  +{
  +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
  +int virq;
  +EventNotifier *n = s-peers[s-vm_id].eventfds[vector];
  +
  +virq = kvm_irqchip_add_msi_route(kvm_state, msg);
  +if (virq = 0  kvm_irqchip_add_irqfd_notifier(kvm_state,
  n, virq) = 0) {
  +s-vector_virqs[vector] = virq;
  +qemu_chr_add_handlers(s-eventfd_chr[vector], NULL,
  NULL, NULL, NULL);
  +} else if (virq = 0) {
  +kvm_irqchip_release_virq(kvm_state, virq);
  +error_report(ivshmem, can not setup irqfd\n);
  +return -1;
  +} else {
  +error_report(ivshmem, no enough msi route to setup
  irqfd\n);
  +return -1;
  +}
  +
  +return 0;
  +}
  +
  +static void ivshmem_vector_release(PCIDevice *dev, unsigned
  vector)
  +{
  +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
  +EventNotifier *n = s-peers[s-vm_id].eventfds[vector];
  +int virq = s-vector_virqs[vector];
  +
  +if (s-vector_virqs[vector] = 0) {
  +kvm_irqchip_remove_irqfd_notifier(kvm_state, n, virq);
  +kvm_irqchip_release_virq(kvm_state, virq);
  +s-vector_virqs[vector] = -1;
  +}
  +}
  +
   static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t
   address,
   uint32_t val, int len)
   {
  +bool is_enabled, was_enabled = msi_enabled(pci_dev);
  +
   pci_default_write_config(pci_dev, address, val, len);
  +is_enabled = msi_enabled(pci_dev);
  +if (!was_enabled  is_enabled) {
  +msix_set_vector_notifiers(pci_dev, ivshmem_vector_use,
  +ivshmem_vector_release);
  +} else if (was_enabled  !is_enabled) {
  +msix_unset_vector_notifiers(pci_dev);
  +}
   }
 
   static int pci_ivshmem_init(PCIDevice *dev)
   {
   IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
   uint8_t *pci_conf;
  +int i;
 
   if (s-sizearg == NULL)
   s-ivshmem_size = 4  20; /* 4 MB default */
  @@ -758,7 +806,10 @@ static int pci_ivshmem_init(PCIDevice
  *dev)
   }
 
   s-dev.config_write = ivshmem_write_config;
  -
  +s-vector_virqs = g_new0(int, s-vectors);
  +for (i = 0; i  s-vectors; i++) {
  +s-vector_virqs[i] = -1

Re: [Qemu-devel] [PATCH v2 2/2] ivshmem: use irqfd to interrupt among VMs

2012-12-04 Thread Cam Macdonell
On Sat, Nov 24, 2012 at 8:51 PM, Liu Ping Fan qemul...@gmail.com wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com

 Using irqfd, so we can avoid switch between kernel and user when
 VMs interrupts each other.


Hi Liu Ping,

With this patch applied I was still seeing transitions to user-level
on the receipt of an msi interrupt.

uncomment DEBUG_IVSHMEM in hw/ivshmem.c (and fix one compile error in
the debug statement :) )

IVSHMEM: msix initialized (1 vectors)
...
IVSHMEM: interrupt on vector 0x7f2971b1d8d0 0
IVSHMEM: interrupt on vector 0x7f2971b1d8d0 0

if we're using irqfd, this should be avoided.  Here's my command-line:

-device ivshmem,chardev=nahanni,msi=on,ioeventfd=on,size=2048,use64=1,role=peer

There are two issues as I see it:
1) irqfd is not being enabled in my tests
2) the defaults handlers are still being added

One difference is that I'm using the UIO driver, which enables PCI
using pci_enable_msix as follows

pci_enable_msix(ivs_info-dev, ivs_info-msix_entries,
ivs_info-nvectors);

and succeeds

[2.651253] uio_ivshmem :00:04.0: irq 43 for MSI/MSI-X
[2.651326] MSI-X enabled

(continued below)

 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  hw/ivshmem.c |   54 +-
  1 files changed, 53 insertions(+), 1 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 7c8630c..5709e89 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -19,6 +19,7 @@
  #include hw.h
  #include pc.h
  #include pci.h
 +#include msi.h
  #include msix.h
  #include kvm.h
  #include migration.h
 @@ -83,6 +84,7 @@ typedef struct IVShmemState {
  uint32_t vectors;
  uint32_t features;
  EventfdEntry *eventfd_table;
 +int *vector_virqs;

  Error *migration_blocker;

 @@ -625,16 +627,62 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int 
 version_id)
  return 0;
  }

 +static int ivshmem_vector_use(PCIDevice *dev, unsigned vector,
 + MSIMessage msg)
 +{
 +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 +int virq;
 +EventNotifier *n = s-peers[s-vm_id].eventfds[vector];
 +
 +virq = kvm_irqchip_add_msi_route(kvm_state, msg);
 +if (virq = 0  kvm_irqchip_add_irqfd_notifier(kvm_state, n, virq) = 
 0) {
 +s-vector_virqs[vector] = virq;
 +qemu_chr_add_handlers(s-eventfd_chr[vector], NULL, NULL, NULL, 
 NULL);
 +} else if (virq = 0) {
 +kvm_irqchip_release_virq(kvm_state, virq);
 +error_report(ivshmem, can not setup irqfd\n);
 +return -1;
 +} else {
 +error_report(ivshmem, no enough msi route to setup irqfd\n);
 +return -1;
 +}
 +
 +return 0;
 +}
 +
 +static void ivshmem_vector_release(PCIDevice *dev, unsigned vector)
 +{
 +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 +EventNotifier *n = s-peers[s-vm_id].eventfds[vector];
 +int virq = s-vector_virqs[vector];
 +
 +if (s-vector_virqs[vector] = 0) {
 +kvm_irqchip_remove_irqfd_notifier(kvm_state, n, virq);
 +kvm_irqchip_release_virq(kvm_state, virq);
 +s-vector_virqs[vector] = -1;
 +}
 +}
 +
  static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address,
  uint32_t val, int len)
  {
 +bool is_enabled, was_enabled = msi_enabled(pci_dev);
 +
  pci_default_write_config(pci_dev, address, val, len);
 +is_enabled = msi_enabled(pci_dev);

Problem 1)  in my tests is_enabled is always 0, so I don't think the
irqfds are getting setup

 +if (!was_enabled  is_enabled) {
 +msix_set_vector_notifiers(pci_dev, ivshmem_vector_use,
 +ivshmem_vector_release);
 +} else if (was_enabled  !is_enabled) {
 +msix_unset_vector_notifiers(pci_dev);
 +}
  }

  static int pci_ivshmem_init(PCIDevice *dev)
  {
  IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
  uint8_t *pci_conf;
 +int i;

  if (s-sizearg == NULL)
  s-ivshmem_size = 4  20; /* 4 MB default */
 @@ -758,7 +806,10 @@ static int pci_ivshmem_init(PCIDevice *dev)
  }

  s-dev.config_write = ivshmem_write_config;
 -
 +s-vector_virqs = g_new0(int, s-vectors);
 +for (i = 0; i  s-vectors; i++) {
 +s-vector_virqs[i] = -1;
 +}
  return 0;
  }

 @@ -770,6 +821,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev)
  migrate_del_blocker(s-migration_blocker);
  error_free(s-migration_blocker);
  }
 +g_free(s-vector_virqs);

  memory_region_destroy(s-ivshmem_mmio);
  memory_region_del_subregion(s-bar, s-ivshmem);
 --
 1.7.4.4


Problem 2)
We'll also have to not add the handlers as below if irqfd is present
otherwise we'll get double interrupts, so we'll have to add a check
here too.

/* if MSI is supported we need multiple interrupts */
if (!ivshmem_has_feature(s, IVSHMEM_MSI)) {
s-eventfd_table[vector].pdev = s-dev;

Re: [Qemu-devel] [PATCH v2 1/2] ivshmem: remove msix_write_config

2012-12-04 Thread Cam Macdonell
On Sat, Nov 24, 2012 at 8:51 PM, Liu Ping Fan qemul...@gmail.com wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com

 This logic has been integrated into pci core, so remove it.

 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
 ---
  hw/ivshmem.c |1 -
  1 files changed, 0 insertions(+), 1 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index f6dbb21..7c8630c 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -629,7 +629,6 @@ static void ivshmem_write_config(PCIDevice *pci_dev, 
 uint32_t address,
  uint32_t val, int len)
  {
  pci_default_write_config(pci_dev, address, val, len);
 -msix_write_config(pci_dev, address, val, len);
  }

  static int pci_ivshmem_init(PCIDevice *dev)
 --
 1.7.4.4




Re: [Qemu-devel] [PATCH v2 2/2] ivshmem: use irqfd to interrupt among VMs

2012-11-29 Thread Cam Macdonell
On Thu, Nov 29, 2012 at 1:34 AM, liu ping fan qemul...@gmail.com wrote:
 On Thu, Nov 29, 2012 at 12:42 PM, Cam Macdonell c...@cs.ualberta.ca wrote:
 On Tue, Nov 27, 2012 at 7:53 PM, liu ping fan qemul...@gmail.com wrote:
 On Wed, Nov 28, 2012 at 5:48 AM, Cam Macdonell c...@cs.ualberta.ca wrote:
 On Sat, Nov 24, 2012 at 8:51 PM, Liu Ping Fan qemul...@gmail.com wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com

 Using irqfd, so we can avoid switch between kernel and user when
 VMs interrupts each other.

 Nice work.  Due to a hardware failure, there will be a small delay in
 me being able to test this.  I'll follow up as soon as I can.

 BTW where can I find the latest guest code for test?
 I got the guest code from git://gitorious.org/nahanni/guest-code.git.
 But it seems outdated, after fixing the unlocked_ioctl, and vm-id bits
 position (the codes conflict with ivshmem spec), it works (I have
 tested for V1).

 Hello,

 Which device driver are you using?

 guest-code/kernel_module/standard/kvm_ivshmem.c

The uio driver is the recommended one, however if you want to use the
kvm_ivshmem one and have it working, then feel free to continue.

I had deleted it form the repo, but some users had based solutions off
it, so I added it back.

btw, my hardware issue has been resolved, so I'll get to testing your
patch soon.

Sincerely,
Cam


 Cam


 Regards,
 Pingfan


 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  hw/ivshmem.c |   54 
 +-
  1 files changed, 53 insertions(+), 1 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 7c8630c..5709e89 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -19,6 +19,7 @@
  #include hw.h
  #include pc.h
  #include pci.h
 +#include msi.h
  #include msix.h
  #include kvm.h
  #include migration.h
 @@ -83,6 +84,7 @@ typedef struct IVShmemState {
  uint32_t vectors;
  uint32_t features;
  EventfdEntry *eventfd_table;
 +int *vector_virqs;

  Error *migration_blocker;

 @@ -625,16 +627,62 @@ static int ivshmem_load(QEMUFile* f, void *opaque, 
 int version_id)
  return 0;
  }

 +static int ivshmem_vector_use(PCIDevice *dev, unsigned vector,
 + MSIMessage msg)
 +{
 +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 +int virq;
 +EventNotifier *n = s-peers[s-vm_id].eventfds[vector];
 +
 +virq = kvm_irqchip_add_msi_route(kvm_state, msg);
 +if (virq = 0  kvm_irqchip_add_irqfd_notifier(kvm_state, n, virq) 
 = 0) {
 +s-vector_virqs[vector] = virq;
 +qemu_chr_add_handlers(s-eventfd_chr[vector], NULL, NULL, NULL, 
 NULL);
 +} else if (virq = 0) {
 +kvm_irqchip_release_virq(kvm_state, virq);
 +error_report(ivshmem, can not setup irqfd\n);
 +return -1;
 +} else {
 +error_report(ivshmem, no enough msi route to setup irqfd\n);
 +return -1;
 +}
 +
 +return 0;
 +}
 +
 +static void ivshmem_vector_release(PCIDevice *dev, unsigned vector)
 +{
 +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 +EventNotifier *n = s-peers[s-vm_id].eventfds[vector];
 +int virq = s-vector_virqs[vector];
 +
 +if (s-vector_virqs[vector] = 0) {
 +kvm_irqchip_remove_irqfd_notifier(kvm_state, n, virq);
 +kvm_irqchip_release_virq(kvm_state, virq);
 +s-vector_virqs[vector] = -1;
 +}
 +}
 +
  static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address,
  uint32_t val, int len)
  {
 +bool is_enabled, was_enabled = msi_enabled(pci_dev);
 +
  pci_default_write_config(pci_dev, address, val, len);
 +is_enabled = msi_enabled(pci_dev);
 +if (!was_enabled  is_enabled) {
 +msix_set_vector_notifiers(pci_dev, ivshmem_vector_use,
 +ivshmem_vector_release);
 +} else if (was_enabled  !is_enabled) {
 +msix_unset_vector_notifiers(pci_dev);
 +}
  }

  static int pci_ivshmem_init(PCIDevice *dev)
  {
  IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
  uint8_t *pci_conf;
 +int i;

  if (s-sizearg == NULL)
  s-ivshmem_size = 4  20; /* 4 MB default */
 @@ -758,7 +806,10 @@ static int pci_ivshmem_init(PCIDevice *dev)
  }

  s-dev.config_write = ivshmem_write_config;
 -
 +s-vector_virqs = g_new0(int, s-vectors);
 +for (i = 0; i  s-vectors; i++) {
 +s-vector_virqs[i] = -1;
 +}
  return 0;
  }

 @@ -770,6 +821,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev)
  migrate_del_blocker(s-migration_blocker);
  error_free(s-migration_blocker);
  }
 +g_free(s-vector_virqs);

  memory_region_destroy(s-ivshmem_mmio);
  memory_region_del_subregion(s-bar, s-ivshmem);
 --
 1.7.4.4






Re: [Qemu-devel] [PATCH v2 2/2] ivshmem: use irqfd to interrupt among VMs

2012-11-28 Thread Cam Macdonell
On Tue, Nov 27, 2012 at 7:53 PM, liu ping fan qemul...@gmail.com wrote:
 On Wed, Nov 28, 2012 at 5:48 AM, Cam Macdonell c...@cs.ualberta.ca wrote:
 On Sat, Nov 24, 2012 at 8:51 PM, Liu Ping Fan qemul...@gmail.com wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com

 Using irqfd, so we can avoid switch between kernel and user when
 VMs interrupts each other.

 Nice work.  Due to a hardware failure, there will be a small delay in
 me being able to test this.  I'll follow up as soon as I can.

 BTW where can I find the latest guest code for test?
 I got the guest code from git://gitorious.org/nahanni/guest-code.git.
 But it seems outdated, after fixing the unlocked_ioctl, and vm-id bits
 position (the codes conflict with ivshmem spec), it works (I have
 tested for V1).

Hello,

Which device driver are you using?

Cam


 Regards,
 Pingfan


 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  hw/ivshmem.c |   54 +-
  1 files changed, 53 insertions(+), 1 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 7c8630c..5709e89 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -19,6 +19,7 @@
  #include hw.h
  #include pc.h
  #include pci.h
 +#include msi.h
  #include msix.h
  #include kvm.h
  #include migration.h
 @@ -83,6 +84,7 @@ typedef struct IVShmemState {
  uint32_t vectors;
  uint32_t features;
  EventfdEntry *eventfd_table;
 +int *vector_virqs;

  Error *migration_blocker;

 @@ -625,16 +627,62 @@ static int ivshmem_load(QEMUFile* f, void *opaque, 
 int version_id)
  return 0;
  }

 +static int ivshmem_vector_use(PCIDevice *dev, unsigned vector,
 + MSIMessage msg)
 +{
 +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 +int virq;
 +EventNotifier *n = s-peers[s-vm_id].eventfds[vector];
 +
 +virq = kvm_irqchip_add_msi_route(kvm_state, msg);
 +if (virq = 0  kvm_irqchip_add_irqfd_notifier(kvm_state, n, virq) = 
 0) {
 +s-vector_virqs[vector] = virq;
 +qemu_chr_add_handlers(s-eventfd_chr[vector], NULL, NULL, NULL, 
 NULL);
 +} else if (virq = 0) {
 +kvm_irqchip_release_virq(kvm_state, virq);
 +error_report(ivshmem, can not setup irqfd\n);
 +return -1;
 +} else {
 +error_report(ivshmem, no enough msi route to setup irqfd\n);
 +return -1;
 +}
 +
 +return 0;
 +}
 +
 +static void ivshmem_vector_release(PCIDevice *dev, unsigned vector)
 +{
 +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 +EventNotifier *n = s-peers[s-vm_id].eventfds[vector];
 +int virq = s-vector_virqs[vector];
 +
 +if (s-vector_virqs[vector] = 0) {
 +kvm_irqchip_remove_irqfd_notifier(kvm_state, n, virq);
 +kvm_irqchip_release_virq(kvm_state, virq);
 +s-vector_virqs[vector] = -1;
 +}
 +}
 +
  static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address,
  uint32_t val, int len)
  {
 +bool is_enabled, was_enabled = msi_enabled(pci_dev);
 +
  pci_default_write_config(pci_dev, address, val, len);
 +is_enabled = msi_enabled(pci_dev);
 +if (!was_enabled  is_enabled) {
 +msix_set_vector_notifiers(pci_dev, ivshmem_vector_use,
 +ivshmem_vector_release);
 +} else if (was_enabled  !is_enabled) {
 +msix_unset_vector_notifiers(pci_dev);
 +}
  }

  static int pci_ivshmem_init(PCIDevice *dev)
  {
  IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
  uint8_t *pci_conf;
 +int i;

  if (s-sizearg == NULL)
  s-ivshmem_size = 4  20; /* 4 MB default */
 @@ -758,7 +806,10 @@ static int pci_ivshmem_init(PCIDevice *dev)
  }

  s-dev.config_write = ivshmem_write_config;
 -
 +s-vector_virqs = g_new0(int, s-vectors);
 +for (i = 0; i  s-vectors; i++) {
 +s-vector_virqs[i] = -1;
 +}
  return 0;
  }

 @@ -770,6 +821,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev)
  migrate_del_blocker(s-migration_blocker);
  error_free(s-migration_blocker);
  }
 +g_free(s-vector_virqs);

  memory_region_destroy(s-ivshmem_mmio);
  memory_region_del_subregion(s-bar, s-ivshmem);
 --
 1.7.4.4





Re: [Qemu-devel] [PATCH v2 2/2] ivshmem: use irqfd to interrupt among VMs

2012-11-27 Thread Cam Macdonell
On Sat, Nov 24, 2012 at 8:51 PM, Liu Ping Fan qemul...@gmail.com wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com

 Using irqfd, so we can avoid switch between kernel and user when
 VMs interrupts each other.

Nice work.  Due to a hardware failure, there will be a small delay in
me being able to test this.  I'll follow up as soon as I can.


 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  hw/ivshmem.c |   54 +-
  1 files changed, 53 insertions(+), 1 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 7c8630c..5709e89 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -19,6 +19,7 @@
  #include hw.h
  #include pc.h
  #include pci.h
 +#include msi.h
  #include msix.h
  #include kvm.h
  #include migration.h
 @@ -83,6 +84,7 @@ typedef struct IVShmemState {
  uint32_t vectors;
  uint32_t features;
  EventfdEntry *eventfd_table;
 +int *vector_virqs;

  Error *migration_blocker;

 @@ -625,16 +627,62 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int 
 version_id)
  return 0;
  }

 +static int ivshmem_vector_use(PCIDevice *dev, unsigned vector,
 + MSIMessage msg)
 +{
 +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 +int virq;
 +EventNotifier *n = s-peers[s-vm_id].eventfds[vector];
 +
 +virq = kvm_irqchip_add_msi_route(kvm_state, msg);
 +if (virq = 0  kvm_irqchip_add_irqfd_notifier(kvm_state, n, virq) = 
 0) {
 +s-vector_virqs[vector] = virq;
 +qemu_chr_add_handlers(s-eventfd_chr[vector], NULL, NULL, NULL, 
 NULL);
 +} else if (virq = 0) {
 +kvm_irqchip_release_virq(kvm_state, virq);
 +error_report(ivshmem, can not setup irqfd\n);
 +return -1;
 +} else {
 +error_report(ivshmem, no enough msi route to setup irqfd\n);
 +return -1;
 +}
 +
 +return 0;
 +}
 +
 +static void ivshmem_vector_release(PCIDevice *dev, unsigned vector)
 +{
 +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 +EventNotifier *n = s-peers[s-vm_id].eventfds[vector];
 +int virq = s-vector_virqs[vector];
 +
 +if (s-vector_virqs[vector] = 0) {
 +kvm_irqchip_remove_irqfd_notifier(kvm_state, n, virq);
 +kvm_irqchip_release_virq(kvm_state, virq);
 +s-vector_virqs[vector] = -1;
 +}
 +}
 +
  static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address,
  uint32_t val, int len)
  {
 +bool is_enabled, was_enabled = msi_enabled(pci_dev);
 +
  pci_default_write_config(pci_dev, address, val, len);
 +is_enabled = msi_enabled(pci_dev);
 +if (!was_enabled  is_enabled) {
 +msix_set_vector_notifiers(pci_dev, ivshmem_vector_use,
 +ivshmem_vector_release);
 +} else if (was_enabled  !is_enabled) {
 +msix_unset_vector_notifiers(pci_dev);
 +}
  }

  static int pci_ivshmem_init(PCIDevice *dev)
  {
  IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
  uint8_t *pci_conf;
 +int i;

  if (s-sizearg == NULL)
  s-ivshmem_size = 4  20; /* 4 MB default */
 @@ -758,7 +806,10 @@ static int pci_ivshmem_init(PCIDevice *dev)
  }

  s-dev.config_write = ivshmem_write_config;
 -
 +s-vector_virqs = g_new0(int, s-vectors);
 +for (i = 0; i  s-vectors; i++) {
 +s-vector_virqs[i] = -1;
 +}
  return 0;
  }

 @@ -770,6 +821,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev)
  migrate_del_blocker(s-migration_blocker);
  error_free(s-migration_blocker);
  }
 +g_free(s-vector_virqs);

  memory_region_destroy(s-ivshmem_mmio);
  memory_region_del_subregion(s-bar, s-ivshmem);
 --
 1.7.4.4




Re: [Qemu-devel] [PATCH v4] ivshmem: add 64bit option

2012-08-31 Thread Cam Macdonell
Can this patch be included in 1.2?

Thanks,
Cam

On Thu, Aug 23, 2012 at 6:49 AM, Gerd Hoffmann kra...@redhat.com wrote:
 This patch adds a use64 property which will make the ivshmem driver
 register a 64bit memory bar when set, so you have something to play with
 when testing 64bit pci bits.  It also allows to have quite big shared
 memory regions, like this:

 [root@fedora ~]# lspci -vs1:1
 01:01.0 RAM memory: Red Hat, Inc Device 1110
 Subsystem: Red Hat, Inc Device 1100
 Physical Slot: 1-1
 Flags: fast devsel
 Memory at fd40 (32-bit, non-prefetchable) [disabled] [size=256]
 Memory at 804000 (64-bit, prefetchable) [size=1G]

 [ v4: rebase  adapt to latest master again ]
 [ v3: rebase  adapt to latest master ]
 [ v2: default to on as suggested by avi,
   turn off for pc-$old using compat property ]

 Signed-off-by: Gerd Hoffmann kra...@redhat.com
 Tested-by: Cam Macdonell c...@cs.ualberta.ca
 ---
  hw/ivshmem.c |   13 ++---
  hw/pc_piix.c |4 
  2 files changed, 14 insertions(+), 3 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index b4d65a6..8c45aa6 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -71,6 +71,8 @@ typedef struct IVShmemState {
  MemoryRegion bar;
  MemoryRegion ivshmem;
  uint64_t ivshmem_size; /* size of shared memory region */
 +uint32_t ivshmem_attr;
 +uint32_t ivshmem_64bit;
  int shm_fd; /* shared memory file descriptor */

  Peer *peers;
 @@ -339,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int 
 fd) {
  memory_region_add_subregion(s-bar, 0, s-ivshmem);

  /* region for shared memory */
 -pci_register_bar(s-dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar);
 +pci_register_bar(s-dev, 2, s-ivshmem_attr, s-bar);
  }

  static void ivshmem_add_eventfd(IVShmemState *s, int posn, int i)
 @@ -701,6 +703,11 @@ static int pci_ivshmem_init(PCIDevice *dev)
   s-ivshmem_mmio);

  memory_region_init(s-bar, ivshmem-bar2-container, s-ivshmem_size);
 +s-ivshmem_attr = PCI_BASE_ADDRESS_SPACE_MEMORY |
 +PCI_BASE_ADDRESS_MEM_PREFETCH;
 +if (s-ivshmem_64bit) {
 +s-ivshmem_attr |= PCI_BASE_ADDRESS_MEM_TYPE_64;
 +}

  if ((s-server_chr != NULL) 
  (strncmp(s-server_chr-filename, unix:, 5) == 0)) 
 {
 @@ -726,8 +733,7 @@ static int pci_ivshmem_init(PCIDevice *dev)
  /* allocate/initialize space for interrupt handling */
  s-peers = g_malloc0(s-nb_peers * sizeof(Peer));

 -pci_register_bar(s-dev, 2,
 - PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar);
 +pci_register_bar(s-dev, 2, s-ivshmem_attr, s-bar);

  s-eventfd_chr = g_malloc0(s-vectors * sizeof(CharDriverState *));

 @@ -797,6 +803,7 @@ static Property ivshmem_properties[] = {
  DEFINE_PROP_BIT(msi, IVShmemState, features, IVSHMEM_MSI, true),
  DEFINE_PROP_STRING(shm, IVShmemState, shmobj),
  DEFINE_PROP_STRING(role, IVShmemState, role),
 +DEFINE_PROP_UINT32(use64, IVShmemState, ivshmem_64bit, 1),
  DEFINE_PROP_END_OF_LIST(),
  };

 diff --git a/hw/pc_piix.c b/hw/pc_piix.c
 index 88ff041..c7fbaa0 100644
 --- a/hw/pc_piix.c
 +++ b/hw/pc_piix.c
 @@ -387,6 +387,10 @@ static QEMUMachine pc_machine_v1_2 = {
  .driver   = virtio-blk-pci,\
  .property = config-wce,\
  .value= off,\
 +},{\
 +.driver   = ivshmem,\
 +.property = use64,\
 +.value= 0,\
  }

  static QEMUMachine pc_machine_v1_1 = {
 --
 1.7.1





Re: [Qemu-devel] [PATCH for-1.2] msix: make [un]use vectors on reset/load optional

2012-08-29 Thread Cam Macdonell
Yes, will test shortly.

Cam

On Wed, Aug 29, 2012 at 10:47 AM, Michael S. Tsirkin m...@redhat.com wrote:
 On Wed, Aug 29, 2012 at 06:44:49PM +0200, Jan Kiszka wrote:
 On 2012-08-29 18:40, Michael S. Tsirkin wrote:
  The facility to use/unuse vectors dynamically is helpful
  for virtio but little else: everyone just seems to use
  vectors in their init function.
 
  Avoid clearing msix vector use info on reset and load.
  For virtio, clear it explicitly.
  This should fix regressions reported with ivshmem - though
  I didn't test this, I verified that virtio keeps
  working like it did.
 
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   hw/msix.c   | 13 +++--
   hw/virtio-pci.c |  2 ++
   2 files changed, 13 insertions(+), 2 deletions(-)
 
  diff --git a/hw/msix.c b/hw/msix.c
  index 800fc32..d040cc2 100644
  --- a/hw/msix.c
  +++ b/hw/msix.c
  @@ -340,6 +340,15 @@ static void msix_free_irq_entries(PCIDevice *dev)
   }
   }
 
  +static void msix_clear_all_vectors(PCIDevice *dev)
  +{
  +int vector;
  +
  +for (vector = 0; vector  dev-msix_entries_nr; ++vector) {
  +msix_clr_pending(dev, vector);
  +}
  +}
  +
   /* Clean up resources for the device. */
   void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion 
  *pba_bar)
   {
  @@ -394,7 +403,7 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
   return;
   }
 
  -msix_free_irq_entries(dev);
  +msix_clear_all_vectors(dev);
   qemu_get_buffer(f, dev-msix_table, n * PCI_MSIX_ENTRY_SIZE);
   qemu_get_buffer(f, dev-msix_pba, (n + 7) / 8);
   msix_update_function_masked(dev);
  @@ -440,7 +449,7 @@ void msix_reset(PCIDevice *dev)
   if (!msix_present(dev)) {
   return;
   }
  -msix_free_irq_entries(dev);
  +msix_clear_all_vectors(dev);
   dev-config[dev-msix_cap + MSIX_CONTROL_OFFSET] =
  ~dev-wmask[dev-msix_cap + MSIX_CONTROL_OFFSET];
   memset(dev-msix_table, 0, dev-msix_entries_nr * 
  PCI_MSIX_ENTRY_SIZE);
  diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
  index 125eded..ca0b204 100644
  --- a/hw/virtio-pci.c
  +++ b/hw/virtio-pci.c
  @@ -131,6 +131,7 @@ static int virtio_pci_load_config(void * opaque, 
  QEMUFile *f)
   if (ret) {
   return ret;
   }
  +msix_unuse_all_vectors(proxy-pci_dev);
   msix_load(proxy-pci_dev, f);
   if (msix_present(proxy-pci_dev)) {
   qemu_get_be16s(f, proxy-vdev-config_vector);
  @@ -246,6 +247,7 @@ void virtio_pci_reset(DeviceState *d)
   VirtIOPCIProxy *proxy = container_of(d, VirtIOPCIProxy, pci_dev.qdev);
   virtio_pci_stop_ioeventfd(proxy);
   virtio_reset(proxy-vdev);
  +msix_unuse_all_vectors(proxy-pci_dev);
   proxy-flags = ~VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
   }
 
 

 Fine with me, but let's ask Cam to test.

 Jan

 Cam deadline is today - can u test quickly pls?

 --
 Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
 Corporate Competence Center Embedded Linux




Re: [Qemu-devel] [PATCH for-1.2] msix: make [un]use vectors on reset/load optional

2012-08-29 Thread Cam Macdonell
Yes, works for me.

On Wed, Aug 29, 2012 at 10:40 AM, Michael S. Tsirkin m...@redhat.com wrote:
 The facility to use/unuse vectors dynamically is helpful
 for virtio but little else: everyone just seems to use
 vectors in their init function.

 Avoid clearing msix vector use info on reset and load.
 For virtio, clear it explicitly.
 This should fix regressions reported with ivshmem - though
 I didn't test this, I verified that virtio keeps
 working like it did.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
Tested-by: Cam Macdonell c...@cs.ualberta.ca
 ---
  hw/msix.c   | 13 +++--
  hw/virtio-pci.c |  2 ++
  2 files changed, 13 insertions(+), 2 deletions(-)

 diff --git a/hw/msix.c b/hw/msix.c
 index 800fc32..d040cc2 100644
 --- a/hw/msix.c
 +++ b/hw/msix.c
 @@ -340,6 +340,15 @@ static void msix_free_irq_entries(PCIDevice *dev)
  }
  }

 +static void msix_clear_all_vectors(PCIDevice *dev)
 +{
 +int vector;
 +
 +for (vector = 0; vector  dev-msix_entries_nr; ++vector) {
 +msix_clr_pending(dev, vector);
 +}
 +}
 +
  /* Clean up resources for the device. */
  void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion 
 *pba_bar)
  {
 @@ -394,7 +403,7 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
  return;
  }

 -msix_free_irq_entries(dev);
 +msix_clear_all_vectors(dev);
  qemu_get_buffer(f, dev-msix_table, n * PCI_MSIX_ENTRY_SIZE);
  qemu_get_buffer(f, dev-msix_pba, (n + 7) / 8);
  msix_update_function_masked(dev);
 @@ -440,7 +449,7 @@ void msix_reset(PCIDevice *dev)
  if (!msix_present(dev)) {
  return;
  }
 -msix_free_irq_entries(dev);
 +msix_clear_all_vectors(dev);
  dev-config[dev-msix_cap + MSIX_CONTROL_OFFSET] =
 ~dev-wmask[dev-msix_cap + MSIX_CONTROL_OFFSET];
  memset(dev-msix_table, 0, dev-msix_entries_nr * PCI_MSIX_ENTRY_SIZE);
 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index 125eded..ca0b204 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -131,6 +131,7 @@ static int virtio_pci_load_config(void * opaque, QEMUFile 
 *f)
  if (ret) {
  return ret;
  }
 +msix_unuse_all_vectors(proxy-pci_dev);
  msix_load(proxy-pci_dev, f);
  if (msix_present(proxy-pci_dev)) {
  qemu_get_be16s(f, proxy-vdev-config_vector);
 @@ -246,6 +247,7 @@ void virtio_pci_reset(DeviceState *d)
  VirtIOPCIProxy *proxy = container_of(d, VirtIOPCIProxy, pci_dev.qdev);
  virtio_pci_stop_ioeventfd(proxy);
  virtio_reset(proxy-vdev);
 +msix_unuse_all_vectors(proxy-pci_dev);
  proxy-flags = ~VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
  }

 --
 MST




[Qemu-devel] [PATCH for-1.2] ivshmem: remove redundant ioeventfd configuration

2012-08-27 Thread Cam Macdonell
setup_ioeventfds() is unnecessary and actually causes a segfault when used
ioeventfd=on is used on the command-line.  Since ioeventfds are handled within
the memory API, it can be removed.

Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 hw/ivshmem.c |   15 ---
 1 files changed, 0 insertions(+), 15 deletions(-)

diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 068e1f3..d0fe5a2 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -387,17 +387,6 @@ static void close_guest_eventfds(IVShmemState *s, int posn)
 s-peers[posn].nb_eventfds = 0;
 }
 
-static void setup_ioeventfds(IVShmemState *s) {
-
-int i, j;
-
-for (i = 0; i = s-max_peer; i++) {
-for (j = 0; j  s-peers[i].nb_eventfds; j++) {
-ivshmem_add_eventfd(s, i, j);
-}
-}
-}
-
 /* this function increase the dynamic storage need to store data about other
  * guests */
 static void increase_dynamic_storage(IVShmemState *s, int new_min_size) {
@@ -678,10 +667,6 @@ static int pci_ivshmem_init(PCIDevice *dev)
 memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s,
   ivshmem-mmio, IVSHMEM_REG_BAR_SIZE);
 
-if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
-setup_ioeventfds(s);
-}
-
 /* region for registers*/
 pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
  s-ivshmem_mmio);
-- 
1.7.5.4




Re: [Qemu-devel] [PATCH v2][for 1.2?] msix: Drop tracking of used vectors

2012-08-24 Thread Cam Macdonell
On Fri, Aug 24, 2012 at 12:19 AM, Jan Kiszka jan.kis...@web.de wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 This optimization was once used in qemu-kvm to keep KVM route usage low.
 But now we solved that problem via lazy updates. It also tried to handle
 the case of vectors shared between different sources of the same device.
 However, this never really worked and will have to be addressed
 differently anyway. So drop this obsolete interface.

 We still need interfaces to clear pending vectors though. Provide
 msix_clear_vector and msix_clear_all_vectors for this.

 This also unbreaks MSI-X after reset for ivshmem and megasas as device
 models can't easily mark their vectors used afterward (megasas didn't
 even try).

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Tested-by: Cam Macdonell c...@cs.ualberta.ca
 ---

 This patch has been posted some moons again, and we had a discussion
 about the impact on the existing users. At that time, MSI-X refactoring
 for KVM support was not yet merged. Now it is, and it should be better
 much clearer that vector usage tracking has no business with that
 feature.

  hw/ivshmem.c|   20 ---
  hw/megasas.c|4 ---
  hw/msix.c   |   57 --
  hw/msix.h   |5 +--
  hw/pci.h|2 -
  hw/virtio-pci.c |   20 +++---
  6 files changed, 23 insertions(+), 85 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 47f2a16..548 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -523,28 +523,11 @@ static void ivshmem_read(void *opaque, const uint8_t * 
 buf, int flags)
  return;
  }

 -/* Select the MSI-X vectors used by device.
 - * ivshmem maps events to vectors statically, so
 - * we just enable all vectors on init and after reset. */
 -static void ivshmem_use_msix(IVShmemState * s)
 -{
 -int i;
 -
 -if (!msix_present(s-dev)) {
 -return;
 -}
 -
 -for (i = 0; i  s-vectors; i++) {
 -msix_vector_use(s-dev, i);
 -}
 -}
 -
  static void ivshmem_reset(DeviceState *d)
  {
  IVShmemState *s = DO_UPCAST(IVShmemState, dev.qdev, d);

  s-intrstatus = 0;
 -ivshmem_use_msix(s);
  return;
  }

 @@ -586,8 +569,6 @@ static void ivshmem_setup_msi(IVShmemState * s)

  /* allocate QEMU char devices for receiving interrupts */
  s-eventfd_table = g_malloc0(s-vectors * sizeof(EventfdEntry));
 -
 -ivshmem_use_msix(s);
  }

  static void ivshmem_save(QEMUFile* f, void *opaque)
 @@ -629,7 +610,6 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int 
 version_id)

  if (ivshmem_has_feature(proxy, IVSHMEM_MSI)) {
  msix_load(proxy-dev, f);
 -   ivshmem_use_msix(proxy);
  } else {
  proxy-intrstatus = qemu_get_be32(f);
  proxy-intrmask = qemu_get_be32(f);
 diff --git a/hw/megasas.c b/hw/megasas.c
 index c35a15d..4766117 100644
 --- a/hw/megasas.c
 +++ b/hw/megasas.c
 @@ -2121,10 +2121,6 @@ static int megasas_scsi_init(PCIDevice *dev)
  pci_register_bar(s-dev, 2, PCI_BASE_ADDRESS_SPACE_IO, s-port_io);
  pci_register_bar(s-dev, 3, bar_type, s-queue_io);

 -if (megasas_use_msix(s)) {
 -msix_vector_use(s-dev, 0);
 -}
 -
  if (!s-sas_addr) {
  s-sas_addr = ((NAA_LOCALLY_ASSIGNED_ID  24) |
 IEEE_COMPANY_LOCALLY_ASSIGNED)  36;
 diff --git a/hw/msix.c b/hw/msix.c
 index aea340b..163ce4c 100644
 --- a/hw/msix.c
 +++ b/hw/msix.c
 @@ -273,7 +273,6 @@ int msix_init(struct PCIDevice *dev, unsigned short 
 nentries,

  dev-msix_table = g_malloc0(table_size);
  dev-msix_pba = g_malloc0(pba_size);
 -dev-msix_entry_used = g_malloc0(nentries * sizeof 
 *dev-msix_entry_used);

  msix_mask_all(dev, nentries);

 @@ -326,16 +325,6 @@ int msix_init_exclusive_bar(PCIDevice *dev, unsigned 
 short nentries,
  return 0;
  }

 -static void msix_free_irq_entries(PCIDevice *dev)
 -{
 -int vector;
 -
 -for (vector = 0; vector  dev-msix_entries_nr; ++vector) {
 -dev-msix_entry_used[vector] = 0;
 -msix_clr_pending(dev, vector);
 -}
 -}
 -
  /* Clean up resources for the device. */
  void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion 
 *pba_bar)
  {
 @@ -344,7 +333,6 @@ void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, 
 MemoryRegion *pba_bar)
  }
  pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
  dev-msix_cap = 0;
 -msix_free_irq_entries(dev);
  dev-msix_entries_nr = 0;
  memory_region_del_subregion(pba_bar, dev-msix_pba_mmio);
  memory_region_destroy(dev-msix_pba_mmio);
 @@ -354,8 +342,6 @@ void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, 
 MemoryRegion *pba_bar)
  memory_region_destroy(dev-msix_table_mmio);
  g_free(dev-msix_table);
  dev-msix_table = NULL;
 -g_free(dev-msix_entry_used);
 -dev-msix_entry_used = NULL;
  dev-cap_present = ~QEMU_PCI_CAP_MSIX;
  return;
  }
 @@ -390,7 +376,6 @@ void

[Qemu-devel] MSI-X bug with ivshmem since msix_reset moved to PCI

2012-08-23 Thread Cam Macdonell
Hi Jan,

I've bisected a bug in which MSI interrupts are not being delivered to
the following patch, where msix_reset was moved in tot he PCI core.

commit cbd2d4342b3d42ab33baa99f5b7a23491b5692f2
Author: Jan Kiszka jan.kis...@siemens.com
Date:   Tue May 15 20:09:56 2012 -0300

msi: Invoke msi/msix_reset from PCI core

There is no point in pushing this burden to the devices, they tend to
forget to call them (like intel-hda, ahci, xhci did). Instead, reset
functions are now called from pci_device_reset. They do nothing if
MSI/MSI-X is not in use.

I've been debugging and it seems that when msix_notify() is triggered
the second test in the if fails

/* Send an MSI-X message */
void msix_notify(PCIDevice *dev, unsigned vector)
{
MSIMessage msg;

if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
return;

   …
}

here is some MSI-X debugging statements

msix_init
IVSHMEM: msix initialized (1 vectors)
IVSHMEM: using vector 0
IVSHMEM: ivshmem_reset
IVSHMEM: using vector 0
msix_reset
msix_free_irq_entries 0x7fd52d1cea20

msix_free_irq_entries() sets dev-msix_entries_nr to 0, so I think it
may be the cause.

Shouldn't ivshmem's reset (which reenables the vectors) be triggered
by the msix_reset?

Thanks,
Cam

p.s. And apologies, I should've caught this bug closer to that patch
being merged.



Re: [Qemu-devel] ivshmem assertion failure with EventNotifier

2012-08-22 Thread Cam Macdonell
On Wed, Aug 22, 2012 at 6:01 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 22/08/2012 06:29, Cam Macdonell ha scritto:
 Hi Paolo,

 I've noticed an assertion error when sending interrupts via ivshmem.
 I bisected to this patch.

 Does this help?

Yes, that solves it.


 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index b4d65a6..47f2a16 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -366,6 +366,10 @@ static void close_guest_eventfds(IVShmemState *s,
 int posn)
  {
  int i, guest_curr_max;

 +if (!ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 +return;
 +}
 +
  guest_curr_max = s-peers[posn].nb_eventfds;

  memory_region_transaction_begin();

 Paolo




Re: [Qemu-devel] [PATCH] ivshmem: add 64bit option

2012-08-21 Thread Cam Macdonell
Thanks Gerd,

might need another rebase.  I have had some users request larger memory regions.

Quoting Gerd Hoffmann kra...@redhat.com:
 This patch adds a use64 property which will make the ivshmem driver
 register a 64bit memory bar when set, so you have something to play with
 when testing 64bit pci bits.  It also allows to have quite big shared
 memory regions, like this:
 
 [root@fedora ~]# lspci -vs1:1
 01:01.0 RAM memory: Red Hat, Inc Device 1110
 Subsystem: Red Hat, Inc Device 1100
 Physical Slot: 1-1
 Flags: fast devsel
 Memory at fd40 (32-bit, non-prefetchable) [disabled] [size=256]
 Memory at 804000 (64-bit, prefetchable) [size=1G]
 
 [ v3: rebase  adapt to latest master ]
 [ v2: default to on as suggested by avi,
   turn off for pc-$old using compat property ]
 
 Signed-off-by: Gerd Hoffmann kra...@redhat.com
Tested-by: Cam Macdonell c...@cs.ualberta.ca
 ---
  hw/ivshmem.c |   13 ++---
  hw/pc_piix.c |4 
  2 files changed, 14 insertions(+), 3 deletions(-)
 
 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 05559b6..8a776e2 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -72,6 +72,8 @@ typedef struct IVShmemState {
  MemoryRegion ivshmem;
  MemoryRegion msix_bar;
  uint64_t ivshmem_size; /* size of shared memory region */
 +uint32_t ivshmem_attr;
 +uint32_t ivshmem_64bit;
  int shm_fd; /* shared memory file descriptor */
  
  Peer *peers;
 @@ -344,7 +346,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int 
 fd) {
  memory_region_add_subregion(s-bar, 0, s-ivshmem);
  
  /* region for shared memory */
 -pci_register_bar(s-dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar);
 +pci_register_bar(s-dev, 2, s-ivshmem_attr, s-bar);
  }
  
  static void close_guest_eventfds(IVShmemState *s, int posn)
 @@ -693,6 +695,11 @@ static int pci_ivshmem_init(PCIDevice *dev)
   s-ivshmem_mmio);
  
  memory_region_init(s-bar, ivshmem-bar2-container, s-ivshmem_size);
 +s-ivshmem_attr = PCI_BASE_ADDRESS_SPACE_MEMORY |
 +PCI_BASE_ADDRESS_MEM_PREFETCH;
 +if (s-ivshmem_64bit) {
 +s-ivshmem_attr |= PCI_BASE_ADDRESS_MEM_TYPE_64;
 +}
  
  if ((s-server_chr != NULL) 
  (strncmp(s-server_chr-filename, unix:, 5) == 0)) 
 {
 @@ -718,8 +725,7 @@ static int pci_ivshmem_init(PCIDevice *dev)
  /* allocate/initialize space for interrupt handling */
  s-peers = g_malloc0(s-nb_peers * sizeof(Peer));
  
 -pci_register_bar(s-dev, 2,
 - PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar);
 +pci_register_bar(s-dev, 2, s-ivshmem_attr, s-bar);
  
  s-eventfd_chr = g_malloc0(s-vectors * sizeof(CharDriverState *));
  
 @@ -791,6 +797,7 @@ static Property ivshmem_properties[] = {
  DEFINE_PROP_BIT(msi, IVShmemState, features, IVSHMEM_MSI, true),
  DEFINE_PROP_STRING(shm, IVShmemState, shmobj),
  DEFINE_PROP_STRING(role, IVShmemState, role),
 +DEFINE_PROP_UINT32(use64, IVShmemState, ivshmem_64bit, 1),
  DEFINE_PROP_END_OF_LIST(),
  };
  
 diff --git a/hw/pc_piix.c b/hw/pc_piix.c
 index 0c0096f..5101f5d 100644
 --- a/hw/pc_piix.c
 +++ b/hw/pc_piix.c
 @@ -375,6 +375,10 @@ static QEMUMachine pc_machine_v1_2 = {
  .driver   = qxl,\
  .property = vgamem_mb,\
  .value= stringify(8),\
 +},{\
 +.driver   = ivshmem,\
 +.property = use64,\
 +.value= 0,\
  }
  
  static QEMUMachine pc_machine_v1_1 = {
 -- 
 1.7.1
 
 



[Qemu-devel] ivshmem assertion failure with EventNotifier

2012-08-21 Thread Cam Macdonell
Hi Paolo,

I've noticed an assertion error when sending interrupts via ivshmem.
I bisected to this patch.

commit 563027cc0c94aa4846c18f9d665a4c90f8c42ba8
Author: Paolo Bonzini pbonz...@redhat.com
Date:   Thu Jul 5 17:16:25 2012 +0200

ivshmem: use EventNotifier and memory API

All of ivshmem's usage of eventfd now has a corresponding API in
EventNotifier.  Simplify the code by using it, and also use the
memory API consistently to set up and tear down the ioeventfds.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

qemu-system-x86_64: /home/cam/src/git/qemu/memory.c:1244: memory_region_del_even
tfd: Assertion `i != mr-ioeventfd_nb' failed.  This assertion failure
occurs when the eventfd is triggered.

I'll continue to dig around, but can you explain what this assertion
is catching.  Is there an initialization that might be missing?

Thanks,
Cam



Re: [Qemu-devel] [PATCH master/v1.0.x] ivshmem: add missing msix calls

2012-01-13 Thread Cam Macdonell
Hello,

Can this patch be merged, please?

Thanks,
Cam

On Mon, Dec 5, 2011 at 12:48 PM, Michael S. Tsirkin m...@redhat.com wrote:
 ivshmem used msix but didn't call it on either reset or
 config write paths. This used to partically work since
 guests don't use all of msi-x configuration fields,
 and reset is rarely used, but the patch 'msix: track function masked
 in pci device state' broke that. Fix by adding appropriate calls.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Reported-by: Cam Macdonell c...@cs.ualberta.ca
 Tested-by: Cam Macdonell c...@cs.ualberta.ca

 ---

 Please apply the following to both master
 and 1.0 stable branch. Thanks!

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 242fbea..c58f4d3 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -500,11 +500,29 @@ static void ivshmem_read(void *opaque, const uint8_t * 
 buf, int flags)
     return;
  }

 +/* Select the MSI-X vectors used by device.
 + * ivshmem maps events to vectors statically, so
 + * we just enable all vectors on init and after reset. */
 +static void ivshmem_use_msix(IVShmemState * s)
 +{
 +    int i;
 +
 +    if (!msix_present(s-dev)) {
 +        return;
 +    }
 +
 +    for (i = 0; i  s-vectors; i++) {
 +        msix_vector_use(s-dev, i);
 +    }
 +}
 +
  static void ivshmem_reset(DeviceState *d)
  {
     IVShmemState *s = DO_UPCAST(IVShmemState, dev.qdev, d);

     s-intrstatus = 0;
 +    msix_reset(s-dev);
 +    ivshmem_use_msix(s);
     return;
  }

 @@ -535,12 +553,8 @@ static uint64_t ivshmem_get_size(IVShmemState * s) {
     return value;
  }

 -static void ivshmem_setup_msi(IVShmemState * s) {
 -
 -    int i;
 -
 -    /* allocate the MSI-X vectors */
 -
 +static void ivshmem_setup_msi(IVShmemState * s)
 +{
     memory_region_init(s-msix_bar, ivshmem-msix, 4096);
     if (!msix_init(s-dev, s-vectors, s-msix_bar, 1, 0)) {
         pci_register_bar(s-dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
 @@ -551,13 +565,10 @@ static void ivshmem_setup_msi(IVShmemState * s) {
         exit(1);
     }

 -    /* 'activate' the vectors */
 -    for (i = 0; i  s-vectors; i++) {
 -        msix_vector_use(s-dev, i);
 -    }
 -
     /* allocate Qemu char devices for receiving interrupts */
     s-eventfd_table = g_malloc0(s-vectors * sizeof(EventfdEntry));
 +
 +    ivshmem_use_msix(s);
  }

  static void ivshmem_save(QEMUFile* f, void *opaque)
 @@ -610,6 +621,13 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int 
 version_id)
     return 0;
  }

 +static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address,
 +                                uint32_t val, int len)
 +{
 +    pci_default_write_config(pci_dev, address, val, len);
 +    msix_write_config(pci_dev, address, val, len);
 +}
 +
  static int pci_ivshmem_init(PCIDevice *dev)
  {
     IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 @@ -734,6 +752,8 @@ static int pci_ivshmem_init(PCIDevice *dev)

     }

 +    s-dev.config_write = ivshmem_write_config;
 +
     return 0;
  }





Re: [Qemu-devel] [PATCH] ivshmem: fix guest unable to start with ioeventfd

2012-01-13 Thread Cam Macdonell
Can this patch please be merged?  It fixes an important regression
with ioeventfd.

Thanks,
Cam

On Thu, Nov 24, 2011 at 3:05 AM,  zanghongy...@huawei.com wrote:
 From: Hongyong Zang zanghongy...@huawei.com

 When a guest boots with ioeventfd, an error (by gdb) occurs:
  Program received signal SIGSEGV, Segmentation fault.
  0x006009cc in setup_ioeventfds (s=0x171dc40)
      at /home/louzhengwei/git_source/qemu-kvm/hw/ivshmem.c:363
  363             for (j = 0; j  s-peers[i].nb_eventfds; j++) {
 The bug is due to accessing s-peers which is NULL.

 This patch uses the memory region API to replace the old one 
 kvm_set_ioeventfd_mmio_long().
 And this patch makes memory_region_add_eventfd() called in ivshmem_read() 
 when qemu receives
 eventfd information from ivshmem_server.

 Signed-off-by: Hongyong Zang zanghongy...@huawei.com
 ---
  hw/ivshmem.c |   41 ++---
  1 files changed, 14 insertions(+), 27 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 242fbea..be26f03 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -58,7 +58,6 @@ typedef struct IVShmemState {
     CharDriverState *server_chr;
     MemoryRegion ivshmem_mmio;

 -    pcibus_t mmio_addr;
     /* We might need to register the BAR before we actually have the memory.
      * So prepare a container MemoryRegion for the BAR immediately and
      * add a subregion when we have the memory.
 @@ -346,8 +345,14 @@ static void close_guest_eventfds(IVShmemState *s, int 
 posn)
     guest_curr_max = s-peers[posn].nb_eventfds;

     for (i = 0; i  guest_curr_max; i++) {
 -        kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i],
 -                    s-mmio_addr + DOORBELL, (posn  16) | i, 0);
 +        if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 +            memory_region_del_eventfd(s-ivshmem_mmio,
 +                                     DOORBELL,
 +                                     4,
 +                                     true,
 +                                     (posn  16) | i,
 +                                     s-peers[posn].eventfds[i]);
 +        }
         close(s-peers[posn].eventfds[i]);
     }

 @@ -355,22 +360,6 @@ static void close_guest_eventfds(IVShmemState *s, int 
 posn)
     s-peers[posn].nb_eventfds = 0;
  }

 -static void setup_ioeventfds(IVShmemState *s) {
 -
 -    int i, j;
 -
 -    for (i = 0; i = s-max_peer; i++) {
 -        for (j = 0; j  s-peers[i].nb_eventfds; j++) {
 -            memory_region_add_eventfd(s-ivshmem_mmio,
 -                                      DOORBELL,
 -                                      4,
 -                                      true,
 -                                      (i  16) | j,
 -                                      s-peers[i].eventfds[j]);
 -        }
 -    }
 -}
 -
  /* this function increase the dynamic storage need to store data about other
  * guests */
  static void increase_dynamic_storage(IVShmemState *s, int new_min_size) {
 @@ -491,10 +480,12 @@ static void ivshmem_read(void *opaque, const uint8_t * 
 buf, int flags)
     }

     if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 -        if (kvm_set_ioeventfd_mmio_long(incoming_fd, s-mmio_addr + DOORBELL,
 -                        (incoming_posn  16) | guest_max_eventfd, 1)  0) {
 -            fprintf(stderr, ivshmem: ioeventfd not available\n);
 -        }
 +        memory_region_add_eventfd(s-ivshmem_mmio,
 +                                  DOORBELL,
 +                                  4,
 +                                  true,
 +                                  (incoming_posn  16) | guest_max_eventfd,
 +                                  incoming_fd);
     }

     return;
 @@ -659,10 +650,6 @@ static int pci_ivshmem_init(PCIDevice *dev)
     memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s,
                           ivshmem-mmio, IVSHMEM_REG_BAR_SIZE);

 -    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 -        setup_ioeventfds(s);
 -    }
 -
     /* region for registers*/
     pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
                      s-ivshmem_mmio);
 --
 1.7.1




Re: [Qemu-devel] [PATCH for v1.0 1/3] msix: track function masked in pci device state

2011-12-05 Thread Cam Macdonell
On Mon, Dec 5, 2011 at 2:08 AM, Michael S. Tsirkin m...@redhat.com wrote:
 On Sun, Dec 04, 2011 at 04:47:17PM -0700, Cam Macdonell wrote:
 On Sun, Dec 4, 2011 at 3:20 AM, Michael S. Tsirkin m...@redhat.com wrote:
  On Fri, Dec 02, 2011 at 04:34:21PM -0700, Cam Macdonell wrote:
  Based on a git bisect, this patch breaks msi-x interrupt delivery in
  the ivshmem device.
 
  I think the following should fix it. Compiled-only -
  could you pls check? If yes let's apply to the stable branch.

 Thanks for the patch Michael.

 It addresses the need for msix_write_config() to be called, but the
 addition of the msix_reset() is causing a reset of the vectors after
 they've been initialized in pci_ivshmem_init().  So, interrupts still
 aren't delivered with this patch applied as it is.

 In particular, a reset occurs after pci_ivshmem_init runs, so the
 msix_entry_used array is reset to 0s, which causes the interrupt
 delivery to fail.

 If I comment out the msix_reset(), then interrupts are delivered.
 Would the reset be caused by a bug in the guest driver?  or do I need
 to reconfigure the msix after reset?  I'm unclear as to the proper
 behaviour after a reset.

 Thanks,
 Cam


 I didn't anticipate this, virtio only configures vectors when msi-x is 
 enabled.
 For 1.0 it's safest to do the same and just re-enable after reset.

 So we'll end up with something like the following - does it help?

Yes, this fixes interrupt delivery.


 Side note: exit(1) is not the best way to handle
 errors in the init function. I think you should add error_report, then
 goto err on failures to init, cleanup what you have set up
 meanwhile and return an error code.

Thanks, I'll fix that.


 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 242fbea..c58f4d3 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -500,11 +500,29 @@ static void ivshmem_read(void *opaque, const uint8_t * 
 buf, int flags)
     return;
  }

 +/* Select the MSI-X vectors used by device.
 + * ivshmem maps events to vectors statically, so
 + * we just enable all vectors on init and after reset. */
 +static void ivshmem_use_msix(IVShmemState * s)
 +{
 +    int i;
 +
 +    if (!msix_present(s-dev)) {
 +        return;
 +    }
 +
 +    for (i = 0; i  s-vectors; i++) {
 +        msix_vector_use(s-dev, i);
 +    }
 +}
 +
  static void ivshmem_reset(DeviceState *d)
  {
     IVShmemState *s = DO_UPCAST(IVShmemState, dev.qdev, d);

     s-intrstatus = 0;
 +    msix_reset(s-dev);
 +    ivshmem_use_msix(s);
     return;
  }

 @@ -535,12 +553,8 @@ static uint64_t ivshmem_get_size(IVShmemState * s) {
     return value;
  }

 -static void ivshmem_setup_msi(IVShmemState * s) {
 -
 -    int i;
 -
 -    /* allocate the MSI-X vectors */
 -
 +static void ivshmem_setup_msi(IVShmemState * s)
 +{
     memory_region_init(s-msix_bar, ivshmem-msix, 4096);
     if (!msix_init(s-dev, s-vectors, s-msix_bar, 1, 0)) {
         pci_register_bar(s-dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
 @@ -551,13 +565,10 @@ static void ivshmem_setup_msi(IVShmemState * s) {
         exit(1);
     }

 -    /* 'activate' the vectors */
 -    for (i = 0; i  s-vectors; i++) {
 -        msix_vector_use(s-dev, i);
 -    }
 -
     /* allocate Qemu char devices for receiving interrupts */
     s-eventfd_table = g_malloc0(s-vectors * sizeof(EventfdEntry));
 +
 +    ivshmem_use_msix(s);
  }

  static void ivshmem_save(QEMUFile* f, void *opaque)
 @@ -610,6 +621,13 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int 
 version_id)
     return 0;
  }

 +static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address,
 +                                uint32_t val, int len)
 +{
 +    pci_default_write_config(pci_dev, address, val, len);
 +    msix_write_config(pci_dev, address, val, len);
 +}
 +
  static int pci_ivshmem_init(PCIDevice *dev)
  {
     IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 @@ -734,6 +752,8 @@ static int pci_ivshmem_init(PCIDevice *dev)

     }

 +    s-dev.config_write = ivshmem_write_config;
 +
     return 0;
  }





Re: [Qemu-devel] [PATCH] ivshmem: fix guest unable to start with ioeventfd

2011-12-05 Thread Cam Macdonell
2011/12/2 Cam Macdonell c...@cs.ualberta.ca:
 2011/11/30 Cam Macdonell c...@cs.ualberta.ca:
 2011/11/30 Zang Hongyong zanghongy...@huawei.com:
 Can this bug fix patch be applied yet?

 Sorry, for not replying yet.  I'll test your patch within the next day.

 Have you confirmed the proper receipt of interrupts in the receiving guests?

 I can confirm the bug occurs with ioeventfd enabled and that the
 patches fixes it, but sometime after 15.1, I no longer see interrupts
 (MSI or regular) being delivered in the guest.

 I will bisect tomorrow.

With Michael's help we debugged msi-x interrupt delivery.  With that
fix in place, this patch fixes ioeventfd in ivshmem.


 Cam


 With this bug, guest os cannot successfully boot with ioeventfd.
 Thus the new PIO DoorBell patch cannot be posted.

 Well, you can certainly post the new patch, just clarify that it's
 dependent on this patch.

 Sincerely,
 Cam


 Thanks,
 Hongyong

 于 2011/11/24,星期四 18:05, zanghongy...@huawei.com 写道:
 From: Hongyong Zang zanghongy...@huawei.com

 When a guest boots with ioeventfd, an error (by gdb) occurs:
   Program received signal SIGSEGV, Segmentation fault.
   0x006009cc in setup_ioeventfds (s=0x171dc40)
   at /home/louzhengwei/git_source/qemu-kvm/hw/ivshmem.c:363
   363 for (j = 0; j  s-peers[i].nb_eventfds; j++) {
 The bug is due to accessing s-peers which is NULL.

 This patch uses the memory region API to replace the old one 
 kvm_set_ioeventfd_mmio_long().
 And this patch makes memory_region_add_eventfd() called in ivshmem_read() 
 when qemu receives
 eventfd information from ivshmem_server.

 Signed-off-by: Hongyong Zang zanghongy...@huawei.com
 ---
  hw/ivshmem.c |   41 ++---
  1 files changed, 14 insertions(+), 27 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 242fbea..be26f03 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -58,7 +58,6 @@ typedef struct IVShmemState {
  CharDriverState *server_chr;
  MemoryRegion ivshmem_mmio;

 -pcibus_t mmio_addr;
  /* We might need to register the BAR before we actually have the 
 memory.
   * So prepare a container MemoryRegion for the BAR immediately and
   * add a subregion when we have the memory.
 @@ -346,8 +345,14 @@ static void close_guest_eventfds(IVShmemState *s, int 
 posn)
  guest_curr_max = s-peers[posn].nb_eventfds;

  for (i = 0; i  guest_curr_max; i++) {
 -kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i],
 -s-mmio_addr + DOORBELL, (posn  16) | i, 0);
 +if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 +memory_region_del_eventfd(s-ivshmem_mmio,
 + DOORBELL,
 + 4,
 + true,
 + (posn  16) | i,
 + s-peers[posn].eventfds[i]);
 +}
  close(s-peers[posn].eventfds[i]);
  }

 @@ -355,22 +360,6 @@ static void close_guest_eventfds(IVShmemState *s, int 
 posn)
  s-peers[posn].nb_eventfds = 0;
  }

 -static void setup_ioeventfds(IVShmemState *s) {
 -
 -int i, j;
 -
 -for (i = 0; i = s-max_peer; i++) {
 -for (j = 0; j  s-peers[i].nb_eventfds; j++) {
 -memory_region_add_eventfd(s-ivshmem_mmio,
 -  DOORBELL,
 -  4,
 -  true,
 -  (i  16) | j,
 -  s-peers[i].eventfds[j]);
 -}
 -}
 -}
 -
  /* this function increase the dynamic storage need to store data about 
 other
   * guests */
  static void increase_dynamic_storage(IVShmemState *s, int new_min_size) {
 @@ -491,10 +480,12 @@ static void ivshmem_read(void *opaque, const uint8_t 
 * buf, int flags)
  }

  if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 -if (kvm_set_ioeventfd_mmio_long(incoming_fd, s-mmio_addr + 
 DOORBELL,
 -(incoming_posn  16) | guest_max_eventfd, 1)  
 0) {
 -fprintf(stderr, ivshmem: ioeventfd not available\n);
 -}
 +memory_region_add_eventfd(s-ivshmem_mmio,
 +  DOORBELL,
 +  4,
 +  true,
 +  (incoming_posn  16) | 
 guest_max_eventfd,
 +  incoming_fd);
  }

  return;
 @@ -659,10 +650,6 @@ static int pci_ivshmem_init(PCIDevice *dev)
  memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s,
ivshmem-mmio, IVSHMEM_REG_BAR_SIZE);

 -if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 -setup_ioeventfds(s);
 -}
 -
  /* region for registers*/
  pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
   s-ivshmem_mmio);





Re: [Qemu-devel] [PATCH for v1.0 1/3] msix: track function masked in pci device state

2011-12-04 Thread Cam Macdonell
On Sun, Dec 4, 2011 at 3:20 AM, Michael S. Tsirkin m...@redhat.com wrote:
 On Fri, Dec 02, 2011 at 04:34:21PM -0700, Cam Macdonell wrote:
 Based on a git bisect, this patch breaks msi-x interrupt delivery in
 the ivshmem device.

 I think the following should fix it. Compiled-only -
 could you pls check? If yes let's apply to the stable branch.

Thanks for the patch Michael.

It addresses the need for msix_write_config() to be called, but the
addition of the msix_reset() is causing a reset of the vectors after
they've been initialized in pci_ivshmem_init().  So, interrupts still
aren't delivered with this patch applied as it is.

In particular, a reset occurs after pci_ivshmem_init runs, so the
msix_entry_used array is reset to 0s, which causes the interrupt
delivery to fail.

If I comment out the msix_reset(), then interrupts are delivered.
Would the reset be caused by a bug in the guest driver?  or do I need
to reconfigure the msix after reset?  I'm unclear as to the proper
behaviour after a reset.

Thanks,
Cam


 --

 ivshmem: add missing msix calls

 ivshmem used msix but didn't call it on either reset or
 config write paths. This used to partically work since
 guests don't use all of msi-x configuration fields,
 and reset is rarely used, but the patch 'msix: track function masked
 in pci device state' broke that. Fix by adding appropriate calls.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com

 --

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 242fbea..3680c0f 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -505,6 +505,7 @@ static void ivshmem_reset(DeviceState *d)
     IVShmemState *s = DO_UPCAST(IVShmemState, dev.qdev, d);

     s-intrstatus = 0;
 +    msix_reset(s-dev);
     return;
  }

 @@ -610,6 +611,13 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int 
 version_id)
     return 0;
  }

 +static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address,
 +                                uint32_t val, int len)
 +{
 +    pci_default_write_config(pci_dev, address, val, len);
 +    msix_write_config(pci_dev, address, val, len);
 +}
 +
  static int pci_ivshmem_init(PCIDevice *dev)
  {
     IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 @@ -734,6 +742,8 @@ static int pci_ivshmem_init(PCIDevice *dev)

     }

 +    s-dev.config_write = ivshmem_write_config;
 +
     return 0;
  }





Re: [Qemu-devel] [PATCH] ivshmem: fix guest unable to start with ioeventfd

2011-12-02 Thread Cam Macdonell
2011/11/30 Cam Macdonell c...@cs.ualberta.ca:
 2011/11/30 Zang Hongyong zanghongy...@huawei.com:
 Can this bug fix patch be applied yet?

 Sorry, for not replying yet.  I'll test your patch within the next day.

Have you confirmed the proper receipt of interrupts in the receiving guests?

I can confirm the bug occurs with ioeventfd enabled and that the
patches fixes it, but sometime after 15.1, I no longer see interrupts
(MSI or regular) being delivered in the guest.

I will bisect tomorrow.

Cam


 With this bug, guest os cannot successfully boot with ioeventfd.
 Thus the new PIO DoorBell patch cannot be posted.

 Well, you can certainly post the new patch, just clarify that it's
 dependent on this patch.

 Sincerely,
 Cam


 Thanks,
 Hongyong

 于 2011/11/24,星期四 18:05, zanghongy...@huawei.com 写道:
 From: Hongyong Zang zanghongy...@huawei.com

 When a guest boots with ioeventfd, an error (by gdb) occurs:
   Program received signal SIGSEGV, Segmentation fault.
   0x006009cc in setup_ioeventfds (s=0x171dc40)
   at /home/louzhengwei/git_source/qemu-kvm/hw/ivshmem.c:363
   363 for (j = 0; j  s-peers[i].nb_eventfds; j++) {
 The bug is due to accessing s-peers which is NULL.

 This patch uses the memory region API to replace the old one 
 kvm_set_ioeventfd_mmio_long().
 And this patch makes memory_region_add_eventfd() called in ivshmem_read() 
 when qemu receives
 eventfd information from ivshmem_server.

 Signed-off-by: Hongyong Zang zanghongy...@huawei.com
 ---
  hw/ivshmem.c |   41 ++---
  1 files changed, 14 insertions(+), 27 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 242fbea..be26f03 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -58,7 +58,6 @@ typedef struct IVShmemState {
  CharDriverState *server_chr;
  MemoryRegion ivshmem_mmio;

 -pcibus_t mmio_addr;
  /* We might need to register the BAR before we actually have the 
 memory.
   * So prepare a container MemoryRegion for the BAR immediately and
   * add a subregion when we have the memory.
 @@ -346,8 +345,14 @@ static void close_guest_eventfds(IVShmemState *s, int 
 posn)
  guest_curr_max = s-peers[posn].nb_eventfds;

  for (i = 0; i  guest_curr_max; i++) {
 -kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i],
 -s-mmio_addr + DOORBELL, (posn  16) | i, 0);
 +if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 +memory_region_del_eventfd(s-ivshmem_mmio,
 + DOORBELL,
 + 4,
 + true,
 + (posn  16) | i,
 + s-peers[posn].eventfds[i]);
 +}
  close(s-peers[posn].eventfds[i]);
  }

 @@ -355,22 +360,6 @@ static void close_guest_eventfds(IVShmemState *s, int 
 posn)
  s-peers[posn].nb_eventfds = 0;
  }

 -static void setup_ioeventfds(IVShmemState *s) {
 -
 -int i, j;
 -
 -for (i = 0; i = s-max_peer; i++) {
 -for (j = 0; j  s-peers[i].nb_eventfds; j++) {
 -memory_region_add_eventfd(s-ivshmem_mmio,
 -  DOORBELL,
 -  4,
 -  true,
 -  (i  16) | j,
 -  s-peers[i].eventfds[j]);
 -}
 -}
 -}
 -
  /* this function increase the dynamic storage need to store data about 
 other
   * guests */
  static void increase_dynamic_storage(IVShmemState *s, int new_min_size) {
 @@ -491,10 +480,12 @@ static void ivshmem_read(void *opaque, const uint8_t 
 * buf, int flags)
  }

  if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 -if (kvm_set_ioeventfd_mmio_long(incoming_fd, s-mmio_addr + 
 DOORBELL,
 -(incoming_posn  16) | guest_max_eventfd, 1)  0) 
 {
 -fprintf(stderr, ivshmem: ioeventfd not available\n);
 -}
 +memory_region_add_eventfd(s-ivshmem_mmio,
 +  DOORBELL,
 +  4,
 +  true,
 +  (incoming_posn  16) | 
 guest_max_eventfd,
 +  incoming_fd);
  }

  return;
 @@ -659,10 +650,6 @@ static int pci_ivshmem_init(PCIDevice *dev)
  memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s,
ivshmem-mmio, IVSHMEM_REG_BAR_SIZE);

 -if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 -setup_ioeventfds(s);
 -}
 -
  /* region for registers*/
  pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
   s-ivshmem_mmio);





Re: [Qemu-devel] [PATCH for v1.0 1/3] msix: track function masked in pci device state

2011-12-02 Thread Cam Macdonell
Based on a git bisect, this patch breaks msi-x interrupt delivery in
the ivshmem device.

On Mon, Nov 21, 2011 at 9:57 AM, Michael S. Tsirkin m...@redhat.com wrote:
 Only go over the table when function is masked.
 This is not really important for qemu.git but helps
 fix a bug in qemu-kvm.git.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  hw/msix.c |   21 ++---
  hw/pci.h  |    2 ++
  2 files changed, 16 insertions(+), 7 deletions(-)

 diff --git a/hw/msix.c b/hw/msix.c
 index b15bafc..63b41b9 100644
 --- a/hw/msix.c
 +++ b/hw/msix.c
 @@ -79,6 +79,7 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned 
 short nentries,
     /* Make flags bit writable. */
     pdev-wmask[config_offset + MSIX_CONTROL_OFFSET] |= MSIX_ENABLE_MASK |
            MSIX_MASKALL_MASK;
 +    pdev-msix_function_masked = true;
     return 0;

iiuc, this masks the msix by default.

  }

 @@ -117,16 +118,11 @@ static void msix_clr_pending(PCIDevice *dev, int vector)
     *msix_pending_byte(dev, vector) = ~msix_pending_mask(vector);
  }

 -static int msix_function_masked(PCIDevice *dev)
 -{
 -    return dev-config[dev-msix_cap + MSIX_CONTROL_OFFSET]  
 MSIX_MASKALL_MASK;
 -}
 -
  static int msix_is_masked(PCIDevice *dev, int vector)
  {
     unsigned offset =
         vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
 -    return msix_function_masked(dev) ||
 +    return dev-msix_function_masked ||
           dev-msix_table_page[offset]  PCI_MSIX_ENTRY_CTRL_MASKBIT;
  }

 @@ -138,24 +134,34 @@ static void msix_handle_mask_update(PCIDevice *dev, int 
 vector)
     }
  }

 +static void msix_update_function_masked(PCIDevice *dev)
 +{
 +    dev-msix_function_masked = !msix_enabled(dev) ||
 +        (dev-config[dev-msix_cap + MSIX_CONTROL_OFFSET]  
 MSIX_MASKALL_MASK);
 +}
 +
  /* Handle MSI-X capability config write. */
  void msix_write_config(PCIDevice *dev, uint32_t addr,
                        uint32_t val, int len)
  {
     unsigned enable_pos = dev-msix_cap + MSIX_CONTROL_OFFSET;
     int vector;
 +    bool was_masked;

     if (!range_covers_byte(addr, len, enable_pos)) {
         return;
     }

 +    was_masked = dev-msix_function_masked;
 +    msix_update_function_masked(dev);
 +
     if (!msix_enabled(dev)) {
         return;
     }

     pci_device_deassert_intx(dev);

 -    if (msix_function_masked(dev)) {
 +    if (dev-msix_function_masked == was_masked) {
         return;
     }

So I believe my bug is due to the fact the new logic included in this
patch requires msix_write_config() to be called to unmask the vectors.
 Virtio-pci calls msix_write_config(), but ivshmem does not (nor does
PCIe so I'm not sure if it's also affected).

I haven't been able to fix the bug yet, but I wanted to make sure I
was looking in the correct place.  Any help of further explanation of
this patch would be greatly appreciated.

Sincerely,
Cam


 @@ -300,6 +306,7 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
     msix_free_irq_entries(dev);
     qemu_get_buffer(f, dev-msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
     qemu_get_buffer(f, dev-msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 8);
 +    msix_update_function_masked(dev);
  }

  /* Does device support MSI-X? */
 diff --git a/hw/pci.h b/hw/pci.h
 index 4b2e785..625e717 100644
 --- a/hw/pci.h
 +++ b/hw/pci.h
 @@ -178,6 +178,8 @@ struct PCIDevice {
     unsigned *msix_entry_used;
     /* Region including the MSI-X table */
     uint32_t msix_bar_size;
 +    /* MSIX function mask set or MSIX disabled */
 +    bool msix_function_masked;
     /* Version id needed for VMState */
     int32_t version_id;

 --
 1.7.5.53.gc233e





Re: [Qemu-devel] [PATCH] ivshmem: fix guest unable to start with ioeventfd

2011-11-30 Thread Cam Macdonell
2011/11/30 Zang Hongyong zanghongy...@huawei.com:
 Can this bug fix patch be applied yet?

Sorry, for not replying yet.  I'll test your patch within the next day.

 With this bug, guest os cannot successfully boot with ioeventfd.
 Thus the new PIO DoorBell patch cannot be posted.

Well, you can certainly post the new patch, just clarify that it's
dependent on this patch.

Sincerely,
Cam


 Thanks,
 Hongyong

 于 2011/11/24,星期四 18:05, zanghongy...@huawei.com 写道:
 From: Hongyong Zang zanghongy...@huawei.com

 When a guest boots with ioeventfd, an error (by gdb) occurs:
   Program received signal SIGSEGV, Segmentation fault.
   0x006009cc in setup_ioeventfds (s=0x171dc40)
   at /home/louzhengwei/git_source/qemu-kvm/hw/ivshmem.c:363
   363 for (j = 0; j  s-peers[i].nb_eventfds; j++) {
 The bug is due to accessing s-peers which is NULL.

 This patch uses the memory region API to replace the old one 
 kvm_set_ioeventfd_mmio_long().
 And this patch makes memory_region_add_eventfd() called in ivshmem_read() 
 when qemu receives
 eventfd information from ivshmem_server.

 Signed-off-by: Hongyong Zang zanghongy...@huawei.com
 ---
  hw/ivshmem.c |   41 ++---
  1 files changed, 14 insertions(+), 27 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 242fbea..be26f03 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -58,7 +58,6 @@ typedef struct IVShmemState {
  CharDriverState *server_chr;
  MemoryRegion ivshmem_mmio;

 -pcibus_t mmio_addr;
  /* We might need to register the BAR before we actually have the memory.
   * So prepare a container MemoryRegion for the BAR immediately and
   * add a subregion when we have the memory.
 @@ -346,8 +345,14 @@ static void close_guest_eventfds(IVShmemState *s, int 
 posn)
  guest_curr_max = s-peers[posn].nb_eventfds;

  for (i = 0; i  guest_curr_max; i++) {
 -kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i],
 -s-mmio_addr + DOORBELL, (posn  16) | i, 0);
 +if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 +memory_region_del_eventfd(s-ivshmem_mmio,
 + DOORBELL,
 + 4,
 + true,
 + (posn  16) | i,
 + s-peers[posn].eventfds[i]);
 +}
  close(s-peers[posn].eventfds[i]);
  }

 @@ -355,22 +360,6 @@ static void close_guest_eventfds(IVShmemState *s, int 
 posn)
  s-peers[posn].nb_eventfds = 0;
  }

 -static void setup_ioeventfds(IVShmemState *s) {
 -
 -int i, j;
 -
 -for (i = 0; i = s-max_peer; i++) {
 -for (j = 0; j  s-peers[i].nb_eventfds; j++) {
 -memory_region_add_eventfd(s-ivshmem_mmio,
 -  DOORBELL,
 -  4,
 -  true,
 -  (i  16) | j,
 -  s-peers[i].eventfds[j]);
 -}
 -}
 -}
 -
  /* this function increase the dynamic storage need to store data about other
   * guests */
  static void increase_dynamic_storage(IVShmemState *s, int new_min_size) {
 @@ -491,10 +480,12 @@ static void ivshmem_read(void *opaque, const uint8_t * 
 buf, int flags)
  }

  if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 -if (kvm_set_ioeventfd_mmio_long(incoming_fd, s-mmio_addr + 
 DOORBELL,
 -(incoming_posn  16) | guest_max_eventfd, 1)  0) {
 -fprintf(stderr, ivshmem: ioeventfd not available\n);
 -}
 +memory_region_add_eventfd(s-ivshmem_mmio,
 +  DOORBELL,
 +  4,
 +  true,
 +  (incoming_posn  16) | guest_max_eventfd,
 +  incoming_fd);
  }

  return;
 @@ -659,10 +650,6 @@ static int pci_ivshmem_init(PCIDevice *dev)
  memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s,
ivshmem-mmio, IVSHMEM_REG_BAR_SIZE);

 -if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 -setup_ioeventfds(s);
 -}
 -
  /* region for registers*/
  pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
   s-ivshmem_mmio);





Re: [Qemu-devel] [PATCH v2] ivshmem: add a new PIO BAR3(Doorbell) besides MMIO BAR0 to reduce notification time

2011-11-28 Thread Cam Macdonell
On Thu, Nov 17, 2011 at 10:50 PM,  zanghongy...@huawei.com wrote:
 From: Hongyong Zang zanghongy...@huawei.com

 This patch, adds a PIO BAR3 for guest notifying qemu. And we find the new 
 notification way of PIO BAR3 reduces 30% time in comparison with the original 
 MMIO BAR0 way.

Come to think of it, should we bump the PIO to BAR4 so that the shared
memory region could be made a 64-bit BAR and therefore take up BAR2
and BAR3?


 Signed-off-by: Hongyong Zang zanghongy...@huawei.com
 ---
  hw/ivshmem.c |   24 ++--
  kvm-all.c    |   23 +++
  kvm.h        |    1 +
  3 files changed, 46 insertions(+), 2 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 242fbea..031cdd8 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -29,6 +29,7 @@
  #define IVSHMEM_MASTER  1

  #define IVSHMEM_REG_BAR_SIZE 0x100
 +#define IVSHIO_REG_BAR_SIZE 0x10

  //#define DEBUG_IVSHMEM
  #ifdef DEBUG_IVSHMEM
 @@ -57,8 +58,10 @@ typedef struct IVShmemState {
     CharDriverState **eventfd_chr;
     CharDriverState *server_chr;
     MemoryRegion ivshmem_mmio;
 +    MemoryRegion ivshmem_pio;

     pcibus_t mmio_addr;
 +    pcibus_t pio_addr;
     /* We might need to register the BAR before we actually have the memory.
      * So prepare a container MemoryRegion for the BAR immediately and
      * add a subregion when we have the memory.
 @@ -234,7 +237,7 @@ static uint64_t ivshmem_io_read(void *opaque, 
 target_phys_addr_t addr,
     return ret;
  }

 -static const MemoryRegionOps ivshmem_mmio_ops = {
 +static const MemoryRegionOps ivshmem_io_ops = {
     .read = ivshmem_io_read,
     .write = ivshmem_io_write,
     .endianness = DEVICE_NATIVE_ENDIAN,
 @@ -348,6 +351,8 @@ static void close_guest_eventfds(IVShmemState *s, int 
 posn)
     for (i = 0; i  guest_curr_max; i++) {
         kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i],
                     s-mmio_addr + DOORBELL, (posn  16) | i, 0);
 +        kvm_set_ioeventfd_pio_long(s-peers[posn].eventfds[i],
 +                    s-pio_addr + DOORBELL, (posn  16) | i, 0);
         close(s-peers[posn].eventfds[i]);
     }

 @@ -367,6 +372,12 @@ static void setup_ioeventfds(IVShmemState *s) {
                                       true,
                                       (i  16) | j,
                                       s-peers[i].eventfds[j]);
 +            memory_region_add_eventfd(s-ivshmem_pio,
 +                                      DOORBELL,
 +                                      4,
 +                                      true,
 +                                      (i  16) | j,
 +                                      s-peers[i].eventfds[j]);
         }
     }
  }
 @@ -495,6 +506,10 @@ static void ivshmem_read(void *opaque, const uint8_t * 
 buf, int flags)
                         (incoming_posn  16) | guest_max_eventfd, 1)  0) {
             fprintf(stderr, ivshmem: ioeventfd not available\n);
         }
 +        if (kvm_set_ioeventfd_pio_long(incoming_fd, s-pio_addr + DOORBELL,
 +                        (incoming_posn  16) | guest_max_eventfd, 1)  0) {
 +            fprintf(stderr, ivshmem: ioeventfd not available\n);
 +        }
     }

     return;
 @@ -656,8 +671,10 @@ static int pci_ivshmem_init(PCIDevice *dev)

     s-shm_fd = 0;

 -    memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s,
 +    memory_region_init_io(s-ivshmem_mmio, ivshmem_io_ops, s,
                           ivshmem-mmio, IVSHMEM_REG_BAR_SIZE);
 +    memory_region_init_io(s-ivshmem_pio, ivshmem_io_ops, s,
 +                          ivshmem-pio, IVSHIO_REG_BAR_SIZE);

     if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
         setup_ioeventfds(s);
 @@ -666,6 +683,8 @@ static int pci_ivshmem_init(PCIDevice *dev)
     /* region for registers*/
     pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
                      s-ivshmem_mmio);
 +    pci_register_bar(s-dev, 3, PCI_BASE_ADDRESS_SPACE_IO,
 +                     s-ivshmem_pio);

     memory_region_init(s-bar, ivshmem-bar2-container, s-ivshmem_size);

 @@ -742,6 +761,7 @@ static int pci_ivshmem_uninit(PCIDevice *dev)
     IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);

     memory_region_destroy(s-ivshmem_mmio);
 +    memory_region_destroy(s-ivshmem_pio);
     memory_region_del_subregion(s-bar, s-ivshmem);
     memory_region_destroy(s-ivshmem);
     memory_region_destroy(s-bar);
 diff --git a/kvm-all.c b/kvm-all.c
 index 5d500e1..737c2e2 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -1396,6 +1396,29 @@ int kvm_set_ioeventfd_mmio_long(int fd, uint32_t addr, 
 uint32_t val, bool assign
     return 0;
  }

 +int kvm_set_ioeventfd_pio_long(int fd, uint32_t addr, uint32_t val, bool 
 assign)
 +{
 +    struct kvm_ioeventfd kick = {
 +        .datamatch = val,
 +        .addr = addr,
 +        .len = 4,
 +        .flags = KVM_IOEVENTFD_FLAG_DATAMATCH | KVM_IOEVENTFD_FLAG_PIO,
 +        .fd = fd,
 +    };
 +    int r;
 +    if (!kvm_enabled()) {
 +        return -ENOSYS;
 

Re: [Qemu-devel] [PATCH] ivshmem: fix guest unable to start with ioeventfd

2011-11-24 Thread Cam Macdonell
On Thu, Nov 24, 2011 at 3:05 AM,  zanghongy...@huawei.com wrote:
 From: Hongyong Zang zanghongy...@huawei.com

 When a guest boots with ioeventfd, an error (by gdb) occurs:
  Program received signal SIGSEGV, Segmentation fault.
  0x006009cc in setup_ioeventfds (s=0x171dc40)
      at /home/louzhengwei/git_source/qemu-kvm/hw/ivshmem.c:363
  363             for (j = 0; j  s-peers[i].nb_eventfds; j++) {
 The bug is due to accessing s-peers which is NULL.

Can you share the command-line that caused the fault?


 This patch uses the memory region API to replace the old one 
 kvm_set_ioeventfd_mmio_long().
 And this patch makes memory_region_add_eventfd() called in ivshmem_read() 
 when qemu receives
 eventfd information from ivshmem_server.

Should this patch be split into two patches, to separate the bug fix
from the other changes related to the Memory API?  Unless I
misunderstand how the two are necessarily related.

Cam


 Signed-off-by: Hongyong Zang zanghongy...@huawei.com
 ---
  hw/ivshmem.c |   41 ++---
  1 files changed, 14 insertions(+), 27 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 242fbea..be26f03 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -58,7 +58,6 @@ typedef struct IVShmemState {
     CharDriverState *server_chr;
     MemoryRegion ivshmem_mmio;

 -    pcibus_t mmio_addr;
     /* We might need to register the BAR before we actually have the memory.
      * So prepare a container MemoryRegion for the BAR immediately and
      * add a subregion when we have the memory.
 @@ -346,8 +345,14 @@ static void close_guest_eventfds(IVShmemState *s, int 
 posn)
     guest_curr_max = s-peers[posn].nb_eventfds;

     for (i = 0; i  guest_curr_max; i++) {
 -        kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i],
 -                    s-mmio_addr + DOORBELL, (posn  16) | i, 0);
 +        if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 +            memory_region_del_eventfd(s-ivshmem_mmio,
 +                                     DOORBELL,
 +                                     4,
 +                                     true,
 +                                     (posn  16) | i,
 +                                     s-peers[posn].eventfds[i]);
 +        }
         close(s-peers[posn].eventfds[i]);
     }

 @@ -355,22 +360,6 @@ static void close_guest_eventfds(IVShmemState *s, int 
 posn)
     s-peers[posn].nb_eventfds = 0;
  }

 -static void setup_ioeventfds(IVShmemState *s) {
 -
 -    int i, j;
 -
 -    for (i = 0; i = s-max_peer; i++) {
 -        for (j = 0; j  s-peers[i].nb_eventfds; j++) {
 -            memory_region_add_eventfd(s-ivshmem_mmio,
 -                                      DOORBELL,
 -                                      4,
 -                                      true,
 -                                      (i  16) | j,
 -                                      s-peers[i].eventfds[j]);
 -        }
 -    }
 -}
 -
  /* this function increase the dynamic storage need to store data about other
  * guests */
  static void increase_dynamic_storage(IVShmemState *s, int new_min_size) {
 @@ -491,10 +480,12 @@ static void ivshmem_read(void *opaque, const uint8_t * 
 buf, int flags)
     }

     if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 -        if (kvm_set_ioeventfd_mmio_long(incoming_fd, s-mmio_addr + DOORBELL,
 -                        (incoming_posn  16) | guest_max_eventfd, 1)  0) {
 -            fprintf(stderr, ivshmem: ioeventfd not available\n);
 -        }
 +        memory_region_add_eventfd(s-ivshmem_mmio,
 +                                  DOORBELL,
 +                                  4,
 +                                  true,
 +                                  (incoming_posn  16) | guest_max_eventfd,
 +                                  incoming_fd);
     }

     return;
 @@ -659,10 +650,6 @@ static int pci_ivshmem_init(PCIDevice *dev)
     memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s,
                           ivshmem-mmio, IVSHMEM_REG_BAR_SIZE);

 -    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 -        setup_ioeventfds(s);
 -    }
 -
     /* region for registers*/
     pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
                      s-ivshmem_mmio);
 --
 1.7.1





Re: [Qemu-devel] [PATCH] ivshmem: use PIO for BAR0(Doorbell) instead of MMIO to reduce notification time

2011-11-15 Thread Cam Macdonell
On Sun, Nov 13, 2011 at 8:56 PM,  zanghongy...@huawei.com wrote:
 From: Hongyong Zang zanghongy...@huawei.com

 Ivshmem(nahanni) is a mechanism for sharing host memory with VMs running on 
 the same host. Currently, guest notifies qemu by reading or writing ivshmem 
 device's PCI MMIO BAR0(Doorbell).

 This patch, changes this PCI MMIO BAR0(Doorbell) to PIO. And we find guest 
 accesses PIO BAR 30% faster than MMIO BAR.

Nice work :)


 Test it with:
 Call 5,000,000 times writing PCI BAR0's DOORBELL register, we got the total 
 time as follows:
 linux command #time:
        MMIO(regular interrupt)  PIO(regular interrupt)    MMIO(msi+ioeventfd) 
  PIO(msi+ioeventfd)
 real    101.441s                 68.863s                   70.720s            
   49.521s
 user    0.391s                   0.305s                    0.404s             
   0.340s
 sys     46.308s                  30.634s                   38.740s            
   27.559s

Did you pin the VMs to cores?

You're sending between 5-10 notifications per second, did you
confirm that they are all being received?  Since eventfds do not
buffer, some may be lost at that rate.  Of course, one would expect
that a single notification should be faster based on these results,
but I'm just curious.

Do you know of any issues with mapping a PIO region to user-space with
the UIO driver framework?

Thanks,
Cam


 Signed-off-by: Hongyong Zang zanghongy...@huawei.com
 ---
  hw/ivshmem.c |   26 +-
  kvm-all.c    |   23 +++
  kvm.h        |    1 +
  3 files changed, 37 insertions(+), 13 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 242fbea..e68d0a7 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -28,7 +28,7 @@
  #define IVSHMEM_PEER    0
  #define IVSHMEM_MASTER  1

 -#define IVSHMEM_REG_BAR_SIZE 0x100
 +#define IVSHMEM_REG_BAR_SIZE 0x10

  //#define DEBUG_IVSHMEM
  #ifdef DEBUG_IVSHMEM
 @@ -56,9 +56,9 @@ typedef struct IVShmemState {

     CharDriverState **eventfd_chr;
     CharDriverState *server_chr;
 -    MemoryRegion ivshmem_mmio;
 +    MemoryRegion ivshmem_pio;

 -    pcibus_t mmio_addr;
 +    pcibus_t pio_addr;
     /* We might need to register the BAR before we actually have the memory.
      * So prepare a container MemoryRegion for the BAR immediately and
      * add a subregion when we have the memory.
 @@ -234,7 +234,7 @@ static uint64_t ivshmem_io_read(void *opaque, 
 target_phys_addr_t addr,
     return ret;
  }

 -static const MemoryRegionOps ivshmem_mmio_ops = {
 +static const MemoryRegionOps ivshmem_pio_ops = {
     .read = ivshmem_io_read,
     .write = ivshmem_io_write,
     .endianness = DEVICE_NATIVE_ENDIAN,
 @@ -346,8 +346,8 @@ static void close_guest_eventfds(IVShmemState *s, int 
 posn)
     guest_curr_max = s-peers[posn].nb_eventfds;

     for (i = 0; i  guest_curr_max; i++) {
 -        kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i],
 -                    s-mmio_addr + DOORBELL, (posn  16) | i, 0);
 +        kvm_set_ioeventfd_pio_long(s-peers[posn].eventfds[i],
 +                    s-pio_addr + DOORBELL, (posn  16) | i, 0);
         close(s-peers[posn].eventfds[i]);
     }

 @@ -361,7 +361,7 @@ static void setup_ioeventfds(IVShmemState *s) {

     for (i = 0; i = s-max_peer; i++) {
         for (j = 0; j  s-peers[i].nb_eventfds; j++) {
 -            memory_region_add_eventfd(s-ivshmem_mmio,
 +            memory_region_add_eventfd(s-ivshmem_pio,
                                       DOORBELL,
                                       4,
                                       true,
 @@ -491,7 +491,7 @@ static void ivshmem_read(void *opaque, const uint8_t * 
 buf, int flags)
     }

     if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 -        if (kvm_set_ioeventfd_mmio_long(incoming_fd, s-mmio_addr + DOORBELL,
 +        if (kvm_set_ioeventfd_pio_long(incoming_fd, s-pio_addr + DOORBELL,
                         (incoming_posn  16) | guest_max_eventfd, 1)  0) {
             fprintf(stderr, ivshmem: ioeventfd not available\n);
         }
 @@ -656,16 +656,16 @@ static int pci_ivshmem_init(PCIDevice *dev)

     s-shm_fd = 0;

 -    memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s,
 -                          ivshmem-mmio, IVSHMEM_REG_BAR_SIZE);
 +    memory_region_init_io(s-ivshmem_pio, ivshmem_pio_ops, s,
 +                          ivshmem-pio, IVSHMEM_REG_BAR_SIZE);

     if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
         setup_ioeventfds(s);
     }

     /* region for registers*/
 -    pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
 -                     s-ivshmem_mmio);
 +    pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_IO,
 +                     s-ivshmem_pio);

     memory_region_init(s-bar, ivshmem-bar2-container, s-ivshmem_size);

 @@ -741,7 +741,7 @@ static int pci_ivshmem_uninit(PCIDevice *dev)
  {
     IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);

 -    memory_region_destroy(s-ivshmem_mmio);
 +    

Re: [Qemu-devel] Questions regarding ivshmem spec

2011-08-30 Thread Cam Macdonell
On Thu, Aug 25, 2011 at 7:29 AM, Sasha Levin levinsasha...@gmail.com wrote:
 Hello,

 I am looking to implement an ivshmem device for KVM tools, the purpose
 is to provide same functionality as QEMU and interoperability with QEMU.

 Going through the spec (I found here:
 https://gitorious.org/nahanni/guest-code/blobs/master/device_spec.txt )
 and the code in QEMU I have gathered several questions, I'll be happy
 for some help with it.

 1. File handles and guest IDs are passed between the server and the
 peers using sockets, is the protocol itself documented anywhere? I would
 like to be able to work alongside QEMU servers/peers.

 2. The spec describes DOORBELL as an array of DWORDs, when one guest
 wants to poke a different guest it would write something into the offset
 of the other guest in the DOORBELL array.
 Looking at the implementation in QEMU, DOORBELL is one DWORD, when
 writing to it the upper WORD is the guest id and the lower WORD is the
 value.
 What am I missing here?

 3. There are 3 ways for guests to communicate between each other, and
 I'm assuming all guests using the same SHM block must use the same
 method. Is it safe to assume we'll always use ioeventfds as in the
 implementation now?

Guests can either access the shared memory region directly or use the
server, so I would count that as 2 ways to share memory via ivshmem.
Can you clarify what you mean by ways for guests to communicate?

As for ioeventfds, they are an optional optimization to eventfds.  We
use eventfds currently, but since the guest VMs only see file
descriptors (passed from the server) another mechanism could replace
eventfds if there was some reason to.

Cam



Re: [Qemu-devel] Questions regarding ivshmem spec

2011-08-29 Thread Cam Macdonell
On Mon, Aug 29, 2011 at 9:25 AM, Sasha Levin levinsasha...@gmail.com wrote:
 On Thu, 2011-08-25 at 16:29 +0300, Sasha Levin wrote:
 Hello,

 I am looking to implement an ivshmem device for KVM tools, the purpose
 is to provide same functionality as QEMU and interoperability with QEMU.

 [snip]

 1. File handles and guest IDs are passed between the server and the
 peers using sockets, is the protocol itself documented anywhere? I would
 like to be able to work alongside QEMU servers/peers.

 I'm still wondering if someone could do a quick sketch of the
 client-server protocol or possibly point me to something that documents
 it?

Hi Sasha,

I have something like that.  I'll be in touch when I find it.

Cam


 --

 Sasha.





Re: [Qemu-devel] [PATCH] vmstate: Add unmigratable flag

2011-06-21 Thread Cam Macdonell
On Mon, Jun 20, 2011 at 3:05 AM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2011-06-19 22:46, Cam Macdonell wrote:
 On Thu, Jun 9, 2011 at 2:39 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2011-06-09 22:00, Anthony Liguori wrote:
 On 06/09/2011 11:44 AM, Jan Kiszka wrote:
 A first step towards getting rid of register_device_unmigratable
 (ivshmem and lacking vmstate support in virtio are blocking this):

 Allow to register an unmigratable vmstate via qdev, i.e. tag a device
 declaratively.

 I thought part of the problem with this was that for some devices (like
 ivshmem), whether it can be migrated was dynamic.  It depends on
 configuration, state, etc.

 That only applies to ivshmem (the other user is device assignment which
 is unconditionally unmigratable). And the ivshmem issue could easily be
 solved by defining two devices, ivshmem-peer (or just ivshmem) and
 ivshmem-master, eliminating the need for the role property.

 I don't think there will ever be a use case for a transformer device
 that becomes unmigratable during runtime (would be a nightmare for
 management apps anyway).

 If breaking the user interface of ivshmem for this is OK, I'll post a patch.

 Jan



 The migratability of ivshmem is not dynamic in that it doesn't change
 at runtime, it's set when the device is created, either role=peer or
 role=master is specified.  So iiuc, this could work with ivshmem.

 So you are fine with breaking the interface? Everyone else as well? Then
 I'll cook a patch to sort at least this out for 0.15.


To be clear, this would break the interface in that a device cannot
specify whether it's migratable via a parameter?

 Jan

 --
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux





Re: [Qemu-devel] [PATCH] vmstate: Add unmigratable flag

2011-06-19 Thread Cam Macdonell
On Thu, Jun 9, 2011 at 2:39 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2011-06-09 22:00, Anthony Liguori wrote:
 On 06/09/2011 11:44 AM, Jan Kiszka wrote:
 A first step towards getting rid of register_device_unmigratable
 (ivshmem and lacking vmstate support in virtio are blocking this):

 Allow to register an unmigratable vmstate via qdev, i.e. tag a device
 declaratively.

 I thought part of the problem with this was that for some devices (like
 ivshmem), whether it can be migrated was dynamic.  It depends on
 configuration, state, etc.

 That only applies to ivshmem (the other user is device assignment which
 is unconditionally unmigratable). And the ivshmem issue could easily be
 solved by defining two devices, ivshmem-peer (or just ivshmem) and
 ivshmem-master, eliminating the need for the role property.

 I don't think there will ever be a use case for a transformer device
 that becomes unmigratable during runtime (would be a nightmare for
 management apps anyway).

 If breaking the user interface of ivshmem for this is OK, I'll post a patch.

 Jan



The migratability of ivshmem is not dynamic in that it doesn't change
at runtime, it's set when the device is created, either role=peer or
role=master is specified.  So iiuc, this could work with ivshmem.

Cam



Re: [Qemu-devel] [PATCH] RFC: delay pci_update_mappings for 64-bit BARs

2010-12-14 Thread Cam Macdonell
On Mon, Dec 13, 2010 at 8:00 PM, Isaku Yamahata yamah...@valinux.co.jp wrote:
 On Mon, Dec 13, 2010 at 03:43:44PM -0700, Cam Macdonell wrote:
 Do not call pci_update_mappings on the lower 32-bits of a 64-bit bar.  Wait 
 for the upper 32 or else Qemu will try to map on just the lower 32 which is 
 probably going to corrupt memory.

 I was encountering crashes when mapping certain PCI region sizes.  The 
 problem turns out that pci_update_mappings is being called without all 
 64-bits in the BAR.  For example when mapping to 0x18000, once the lower 
 32-bits were written the remapping happened (mapping to 0x800) which 
 would overwrite something.

 I'm not certain if this is completely correct, I'm simply testing the lower 
 4-bits to only be MEM_TYPE_64 flag.  Upper 32-bit address parts can be 
 values like 0xff which is tricky to test against.

 You're assuming that guest OS always write lower 32bit and them upper 32bit.
 Is the assumption correct?
 I found Linux does, but I don't know about other OSes.
 And I couldn't find any sentence about how to update (64bit) BAR in the specs.
 (Please correct me if I missed it)

I think you're right, we probably can't assume the order.


 Some work around would be necessary regardless of 32bit-or-64bit.
 because qemu doesn't emulate bus accurately at the moment.
 How about the followings?
 If BAR overlaps with RAM, don't map BAR.
 If BAR overlaps with other BARs, record the overlapping and
 when updating one of the BARs, update all the overlapping BARs.
 Which BAR wins depends on the order of updating, it doesn't matter because
 it's anomaly case.

But the addresses in the BARs may not overlap.  For example, Linux
allocates memory from top down, so I recently had the mapping of a BAR
to address 0xffc000

So BAR 0x18 sees 0xc004
Then BAR 0x1c sees 0xff

So if I understand what you mean by overlapping BARs, 0xc000 and
0xffc000 will not be detected as overlapping and so we can't
record it.  But, we can allow harmless mappings of the incomplete
lower-32 to proceed and then get remapped when the upper bits are
written.  (This is what happens currently, but fails when the lower-32
overwrite RAM).

Case of writing upper-then-lower (non-Linux case):
The addresses in the upper 32-bits are going to be limited to 16-bits
(at most 48-bit addresses currently) and so those shouldn't update
mappings because they will overlap with RAM.  When the lower-bits are
written, we have the full 64-bit address and can update mappings.

Case of writing lower-then-upper:
If the lower 32-bit BAR address doesn't conflict with RAM, map it.
When the upper bits are written, update to the correct mapping.

We would just have to ensure the first mapping is indeed harmless.

Would that work?
Cam


 This way, 32bit BAR case is also covered.

 thanks,


 Cam
 ---
  hw/pci.c |    5 -
  1 files changed, 4 insertions(+), 1 deletions(-)

 diff --git a/hw/pci.c b/hw/pci.c
 index 438c0d1..3b81792 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -1000,6 +1000,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
 addr, uint32_t val, int l)
  {
      int i, was_irq_disabled = pci_irq_disabled(d);
      uint32_t config_size = pci_config_size(d);
 +    int is_64 = 0;
 +
 +    is_64 = ((val  0xf) == PCI_BASE_ADDRESS_MEM_TYPE_64);

      for (i = 0; i  l  addr + i  config_size; val = 8, ++i) {
          uint8_t wmask = d-wmask[addr + i];
 @@ -1008,7 +1011,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
 addr, uint32_t val, int l)
          d-config[addr + i] = (d-config[addr + i]  ~wmask) | (val  
 wmask);
          d-config[addr + i] = ~(val  w1cmask); /* W1C: Write 1 to Clear */
      }
 -    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
 +    if ((ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24)  (!is_64)) ||
          ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
          ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
          range_covers_byte(addr, l, PCI_COMMAND))
 --
 1.7.0.4



 --
 yamahata





[Qemu-devel] [PATCH] RFC: delay pci_update_mappings for 64-bit BARs

2010-12-13 Thread Cam Macdonell
Do not call pci_update_mappings on the lower 32-bits of a 64-bit bar.  Wait for 
the upper 32 or else Qemu will try to map on just the lower 32 which is 
probably going to corrupt memory.

I was encountering crashes when mapping certain PCI region sizes.  The problem 
turns out that pci_update_mappings is being called without all 64-bits in the 
BAR.  For example when mapping to 0x18000, once the lower 32-bits were 
written the remapping happened (mapping to 0x800) which would overwrite 
something.

I'm not certain if this is completely correct, I'm simply testing the lower 
4-bits to only be MEM_TYPE_64 flag.  Upper 32-bit address parts can be values 
like 0xff which is tricky to test against.

Cam
---
 hw/pci.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 438c0d1..3b81792 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1000,6 +1000,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
addr, uint32_t val, int l)
 {
 int i, was_irq_disabled = pci_irq_disabled(d);
 uint32_t config_size = pci_config_size(d);
+int is_64 = 0;
+
+is_64 = ((val  0xf) == PCI_BASE_ADDRESS_MEM_TYPE_64);
 
 for (i = 0; i  l  addr + i  config_size; val = 8, ++i) {
 uint8_t wmask = d-wmask[addr + i];
@@ -1008,7 +1011,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
addr, uint32_t val, int l)
 d-config[addr + i] = (d-config[addr + i]  ~wmask) | (val  wmask);
 d-config[addr + i] = ~(val  w1cmask); /* W1C: Write 1 to Clear */
 }
-if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
+if ((ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24)  (!is_64)) ||
 ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
 ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
 range_covers_byte(addr, l, PCI_COMMAND))
-- 
1.7.0.4




[Qemu-devel] [PATCH 1/2] add qemu_ram_free_from_ptr

2010-12-01 Thread Cam Macdonell
add function to free memory from Qemu that was added via 
qemu_ram_alloc_from_ptr.  Name is a little weird.  This is copied from 
qemu_ram_unmap from qemu-kvm.

Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 cpu-common.h |1 +
 exec.c   |   13 +
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index a543b5d..3f802e1 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -44,6 +44,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const 
char *name,
 ram_addr_t size, void *host);
 ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size);
 void qemu_ram_free(ram_addr_t addr);
+void qemu_ram_free_from_ptr(ram_addr_t addr);
 /* This should only be used for ram local to a device.  */
 void *qemu_get_ram_ptr(ram_addr_t addr);
 /* This should not be used by devices.  */
diff --git a/exec.c b/exec.c
index db9ff55..1f5c8f8 100644
--- a/exec.c
+++ b/exec.c
@@ -2875,6 +2875,19 @@ ram_addr_t qemu_ram_alloc(DeviceState *dev, const char 
*name, ram_addr_t size)
 return qemu_ram_alloc_from_ptr(dev, name, size, NULL);
 }
 
+void qemu_ram_free_from_ptr(ram_addr_t addr)
+{
+RAMBlock *block;
+
+QLIST_FOREACH(block, ram_list.blocks, next) {
+if (addr == block-offset) {
+QLIST_REMOVE(block, next);
+qemu_free(block);
+return;
+}
+}
+}
+
 void qemu_ram_free(ram_addr_t addr)
 {
 RAMBlock *block;
-- 
1.7.0.4




[Qemu-devel] [PATCH 2/2] Unregister shared memory on unplug.

2010-12-01 Thread Cam Macdonell
This allows 'peer' ivshmem guests to detach from shared memory before migration
and re-attach after migration is complete.

Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 hw/ivshmem.c |   12 
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 06dce70..9254fad 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -68,6 +68,7 @@ typedef struct IVShmemState {
 int nb_peers; /* how many guests we have space for */
 int max_peer; /* maximum numbered peer */
 
+void *shm_ptr;
 int vm_id;
 uint32_t vectors;
 uint32_t features;
@@ -365,14 +366,13 @@ static int check_shm_size(IVShmemState *s, int fd) {
  * create the BAR and map the memory immediately */
 static void create_shared_memory_BAR(IVShmemState *s, int fd) {
 
-void * ptr;
-
 s-shm_fd = fd;
 
-ptr = mmap(0, s-ivshmem_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+s-shm_ptr = mmap(0, s-ivshmem_size, PROT_READ|PROT_WRITE,
+MAP_SHARED, fd, 0);
 
 s-ivshmem_offset = qemu_ram_alloc_from_ptr(s-dev.qdev, ivshmem.bar2,
-s-ivshmem_size, ptr);
+s-ivshmem_size, s-shm_ptr);
 
 /* region for shared memory */
 pci_register_bar(s-dev, 2, s-ivshmem_size,
@@ -797,6 +797,10 @@ static int pci_ivshmem_uninit(PCIDevice *dev)
 {
 IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 
+cpu_register_physical_memory(s-shm_pci_addr, s-ivshmem_size,
+IO_MEM_UNASSIGNED);
+qemu_ram_free_from_ptr(s-ivshmem_offset);
+munmap(s-shm_ptr, s-ivshmem_size);
 cpu_unregister_io_memory(s-ivshmem_mmio_io_addr);
 unregister_savevm(dev-qdev, ivshmem, s);
 
-- 
1.7.0.4




[Qemu-devel] [PATCH 0/2] fixups for ivshmem unplug/migration

2010-11-30 Thread Cam Macdonell
We added qemu_ram_alloc_from_ptr, but we need its corresponding free 
function.  With that, we can properly remove the ivshmem memory on unplug.

Cam Macdonell (2):
  add qemu_ram_free_from_ptr
  Unregister shared memory on unplug.

 cpu-common.h |1 +
 exec.c   |   13 +
 hw/ivshmem.c |   12 
 3 files changed, 22 insertions(+), 4 deletions(-)




[Qemu-devel] Re: [PATCH] pci: allow hotplug removal of cold-plugged devices

2010-11-15 Thread Cam Macdonell
On Sun, Nov 14, 2010 at 7:18 AM, Michael S. Tsirkin m...@redhat.com wrote:
 This patch fixes 5beb8ad503c88a76f2b8106c3b74b4ce485a60e1
 which broke hotplug removal of cold plugged devices:

 - pass addition/removal state to hotplug callbacks
 - use that in piix and pcie

 This also fixes an assert on hotplug removal of coldplugged
 express devices.

 Reported-by: by Cam Macdonell c...@cs.ualberta.ca.
 Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

Acked-by: Cam Macdonell c...@cs.ualberta.ca

 ---

 So I think the below would be the cleanest way
 to fix the bug as we keep the hot/cold plug logic
 in a central palce. Untested. Comments? Cam?

Yes, it seems to fix the problem.



 diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
 index 66c7885..f549089 100644
 --- a/hw/acpi_piix4.c
 +++ b/hw/acpi_piix4.c
 @@ -585,7 +585,8 @@ static void pciej_write(void *opaque, uint32_t addr, 
 uint32_t val)
     PIIX4_DPRINTF(pciej write %x == %d\n, addr, val);
  }

 -static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, int 
 state);
 +static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
 +                                PCIHotplugState state);

  static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
  {
 @@ -615,18 +616,23 @@ static void disable_device(PIIX4PMState *s, int slot)
     s-pci0_status.down |= (1  slot);
  }

 -static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, int state)
 +static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
 +                               PCIHotplugState state)
  {
     int slot = PCI_SLOT(dev-devfn);
     PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev,
                                 DO_UPCAST(PCIDevice, qdev, qdev));

 -    if (!dev-qdev.hotplugged)
 +    /* Don't send event when device is enabled during qemu machine creation:
 +     * it is present on boot, no hotplug event is necessary. We do send an
 +     * event when the device is disabled later. */
 +    if (state == PCI_COLDPLUG_ENABLED) {
         return 0;
 +    }

     s-pci0_status.up = 0;
     s-pci0_status.down = 0;
 -    if (state) {
 +    if (state == PCI_HOTPLUG_ENABLED) {
         enable_device(s, slot);
     } else {
         disable_device(s, slot);
 diff --git a/hw/pci.c b/hw/pci.c
 index 30e1603..316b24f 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -1566,8 +1566,11 @@ static int pci_qdev_init(DeviceState *qdev, DeviceInfo 
 *base)
     pci_add_option_rom(pci_dev);

     if (bus-hotplug) {
 -        /* lower layer must check qdev-hotplugged */
 -        rc = bus-hotplug(bus-hotplug_qdev, pci_dev, 1);
 +        /* Let buses differentiate between hotplug and when device is
 +         * enabled during qemu machine creation. */
 +        rc = bus-hotplug(bus-hotplug_qdev, pci_dev,
 +                          qdev-hotplugged ? PCI_HOTPLUG_ENABLED:
 +                          PCI_COLDPLUG_ENABLED);
         if (rc != 0) {
             int r = pci_unregister_device(pci_dev-qdev);
             assert(!r);
 @@ -1581,7 +1584,8 @@ static int pci_unplug_device(DeviceState *qdev)
  {
     PCIDevice *dev = DO_UPCAST(PCIDevice, qdev, qdev);

 -    return dev-bus-hotplug(dev-bus-hotplug_qdev, dev, 0);
 +    return dev-bus-hotplug(dev-bus-hotplug_qdev, dev,
 +                             PCI_HOTPLUG_DISABLED);
  }

  void pci_qdev_register(PCIDeviceInfo *info)
 diff --git a/hw/pci.h b/hw/pci.h
 index 7100804..09b3e4c 100644
 --- a/hw/pci.h
 +++ b/hw/pci.h
 @@ -214,7 +214,15 @@ int pci_device_load(PCIDevice *s, QEMUFile *f);

  typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level);
  typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num);
 -typedef int (*pci_hotplug_fn)(DeviceState *qdev, PCIDevice *pci_dev, int 
 state);
 +
 +typedef enum {
 +    PCI_HOTPLUG_DISABLED,
 +    PCI_HOTPLUG_ENABLED,
 +    PCI_COLDPLUG_ENABLED,
 +} PCIHotplugState;
 +
 +typedef int (*pci_hotplug_fn)(DeviceState *qdev, PCIDevice *pci_dev,
 +                              PCIHotplugState state);
  void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
                          const char *name, int devfn_min);
  PCIBus *pci_bus_new(DeviceState *parent, const char *name, int devfn_min);
 diff --git a/hw/pcie.c b/hw/pcie.c
 index 35918f7..4df48b8 100644
 --- a/hw/pcie.c
 +++ b/hw/pcie.c
 @@ -192,14 +192,16 @@ static void pcie_cap_slot_event(PCIDevice *dev, 
 PCIExpressHotPlugEvent event)
  }

  static int pcie_cap_slot_hotplug(DeviceState *qdev,
 -                                 PCIDevice *pci_dev, int state)
 +                                 PCIDevice *pci_dev, PCIHotplugState state)
  {
     PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
     uint8_t *exp_cap = d-config + d-exp.exp_cap;
     uint16_t sltsta = pci_get_word(exp_cap + PCI_EXP_SLTSTA);

 -    if (!pci_dev-qdev.hotplugged) {
 -        assert(state); /* this case only happens at machine creation. */
 +    /* Don't send event when device

[Qemu-devel] Cannot not unplug cold-plugged devices

2010-11-11 Thread Cam Macdonell
Hi,

I was trying to do a device_del on my ivshmem device and it won't
work unless the device is added via hotplug.  If the device is
coldplugged (added at startup) then nothing happens.  I think I
tracked this behaviour to the patch below.

Is not allowing coldplugged devices to be unplugged the desired behaviour?

Thanks,
Cam

commit 5beb8ad503c88a76f2b8106c3b74b4ce485a60e1
Author: Isaku Yamahata yamah...@valinux.co.jp
Date:   Mon Sep 6 16:46:18 2010 +0900

pci: call hotplug callback even when not hotplug case for later use.

call hotplug callback even when not hotplug case for later use.
And move hotplug check into hotplug callback.
PCIE slot needs this for card presence detection.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
Signed-off-by: Michael S. Tsirkin m...@redhat.com

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index bfa1d9a..24dfcf2 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -611,6 +611,9 @@ static int piix4_device_hotplug(DeviceState *qdev,
PCIDevice *dev, int state)
 PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev,
 DO_UPCAST(PCIDevice, qdev, qdev));

+if (!dev-qdev.hotplugged)
+return 0;
+
 s-pci0_status.up = 0;
 s-pci0_status.down = 0;
 if (state) {



[Qemu-devel] ACPI error when mapping a 2GB BAR w/ 4GB of RAM

2010-09-17 Thread Cam Macdonell
After fixing the resource_size_t return value with
pci_resource_alignment, I see one other strange behaviour only when
using 4GB of RAM and a 2GB BAR.  I haven't found any other combination
of RAM/BAR size that triggers this bug.  I am using 2.6.36-rc3.

ACPI Error: The DSDT has been corrupted or replaced - old, new headers
below (20100702/tbutils-372)
ACPI: DSDT (null) 01F15 (v01   BXPC   BXDSDT 0001 INTL 20090123)
ACPI:  (null) 0 (v00   )
ACPI Error: Please send DMI info to linux-a...@vger.kernel.org
If system does not work as expected, please boot with acpi=copy_dsdt
(20100702/tbutils-378)
ACPI: PCI Interrupt Link [LNKC] disabled and referenced, BIOS bug
ACPI Exception: AE_AML_INVALID_RESOURCE_TYPE, Evaluating _CRS
(20100702/pci_link-283)
ACPI: Unable to set IRQ for PCI Interrupt Link [LNKC]. Try pci=noacpi
or acpi=off
virtio-pci :00:03.0: PCI INT A: no GSI - using ISA IRQ 11
Non-volatile memory driver v1.3
Linux agpgart interface v0.103
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled

with acpi=off, the ACPI error output goes away, but the boot does not
get any further.

here are the PCI writes from Qemu related to this BAR's assignment in
case they are helpful,

pci_write_config: (val) 0x8004 - 0x18 (addr)
IVSHMEM: guest pci addr = 8000, guest h/w addr = 4312137728, size = 8000
pci_read_config: (val) 0x8004 - 0x18 (addr)
pci_write_config: (val) 0x1 - 0x1c (addr)
IVSHMEM: guest pci addr = 18000, guest h/w addr = 4312137728, size
= 8000
pci_read_config: (val) 0x1 - 0x1c (addr)

Any pointers are appreciated,
Cam



Re: [Qemu-devel] Re: ACPI error when mapping a 2GB BAR w/ 4GB of RAM

2010-09-17 Thread Cam Macdonell
On Fri, Sep 17, 2010 at 2:04 PM, Chris Wright chr...@redhat.com wrote:
 * Cam Macdonell (c...@cs.ualberta.ca) wrote:
 After fixing the resource_size_t return value with
 pci_resource_alignment, I see one other strange behaviour only when
 using 4GB of RAM and a 2GB BAR.  I haven't found any other combination
 of RAM/BAR size that triggers this bug.  I am using 2.6.36-rc3.

 ACPI Error: The DSDT has been corrupted or replaced - old, new headers
 below (20100702/tbutils-372)
 ACPI: DSDT (null) 01F15 (v01   BXPC   BXDSDT 0001 INTL 20090123)
 ACPI:      (null) 0 (v00                       )
 ACPI Error: Please send DMI info to linux-a...@vger.kernel.org
 If system does not work as expected, please boot with acpi=copy_dsdt
 (20100702/tbutils-378)
 ACPI: PCI Interrupt Link [LNKC] disabled and referenced, BIOS bug
 ACPI Exception: AE_AML_INVALID_RESOURCE_TYPE, Evaluating _CRS
 (20100702/pci_link-283)
 ACPI: Unable to set IRQ for PCI Interrupt Link [LNKC]. Try pci=noacpi
 or acpi=off
 virtio-pci :00:03.0: PCI INT A: no GSI - using ISA IRQ 11
 Non-volatile memory driver v1.3
 Linux agpgart interface v0.103
 Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled

 IIRC,  the pci hole is only 1.5G in the BIOS, can you verify that
 seabios is doing the right thing?

I'm not sure what the right thing for seabios to do is.  Here is the
seabios output related to the device.

PCI: bus=0 devfn=0x20: vendor_id=0x1af4 device_id=0x1110
region 0: 0xf204
init smm
init boot device ordering

snip

Attempting to init PCI bdf 00:04.0 (vd 1af4:1110)
Attempting to map option rom on dev 00:04.0
Option rom sizing returned 0 0
Checking rom 0x000c9800 (sig aa55 size 17)
Checking rom 0x000cc000 (sig aa55 size 2)
Checking rom 0x000c9000 (sig aa55 size 4)
Checking rom 0x000c9800 (sig aa55 size 17)
Checking rom 0x000cc000 (sig aa55 size 2)
Mapping hd drive 0x000fdb50 to 0
Running option rom at c980:0003
Running option rom at cc00:0003
pmm_malloc zone=0x000f515c handle= size=36 align=10
ret=0x000fdaf0 (detail=0x7ffefca0)
ebda moved from 9fc00 to 9f400
pmm_malloc zone=0x000f5154 handle= size=2048 align=10
ret=0x0009f800 (detail=0x7ffefb40)
finalize PMM
malloc finalize

when using a BAR of 2GB or less, there is an additional write to the
PCI space of the device, which may be from the bios

pci_write_config: (val) 0x - 0x18 (addr)
pci_read_config: (val) 0x8004 - 0x18 (addr)
pci_write_config: (val) 0x4 - 0x18 (addr)
pci_write_config: (val) 0x3 - 0x4 (addr)
pci_read_config: (val) 0x0 - 0x1c (addr)
pci_write_config: (val) 0x - 0x1c (addr)
IVSHMEM: guest pci addr = , guest h/w addr =
4312137728, size = 8000

so is it succeeding with smaller sizes ( 2GB) because it fits in the
bios' pci hole?

Cam


 thanks,
 -chris





Re: [Qemu-devel] Re: ACPI error when mapping a 2GB BAR w/ 4GB of RAM

2010-09-17 Thread Cam Macdonell
On Fri, Sep 17, 2010 at 2:52 PM, Cam Macdonell c...@cs.ualberta.ca wrote:
 On Fri, Sep 17, 2010 at 2:04 PM, Chris Wright chr...@redhat.com wrote:
 * Cam Macdonell (c...@cs.ualberta.ca) wrote:
 After fixing the resource_size_t return value with
 pci_resource_alignment, I see one other strange behaviour only when
 using 4GB of RAM and a 2GB BAR.  I haven't found any other combination
 of RAM/BAR size that triggers this bug.  I am using 2.6.36-rc3.

 ACPI Error: The DSDT has been corrupted or replaced - old, new headers
 below (20100702/tbutils-372)
 ACPI: DSDT (null) 01F15 (v01   BXPC   BXDSDT 0001 INTL 20090123)
 ACPI:      (null) 0 (v00                       )
 ACPI Error: Please send DMI info to linux-a...@vger.kernel.org
 If system does not work as expected, please boot with acpi=copy_dsdt
 (20100702/tbutils-378)
 ACPI: PCI Interrupt Link [LNKC] disabled and referenced, BIOS bug
 ACPI Exception: AE_AML_INVALID_RESOURCE_TYPE, Evaluating _CRS
 (20100702/pci_link-283)
 ACPI: Unable to set IRQ for PCI Interrupt Link [LNKC]. Try pci=noacpi
 or acpi=off
 virtio-pci :00:03.0: PCI INT A: no GSI - using ISA IRQ 11
 Non-volatile memory driver v1.3
 Linux agpgart interface v0.103
 Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled

 IIRC,  the pci hole is only 1.5G in the BIOS, can you verify that
 seabios is doing the right thing?

 I'm not sure what the right thing for seabios to do is.  Here is the
 seabios output related to the device.

 PCI: bus=0 devfn=0x20: vendor_id=0x1af4 device_id=0x1110
 region 0: 0xf204
 init smm
 init boot device ordering

 snip

 Attempting to init PCI bdf 00:04.0 (vd 1af4:1110)
 Attempting to map option rom on dev 00:04.0
 Option rom sizing returned 0 0
 Checking rom 0x000c9800 (sig aa55 size 17)
 Checking rom 0x000cc000 (sig aa55 size 2)
 Checking rom 0x000c9000 (sig aa55 size 4)
 Checking rom 0x000c9800 (sig aa55 size 17)
 Checking rom 0x000cc000 (sig aa55 size 2)
 Mapping hd drive 0x000fdb50 to 0
 Running option rom at c980:0003
 Running option rom at cc00:0003
 pmm_malloc zone=0x000f515c handle= size=36 align=10
 ret=0x000fdaf0 (detail=0x7ffefca0)
 ebda moved from 9fc00 to 9f400
 pmm_malloc zone=0x000f5154 handle= size=2048 align=10
 ret=0x0009f800 (detail=0x7ffefb40)
 finalize PMM
 malloc finalize

 when using a BAR of 2GB or less, there is an additional write to the
 PCI space of the device, which may be from the bios

 pci_write_config: (val) 0x - 0x18 (addr)
 pci_read_config: (val) 0x8004 - 0x18 (addr)
 pci_write_config: (val) 0x4 - 0x18 (addr)
 pci_write_config: (val) 0x3 - 0x4 (addr)
 pci_read_config: (val) 0x0 - 0x1c (addr)
 pci_write_config: (val) 0x - 0x1c (addr)
 IVSHMEM: guest pci addr = , guest h/w addr =
 4312137728, size = 8000

 so is it succeeding with smaller sizes ( 2GB) because it fits in the
 bios' pci hole?

sorry that should be  2GB.


 Cam


 thanks,
 -chris






Re: [Qemu-devel] Re: ACPI error when mapping a 2GB BAR w/ 4GB of RAM

2010-09-17 Thread Cam Macdonell
On Fri, Sep 17, 2010 at 3:15 PM, Chris Wright chr...@redhat.com wrote:
 * Cam Macdonell (c...@cs.ualberta.ca) wrote:
 On Fri, Sep 17, 2010 at 2:52 PM, Cam Macdonell c...@cs.ualberta.ca wrote:
  On Fri, Sep 17, 2010 at 2:04 PM, Chris Wright chr...@redhat.com wrote:
  * Cam Macdonell (c...@cs.ualberta.ca) wrote:
  After fixing the resource_size_t return value with
  pci_resource_alignment, I see one other strange behaviour only when
  using 4GB of RAM and a 2GB BAR.  I haven't found any other combination
  of RAM/BAR size that triggers this bug.  I am using 2.6.36-rc3.
 
  ACPI Error: The DSDT has been corrupted or replaced - old, new headers
  below (20100702/tbutils-372)
  ACPI: DSDT (null) 01F15 (v01   BXPC   BXDSDT 0001 INTL 20090123)
  ACPI:      (null) 0 (v00                       )
  ACPI Error: Please send DMI info to linux-a...@vger.kernel.org
  If system does not work as expected, please boot with acpi=copy_dsdt
  (20100702/tbutils-378)
  ACPI: PCI Interrupt Link [LNKC] disabled and referenced, BIOS bug
  ACPI Exception: AE_AML_INVALID_RESOURCE_TYPE, Evaluating _CRS
  (20100702/pci_link-283)
  ACPI: Unable to set IRQ for PCI Interrupt Link [LNKC]. Try pci=noacpi
  or acpi=off
  virtio-pci :00:03.0: PCI INT A: no GSI - using ISA IRQ 11
  Non-volatile memory driver v1.3
  Linux agpgart interface v0.103
  Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
 
  IIRC,  the pci hole is only 1.5G in the BIOS, can you verify that
  seabios is doing the right thing?
 
  I'm not sure what the right thing for seabios to do is.  Here is the
  seabios output related to the device.
 
  PCI: bus=0 devfn=0x20: vendor_id=0x1af4 device_id=0x1110
  region 0: 0xf204
  init smm
  init boot device ordering
 
  snip
 
  Attempting to init PCI bdf 00:04.0 (vd 1af4:1110)
  Attempting to map option rom on dev 00:04.0
  Option rom sizing returned 0 0
  Checking rom 0x000c9800 (sig aa55 size 17)
  Checking rom 0x000cc000 (sig aa55 size 2)
  Checking rom 0x000c9000 (sig aa55 size 4)
  Checking rom 0x000c9800 (sig aa55 size 17)
  Checking rom 0x000cc000 (sig aa55 size 2)
  Mapping hd drive 0x000fdb50 to 0
  Running option rom at c980:0003
  Running option rom at cc00:0003
  pmm_malloc zone=0x000f515c handle= size=36 align=10
  ret=0x000fdaf0 (detail=0x7ffefca0)
  ebda moved from 9fc00 to 9f400
  pmm_malloc zone=0x000f5154 handle= size=2048 align=10
  ret=0x0009f800 (detail=0x7ffefb40)
  finalize PMM
  malloc finalize
 
  when using a BAR of 2GB or less, there is an additional write to the
  PCI space of the device, which may be from the bios
 
  pci_write_config: (val) 0x - 0x18 (addr)
  pci_read_config: (val) 0x8004 - 0x18 (addr)
  pci_write_config: (val) 0x4 - 0x18 (addr)
  pci_write_config: (val) 0x3 - 0x4 (addr)
  pci_read_config: (val) 0x0 - 0x1c (addr)
  pci_write_config: (val) 0x - 0x1c (addr)
  IVSHMEM: guest pci addr = , guest h/w addr =
  4312137728, size = 8000
 
  so is it succeeding with smaller sizes ( 2GB) because it fits in the
  bios' pci hole?

 sorry that should be  2GB.

 It seems most likely... 2GB also means = 1GB (which would fit in
 the hole).  Although, I have to admit, I'm not sure how seabios handles
 the hole nowadays.

 What about 2GB with 32bit BAR?

The bios seems to ignore it and the guest can't map it as it cannot
find an alignment for it.  Seems strange since with 1GB of RAM and 2GB
BAR, everything should map below 4GB.


 thanks,
 -chris





Re: [Qemu-devel] Guest cannot handle a PCI BAR 1GB

2010-09-06 Thread Cam Macdonell
On Sun, Sep 5, 2010 at 10:50 AM, Avi Kivity a...@redhat.com wrote:

  On 09/04/2010 01:22 AM, Cam Macdonell wrote:

 Hi,

 I'm trying to test 2 GB (and eventually larger) BARs with ivshmem and
 I get an error in the guest that it is able to find a mem resource for
 a BAR larger than 1GB.  I'm using 64-bit BARs.

 when running with 6GB of RAM and a 1GB BAR for ivshmem, it finds a
 resource (and searches beyond 32-bit values to find it).  Here is a
 log from printfs added to the loop that searches the resources from
 find_resource() in kernel/resource.c:363.


 This is a kernel question, not a qemu issue.  Copying lkml.


  trying 'tmp.start' 1000 to
 'tmp.end' fff
 trying 'tmp.start' 9f400 to
 'tmp.end' 9f3ff
 trying 'tmp.start' a to
 'tmp.end' e
 trying 'tmp.start' 10 to
 'tmp.end' f
 trying 'tmp.start' dfffd000 to
 'tmp.end' dfffcfff
 trying 'tmp.start' e000 to
 'tmp.end' efff
 trying 'tmp.start' f200 to
 'tmp.end' f1ff
 trying 'tmp.start' f2001000 to
 'tmp.end' f200
 trying 'tmp.start' f202 to
 'tmp.end' f201
 trying 'tmp.start' f2021000 to
 'tmp.end' f202
 trying 'tmp.start' f204 to
 'tmp.end' f203
 trying 'tmp.start' f2040100 to
 'tmp.end' febf
 trying 'tmp.start' fec00400 to
 'tmp.end' fffb
 trying 'tmp.start' 1 to
 'tmp.end' 
 trying 'tmp.start' 1a000 to
 'tmp.end' 
 pci :00:04.0: BAR 2: assigned [mem 0x1c000-0x1 64bit]
 pci :00:04.0: BAR 2: set to [mem 0x1c000-0x1 64bit]
 (PCI address [0x1c000-0x1]

 and you can see the BAR is successfully assigned.

 However, with a 2GB BAR (below), the search fails, but it also never
 searches beyong 32-bits.  Again, all that's changed is the size of the
 ivshmem region.

 trying 'tmp.start' 1000 to
 'tmp.end' fff
 trying 'tmp.start' 9f400 to
 'tmp.end' 9f3ff
 trying 'tmp.start' a to
 'tmp.end' e
 trying 'tmp.start' 10 to
 'tmp.end' f
 trying 'tmp.start' dfffd000 to
 'tmp.end' dfffcfff
 pci :00:04.0: BAR 2: can't assign mem (size 0x8000)

 Is there a limit to PCI BAR sizes or resources?  Any pointers or
 further debugging tips are greatly appreciated.


 What kernel version are you looking at?


latest kvm git, 2.6.36-rc2+



 Please add printks to the loop so we can see this-start and this-end.  It
 smells like a truncation issue.


Success with a 1GB BAR

this-start 1000, this-end 9f3ff
this-start 9f400, this-end 9
this-start f, this-end f
this-start 10, this-end dfffcfff
this-start dfffd000, this-end dfff
this-start f000, this-end f1ff
this-start f200, this-end f2000fff
this-start f201, this-end f201
this-start f202, this-end f2020fff
this-start f203, this-end f203
this-start f204, this-end f20400ff
this-start fec0, this-end fec003ff
this-start fffc, this-end 
this-start 1, this-end 11fff
tmp.start 12000, tmp.end 
pci :00:04.0: BAR 2: assigned [mem 0x14000-0x17fff 64bit]

and when it fails with a 2GB BAR, the following is printed

this-start 1000, this-end 9f3ff
this-start 9f400, this-end 9
this-start f, this-end f
this-start 10, this-end dfffcfff
this-start dfffd000, this-end dfff
pci :00:04.0: BAR 2: can't assign mem (size 0x8000)

I added a few more debug statements and found that in the failure case, the
function returns that it found a region (the last one printed before the
error).  I've added printfs for the two tests in the if that determine when
a region is found:

   if (tmp.start  tmp.end  tmp.end - tmp.start = size - 1) {
new-start = tmp.start;
new-end = tmp.start + size - 1;
printk(KERN_INFO returning 0\n);
return 0;
}

this-start 1000, this-end 9f3ff
tmp.start 8000, tmp.end fff
true: 8fff = 7fff
this-start 9f400, this-end 9
tmp.start 8000, tmp.end 9f3ff
true: 8009f3ff = 7fff
this-start f, this-end f
tmp.start 8000, tmp.end e
true: 800e = 7fff
this-start 10, this-end dfffcfff
tmp.start 8000, tmp.end f
true: 800f = 7fff
this-start dfffd000, this-end dfff
tmp.start 10, tmp.end dfffcfff
true: 10  dfffcfff
true: dfefcfff = 7fff
returning 0
pci :00:04.0: BAR 2: can't assign mem (size 0x8000)







 --
 error compiling committee.c: too many arguments to function




Re: [Qemu-devel] Guest cannot handle a PCI BAR 1GB

2010-09-06 Thread Cam Macdonell
On Mon, Sep 6, 2010 at 10:37 AM, Cam Macdonell c...@cs.ualberta.ca wrote:


 On Sun, Sep 5, 2010 at 10:50 AM, Avi Kivity a...@redhat.com wrote:

  On 09/04/2010 01:22 AM, Cam Macdonell wrote:

 Hi,

 I'm trying to test 2 GB (and eventually larger) BARs with ivshmem and
 I get an error in the guest that it is able to find a mem resource for
 a BAR larger than 1GB.  I'm using 64-bit BARs.

 when running with 6GB of RAM and a 1GB BAR for ivshmem, it finds a
 resource (and searches beyond 32-bit values to find it).  Here is a
 log from printfs added to the loop that searches the resources from
 find_resource() in kernel/resource.c:363.


 This is a kernel question, not a qemu issue.  Copying lkml.

 trying 'tmp.start' 1000 to
         'tmp.end' fff
 trying 'tmp.start' 9f400 to
         'tmp.end' 9f3ff
 trying 'tmp.start' a to
         'tmp.end' e
 trying 'tmp.start' 10 to
         'tmp.end' f
 trying 'tmp.start' dfffd000 to
         'tmp.end' dfffcfff
 trying 'tmp.start' e000 to
         'tmp.end' efff
 trying 'tmp.start' f200 to
         'tmp.end' f1ff
 trying 'tmp.start' f2001000 to
         'tmp.end' f200
 trying 'tmp.start' f202 to
         'tmp.end' f201
 trying 'tmp.start' f2021000 to
         'tmp.end' f202
 trying 'tmp.start' f204 to
         'tmp.end' f203
 trying 'tmp.start' f2040100 to
         'tmp.end' febf
 trying 'tmp.start' fec00400 to
         'tmp.end' fffb
 trying 'tmp.start' 1 to
         'tmp.end' 
 trying 'tmp.start' 1a000 to
         'tmp.end' 
 pci :00:04.0: BAR 2: assigned [mem 0x1c000-0x1 64bit]
 pci :00:04.0: BAR 2: set to [mem 0x1c000-0x1 64bit]
 (PCI address [0x1c000-0x1]

 and you can see the BAR is successfully assigned.

 However, with a 2GB BAR (below), the search fails, but it also never
 searches beyong 32-bits.  Again, all that's changed is the size of the
 ivshmem region.

 trying 'tmp.start' 1000 to
         'tmp.end' fff
 trying 'tmp.start' 9f400 to
         'tmp.end' 9f3ff
 trying 'tmp.start' a to
         'tmp.end' e
 trying 'tmp.start' 10 to
         'tmp.end' f
 trying 'tmp.start' dfffd000 to
         'tmp.end' dfffcfff
 pci :00:04.0: BAR 2: can't assign mem (size 0x8000)

 Is there a limit to PCI BAR sizes or resources?  Any pointers or
 further debugging tips are greatly appreciated.


 What kernel version are you looking at?

 latest kvm git, 2.6.36-rc2+


 Please add printks to the loop so we can see this-start and this-end.  It 
 smells like a truncation issue.

 Success with a 1GB BAR
 this-start 1000, this-end 9f3ff
 this-start 9f400, this-end 9
 this-start f, this-end f
 this-start 10, this-end dfffcfff
 this-start dfffd000, this-end dfff
 this-start f000, this-end f1ff
 this-start f200, this-end f2000fff
 this-start f201, this-end f201
 this-start f202, this-end f2020fff
 this-start f203, this-end f203
 this-start f204, this-end f20400ff
 this-start fec0, this-end fec003ff
 this-start fffc, this-end 
 this-start 1, this-end 11fff
 tmp.start 12000, tmp.end 
 pci :00:04.0: BAR 2: assigned [mem 0x14000-0x17fff 64bit]
 and when it fails with a 2GB BAR, the following is printed
 this-start 1000, this-end 9f3ff
 this-start 9f400, this-end 9
 this-start f, this-end f
 this-start 10, this-end dfffcfff
 this-start dfffd000, this-end dfff
 pci :00:04.0: BAR 2: can't assign mem (size 0x8000)
 I added a few more debug statements and found that in the failure case, the 
 function returns that it found a region (the last one printed before the 
 error).  I've added printfs for the two tests in the if that determine when a 
 region is found:
        if (tmp.start  tmp.end  tmp.end - tmp.start = size - 1) {
             new-start = tmp.start;
             new-end = tmp.start + size - 1;
             printk(KERN_INFO returning 0\n);
             return 0;
         }
 this-start 1000, this-end 9f3ff
 tmp.start 8000, tmp.end fff
     true: 8fff = 7fff
 this-start 9f400, this-end 9
 tmp.start 8000, tmp.end 9f3ff
     true: 8009f3ff = 7fff
 this-start f, this-end f
 tmp.start 8000, tmp.end e
     true: 800e = 7fff
 this-start 10, this-end dfffcfff
 tmp.start 8000, tmp.end f
     true: 800f = 7fff
 this-start dfffd000, this-end dfff
 tmp.start 10, tmp.end dfffcfff
     true: 10  dfffcfff
     true: dfefcfff = 7fff
 returning 0
 pci :00:04.0: BAR 2: can't assign mem (size 0x8000)

Further to this, it seems tmp.start is getting set to zero by the ALIGN macro

2GB BAR:
this-start dfffd000, this-end dfff
tmp.start dfffd000
tmp.start 0
tmp.start 10, tmp.end dfffcfff

1GB BAR:
this-start dfffd000, this-end dfff
tmp.start dfffd000
tmp.start

Re: [Qemu-devel] virtio-serial question

2010-09-03 Thread Cam Macdonell
On Fri, Sep 3, 2010 at 2:57 AM, Vasiliy G Tolstov v.tols...@selfip.ru wrote:
 Hello. Can somebody provide minimal example code that using
 virtio-serial to communicate with guest os? (work on guest side and
 hypervisor side)

Hi Vasiliy,

There are basic examples here:

http://fedoraproject.org/wiki/Features/VirtioSerial

Cam


 --
 Vasiliy G Tolstov v.tols...@selfip.ru
 Selfip.Ru





[Qemu-devel] Guest cannot handle a PCI BAR 1GB

2010-09-03 Thread Cam Macdonell
Hi,

I'm trying to test 2 GB (and eventually larger) BARs with ivshmem and
I get an error in the guest that it is able to find a mem resource for
a BAR larger than 1GB.  I'm using 64-bit BARs.

when running with 6GB of RAM and a 1GB BAR for ivshmem, it finds a
resource (and searches beyond 32-bit values to find it).  Here is a
log from printfs added to the loop that searches the resources from
find_resource() in kernel/resource.c:363.

trying 'tmp.start' 1000 to
'tmp.end' fff
trying 'tmp.start' 9f400 to
'tmp.end' 9f3ff
trying 'tmp.start' a to
'tmp.end' e
trying 'tmp.start' 10 to
'tmp.end' f
trying 'tmp.start' dfffd000 to
'tmp.end' dfffcfff
trying 'tmp.start' e000 to
'tmp.end' efff
trying 'tmp.start' f200 to
'tmp.end' f1ff
trying 'tmp.start' f2001000 to
'tmp.end' f200
trying 'tmp.start' f202 to
'tmp.end' f201
trying 'tmp.start' f2021000 to
'tmp.end' f202
trying 'tmp.start' f204 to
'tmp.end' f203
trying 'tmp.start' f2040100 to
'tmp.end' febf
trying 'tmp.start' fec00400 to
'tmp.end' fffb
trying 'tmp.start' 1 to
'tmp.end' 
trying 'tmp.start' 1a000 to
'tmp.end' 
pci :00:04.0: BAR 2: assigned [mem 0x1c000-0x1 64bit]
pci :00:04.0: BAR 2: set to [mem 0x1c000-0x1 64bit]
(PCI address [0x1c000-0x1]

and you can see the BAR is successfully assigned.

However, with a 2GB BAR (below), the search fails, but it also never
searches beyong 32-bits.  Again, all that's changed is the size of the
ivshmem region.

trying 'tmp.start' 1000 to
'tmp.end' fff
trying 'tmp.start' 9f400 to
'tmp.end' 9f3ff
trying 'tmp.start' a to
'tmp.end' e
trying 'tmp.start' 10 to
'tmp.end' f
trying 'tmp.start' dfffd000 to
'tmp.end' dfffcfff
pci :00:04.0: BAR 2: can't assign mem (size 0x8000)

Is there a limit to PCI BAR sizes or resources?  Any pointers or
further debugging tips are greatly appreciated.

Thanks,
Cam



[Qemu-devel] Re: [PATCH] hw/ivshmem.c don't check for negative values on unsigned data types

2010-08-31 Thread Cam Macdonell
On Mon, Aug 30, 2010 at 4:31 AM,  jes.soren...@redhat.com wrote:
 From: Jes Sorensen jes.soren...@redhat.com

 There is no need to check for dest  0 or vector = 0 as both are
 uint16_t.

 This should fix problems with broken build with aggressive compiler
 flags. Reported by Xudong Hao xudong@intel.com

 Signed-off-by: Jes Sorensen jes.soren...@redhat.com
 ---
  hw/ivshmem.c |    4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index bbb5cba..afebbc3 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -199,13 +199,13 @@ static void ivshmem_io_writel(void *opaque, 
 target_phys_addr_t addr,

         case DOORBELL:
             /* check that dest VM ID is reasonable */
 -            if ((dest  0) || (dest  s-max_peer)) {
 +            if (dest  s-max_peer) {
                 IVSHMEM_DPRINTF(Invalid destination VM ID (%d)\n, dest);
                 break;
             }

             /* check doorbell range */
 -            if ((vector = 0)  (vector  s-peers[dest].nb_eventfds)) {
 +            if (vector  s-peers[dest].nb_eventfds) {
                 IVSHMEM_DPRINTF(Writing % PRId64  to VM %d on vector %d\n,
                                                     write_one, dest, vector);
                 if (write(s-peers[dest].eventfds[vector],
 --
 1.7.2.2



Acked-by: Cam Macdonell c...@cs.ualberta.ca

I sent a patch yesterday that addressed it, but it is probably better
to remove the tests that switched to signed 16-bit ints.

Anthony, please apply this one.

Cam



[Qemu-devel] Re: [PATCH] hw/ivshmem.c don't check for negative values on unsigned data types

2010-08-31 Thread Cam Macdonell
On Tue, Aug 31, 2010 at 6:51 PM, Hao, Xudong xudong@intel.com wrote:
 Hao, Xudong wrote:
 jes.soren...@redhat.com wrote:
 From: Jes Sorensen jes.soren...@redhat.com

 There is no need to check for dest  0 or vector = 0 as both are
 uint16_t.

 This should fix problems with broken build with aggressive compiler
 flags. Reported by Xudong Hao xudong@intel.com

 Signed-off-by: Jes Sorensen jes.soren...@redhat.com ---
  hw/ivshmem.c |    4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index bbb5cba..afebbc3 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -199,13 +199,13 @@ static void ivshmem_io_writel(void *opaque,
 target_phys_addr_t addr,

          case DOORBELL:
              /* check that dest VM ID is reasonable */
 -            if ((dest  0) || (dest  s-max_peer)) {
 +            if (dest  s-max_peer) {
                  IVSHMEM_DPRINTF(Invalid destination VM ID (%d)\n,
                  dest); break;
              }

              /* check doorbell range */
 -            if ((vector = 0)  (vector 
 s-peers[dest].nb_eventfds)) { +            if (vector 
                  s-peers[dest].nb_eventfds) {

                  IVSHMEM_DPRINTF(Writing % PRId64  to VM %d on
 vector %d\n, write_one, dest, vector); if
 (write(s-peers[dest].eventfds[vector],

 This patch works for me.


 Jes, correct result is this patch works for me on x86_64 system. However, in 
 i386 system, there is another bug:

 ...
  CC    x86_64-softmmu/ivshmem.o
  CC    x86_64-softmmu/fpu/softfloat-native.o
  CC    x86_64-softmmu/op_helper.o
 cc1: warnings being treated as errors
 /home/build/gitrepo/qemu/hw/ivshmem.c: In function 'check_shm_size.
 /home/build/gitrepo/qemu/hw/ivshmem.c:357: warning: format '%ld' expects type 
 'long int', but argt 5 has type '__off64_t
 make[1]: *** [ivshmem.o] Error 1
 make[1]: *** Waiting for unfinished jobs
 make: *** [subdir-x86_64-softmmu] Error 2

Hmm, that was causing problems on 32-bit systems, not 64-bit.  Is this
with gcc 4.1.2?  Please the try the following patch from Avi that
should add the necessary cast.

http://www.mail-archive.com/qemu-devel@nongnu.org/msg40715.html

Cam



[Qemu-devel] Re: [PATCH] hw/ivshmem.c don't check for negative values on unsigned data types

2010-08-31 Thread Cam Macdonell
On Tue, Aug 31, 2010 at 9:56 PM, Cam Macdonell c...@cs.ualberta.ca wrote:
 On Tue, Aug 31, 2010 at 6:51 PM, Hao, Xudong xudong@intel.com wrote:
 Hao, Xudong wrote:
 jes.soren...@redhat.com wrote:
 From: Jes Sorensen jes.soren...@redhat.com

 There is no need to check for dest  0 or vector = 0 as both are
 uint16_t.

 This should fix problems with broken build with aggressive compiler
 flags. Reported by Xudong Hao xudong@intel.com

 Signed-off-by: Jes Sorensen jes.soren...@redhat.com ---
  hw/ivshmem.c |    4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index bbb5cba..afebbc3 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -199,13 +199,13 @@ static void ivshmem_io_writel(void *opaque,
 target_phys_addr_t addr,

          case DOORBELL:
              /* check that dest VM ID is reasonable */
 -            if ((dest  0) || (dest  s-max_peer)) {
 +            if (dest  s-max_peer) {
                  IVSHMEM_DPRINTF(Invalid destination VM ID (%d)\n,
                  dest); break;
              }

              /* check doorbell range */
 -            if ((vector = 0)  (vector 
 s-peers[dest].nb_eventfds)) { +            if (vector 
                  s-peers[dest].nb_eventfds) {

                  IVSHMEM_DPRINTF(Writing % PRId64  to VM %d on
 vector %d\n, write_one, dest, vector); if
 (write(s-peers[dest].eventfds[vector],

 This patch works for me.


 Jes, correct result is this patch works for me on x86_64 system. However, in 
 i386 system, there is another bug:

 ...
  CC    x86_64-softmmu/ivshmem.o
  CC    x86_64-softmmu/fpu/softfloat-native.o
  CC    x86_64-softmmu/op_helper.o
 cc1: warnings being treated as errors
 /home/build/gitrepo/qemu/hw/ivshmem.c: In function 'check_shm_size.
 /home/build/gitrepo/qemu/hw/ivshmem.c:357: warning: format '%ld' expects 
 type 'long int', but argt 5 has type '__off64_t
 make[1]: *** [ivshmem.o] Error 1
 make[1]: *** Waiting for unfinished jobs
 make: *** [subdir-x86_64-softmmu] Error 2

 Hmm, that was causing problems on 32-bit systems, not 64-bit.  Is this
 with gcc 4.1.2?

Oh sorry, I read too fast, you did say it was on 32-bit.  Avi's patch
will do the trick.

Cam



[Qemu-devel] [PATCH] Use signed 16-bit values for ivshmem register writes

2010-08-30 Thread Cam Macdonell
fixes gcc 4.1 warning

Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 hw/ivshmem.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index bbb5cba..fa9c684 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -181,8 +181,8 @@ static void ivshmem_io_writel(void *opaque, 
target_phys_addr_t addr,
 IVShmemState *s = opaque;
 
 uint64_t write_one = 1;
-uint16_t dest = val  16;
-uint16_t vector = val  0xff;
+int16_t dest = val  16;
+int16_t vector = val  0xff;
 
 addr = 0xfc;
 
-- 
1.6.6.1




Re: [Qemu-devel] Template for developing a Qemu device with PCIe and MSI-X

2010-08-27 Thread Cam Macdonell
On Wed, Aug 25, 2010 at 4:39 PM, Adnan Khaleel ad...@khaleel.us wrote:
 Hi Isaku,

 I've made some progress in coding the device template but its no where near
 complete.

 I've created some files and am attaching it to this note. Based on what I
 could gather from the pcie source files I've made a stab at creating a
 simple model. I've also attached a file for a simple pci device that works
 under regular Qemu. I would like to duplicate its functionality in your pcie
 environment for starters.

 Could you please take a look at the files I've created and tell me if I've
 understood your pcie model correctly. Any help will be truly appreciated.

 Adnan

Hi Adnan,

There is a fairly simple device I've created called ivshmem that is
in the qemu git tree.  It is a regular PCI device that exports a
shared memory object via a BAR and supports a few registers and
optional MSI-X interrupts (I had to pick through the virtio code to
get MSI-X working, so looking at ivshmem might save you some effort).
My device is somewhat similar to a graphics card actually which I
recall is your goal.   The purpose of ivshmem is to support sharing
memory between multiple guests running on the same host.  It follows
the qdev model which you will need to do.

Cam


 The five files I've modified from your git repository are as follows

 hw/pci_ids.h    // Added vendor id defines
 hw/pc_q35.c    // Device instantiation
 hw/pcie_msix_template.h  // Device header file
 hw/pcie_msix_template.c  // Device file
 Makefile.objs   // Added pcie_msix_template.o to list of
 objects being built

 Everything should compile without any warnings or errors.

 The last file:
 sc_link_pci.c
 Is the original PCI device that I'm trying to convert into being PCIe and
 MSI-X and is included merely for reference to help you understand what I'd
 like to achieve in your environment.


 
 From: Isaku Yamahata [mailto:yamah...@valinux.co.jp]
 To: Adnan Khaleel [mailto:ad...@khaleel.us]
 Cc: qemu-devel@nongnu.org
 Sent: Wed, 18 Aug 2010 22:19:04 -0500
 Subject: Re: [Qemu-devel] Template for developing a Qemu device with PCIe
 and MSI-X

 On Wed, Aug 18, 2010 at 02:10:10PM -0500, Adnan Khaleel wrote:
 Hello Qemu developers,

 I'm interested in developing a device model that plugs into Qemu that is
 based
 on a PCIe interface and uses MSI-X. My goal is to ultimately attach a GPU
 simulator to this PCIe interface and use the entire platfom (Qemu + GPU
 simulator) for studying cpu, gpu interactions.

 I'm not terribly familiar with the Qemu device model and I'm looking for
 some
 assistance, perhaps a starting template for pcie and msi-x that would
 offer the
 basic functionality that I could then build upon.

 I have looked at the various devices that already modelled that are
 included
 with Qemu (v0.12.5 at least) and I've noticed several a few pci devices,
 eg;
 ne2k and cirrus-pci etc, however only one device truly seems to utilize
 both
 the technologies that I'm interested in and that is the virtio-pci.c

 I'm not sure what virtio-pci does so I'm not sure if that is a suitable
 starting point for me.

 Any help, suggestions etc would be extremely helpful and much appreciated.

 Qemu doesn't support pcie at the moment.
 Only partial patches have been merged, still more patches have to
 be merged for pcie to fully work. The following repo is available.

 git clone http://people.valinux.co.jp/~yamahata/qemu/q35/qemu
 git clone http://people.valinux.co.jp/~yamahata/qemu/q35/seabios
 git clone http://people.valinux.co.jp/~yamahata/qemu/q35/vgabios

 Note: patched seabios and vgabios are needed, you have to pass ACPI DSDT
 for q35.
 example:
 qemu-system-x86_64 -M pc_q35 -acpitable
 load_header,data=roms/seabios/src/q35-acpi-dsdt.aml

 This repo is for those who want to try/develop pcie support,
 not for upstream merge. So they include patches unsuitable for upstream.
 The repo includes pcie port switch emulator which utilize pcie and
 MSI(not MSI-X).

 The difference between PCI device and PCIe device is configuration
 space size.
 By setting PCIDeviceInfo::is_express = 1, you'll get 4K configuration
 space. Helper functions for pcie are found in qemu/hw/pcie.c
 For msi-x, see qemu/hw/msix.c.

 Thanks,
 --
 yamahata




Re: [Qemu-devel] Re: Unusual physical address when using 64-bit BAR

2010-08-24 Thread Cam Macdonell
On Tue, Jul 20, 2010 at 9:49 PM, Isaku Yamahata yamah...@valinux.co.jp wrote:
 Added Cc: seab...@seabios.org

 On Wed, Jul 21, 2010 at 06:31:01AM +0300, Michael S. Tsirkin wrote:
 On Tue, Jul 20, 2010 at 06:52:23PM +0900, Isaku Yamahata wrote:
  On Wed, Jul 14, 2010 at 09:10:28AM -0600, Cam Macdonell wrote:
   On Tue, Jul 13, 2010 at 8:52 PM, Isaku Yamahata yamah...@valinux.co.jp 
   wrote:
On Tue, Jul 13, 2010 at 04:48:19PM -0600, Cam Macdonell wrote:
On Tue, Jul 13, 2010 at 2:41 PM, Isaku Yamahata 
yamah...@valinux.co.jp wrote:
 On Tue, Jul 13, 2010 at 02:05:51PM -0600, Cam Macdonell wrote:
   Seabios completely ignore the 64-bitness of the BAR. ?Looks 
   like it also
   thinks the second half of the BAR is an I/O region instead of 
   memory (hence
   the c200, that's part of the pci portio region.
 
  I've sent the patches to address it. But they haven't been 
  merged yet.
  seabios doesn't map BARs beyond 4GB.
  If bar is mapped beyond 4GB, guest BIOS does it.

 Have those patches been merged yet?

 They have been merged into seabios upstream now.
 qemu seabios fork hasn't pulled for a while, though.


  To see how seabios works, it would help to increase 
  CONFIG_DEBUG_LEVEL
  in config.h of seabios

 Where does the output from seabios end up? ?Inside dmesg?

 It outputs them to the serial console which qemu emulates.
 seabios is out of kernel control, so dmesg doesn't show it.


  pci_read_config: (val) 0x0 - 0x1c (addr)
  pci_write_config: (val) 0x0 - 0x1c (addr)
  pci_read_config: (val) 0x - 0x1c (addr)
  pci_write_config: (val) 0x0 - 0x1c (addr)
  pci_read_config: (val) 0x0 - 0x1c (addr)
  pci_write_config: (val) 0x0 - 0x1c (addr)
 
  seabios BAR3. Not sure how it is mapped from this
  message.

 Isn't the BAR3 from the fact that a 64-bit BAR would use both BAR2 
 and
 BAR3 to store all 64-bits?

 Yes. Seabios misbehaves. 64bit bar is(was) a missing feature.
 --
 yamahata


   
With the latest seabios git passed via -bios, I no longer see the
48-bit address, but instead a 32-bit address and then
. ?This guest has 1gb of RAM so the address isn't be
mapped beyond 4g.
   
Can I see the debug log like before?
(hopefully seabios with CONFIG_DEBUG_LEVEL enabled.)
  
   Here's the dump from SeaBIOS in the region related to the PCI devices.
    The SeaBIOS output is identical whether the BAR is 32-bit or 64-bit.
  
   PCI: bus=0 devfn=0x10: vendor_id=0x1013 device_id=0x00b8
   region 0: 0xf000
   region 1: 0xf200
   region 6: 0xf201
   PCI: bus=0 devfn=0x18: vendor_id=0x1af4 device_id=0x1000
   region 0: 0xc020
   region 1: 0xf202
   region 6: 0xf203
   PCI: bus=0 devfn=0x20: vendor_id=0x1af4 device_id=0x1110
   region 0: 0xf204
   region 1: 0xf2041000
   region 2: 0x
 
  Is this region (region 2 of devfn=0x20: vendor_id=0x1af4 device_id=0x1110)
  the BAR in quistion?
  The value 0 seems odd. Probably BAR address calculation overflowed.
  Currently seabios doesn't check overflow. I attached the patch.
 
 
Do you know who sets the BAR to ?
  
   Here are the config reads/writes related to the 0x18/1c, the 'IVSHMEM'
   lines are from the map function passed to pci_register_bar().  It
   looks like SeaBIOS sets the address to 0 and then the potentially
   useful e000 address gets mangled into 00.
 
  There is something wrong with the debug message of write case, I suppose.
  All written value are 0, but the resulted effect doesn't seems so.
 
  
   IVSHMEM: guest pci addr = 0, guest h/w addr = 1090912256, size = 
   536870912
  
   ...snip...
  
   pci_read_config: (val) 0x4 - 0x18 (addr)
   pci_write_config: (val) 0x0 - 0x18 (addr)
   IVSHMEM: guest pci addr = e000, guest h/w addr = 1090912256, size = 
   2000
 
  If 0 is written to 0x18, the bar address should be 0, but it says e000.
 
   pci_read_config: (val) 0xe004 - 0x18 (addr)
 
  The read value isn't 0. and so on...
 
   pci_write_config: (val) 0x0 - 0x18 (addr)
   pci_read_config: (val) 0x0 - 0x1c (addr)
   pci_write_config: (val) 0x0 - 0x1c (addr)
   IVSHMEM: guest pci addr = , guest h/w addr =
   1090912256, size = 2000
   pci_read_config: (val) 0x - 0x1c (addr)
   pci_write_config: (val) 0x0 - 0x1c (addr)
  
   and with the 64-bit guest I get this error as well (recall the guest
   fails to boot on 64-bit)
  
   BUG: kvm_dirty_pages_log_change: invalid parameters
   f000-f0ff
 
 
  diff --git a/src/pciinit.c b/src/pciinit.c
  index b110531..6eca2ce 100644
  --- a/src/pciinit.c
  +++ b/src/pciinit.c
  @@ -90,7 +90,8 @@ static int pci_bios_allocate_region(u16 bdf, int 
  region_num)
                    /* If pci_bios_prefmem_addr == 0, keep old behaviour

[Qemu-devel] Re: [qemu-kvm PATCH 0/3] small qemu-kvm cleanups

2010-08-23 Thread Cam Macdonell
On Mon, Aug 23, 2010 at 12:49 AM, Avi Kivity a...@redhat.com wrote:
  On 08/23/2010 09:45 AM, Paolo Bonzini wrote:

 On 08/17/2010 01:29 PM, Avi Kivity wrote:

 On 08/12/2010 06:29 PM, Paolo Bonzini wrote:

 Nothing earth shattering. :)

 Paolo Bonzini (3):
 move kvm_set_irqfd to kvm-stub.c

 This touches kvm-all.c, so should be against uq/master.

 kvm_set_irqfd is not in upstream qemu, should I add it there even though
 it is unused?

 I think Cam (copied) wants it for nahanni devices, so it's a good
 opportunity.

Yes, I have a patch to make use of kvm_set_irqfd in nahanni, so it
would be great to have kvm_set_irqfd in qemu.

Cam


 --
 I have a truly marvellous patch that fixes the bug which this
 signature is too narrow to contain.





Re: [Qemu-devel] Re: Latest version in Git doesn't link - any ideas?

2010-08-14 Thread Cam Macdonell
Yup, it was my patch.  Sorry about that.  Patches are in the list,
just need to be merged.

Cam

On Sat, Aug 14, 2010 at 4:18 PM, Stefan Weil w...@mail.berlios.de wrote:
 Am 14.08.2010 23:35, schrieb Nigel Horne:

 I've seen some activity in this area of late, but the code still fails to
 link.  Any clues anyone?

 make distclean
 ./configure --enable-linux-aio --enable-io-thread --enable-kvm
 make
 ...
 LINK  arm-softmmu/qemu-system-arm
 ivshmem.o: In function `ivshmem_mmio_map':
 ivshmem.c:(.text+0x80f): undefined reference to
 `kvm_set_ioeventfd_mmio_long'
 ivshmem.o: In function `ivshmem_read':
 ivshmem.c:(.text+0x9fc): undefined reference to
 `kvm_set_ioeventfd_mmio_long'
 ivshmem.c:(.text+0xa6c): undefined reference to
 `kvm_set_ioeventfd_mmio_long'
 collect2: ld returned 1 exit status
 make[1]: *** [qemu-system-arm] Error 1
 make: *** [subdir-arm-softmmu] Error 2

 -Nigel




 Try changing the entry for ivshmem.o (which is not needed for targets
 without kvm) in Makefile.target like this:

 obj-$(CONFIG_KVM) += ivshmem.o

 Regards
 Stefan





[Qemu-devel] [PATCH 1/2] RESEND: Add kvm_set_ioeventfd_mmio_long definition for non-KVM systems

2010-08-14 Thread Cam Macdonell
Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 kvm-stub.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kvm-stub.c b/kvm-stub.c
index 3378bd3..d45f9fa 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -136,3 +136,8 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, 
uint16_t val, bool assign)
 {
 return -ENOSYS;
 }
+
+int kvm_set_ioeventfd_mmio_long(int fd, uint32_t adr, uint32_t val, bool 
assign)
+{
+return -ENOSYS;
+}
-- 
1.6.2.5




[Qemu-devel] [PATCH 2/2] RESEND: Disable build of ivshmem on non-KVM systems

2010-08-14 Thread Cam Macdonell
Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 Makefile.target |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index b791492..c8281e9 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -191,7 +191,7 @@ obj-y += rtl8139.o
 obj-y += e1000.o
 
 # Inter-VM PCI shared memory
-obj-y += ivshmem.o
+obj-$(CONFIG_KVM) += ivshmem.o
 
 # Hardware support
 obj-i386-y += vga.o
-- 
1.6.2.5




Re: [Qemu-devel] [PATCH 1/2] Add kvm_set_ioeventfd_mmio_long definition for non-KVM systems

2010-08-14 Thread Cam Macdonell
On Sat, Aug 14, 2010 at 11:24 AM, Andreas Färber andreas.faer...@web.de wrote:
 Am 11.08.2010 um 20:16 schrieb Cam Macdonell:

 ---
 kvm-stub.c |    5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

 diff --git a/kvm-stub.c b/kvm-stub.c
 index 3378bd3..d45f9fa 100644
 --- a/kvm-stub.c
 +++ b/kvm-stub.c
 @@ -136,3 +136,8 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr,
 uint16_t val, bool assign)
 {
    return -ENOSYS;
 }
 +
 +int kvm_set_ioeventfd_mmio_long(int fd, uint32_t adr, uint32_t val, bool
 assign)
 +{
 +    return -ENOSYS;
 +}
 --
 1.6.2.5

 Acked-by: Andreas Färber andreas.faer...@web.de

 This fixes linking [1] on Mac OS X.

 The patch is missing a Signed-off-by though, same for 2/2.

I've re-sent with the sign-off.

Cam


 Andreas

 [1]
  LINK  i386-softmmu/qemu
 Undefined symbols:
  _kvm_set_ioeventfd_mmio_long, referenced from:
     _ivshmem_read in ivshmem.o
     _ivshmem_read in ivshmem.o
     _ivshmem_mmio_map in ivshmem.o
 ld: symbol(s) not found
 collect2: ld returned 1 exit status
 make[1]: *** [qemu] Error 1
 make: *** [subdir-i386-softmmu] Error 2




Re: [Qemu-devel] [PATCH v8 5/5] RESEND: Inter-VM shared memory PCI device

2010-08-11 Thread Cam Macdonell
On Wed, Aug 11, 2010 at 4:35 AM, Stefan Weil w...@mail.berlios.de wrote:
 Am 27.07.2010 18:54, schrieb Cam Macdonell:

 resend for bug fix related to removal of irqfd

 Support an inter-vm shared memory device that maps a shared-memory object
 as a
 PCI device in the guest. This patch also supports interrupts between guest
 by
 communicating over a unix domain socket. This patch applies to the
 qemu-kvm
 repository.

 -device ivshmem,size=size in format accepted by -m[,shm=shm name]

 Interrupts are supported between multiple VMs by using a shared memory
 server
 by using a chardev socket.

 -device ivshmem,size=size in format accepted by -m[,shm=shm name]
 [,chardev=id][,msi=on][,ioeventfd=on][,vectors=n][,role=peer|master]
 -chardev socket,path=path,id=id

 The shared memory server, sample programs and init scripts are in a git
 repo here:

 www.gitorious.org/nahanni

 Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
 ---
 Makefile.target | 3 +
 hw/ivshmem.c | 828 +++
 qemu-char.c | 6 +
 qemu-char.h | 3 +
 qemu-doc.texi | 43 +++
 5 files changed, 883 insertions(+), 0 deletions(-)
 create mode 100644 hw/ivshmem.c


 Hi,

 hw/ivshmem.c breaks compilation on 32 bit hosts, for targets without kvm
 support
 and for win32 environments.

 I sent patches to qemu-devel which fix the first two problems.

 The win32 problems (missing mmap, maybe more) remain.
 Could you please fix them?

Could that be accomplished with excluding it on Windows on the makefiles?

Thanks,
Cam


 Regards
 Stefan





[Qemu-devel] Re: [PATCH 2/2] ivshmem: Fix compilation without kvm

2010-08-11 Thread Cam Macdonell
On Wed, Aug 11, 2010 at 1:05 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 On 08/11/2010 03:38 AM, Stefan Weil wrote:

 kvm_set_ioeventfd_mmio_long is only available with CONFIG_KVM.

 We should just disable ivshmem for non-KVM

 diff --git a/Makefile.target b/Makefile.target
 index b791492..c8281e9 100644
 --- a/Makefile.target
 +++ b/Makefile.target
 @@ -191,7 +191,7 @@ obj-y += rtl8139.o
  obj-y += e1000.o

  # Inter-VM PCI shared memory
 -obj-y += ivshmem.o
 +obj-$(CONFIG_KVM) += ivshmem.o

  # Hardware support
  obj-i386-y += vga.o

 because it is also breaking Windows builds.

 Alternatively, the right way to do what this patch does, is to add
 kvm_set_ioeventfd_mmio_long to kvm-stub.c, and to use obj-$(CONFIG_POSIX)
 += ivshmem.o in the makefile to work around the Windows build problems.


I think we decided to disable it on non-KVM systems to avoid people
stumbling into atomicity issues with emulation.

 Paolo





[Qemu-devel] [PATCH 1/2] Add kvm_set_ioeventfd_mmio_long definition for non-KVM systems

2010-08-11 Thread Cam Macdonell
---
 kvm-stub.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kvm-stub.c b/kvm-stub.c
index 3378bd3..d45f9fa 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -136,3 +136,8 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, 
uint16_t val, bool assign)
 {
 return -ENOSYS;
 }
+
+int kvm_set_ioeventfd_mmio_long(int fd, uint32_t adr, uint32_t val, bool 
assign)
+{
+return -ENOSYS;
+}
-- 
1.6.2.5




[Qemu-devel] [PATCH 2/2] Disable build of ivshmem on non-KVM systems

2010-08-11 Thread Cam Macdonell
---
 Makefile.target |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index b791492..c8281e9 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -191,7 +191,7 @@ obj-y += rtl8139.o
 obj-y += e1000.o
 
 # Inter-VM PCI shared memory
-obj-y += ivshmem.o
+obj-$(CONFIG_KVM) += ivshmem.o
 
 # Hardware support
 obj-i386-y += vga.o
-- 
1.6.2.5




Re: [Qemu-devel] [PATCH 1/2] Add kvm_set_ioeventfd_mmio_long definition for non-KVM systems

2010-08-11 Thread Cam Macdonell
On Wed, Aug 11, 2010 at 2:28 PM, Stefan Weil w...@mail.berlios.de wrote:
 Am 11.08.2010 20:16, schrieb Cam Macdonell:

 ---
 kvm-stub.c | 5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

 diff --git a/kvm-stub.c b/kvm-stub.c
 index 3378bd3..d45f9fa 100644
 --- a/kvm-stub.c
 +++ b/kvm-stub.c
 @@ -136,3 +136,8 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr,
 uint16_t val, bool assign)
 {
 return -ENOSYS;
 }
 +
 +int kvm_set_ioeventfd_mmio_long(int fd, uint32_t adr, uint32_t val, bool
 assign)
 +{
 + return -ENOSYS;
 +}

 Your 2nd patch disables build of ivshmem.o on non-kvm systems.
 Only ivshmem.c was using kvm_set_ioeventfd_mmio_long, so
 up to now, no dummy function in kvm-stub.c is needed.

Right. It can be left out for now if that's preferred.

Cam



[Qemu-devel] [PATCH v8 5/5] RESEND: Inter-VM shared memory PCI device

2010-07-27 Thread Cam Macdonell
resend for bug fix related to removal of irqfd

Support an inter-vm shared memory device that maps a shared-memory object as a
PCI device in the guest.  This patch also supports interrupts between guest by
communicating over a unix domain socket.  This patch applies to the qemu-kvm
repository.

-device ivshmem,size=size in format accepted by -m[,shm=shm name]

Interrupts are supported between multiple VMs by using a shared memory server
by using a chardev socket.

-device ivshmem,size=size in format accepted by -m[,shm=shm name]
   
[,chardev=id][,msi=on][,ioeventfd=on][,vectors=n][,role=peer|master]
-chardev socket,path=path,id=id

The shared memory server, sample programs and init scripts are in a git repo 
here:

www.gitorious.org/nahanni

Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 Makefile.target |3 +
 hw/ivshmem.c|  828 +++
 qemu-char.c |6 +
 qemu-char.h |3 +
 qemu-doc.texi   |   43 +++
 5 files changed, 883 insertions(+), 0 deletions(-)
 create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index 8a9c427..b791492 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -190,6 +190,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
 obj-y += rtl8139.o
 obj-y += e1000.o
 
+# Inter-VM PCI shared memory
+obj-y += ivshmem.o
+
 # Hardware support
 obj-i386-y += vga.o
 obj-i386-y += mc146818rtc.o i8259.o pc.o
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
new file mode 100644
index 000..bbb5cba
--- /dev/null
+++ b/hw/ivshmem.c
@@ -0,0 +1,828 @@
+/*
+ * Inter-VM Shared Memory PCI device.
+ *
+ * Author:
+ *  Cam Macdonell c...@cs.ualberta.ca
+ *
+ * Based On: cirrus_vga.c
+ *  Copyright (c) 2004 Fabrice Bellard
+ *  Copyright (c) 2004 Makoto Suzuki (suzu)
+ *
+ *  and rtl8139.c
+ *  Copyright (c) 2006 Igor Kovalenko
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+#include hw.h
+#include pc.h
+#include pci.h
+#include msix.h
+#include kvm.h
+
+#include sys/mman.h
+#include sys/types.h
+
+#define IVSHMEM_IOEVENTFD   0
+#define IVSHMEM_MSI 1
+
+#define IVSHMEM_PEER0
+#define IVSHMEM_MASTER  1
+
+#define IVSHMEM_REG_BAR_SIZE 0x100
+
+//#define DEBUG_IVSHMEM
+#ifdef DEBUG_IVSHMEM
+#define IVSHMEM_DPRINTF(fmt, ...)\
+do {printf(IVSHMEM:  fmt, ## __VA_ARGS__); } while (0)
+#else
+#define IVSHMEM_DPRINTF(fmt, ...)
+#endif
+
+typedef struct Peer {
+int nb_eventfds;
+int *eventfds;
+} Peer;
+
+typedef struct EventfdEntry {
+PCIDevice *pdev;
+int vector;
+} EventfdEntry;
+
+typedef struct IVShmemState {
+PCIDevice dev;
+uint32_t intrmask;
+uint32_t intrstatus;
+uint32_t doorbell;
+
+CharDriverState **eventfd_chr;
+CharDriverState *server_chr;
+int ivshmem_mmio_io_addr;
+
+pcibus_t mmio_addr;
+pcibus_t shm_pci_addr;
+uint64_t ivshmem_offset;
+uint64_t ivshmem_size; /* size of shared memory region */
+int shm_fd; /* shared memory file descriptor */
+
+Peer *peers;
+int nb_peers; /* how many guests we have space for */
+int max_peer; /* maximum numbered peer */
+
+int vm_id;
+uint32_t vectors;
+uint32_t features;
+EventfdEntry *eventfd_table;
+
+char * shmobj;
+char * sizearg;
+char * role;
+int role_val;   /* scalar to avoid multiple string comparisons */
+} IVShmemState;
+
+/* registers for the Inter-VM shared memory device */
+enum ivshmem_registers {
+INTRMASK = 0,
+INTRSTATUS = 4,
+IVPOSITION = 8,
+DOORBELL = 12,
+};
+
+static inline uint32_t ivshmem_has_feature(IVShmemState *ivs,
+unsigned int feature) {
+return (ivs-features  (1  feature));
+}
+
+static inline bool is_power_of_two(uint64_t x) {
+return (x  (x - 1)) == 0;
+}
+
+static void ivshmem_map(PCIDevice *pci_dev, int region_num,
+pcibus_t addr, pcibus_t size, int type)
+{
+IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev);
+
+s-shm_pci_addr = addr;
+
+if (s-ivshmem_offset  0) {
+cpu_register_physical_memory(s-shm_pci_addr, s-ivshmem_size,
+s-ivshmem_offset);
+}
+
+IVSHMEM_DPRINTF(guest pci addr = % FMT_PCIBUS , guest h/w addr = %
+PRIu64 , size = % FMT_PCIBUS \n, addr, s-ivshmem_offset, size);
+
+}
+
+/* accessing registers - based on rtl8139 */
+static void ivshmem_update_irq(IVShmemState *s, int val)
+{
+int isr;
+isr = (s-intrstatus  s-intrmask)  0x;
+
+/* don't print ISR resets */
+if (isr) {
+IVSHMEM_DPRINTF(Set IRQ to %d (%04x %04x)\n,
+   isr ? 1 : 0, s-intrstatus, s-intrmask);
+}
+
+qemu_set_irq(s-dev.irq[0], (isr != 0));
+}
+
+static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val)
+{
+IVSHMEM_DPRINTF(IntrMask write(w) val = 0x%04x\n, val);
+
+s-intrmask = val;
+
+ivshmem_update_irq(s, val

Re: [Qemu-devel] [PATCH v7 0/4] Inter-VM shared memory device

2010-07-26 Thread Cam Macdonell
On Mon, Jul 26, 2010 at 7:48 AM, Anthony Liguori anth...@codemonkey.ws wrote:
 On 06/15/2010 03:23 PM, Cam Macdonell wrote:

 Latest patch for PCI shared memory device that maps a host shared memory
 object
 to be shared between guests


 Is this against qemu.git or qemu-kvm.git?  It depends on functions like
 qemu_ram_map() which are not present in qemu.git (and are present in
 qemu-kvm.git).

It is against qemu-kvm.git.  Is qemu_ram_map() going into qemu.git?
Another function I use (and virtio in qemu-kvm uses as well) is
kvm_set_irqfd() that is not qemu.git either.  Do I need ifdef these
functions with CONFIG_KVM?

Please advise as to how to handle these two functions and then I can
rebase against qemu.git.

Thanks,
Cam


 Regards,

 Anthony Liguori

 new in this series

     - replace marking memory from v6 with marking device as unmigratable
 indicating
       that it should be unplugged before migration and re-added after.
     - 'peer' case changed to require removal before migration, only
 'master'
       devices can be migrated while attached.

     v6
     - migration support with 'master' and 'peer' roles for guest to
 determine
       who owns memory
     - modified phys_ram_dirty array for marking memory as not to be
 migrated

     v5:
     - fixed segfault for non-server case
     - code style fixes
     - removed limit on the number of guests
     - shared memory server is now in qemu.git/contrib
     - made ioeventfd setup function generic
     - removed interrupts when guest joined (let application handle it)

     v4:
     - moved to single Doorbell register and use datamatch to trigger
 different
       VMs rather than one register per eventfd
     - remove writing arbitrary values to eventfds.  Only values of 1 are
 now
       written to ensure correct usage

 Cam Macdonell (4):
   Device specification for shared memory PCI device
   Add function to assign ioeventfd to MMIO.
   Support marking a device as non-migratable
   Inter-VM shared memory PCI device

  Makefile.target                    |    3 +
  docs/specs/ivshmem_device_spec.txt |   96 +
  hw/hw.h                            |    1 +
  hw/ivshmem.c                       |  823
 
  kvm-all.c                          |   32 ++
  kvm.h                              |    1 +
  qemu-char.c                        |    6 +
  qemu-char.h                        |    3 +
  qemu-doc.texi                      |   43 ++
  savevm.c                           |   32 ++-
  10 files changed, 1037 insertions(+), 3 deletions(-)
  create mode 100644 docs/specs/ivshmem_device_spec.txt
  create mode 100644 hw/ivshmem.c









Re: [Qemu-devel] [PATCH v7 0/4] Inter-VM shared memory device

2010-07-26 Thread Cam Macdonell
On Mon, Jul 26, 2010 at 1:51 PM, Avi Kivity a...@redhat.com wrote:
  On 07/26/2010 10:01 PM, Cam Macdonell wrote:

 Is this against qemu.git or qemu-kvm.git?  It depends on functions like
 qemu_ram_map() which are not present in qemu.git (and are present in
 qemu-kvm.git).

 It is against qemu-kvm.git.  Is qemu_ram_map() going into qemu.git?
 Another function I use (and virtio in qemu-kvm uses as well) is
 kvm_set_irqfd() that is not qemu.git either.  Do I need ifdef these
 functions with CONFIG_KVM?

 Please advise as to how to handle these two functions and then I can
 rebase against qemu.git.

 Please add qemu_ram_map() as a separate patch to avoid interdependencies.
  Try to keep it at the same place etc., that will reduce merge difficulties
 later.

qemu_ram_map() isn't my patch, Marcelo pushed it into qemu-kvm, so I
dropped my version.

Marcelo's version is commit c15414b9 in qemu-kvm.git


 wrt kvm_set_irqfd(), its usage is optional, yes?  if so I recommend just
 dropping support for it temporarily.  I'll work at upstreaming
 kvm_set_irqfd() so you can re-add this functionality.

Yes, kvm_set_irqfd() is optional.  I can drop it for now.


 --
 I have a truly marvellous patch that fixes the bug which this
 signature is too narrow to contain.





Re: [Qemu-devel] [PATCH v7 0/4] Inter-VM shared memory device

2010-07-26 Thread Cam Macdonell
On Mon, Jul 26, 2010 at 2:11 PM, Avi Kivity a...@redhat.com wrote:
  On 07/26/2010 11:03 PM, Cam Macdonell wrote:

 Please add qemu_ram_map() as a separate patch to avoid interdependencies.
  Try to keep it at the same place etc., that will reduce merge
 difficulties
 later.

 qemu_ram_map() isn't my patch, Marcelo pushed it into qemu-kvm, so I
 dropped my version.


 You can pick that code and put it in front of your patch set as a patch, or
 risk more months of delay while Marcelo, Anthony and myself try to figure
 out who was supposed to be doing what.

 Marcelo's version is commit c15414b9 in qemu-kvm.git

 That appears to be a merge.

Ah, the header changed in a merge, the original patch is commit
00d53f24.  I'll pick it as part of my patch set as you suggest.


 --
 I have a truly marvellous patch that fixes the bug which this
 signature is too narrow to contain.





Re: [Qemu-devel] [PATCH v7 0/4] Inter-VM shared memory device

2010-07-26 Thread Cam Macdonell
On Mon, Jul 26, 2010 at 2:41 PM, Anthony Liguori anth...@codemonkey.ws wrote:
 On 07/26/2010 03:27 PM, Avi Kivity wrote:

  On 07/26/2010 10:41 PM, Anthony Liguori wrote:

 kvm_set_irqfd() is fine, it just needs to be ported.  It should be there
 due to vhost though?


 It should, but isn't.

 qemu_ram_map() is more difficult.  I would think the better approach
 would be to invert things.  Instead of a give me a stable mapping that is
 shared atomically with a guest to this ram region, I would go with go
 create a ram region with this preallocated memory and then just assume that
 afterwards, you can make use of it and it's exposed as atomic memory.

 Isn't that what qemu_ram_map() does?

 Yup, name is misleading.  I thought it was the equivalent of
 cpu_physical_memory_map() but took a ram_addr_t instead of a target_ulong.

If I add it to my patch set, should I change the name to the suggested
qemu_ram_alloc_from_ptr() or keep it as is for consistency?

Also, the current version of qemu_ram_map() in qemu-kvm.git uses the
DeviceState parameter.  Are Alex's DeviceState changes going into 0.13
for the qemu_ram_*() functions??

Cam


 Regards,

 Anthony Liguori


 ram_addr_t qemu_ram_map(DeviceState *dev, const char *name,
                        ram_addr_t size, void *host)

 'host' is your this preallocated memory, no?






[Qemu-devel] [PATCH v8 3/5] Add function to assign ioeventfd to MMIO.

2010-07-26 Thread Cam Macdonell

Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 kvm-all.c |   32 
 kvm.h |1 +
 2 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 7635f2f..d9a5dd0 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1241,6 +1241,38 @@ int kvm_set_signal_mask(CPUState *env, const sigset_t 
*sigset)
 return r;
 }
 
+int kvm_set_ioeventfd_mmio_long(int fd, uint32_t addr, uint32_t val, bool 
assign)
+{
+#ifdef KVM_IOEVENTFD
+int ret;
+struct kvm_ioeventfd iofd;
+
+iofd.datamatch = val;
+iofd.addr = addr;
+iofd.len = 4;
+iofd.flags = KVM_IOEVENTFD_FLAG_DATAMATCH;
+iofd.fd = fd;
+
+if (!kvm_enabled()) {
+return -ENOSYS;
+}
+
+if (!assign) {
+iofd.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN;
+}
+
+ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, iofd);
+
+if (ret  0) {
+return -errno;
+}
+
+return 0;
+#else
+return -ENOSYS;
+#endif
+}
+
 int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool 
assign)
 {
 #ifdef KVM_IOEVENTFD
diff --git a/kvm.h b/kvm.h
index 93f8187..50b6c01 100644
--- a/kvm.h
+++ b/kvm.h
@@ -175,6 +175,7 @@ static inline void cpu_synchronize_post_init(CPUState *env)
 }
 
 #endif
+int kvm_set_ioeventfd_mmio_long(int fd, uint32_t adr, uint32_t val, bool 
assign);
 
 int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool 
assign);
 #endif
-- 
1.6.3.2.198.g6096d




[Qemu-devel] [PATCH v8 1/5] Add qemu_ram_alloc_from_ptr function

2010-07-26 Thread Cam Macdonell
Provide a function to add an allocated region of memory to the qemu RAM.

This patch is copied from Marcelo's qemu_ram_map() in qemu-kvm and given the
clearer name qemu_ram_alloc_from_ptr().

Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 cpu-common.h |2 ++
 exec.c   |   43 +++
 2 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index 71e7933..0426bc8 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -40,6 +40,8 @@ static inline void 
cpu_register_physical_memory(target_phys_addr_t start_addr,
 }
 
 ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr);
+ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
+ram_addr_t size, void *host);
 ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size);
 void qemu_ram_free(ram_addr_t addr);
 /* This should only be used for ram local to a device.  */
diff --git a/exec.c b/exec.c
index 868cd7f..8636316 100644
--- a/exec.c
+++ b/exec.c
@@ -2808,6 +2808,49 @@ static ram_addr_t last_ram_offset(void)
 return last;
 }
 
+ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
+ram_addr_t size, void *host)
+{
+RAMBlock *new_block, *block;
+
+size = TARGET_PAGE_ALIGN(size);
+new_block = qemu_mallocz(sizeof(*new_block));
+
+if (dev  dev-parent_bus  dev-parent_bus-info-get_dev_path) {
+char *id = dev-parent_bus-info-get_dev_path(dev);
+if (id) {
+snprintf(new_block-idstr, sizeof(new_block-idstr), %s/, id);
+qemu_free(id);
+}
+}
+pstrcat(new_block-idstr, sizeof(new_block-idstr), name);
+
+QLIST_FOREACH(block, ram_list.blocks, next) {
+if (!strcmp(block-idstr, new_block-idstr)) {
+fprintf(stderr, RAMBlock \%s\ already registered, abort!\n,
+new_block-idstr);
+abort();
+}
+}
+
+new_block-host = host;
+
+new_block-offset = find_ram_offset(size);
+new_block-length = size;
+
+QLIST_INSERT_HEAD(ram_list.blocks, new_block, next);
+
+ram_list.phys_dirty = qemu_realloc(ram_list.phys_dirty,
+   last_ram_offset()  TARGET_PAGE_BITS);
+memset(ram_list.phys_dirty + (new_block-offset  TARGET_PAGE_BITS),
+   0xff, size  TARGET_PAGE_BITS);
+
+if (kvm_enabled())
+kvm_setup_guest_memory(new_block-host, size);
+
+return new_block-offset;
+}
+
 ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size)
 {
 RAMBlock *new_block, *block;
-- 
1.6.3.2.198.g6096d




[Qemu-devel] [PATCH v8 0/5] Inter-VM Shared Memory Device

2010-07-26 Thread Cam Macdonell
Latest patch for PCI shared memory device that maps a host shared memory object
to be shared between guests

new in this series

v8
- rebased to qemu.git
- added qemu_ram_alloc_from_ptr() function
- removed irqfd support as support functions are not yet in qemu

v7
- replace marking memory from v6 with marking device as unmigratable 
indicating
  that it should be unplugged before migration and re-added after.
- 'peer' case changed to require removal before migration, only 'master'
  devices can be migrated while attached.

v6
- migration support with 'master' and 'peer' roles for guest to determine
  who owns memory
- modified phys_ram_dirty array for marking memory as not to be migrated

v5:
- fixed segfault for non-server case
- code style fixes
- removed limit on the number of guests
- shared memory server is now in qemu.git/contrib
- made ioeventfd setup function generic
- removed interrupts when guest joined (let application handle it)

v4:
- moved to single Doorbell register and use datamatch to trigger different
  VMs rather than one register per eventfd
- remove writing arbitrary values to eventfds.  Only values of 1 are now
  written to ensure correct usage

Cam Macdonell (5):
  Provide a function to add an allocated region of memory to the qemu
RAM.
  Device specification for shared memory PCI device
  Add function to assign ioeventfd to MMIO.
  Support marking a device as non-migratable
  Inter-VM shared memory PCI device

 Makefile.target|3 +
 cpu-common.h   |2 +
 docs/specs/ivshmem_device_spec.txt |   96 
 exec.c |   43 ++
 hw/hw.h|2 +
 hw/ivshmem.c   |  834 
 kvm-all.c  |   32 ++
 kvm.h  |1 +
 qemu-char.c|6 +
 qemu-char.h|3 +
 qemu-doc.texi  |   43 ++
 savevm.c   |   44 ++-
 12 files changed, 1106 insertions(+), 3 deletions(-)
 create mode 100644 docs/specs/ivshmem_device_spec.txt
 create mode 100644 hw/ivshmem.c




[Qemu-devel] [PATCH v8 4/5] Support marking a device as non-migratable

2010-07-26 Thread Cam Macdonell
A non-migratable device should be removed before migration and re-added after.

Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 hw/hw.h  |2 ++
 savevm.c |   44 +---
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index c2de6fe..28b4204 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -264,6 +264,8 @@ int register_savevm_live(DeviceState *dev,
  void *opaque);
 
 void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque);
+void register_device_unmigratable(DeviceState *dev, const char *idstr,
+void *opaque);
 
 typedef void QEMUResetHandler(void *opaque);
 
diff --git a/savevm.c b/savevm.c
index ee27989..6d55d28 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1005,6 +1005,7 @@ typedef struct SaveStateEntry {
 const VMStateDescription *vmsd;
 void *opaque;
 CompatEntry *compat;
+int no_migrate;
 } SaveStateEntry;
 
 
@@ -1068,6 +1069,7 @@ int register_savevm_live(DeviceState *dev,
 se-load_state = load_state;
 se-opaque = opaque;
 se-vmsd = NULL;
+se-no_migrate = 0;
 
 if (dev  dev-parent_bus  dev-parent_bus-info-get_dev_path) {
 char *id = dev-parent_bus-info-get_dev_path(dev);
@@ -1131,6 +1133,31 @@ void unregister_savevm(DeviceState *dev, const char 
*idstr, void *opaque)
 }
 }
 
+/* mark a device as not to be migrated, that is the device should be
+   unplugged before migration */
+void register_device_unmigratable(DeviceState *dev, const char *idstr,
+void *opaque)
+{
+SaveStateEntry *se;
+char id[256] = ;
+
+if (dev  dev-parent_bus  dev-parent_bus-info-get_dev_path) {
+char *path = dev-parent_bus-info-get_dev_path(dev);
+if (path) {
+pstrcpy(id, sizeof(id), path);
+pstrcat(id, sizeof(id), /);
+qemu_free(path);
+}
+}
+pstrcat(id, sizeof(id), idstr);
+
+QTAILQ_FOREACH(se, savevm_handlers, entry) {
+if (strcmp(se-idstr, id) == 0  se-opaque == opaque) {
+se-no_migrate = 1;
+}
+}
+}
+
 int vmstate_register_with_alias_id(DeviceState *dev, int instance_id,
const VMStateDescription *vmsd,
void *opaque, int alias_id,
@@ -1323,13 +1350,19 @@ static int vmstate_load(QEMUFile *f, SaveStateEntry 
*se, int version_id)
 return vmstate_load_state(f, se-vmsd, se-opaque, version_id);
 }
 
-static void vmstate_save(QEMUFile *f, SaveStateEntry *se)
+static int vmstate_save(QEMUFile *f, SaveStateEntry *se)
 {
+if (se-no_migrate) {
+return -1;
+}
+
 if (!se-vmsd) { /* Old style */
 se-save_state(f, se-opaque);
-return;
+return 0;
 }
 vmstate_save_state(f,se-vmsd, se-opaque);
+
+return 0;
 }
 
 #define QEMU_VM_FILE_MAGIC   0x5145564d
@@ -1423,6 +1456,7 @@ int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f)
 int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
 {
 SaveStateEntry *se;
+int r;
 
 cpu_synchronize_all_states();
 
@@ -1455,7 +1489,11 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
 qemu_put_be32(f, se-instance_id);
 qemu_put_be32(f, se-version_id);
 
-vmstate_save(f, se);
+r = vmstate_save(f, se);
+if (r  0) {
+monitor_printf(mon, cannot migrate with device '%s'\n, 
se-idstr);
+return r;
+}
 }
 
 qemu_put_byte(f, QEMU_VM_EOF);
-- 
1.6.3.2.198.g6096d




[Qemu-devel] [PATCH v8 2/5] Device specification for shared memory PCI device

2010-07-26 Thread Cam Macdonell

Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 docs/specs/ivshmem_device_spec.txt |   96 
 1 files changed, 96 insertions(+), 0 deletions(-)
 create mode 100644 docs/specs/ivshmem_device_spec.txt

diff --git a/docs/specs/ivshmem_device_spec.txt 
b/docs/specs/ivshmem_device_spec.txt
new file mode 100644
index 000..23dd2ba
--- /dev/null
+++ b/docs/specs/ivshmem_device_spec.txt
@@ -0,0 +1,96 @@
+
+Device Specification for Inter-VM shared memory device
+--
+
+The Inter-VM shared memory device is designed to share a region of memory to
+userspace in multiple virtual guests.  The memory region does not belong to any
+guest, but is a POSIX memory object on the host.  Optionally, the device may
+support sending interrupts to other guests sharing the same memory region.
+
+
+The Inter-VM PCI device
+---
+
+*BARs*
+
+The device supports three BARs.  BAR0 is a 1 Kbyte MMIO region to support
+registers.  BAR1 is used for MSI-X when it is enabled in the device.  BAR2 is
+used to map the shared memory object from the host.  The size of BAR2 is
+specified when the guest is started and must be a power of 2 in size.
+
+*Registers*
+
+The device currently supports 4 registers of 32-bits each.  Registers
+are used for synchronization between guests sharing the same memory object when
+interrupts are supported (this requires using the shared memory server).
+
+The server assigns each VM an ID number and sends this ID number to the Qemu
+process when the guest starts.
+
+enum ivshmem_registers {
+IntrMask = 0,
+IntrStatus = 4,
+IVPosition = 8,
+Doorbell = 12
+};
+
+The first two registers are the interrupt mask and status registers.  Mask and
+status are only used with pin-based interrupts.  They are unused with MSI
+interrupts.
+
+Status Register: The status register is set to 1 when an interrupt occurs.
+
+Mask Register: The mask register is bitwise ANDed with the interrupt status
+and the result will raise an interrupt if it is non-zero.  However, since 1 is
+the only value the status will be set to, it is only the first bit of the mask
+that has any effect.  Therefore interrupts can be masked by setting the first
+bit to 0 and unmasked by setting the first bit to 1.
+
+IVPosition Register: The IVPosition register is read-only and reports the
+guest's ID number.  The guest IDs are non-negative integers.  When using the
+server, since the server is a separate process, the VM ID will only be set when
+the device is ready (shared memory is received from the server and accessible 
via
+the device).  If the device is not ready, the IVPosition will return -1.
+Applications should ensure that they have a valid VM ID before accessing the
+shared memory.
+
+Doorbell Register:  To interrupt another guest, a guest must write to the
+Doorbell register.  The doorbell register is 32-bits, logically divided into
+two 16-bit fields.  The high 16-bits are the guest ID to interrupt and the low
+16-bits are the interrupt vector to trigger.  The semantics of the value
+written to the doorbell depends on whether the device is using MSI or a regular
+pin-based interrupt.  In short, MSI uses vectors while regular interrupts set 
the
+status register.
+
+Regular Interrupts
+
+If regular interrupts are used (due to either a guest not supporting MSI or the
+user specifying not to use them on startup) then the value written to the lower
+16-bits of the Doorbell register results is arbitrary and will trigger an
+interrupt in the destination guest.
+
+Message Signalled Interrupts
+
+A ivshmem device may support multiple MSI vectors.  If so, the lower 16-bits
+written to the Doorbell register must be between 0 and the maximum number of
+vectors the guest supports.  The lower 16 bits written to the doorbell is the
+MSI vector that will be raised in the destination guest.  The number of MSI
+vectors is configurable but it is set when the VM is started.
+
+The important thing to remember with MSI is that it is only a signal, no status
+is set (since MSI interrupts are not shared).  All information other than the
+interrupt itself should be communicated via the shared memory region.  Devices
+supporting multiple MSI vectors can use different vectors to indicate different
+events have occurred.  The semantics of interrupt vectors are left to the
+user's discretion.
+
+
+Usage in the Guest
+--
+
+The shared memory device is intended to be used with the provided UIO driver.
+Very little configuration is needed.  The guest should map BAR0 to access the
+registers (an array of 32-bit ints allows simple writing) and map BAR2 to
+access the shared memory region itself.  The size of the shared memory region
+is specified when the guest (or shared memory server) is started.  A guest may
+map the whole shared memory region or only part of it.
-- 
1.6.3.2.198.g6096d




[Qemu-devel] [PATCH v8 5/5] Inter-VM shared memory PCI device

2010-07-26 Thread Cam Macdonell
Support an inter-vm shared memory device that maps a shared-memory object as a
PCI device in the guest.  This patch also supports interrupts between guest by
communicating over a unix domain socket.  This patch applies to the qemu-kvm
repository.

-device ivshmem,size=size in format accepted by -m[,shm=shm name]

Interrupts are supported between multiple VMs by using a shared memory server
by using a chardev socket.

-device ivshmem,size=size in format accepted by -m[,shm=shm name]
   
[,chardev=id][,msi=on][,ioeventfd=on][,vectors=n][,role=peer|master]
-chardev socket,path=path,id=id

The shared memory server, sample programs and init scripts are in a git repo 
here:

www.gitorious.org/nahanni

Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 Makefile.target |3 +
 hw/ivshmem.c|  834 +++
 qemu-char.c |6 +
 qemu-char.h |3 +
 qemu-doc.texi   |   43 +++
 5 files changed, 889 insertions(+), 0 deletions(-)
 create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index 3ef4666..4cb3a00 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -188,6 +188,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
 obj-y += rtl8139.o
 obj-y += e1000.o
 
+# Inter-VM PCI shared memory
+obj-y += ivshmem.o
+
 # Hardware support
 obj-i386-y += vga.o
 obj-i386-y += mc146818rtc.o i8259.o pc.o
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
new file mode 100644
index 000..3f197ae
--- /dev/null
+++ b/hw/ivshmem.c
@@ -0,0 +1,834 @@
+/*
+ * Inter-VM Shared Memory PCI device.
+ *
+ * Author:
+ *  Cam Macdonell c...@cs.ualberta.ca
+ *
+ * Based On: cirrus_vga.c
+ *  Copyright (c) 2004 Fabrice Bellard
+ *  Copyright (c) 2004 Makoto Suzuki (suzu)
+ *
+ *  and rtl8139.c
+ *  Copyright (c) 2006 Igor Kovalenko
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+#include hw.h
+#include pc.h
+#include pci.h
+#include msix.h
+#include kvm.h
+
+#include sys/mman.h
+#include sys/types.h
+
+#define IVSHMEM_IOEVENTFD   0
+#define IVSHMEM_MSI 1
+
+#define IVSHMEM_PEER0
+#define IVSHMEM_MASTER  1
+
+#define IVSHMEM_REG_BAR_SIZE 0x100
+
+//#define DEBUG_IVSHMEM
+#ifdef DEBUG_IVSHMEM
+#define IVSHMEM_DPRINTF(fmt, ...)\
+do {printf(IVSHMEM:  fmt, ## __VA_ARGS__); } while (0)
+#else
+#define IVSHMEM_DPRINTF(fmt, ...)
+#endif
+
+typedef struct Peer {
+int nb_eventfds;
+int *eventfds;
+} Peer;
+
+typedef struct EventfdEntry {
+PCIDevice *pdev;
+int vector;
+} EventfdEntry;
+
+typedef struct IVShmemState {
+PCIDevice dev;
+uint32_t intrmask;
+uint32_t intrstatus;
+uint32_t doorbell;
+
+CharDriverState **eventfd_chr;
+CharDriverState *server_chr;
+int ivshmem_mmio_io_addr;
+
+pcibus_t mmio_addr;
+pcibus_t shm_pci_addr;
+uint64_t ivshmem_offset;
+uint64_t ivshmem_size; /* size of shared memory region */
+int shm_fd; /* shared memory file descriptor */
+
+Peer *peers;
+int nb_peers; /* how many guests we have space for */
+int max_peer; /* maximum numbered peer */
+
+int vm_id;
+uint32_t vectors;
+uint32_t features;
+EventfdEntry *eventfd_table;
+
+char * shmobj;
+char * sizearg;
+char * role;
+int role_val;   /* scalar to avoid multiple string comparisons */
+} IVShmemState;
+
+/* registers for the Inter-VM shared memory device */
+enum ivshmem_registers {
+INTRMASK = 0,
+INTRSTATUS = 4,
+IVPOSITION = 8,
+DOORBELL = 12,
+};
+
+static inline uint32_t ivshmem_has_feature(IVShmemState *ivs,
+unsigned int feature) {
+return (ivs-features  (1  feature));
+}
+
+static inline bool is_power_of_two(uint64_t x) {
+return (x  (x - 1)) == 0;
+}
+
+static void ivshmem_map(PCIDevice *pci_dev, int region_num,
+pcibus_t addr, pcibus_t size, int type)
+{
+IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev);
+
+s-shm_pci_addr = addr;
+
+if (s-ivshmem_offset  0) {
+cpu_register_physical_memory(s-shm_pci_addr, s-ivshmem_size,
+s-ivshmem_offset);
+}
+
+IVSHMEM_DPRINTF(guest pci addr = % FMT_PCIBUS , guest h/w addr = %
+PRIu64 , size = % FMT_PCIBUS \n, addr, s-ivshmem_offset, size);
+
+}
+
+/* accessing registers - based on rtl8139 */
+static void ivshmem_update_irq(IVShmemState *s, int val)
+{
+int isr;
+isr = (s-intrstatus  s-intrmask)  0x;
+
+/* don't print ISR resets */
+if (isr) {
+IVSHMEM_DPRINTF(Set IRQ to %d (%04x %04x)\n,
+   isr ? 1 : 0, s-intrstatus, s-intrmask);
+}
+
+qemu_set_irq(s-dev.irq[0], (isr != 0));
+}
+
+static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val)
+{
+IVSHMEM_DPRINTF(IntrMask write(w) val = 0x%04x\n, val);
+
+s-intrmask = val;
+
+ivshmem_update_irq(s, val);
+}
+
+static uint32_t ivshmem_IntrMask_read

Re: [Qemu-devel] virtio-9p is not working

2010-07-21 Thread Cam Macdonell
On Wed, Jul 21, 2010 at 2:27 AM, Dallas Lee mswpl...@gmail.com wrote:
 Hi,
 I have trying to use the virtio-9p for my linux in QEMU, but without
 success.
 Here is my option for booting my qemu:
 i386-softmmu/qemu -kernel bzImage -append console=ttyS0
 video=uvesafb:ywrap,overlay:rgb16,480x800...@60 root=/dev/nfs rw
 nfsroot=10.0.2.2:/root,udp ip=10.0.2.16:eth0:none 5 -net
  nic,model=virtio -net user -soundhw all -usb -serial
 telnet:localhost:1200,server -vga std -m 512 -L ./pc-bios -bios bios.bin
 -virtfs
 local,path=/home/dallas/nfs,security_model=passthrough,mount_tag=v_tmp

 The virtio network is working, I could mount the nfs through virio net.
 And in the guest linux, I tried to mount v9fs by using following command:
 mount -t 9p -o trans=virtio -o debug=0x v_tmp /mnt
 but unfortunately I got the error:
 mount: mounting v_tmp on /mnt failed: No such device
 And I can't find the v_tmp neither in /sys/devices/virtio-pci/virtio1/ nor
 in /sys/bus/virtio/drivers/9pnet_virtio/virtio1/
 And before building the kernel, I enabled the Plan 9 Ressource Sharing
 Support under File System/Network File System, I also enabled the following
 configures:
 PARAVIRT_GUEST:
         - Processor type and features
                 - Paravirtualized guest support
 LGUEST_GUEST:
         - Processor type and features
                 - Paravirtualized guest support
                         - Lguest guest support
 VIRTIO_PCI:
         - Virtualization (VIRTUALIZATION [=y])
                 - PCI driver for virtio devices
 VIRTIO_BLK:
         - Device Drivers
                 - Block devices (BLK_DEV [=y])
                         - Virtio block driver
 VIRTIO_NET:
         - Device Drivers
                 - Network device support (NETDEVICES [=y])
                         - Virtio network driver
 Would you please help me to find out the problem why I couldn't mount the
 v9fs?
 Thank you very much!
 BR,
 Dallas

Hi Dallas,

what does 'lspci -vv' in the guest show?  Is there a device for virtio_9p?

do you have

CONFIG_NET_9P_VIRTIO=y

in your kernel's .config?

Cam



Re: [Qemu-devel] Re: Unusual physical address when using 64-bit BAR

2010-07-14 Thread Cam Macdonell
On Tue, Jul 13, 2010 at 8:52 PM, Isaku Yamahata yamah...@valinux.co.jp wrote:
 On Tue, Jul 13, 2010 at 04:48:19PM -0600, Cam Macdonell wrote:
 On Tue, Jul 13, 2010 at 2:41 PM, Isaku Yamahata yamah...@valinux.co.jp 
 wrote:
  On Tue, Jul 13, 2010 at 02:05:51PM -0600, Cam Macdonell wrote:
Seabios completely ignore the 64-bitness of the BAR. ?Looks like it 
also
thinks the second half of the BAR is an I/O region instead of memory 
(hence
the c200, that's part of the pci portio region.
  
   I've sent the patches to address it. But they haven't been merged yet.
   seabios doesn't map BARs beyond 4GB.
   If bar is mapped beyond 4GB, guest BIOS does it.
 
  Have those patches been merged yet?
 
  They have been merged into seabios upstream now.
  qemu seabios fork hasn't pulled for a while, though.
 
 
   To see how seabios works, it would help to increase CONFIG_DEBUG_LEVEL
   in config.h of seabios
 
  Where does the output from seabios end up? ?Inside dmesg?
 
  It outputs them to the serial console which qemu emulates.
  seabios is out of kernel control, so dmesg doesn't show it.
 
 
   pci_read_config: (val) 0x0 - 0x1c (addr)
   pci_write_config: (val) 0x0 - 0x1c (addr)
   pci_read_config: (val) 0x - 0x1c (addr)
   pci_write_config: (val) 0x0 - 0x1c (addr)
   pci_read_config: (val) 0x0 - 0x1c (addr)
   pci_write_config: (val) 0x0 - 0x1c (addr)
  
   seabios BAR3. Not sure how it is mapped from this
   message.
 
  Isn't the BAR3 from the fact that a 64-bit BAR would use both BAR2 and
  BAR3 to store all 64-bits?
 
  Yes. Seabios misbehaves. 64bit bar is(was) a missing feature.
  --
  yamahata
 
 

 With the latest seabios git passed via -bios, I no longer see the
 48-bit address, but instead a 32-bit address and then
 .  This guest has 1gb of RAM so the address isn't be
 mapped beyond 4g.

 Can I see the debug log like before?
 (hopefully seabios with CONFIG_DEBUG_LEVEL enabled.)

Here's the dump from SeaBIOS in the region related to the PCI devices.
 The SeaBIOS output is identical whether the BAR is 32-bit or 64-bit.

PCI: bus=0 devfn=0x10: vendor_id=0x1013 device_id=0x00b8
region 0: 0xf000
region 1: 0xf200
region 6: 0xf201
PCI: bus=0 devfn=0x18: vendor_id=0x1af4 device_id=0x1000
region 0: 0xc020
region 1: 0xf202
region 6: 0xf203
PCI: bus=0 devfn=0x20: vendor_id=0x1af4 device_id=0x1110
region 0: 0xf204
region 1: 0xf2041000
region 2: 0x


 Do you know who sets the BAR to ?

Here are the config reads/writes related to the 0x18/1c, the 'IVSHMEM'
lines are from the map function passed to pci_register_bar().  It
looks like SeaBIOS sets the address to 0 and then the potentially
useful e000 address gets mangled into 00.

IVSHMEM: guest pci addr = 0, guest h/w addr = 1090912256, size = 536870912

...snip...

pci_read_config: (val) 0x4 - 0x18 (addr)
pci_write_config: (val) 0x0 - 0x18 (addr)
IVSHMEM: guest pci addr = e000, guest h/w addr = 1090912256, size = 2000
pci_read_config: (val) 0xe004 - 0x18 (addr)
pci_write_config: (val) 0x0 - 0x18 (addr)
pci_read_config: (val) 0x0 - 0x1c (addr)
pci_write_config: (val) 0x0 - 0x1c (addr)
IVSHMEM: guest pci addr = , guest h/w addr =
1090912256, size = 2000
pci_read_config: (val) 0x - 0x1c (addr)
pci_write_config: (val) 0x0 - 0x1c (addr)

and with the 64-bit guest I get this error as well (recall the guest
fails to boot on 64-bit)

BUG: kvm_dirty_pages_log_change: invalid parameters
f000-f0ff

 If it's seabios, does the following patch help?

The patch doesn't make a difference.  Perhaps it's Qemu then?


 diff --git a/src/pciinit.c b/src/pciinit.c
 index b110531..ce07709 100644
 --- a/src/pciinit.c
 +++ b/src/pciinit.c
 @@ -117,11 +117,7 @@ static int pci_bios_allocate_region(u16 bdf, int 
 region_num)
     int is_64bit = !(val  PCI_BASE_ADDRESS_SPACE_IO) 
         (val  PCI_BASE_ADDRESS_MEM_TYPE_MASK) == 
 PCI_BASE_ADDRESS_MEM_TYPE_64;
     if (is_64bit) {
 -        if (size  0) {
 -            pci_config_writel(bdf, ofs + 4, 0);
 -        } else {
 -            pci_config_writel(bdf, ofs + 4, ~0);
 -        }
 +        pci_config_writel(bdf, ofs + 4, 0);
     }
     return is_64bit;
  }



 IVSHMEM: guest pci addr = e000, guest h/w addr = 1090912256, size = 
 2000
 IVSHMEM: guest pci addr = , guest h/w addr =
 1090912256, size = 2000

 However, booting fails when I use a 64-bit BAR.  Booting is fine with
 a 32-bit BAR.

 HPET: 1 timers in total, 0 timers will be used for per-cpu timer
 divide error:  [#1] SMP last sysfs file:
 CPU 0
 Modules linked in:

 Pid: 1, comm: swapper Not tainted 2.6.35-rc1 #280 /Bochs
 RIP: 0010:[812a801b]  [812a801b] hpet_alloc+0x12c/0x35b
 RSP: 0018:88003e61fd80  EFLAGS: 00010246
 RAX: 00038d7ea4c68000 RBX: 88003e6efd80 RCX: 
 RDX:  RSI:  RDI: 817b8520
 RBP

Re: [Qemu-devel] Re: Unusual physical address when using 64-bit BAR

2010-07-13 Thread Cam Macdonell
On Tue, Jun 29, 2010 at 9:29 PM, Isaku Yamahata yamah...@valinux.co.jp wrote:
 On Tue, Jun 29, 2010 at 11:48:13AM -0600, Cam Macdonell wrote:
 On Tue, Jun 29, 2010 at 12:50 AM, Avi Kivity a...@redhat.com wrote:
  On 06/28/2010 11:38 PM, Cam Macdonell wrote:
 
 
  Is this really the address the guest programmed, or is qemu
  misinterpreting
  it?
 
 
 
  Well, what's the answer?
 
 
  You're going to have to give me a hint on how to determine that.
 
  lspci in the guest shows the following
 
  Memory at c200 (64-bit, non-prefetchable) [size=1024M]
 
  does that demonstrate a guest generated address?
 
 
  That's the result of a round trip: the guest programmed the address and 
  then
  read it back. ?It could have been screwed up in the first place, or perhaps
  qemu screwed it up.
 
  Add a printf() to the config space handlers in qemu (likely in your own
  code) on writes and reads, and show the relevant writes (and reads) for 
  this
  BAR.
 
  That's the theory of deductive debugging; however browsing the code shows
  the guest is at fault:
 
  ? ? ? ?for (i = 0; i  PCI_NUM_REGIONS; i++) {
  ? ? ? ? ? ?int ofs;
  ? ? ? ? ? ?if (i == PCI_ROM_SLOT)
  ? ? ? ? ? ? ? ?ofs = PCI_ROM_ADDRESS;
  ? ? ? ? ? ?else
  ? ? ? ? ? ? ? ?ofs = PCI_BASE_ADDRESS_0 + i * 4;
 
  ? ? ? ? ? ?u32 old = pci_config_readl(bdf, ofs);
  ? ? ? ? ? ?u32 mask;
  ? ? ? ? ? ?if (i == PCI_ROM_SLOT) {
  ? ? ? ? ? ? ? ?mask = PCI_ROM_ADDRESS_MASK;
  ? ? ? ? ? ? ? ?pci_config_writel(bdf, ofs, mask);
  ? ? ? ? ? ?} else {
  ? ? ? ? ? ? ? ?if (old  PCI_BASE_ADDRESS_SPACE_IO)
  ? ? ? ? ? ? ? ? ? ?mask = PCI_BASE_ADDRESS_IO_MASK;
  ? ? ? ? ? ? ? ?else
  ? ? ? ? ? ? ? ? ? ?mask = PCI_BASE_ADDRESS_MEM_MASK;
  ? ? ? ? ? ? ? ?pci_config_writel(bdf, ofs, ~0);
  ? ? ? ? ? ?}
  ? ? ? ? ? ?u32 val = pci_config_readl(bdf, ofs);
  ? ? ? ? ? ?pci_config_writel(bdf, ofs, old);
 
  ? ? ? ? ? ?if (val != 0) {
  ? ? ? ? ? ? ? ?u32 size = (~(val  mask)) + 1;
  ? ? ? ? ? ? ? ?if (val  PCI_BASE_ADDRESS_SPACE_IO)
  ? ? ? ? ? ? ? ? ? ?paddr = pci_bios_io_addr;
  ? ? ? ? ? ? ? ?else
  ? ? ? ? ? ? ? ? ? ?paddr = pci_bios_mem_addr;
  ? ? ? ? ? ? ? ?*paddr = ALIGN(*paddr, size);
  ? ? ? ? ? ? ? ?pci_set_io_region_addr(bdf, i, *paddr);
  ? ? ? ? ? ? ? ?*paddr += size;
  ? ? ? ? ? ?}
  ? ? ? ?}
  ? ? ? ?break;
  ? ?}
 
  Seabios completely ignore the 64-bitness of the BAR. ?Looks like it also
  thinks the second half of the BAR is an I/O region instead of memory (hence
  the c200, that's part of the pci portio region.

 I've sent the patches to address it. But they haven't been merged yet.
 seabios doesn't map BARs beyond 4GB.
 If bar is mapped beyond 4GB, guest BIOS does it.

Have those patches been merged yet?



 To see how seabios works, it would help to increase CONFIG_DEBUG_LEVEL
 in config.h of seabios

Where does the output from seabios end up?  Inside dmesg?



  Do post those reads and writes, I think there's more than one thing wrong
  here.

 Here it is, I added the debug statements to pci_read_config and
 pci_default_write_config.

 here are the reads and writes to offsets 0x18 and 0x1c where a 64-bit
 BAR2 config would be configured

 It seems that 0x1c is accessed as BAR3.
 The write value in the log is always 0.

 Some comment in the log.


 pci_read_config: (val) 0x4 - 0x18 (addr)
 pci_write_config: (val) 0x0 - 0x18 (addr)
 pci_read_config: (val) 0xc004 - 0x18 (addr)
 pci_write_config: (val) 0x0 - 0x18 (addr)
 pci_read_config: (val) 0x4 - 0x18 (addr)
 pci_write_config: (val) 0x0 - 0x18 (addr)
 pci_read_config: (val) 0x0 - 0x1c (addr)
 pci_write_config: (val) 0x0 - 0x1c (addr)
 pci_read_config: (val) 0x - 0x1c (addr)
 pci_write_config: (val) 0x0 - 0x1c (addr)
 pci_read_config: (val) 0x0 - 0x1c (addr)
 pci_write_config: (val) 0x0 - 0x1c (addr)
 pci_read_config: (val) 0x4 - 0x18 (addr)
 pci_write_config: (val) 0x0 - 0x18 (addr)
 pci_read_config: (val) 0xc004 - 0x18 (addr)
 pci_write_config: (val) 0x0 - 0x18 (addr)
 pci_read_config: (val) 0xc040 - 0x1c (addr)
 pci_write_config: (val) 0x0 - 0x1c (addr)
 pci_read_config: (val) 0x - 0x1c (addr)
 pci_write_config: (val) 0x0 - 0x1c (addr)

 the complete read/write profile is below along with debug statements
 from the map functions for the BARs (prefixed with IVSHMEM)

 pci_read_config: (val) 0x1af4 - 0x0 (addr)
 pci_read_config: (val) 0x0 - 0xe (addr)
 pci_read_config: (val) 0x1af4 - 0x0 (addr)
 pci_read_config: (val) 0x1110 - 0x2 (addr)
 pci_read_config: (val) 0x0 - 0xe (addr)
 pci_read_config: (val) 0x1af4 - 0x0 (addr)
 pci_read_config: (val) 0x0 - 0xe (addr)
 pci_read_config: (val) 0x500 - 0xa (addr)
 pci_read_config: (val) 0x1af4 - 0x0 (addr)
 pci_read_config: (val) 0x1110 - 0x2 (addr)
 pci_read_config: (val) 0x0 - 0x10 (addr)
 pci_write_config: (val) 0x0 - 0x10 (addr)
 pci_read_config: (val) 0xff00 - 0x10 (addr)
 pci_write_config: (val) 0x0 - 0x10 (addr)
 pci_read_config: (val) 0x0 - 0x10 (addr)
 pci_write_config: (val) 0x0 - 0x10 (addr)
 pci_read_config: (val) 0x0

Re: [Qemu-devel] Re: Unusual physical address when using 64-bit BAR

2010-07-13 Thread Cam Macdonell
On Tue, Jul 13, 2010 at 2:41 PM, Isaku Yamahata yamah...@valinux.co.jp wrote:
 On Tue, Jul 13, 2010 at 02:05:51PM -0600, Cam Macdonell wrote:
   Seabios completely ignore the 64-bitness of the BAR. ?Looks like it also
   thinks the second half of the BAR is an I/O region instead of memory 
   (hence
   the c200, that's part of the pci portio region.
 
  I've sent the patches to address it. But they haven't been merged yet.
  seabios doesn't map BARs beyond 4GB.
  If bar is mapped beyond 4GB, guest BIOS does it.

 Have those patches been merged yet?

 They have been merged into seabios upstream now.
 qemu seabios fork hasn't pulled for a while, though.


  To see how seabios works, it would help to increase CONFIG_DEBUG_LEVEL
  in config.h of seabios

 Where does the output from seabios end up?  Inside dmesg?

 It outputs them to the serial console which qemu emulates.
 seabios is out of kernel control, so dmesg doesn't show it.


  pci_read_config: (val) 0x0 - 0x1c (addr)
  pci_write_config: (val) 0x0 - 0x1c (addr)
  pci_read_config: (val) 0x - 0x1c (addr)
  pci_write_config: (val) 0x0 - 0x1c (addr)
  pci_read_config: (val) 0x0 - 0x1c (addr)
  pci_write_config: (val) 0x0 - 0x1c (addr)
 
  seabios BAR3. Not sure how it is mapped from this
  message.

 Isn't the BAR3 from the fact that a 64-bit BAR would use both BAR2 and
 BAR3 to store all 64-bits?

 Yes. Seabios misbehaves. 64bit bar is(was) a missing feature.
 --
 yamahata



With the latest seabios git passed via -bios, I no longer see the
48-bit address, but instead a 32-bit address and then
.  This guest has 1gb of RAM so the address isn't be
mapped beyond 4g.

IVSHMEM: guest pci addr = e000, guest h/w addr = 1090912256, size = 2000
IVSHMEM: guest pci addr = , guest h/w addr =
1090912256, size = 2000

However, booting fails when I use a 64-bit BAR.  Booting is fine with
a 32-bit BAR.

HPET: 1 timers in total, 0 timers will be used for per-cpu timer
divide error:  [#1] SMP last sysfs file:
CPU 0
Modules linked in:

Pid: 1, comm: swapper Not tainted 2.6.35-rc1 #280 /Bochs
RIP: 0010:[812a801b]  [812a801b] hpet_alloc+0x12c/0x35b
RSP: 0018:88003e61fd80  EFLAGS: 00010246
RAX: 00038d7ea4c68000 RBX: 88003e6efd80 RCX: 
RDX:  RSI:  RDI: 817b8520
RBP: 88003e61fdc0 R08: 80d0 R09: c900
R10: 88003f9ac600 R11:  R12: 88003e61fdd0
R13: c900 R14:  R15: 817a00a9
FS:  () GS:88000200() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2:  CR3: 01a42000 CR4: 06f0DR0:
 DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process swapper (pid: 1, threadinfo 88003e61e000, task 88003e62)
Stack:
 88003f43ab90 88003f43ab90 81ca3174 81b1d596
0  0100 0100 
0 88003e61fe80 8102945d fed0 c900
Call Trace:
 [81b1d596] ? hpet_late_init+0x0/0xea
 [8102945d] hpet_reserve_platform_timers+0x10b/0x115
 [81b1d596] ? hpet_late_init+0x0/0xea
 [81b1d601] hpet_late_init+0x6b/0xea
 [81b1d596] ? hpet_late_init+0x0/0xea
 [81002069] do_one_initcall+0x5e/0x159
 [81b0b71e] kernel_init+0x18e/0x21c
 [8100aa24] kernel_thread_helper+0x4/0x10
 [81b0b590] ? kernel_init+0x0/0x21c
 [8100aa20] ? kernel_thread_helper+0x0/0x10
Code: 89 1d 8a 92 b3 00 48 c1 ea 21 8b 73 34 49 c7 c7 a9 00 7a 81 48
8d 04 02 4c 89 f2 48 c7 c7 20 85 7b 81 48 c1 ea 20 48 89 d1 31 d2 48
f7 f1 83 7b 30 01 48
c7 c1 cd fa 7c 81 49 0f 46 cf 48 89 43
RIP  [812a801b] hpet_alloc+0x12c/0x35b
 RSP 88003e61fd80
---[ end trace a7919e7f17c0a725 ]---
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: swapper Tainted: G  D 2.6.35-rc1 #280
Call Trace:
 [8145b870] panic+0x8b/0x10b
 [81056a93] ? exit_ptrace+0x38/0x121
 [8104f9e8] do_exit+0x7a/0x722
 [8104c3bd] ? spin_unlock_irqrestore+0xe/0x10
 [8104cfd8] ? kmsg_dump+0x12b/0x145
 [8145eaa9] oops_end+0xbf/0xc7
 [8100d299] die+0x5a/0x63
 [8145e4b3] do_trap+0x121/0x130
 [8100b560] do_divide_error+0x96/0x9f
 [812a801b] ? hpet_alloc+0x12c/0x35b
 [8100a83b] divide_error+0x1b/0x20
 [812a801b] ? hpet_alloc+0x12c/0x35b
 [81b1d596] ? hpet_late_init+0x0/0xea
 [8102945d] hpet_reserve_platform_timers+0x10b/0x115
 [81b1d596] ? hpet_late_init+0x0/0xea
 [81b1d601] hpet_late_init+0x6b/0xea
 [81b1d596] ? hpet_late_init+0x0/0xea
 [81002069] do_one_initcall+0x5e/0x159
 [81b0b71e] kernel_init+0x18e/0x21c
 [8100aa24] kernel_thread_helper+0x4/0x10

[Qemu-devel] [PATCH v7 RESEND 4/4] Inter-VM shared memory PCI device

2010-07-08 Thread Cam Macdonell
Resent (again): Some lines were over 80 characters and debugging is now off.

Support an inter-vm shared memory device that maps a shared-memory object as a
PCI device in the guest.  This patch also supports interrupts between guest by
communicating over a unix domain socket.  This patch applies to the qemu-kvm
repository.

-device ivshmem,size=size in format accepted by -m[,shm=shm name]

Interrupts are supported between multiple VMs by using a shared memory server
by using a chardev socket.

-device ivshmem,size=size in format accepted by -m[,shm=shm name]
   [,chardev=id][,msi=on][,irqfd=on][,vectors=n][,role=peer|master]
-chardev socket,path=path,id=id

The shared memory server, sample programs and init scripts are in a git repo 
here:

www.gitorious.org/nahanni

Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 Makefile.target |3 +
 hw/ivshmem.c|  842 +++
 qemu-char.c |6 +
 qemu-char.h |3 +
 qemu-doc.texi   |   43 +++
 5 files changed, 897 insertions(+), 0 deletions(-)
 create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index a0e9747..1e99ec8 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -203,6 +203,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
 obj-y += rtl8139.o
 obj-y += e1000.o
 
+# Inter-VM PCI shared memory
+obj-y += ivshmem.o
+
 # Hardware support
 obj-i386-y += vga.o
 obj-i386-y += mc146818rtc.o i8259.o pc.o
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
new file mode 100644
index 000..763b9c2
--- /dev/null
+++ b/hw/ivshmem.c
@@ -0,0 +1,842 @@
+/*
+ * Inter-VM Shared Memory PCI device.
+ *
+ * Author:
+ *  Cam Macdonell c...@cs.ualberta.ca
+ *
+ * Based On: cirrus_vga.c
+ *  Copyright (c) 2004 Fabrice Bellard
+ *  Copyright (c) 2004 Makoto Suzuki (suzu)
+ *
+ *  and rtl8139.c
+ *  Copyright (c) 2006 Igor Kovalenko
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+#include hw.h
+#include pc.h
+#include pci.h
+#include msix.h
+#include kvm.h
+
+#include sys/mman.h
+#include sys/types.h
+
+#define IVSHMEM_IRQFD   0
+#define IVSHMEM_MSI 1
+
+#define IVSHMEM_PEER0
+#define IVSHMEM_MASTER  1
+
+#define IVSHMEM_REG_BAR_SIZE 0x100
+
+//#define DEBUG_IVSHMEM
+#ifdef DEBUG_IVSHMEM
+#define IVSHMEM_DPRINTF(fmt, ...)\
+do {printf(IVSHMEM:  fmt, ## __VA_ARGS__); } while (0)
+#else
+#define IVSHMEM_DPRINTF(fmt, ...)
+#endif
+
+typedef struct Peer {
+int nb_eventfds;
+int *eventfds;
+} Peer;
+
+typedef struct EventfdEntry {
+PCIDevice *pdev;
+int vector;
+} EventfdEntry;
+
+typedef struct IVShmemState {
+PCIDevice dev;
+uint32_t intrmask;
+uint32_t intrstatus;
+uint32_t doorbell;
+
+CharDriverState **eventfd_chr;
+CharDriverState *server_chr;
+int ivshmem_mmio_io_addr;
+
+pcibus_t mmio_addr;
+pcibus_t shm_pci_addr;
+uint64_t ivshmem_offset;
+uint64_t ivshmem_size; /* size of shared memory region */
+int shm_fd; /* shared memory file descriptor */
+
+Peer *peers;
+int nb_peers; /* how many guests we have space for */
+int max_peer; /* maximum numbered peer */
+
+int vm_id;
+uint32_t vectors;
+uint32_t features;
+EventfdEntry *eventfd_table;
+
+char * shmobj;
+char * sizearg;
+char * role;
+int role_val;   /* scalar to avoid multiple string comparisons */
+} IVShmemState;
+
+/* registers for the Inter-VM shared memory device */
+enum ivshmem_registers {
+INTRMASK = 0,
+INTRSTATUS = 4,
+IVPOSITION = 8,
+DOORBELL = 12,
+};
+
+static inline uint32_t ivshmem_has_feature(IVShmemState *ivs,
+unsigned int feature) {
+return (ivs-features  (1  feature));
+}
+
+static inline bool is_power_of_two(uint64_t x) {
+return (x  (x - 1)) == 0;
+}
+
+static void ivshmem_map(PCIDevice *pci_dev, int region_num,
+pcibus_t addr, pcibus_t size, int type)
+{
+IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev);
+
+s-shm_pci_addr = addr;
+
+if (s-ivshmem_offset  0) {
+cpu_register_physical_memory(s-shm_pci_addr, s-ivshmem_size,
+s-ivshmem_offset);
+}
+
+IVSHMEM_DPRINTF(guest pci addr = % FMT_PCIBUS , guest h/w addr = %
+PRIu64 , size = % FMT_PCIBUS \n, addr, s-ivshmem_offset, size);
+
+}
+
+/* accessing registers - based on rtl8139 */
+static void ivshmem_update_irq(IVShmemState *s, int val)
+{
+int isr;
+isr = (s-intrstatus  s-intrmask)  0x;
+
+/* don't print ISR resets */
+if (isr) {
+IVSHMEM_DPRINTF(Set IRQ to %d (%04x %04x)\n,
+   isr ? 1 : 0, s-intrstatus, s-intrmask);
+}
+
+qemu_set_irq(s-dev.irq[0], (isr != 0));
+}
+
+static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val)
+{
+IVSHMEM_DPRINTF(IntrMask write(w) val = 0x%04x\n, val);
+
+s-intrmask = val

Re: [Qemu-devel] KVM call minutes for July 6

2010-07-06 Thread Cam Macdonell
On Tue, Jul 6, 2010 at 8:46 AM, Juan Quintela quint...@redhat.com wrote:

 Today was a short call as Anthony didn't attend.

 - Jes reminded people that Linux Plumbers deadline is July 19th
 - Qemu 0.13 (from the agenda, we didn't discuss it without Anthony).

 List of patches missing commit/comment/review from Anthony.  We decided
 to send to the list an initial list, and anybody that has anything empty
 just anwer to this email.  Juan (i.e. me) was voluntered to keep that
 list.  In this list there is:

 - migration subsections (juan) will resend today
 - migration ide: we are in a mess, will resend with subsections.
 - migration memory with names.  Allows migration with hotplug/hotunplug
  (Alex), it is waiting for comments.

- shared memory PCI device is waiting for commit or further comment/review



[Qemu-devel] Re: [PATCH v7 0/4] Inter-VM shared memory device

2010-07-02 Thread Cam Macdonell
On Tue, Jun 15, 2010 at 2:23 PM, Cam Macdonell c...@cs.ualberta.ca wrote:
 Latest patch for PCI shared memory device that maps a host shared memory 
 object
 to be shared between guests

 new in this series

    - replace marking memory from v6 with marking device as unmigratable 
 indicating
      that it should be unplugged before migration and re-added after.
    - 'peer' case changed to require removal before migration, only 'master'
      devices can be migrated while attached.

    v6
    - migration support with 'master' and 'peer' roles for guest to determine
      who owns memory
    - modified phys_ram_dirty array for marking memory as not to be migrated

    v5:
    - fixed segfault for non-server case
    - code style fixes
    - removed limit on the number of guests
    - shared memory server is now in qemu.git/contrib
    - made ioeventfd setup function generic
    - removed interrupts when guest joined (let application handle it)

    v4:
    - moved to single Doorbell register and use datamatch to trigger different
      VMs rather than one register per eventfd
    - remove writing arbitrary values to eventfds.  Only values of 1 are now
      written to ensure correct usage

 Cam Macdonell (4):
  Device specification for shared memory PCI device
  Add function to assign ioeventfd to MMIO.
  Support marking a device as non-migratable
  Inter-VM shared memory PCI device

  Makefile.target                    |    3 +
  docs/specs/ivshmem_device_spec.txt |   96 +
  hw/hw.h                            |    1 +
  hw/ivshmem.c                       |  823 
 
  kvm-all.c                          |   32 ++
  kvm.h                              |    1 +
  qemu-char.c                        |    6 +
  qemu-char.h                        |    3 +
  qemu-doc.texi                      |   43 ++
  savevm.c                           |   32 ++-
  10 files changed, 1037 insertions(+), 3 deletions(-)
  create mode 100644 docs/specs/ivshmem_device_spec.txt
  create mode 100644 hw/ivshmem.c



Hi,

Are there outstanding concerns with this patchset?  Can it be merged?
I can rebase if necessary.

Thanks,
Cam



[Qemu-devel] Re: Unusual physical address when using 64-bit BAR

2010-06-29 Thread Cam Macdonell
On Tue, Jun 29, 2010 at 12:50 AM, Avi Kivity a...@redhat.com wrote:
 On 06/28/2010 11:38 PM, Cam Macdonell wrote:


 Is this really the address the guest programmed, or is qemu
 misinterpreting
 it?



 Well, what's the answer?


 You're going to have to give me a hint on how to determine that.

 lspci in the guest shows the following

 Memory at c200 (64-bit, non-prefetchable) [size=1024M]

 does that demonstrate a guest generated address?


 That's the result of a round trip: the guest programmed the address and then
 read it back.  It could have been screwed up in the first place, or perhaps
 qemu screwed it up.

 Add a printf() to the config space handlers in qemu (likely in your own
 code) on writes and reads, and show the relevant writes (and reads) for this
 BAR.

 That's the theory of deductive debugging; however browsing the code shows
 the guest is at fault:

        for (i = 0; i  PCI_NUM_REGIONS; i++) {
            int ofs;
            if (i == PCI_ROM_SLOT)
                ofs = PCI_ROM_ADDRESS;
            else
                ofs = PCI_BASE_ADDRESS_0 + i * 4;

            u32 old = pci_config_readl(bdf, ofs);
            u32 mask;
            if (i == PCI_ROM_SLOT) {
                mask = PCI_ROM_ADDRESS_MASK;
                pci_config_writel(bdf, ofs, mask);
            } else {
                if (old  PCI_BASE_ADDRESS_SPACE_IO)
                    mask = PCI_BASE_ADDRESS_IO_MASK;
                else
                    mask = PCI_BASE_ADDRESS_MEM_MASK;
                pci_config_writel(bdf, ofs, ~0);
            }
            u32 val = pci_config_readl(bdf, ofs);
            pci_config_writel(bdf, ofs, old);

            if (val != 0) {
                u32 size = (~(val  mask)) + 1;
                if (val  PCI_BASE_ADDRESS_SPACE_IO)
                    paddr = pci_bios_io_addr;
                else
                    paddr = pci_bios_mem_addr;
                *paddr = ALIGN(*paddr, size);
                pci_set_io_region_addr(bdf, i, *paddr);
                *paddr += size;
            }
        }
        break;
    }

 Seabios completely ignore the 64-bitness of the BAR.  Looks like it also
 thinks the second half of the BAR is an I/O region instead of memory (hence
 the c200, that's part of the pci portio region.

 Do post those reads and writes, I think there's more than one thing wrong
 here.

Here it is, I added the debug statements to pci_read_config and
pci_default_write_config.

here are the reads and writes to offsets 0x18 and 0x1c where a 64-bit
BAR2 config would be configured

pci_read_config: (val) 0x4 - 0x18 (addr)
pci_write_config: (val) 0x0 - 0x18 (addr)
pci_read_config: (val) 0xc004 - 0x18 (addr)
pci_write_config: (val) 0x0 - 0x18 (addr)
pci_read_config: (val) 0x4 - 0x18 (addr)
pci_write_config: (val) 0x0 - 0x18 (addr)
pci_read_config: (val) 0x0 - 0x1c (addr)
pci_write_config: (val) 0x0 - 0x1c (addr)
pci_read_config: (val) 0x - 0x1c (addr)
pci_write_config: (val) 0x0 - 0x1c (addr)
pci_read_config: (val) 0x0 - 0x1c (addr)
pci_write_config: (val) 0x0 - 0x1c (addr)
pci_read_config: (val) 0x4 - 0x18 (addr)
pci_write_config: (val) 0x0 - 0x18 (addr)
pci_read_config: (val) 0xc004 - 0x18 (addr)
pci_write_config: (val) 0x0 - 0x18 (addr)
pci_read_config: (val) 0xc040 - 0x1c (addr)
pci_write_config: (val) 0x0 - 0x1c (addr)
pci_read_config: (val) 0x - 0x1c (addr)
pci_write_config: (val) 0x0 - 0x1c (addr)

the complete read/write profile is below along with debug statements
from the map functions for the BARs (prefixed with IVSHMEM)

pci_read_config: (val) 0x1af4 - 0x0 (addr)
pci_read_config: (val) 0x0 - 0xe (addr)
pci_read_config: (val) 0x1af4 - 0x0 (addr)
pci_read_config: (val) 0x1110 - 0x2 (addr)
pci_read_config: (val) 0x0 - 0xe (addr)
pci_read_config: (val) 0x1af4 - 0x0 (addr)
pci_read_config: (val) 0x0 - 0xe (addr)
pci_read_config: (val) 0x500 - 0xa (addr)
pci_read_config: (val) 0x1af4 - 0x0 (addr)
pci_read_config: (val) 0x1110 - 0x2 (addr)
pci_read_config: (val) 0x0 - 0x10 (addr)
pci_write_config: (val) 0x0 - 0x10 (addr)
pci_read_config: (val) 0xff00 - 0x10 (addr)
pci_write_config: (val) 0x0 - 0x10 (addr)
pci_read_config: (val) 0x0 - 0x10 (addr)
pci_write_config: (val) 0x0 - 0x10 (addr)
pci_read_config: (val) 0x0 - 0x14 (addr)
pci_write_config: (val) 0x0 - 0x14 (addr)
pci_read_config: (val) 0xf000 - 0x14 (addr)
pci_write_config: (val) 0x0 - 0x14 (addr)
pci_read_config: (val) 0x0 - 0x14 (addr)
pci_write_config: (val) 0x0 - 0x14 (addr)
pci_read_config: (val) 0x4 - 0x18 (addr)
pci_write_config: (val) 0x0 - 0x18 (addr)
pci_read_config: (val) 0xc004 - 0x18 (addr)
pci_write_config: (val) 0x0 - 0x18 (addr)
pci_read_config: (val) 0x4 - 0x18 (addr)
pci_write_config: (val) 0x0 - 0x18 (addr)
pci_read_config: (val) 0x0 - 0x1c (addr)
pci_write_config: (val) 0x0 - 0x1c (addr)
pci_read_config: (val) 0x - 0x1c (addr)
pci_write_config: (val) 0x0 - 0x1c (addr)
pci_read_config: (val) 0x0 - 0x1c (addr)
pci_write_config: (val) 0x0 - 0x1c

[Qemu-devel] Re: Unusual physical address when using 64-bit BAR

2010-06-28 Thread Cam Macdonell
On Sun, Jun 27, 2010 at 2:39 AM, Avi Kivity a...@redhat.com wrote:
 On 06/25/2010 12:51 AM, Cam Macdonell wrote:

 On Tue, Jun 15, 2010 at 5:04 AM, Avi Kivitya...@redhat.com  wrote:


 On 06/11/2010 08:31 PM, Cam Macdonell wrote:


 On Mon, Apr 19, 2010 at 10:41 AM, Cam Macdonellc...@cs.ualberta.ca
  wrote:



 Hi,

 I'm trying to use a 64-bit BAR for my shared memory device.  In simply
 changing the memory type in pci_register_bar() to
 PCI_BASE_ADDRESS_MEM_TYPE_64 I get an unusual physical address for
 that BAR (and my driver crashes in pci_ioremap).

 from lspci:

 00:04.0 RAM memory: Qumranet, Inc. Device 1110
        Subsystem: Qumranet, Inc. Device 1100
        Flags: fast devsel, IRQ 10
        Memory at f102 (32-bit, non-prefetchable) [size=1K]
        Memory at f1021000 (32-bit, non-prefetchable) [size=4K]
        Memory at c200 (64-bit, non-prefetchable) [size=1024M]
        Capabilities:access denied
 00: f4 1a 10 11 03 00 10 00 00 00 00 05 00 00 00 00
 10: 00 00 02 f1 00 10 02 f1 04 00 00 00 00 c2 00 00
 20: 00 00 00 00 00 00 00 00 00 00 00 00 f4 1a 00 11
 30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00

 with DEBUG_MEMREG, I see

 kvm_unregister_memory_area:666 Unregistering memory region
 c200 (4000)
 kvm_destroy_phys_mem:649 slot 7 start c200 len 0 flags 0
 IVSHMEM: addr = 3221225472 size = 1073741824
 kvm_register_phys_mem:605 memory: gpa: c200c000, size: 4000,
 uaddr: 7f6dd7ffe000, slot: 7, flags: 0
 kvm_unregister_memory_area:666 Unregistering memory region
 c200c000 (4000)
 kvm_destroy_phys_mem:649 slot 7 start c200c000 len 0 flags 0
 IVSHMEM: addr = 0 size = 1073741824
 kvm_register_phys_mem:605 memory: gpa: c200, size: 4000,
 uaddr: 7f6dd7ffe000, slot: 7, flags: 0
 kvm_unregister_memory_area:666 Unregistering memory region
 c200 (4000)
 kvm_destroy_phys_mem:649 slot 7 start c200 len 0 flags 0
 IVSHMEM: addr = 0 size = 1073741824
 kvm_register_phys_mem:605 memory: gpa: , size:
 4000, uaddr: 7f6dd7ffe000, slot: 7, flags: 0
 kvm_unregister_memory_area:666 Unregistering memory region
  (4000)
 kvm_destroy_phys_mem:649 slot 7 start  len 0 flags 0
 IVSHMEM: addr = 0 size = 1073741824
 kvm_register_phys_mem:605 memory: gpa: c200, size: 4000,
 uaddr: 7f6dd7ffe000, slot: 7, flags: 0

 (the IVSHMEM lines are my debug statements)

 address sizes   : 40 bits physical, 48 bits virtual  (guest)
 address sizes   : 38 bits physical, 48 bits virtual  (host)




 Hi, I happened to run into this problem again when trying to use a
 64-bit BAR.  I did a bit more digging and the test that is failing is
 called from arch/x86/mm/ioremap.c in the guest and here it is.

 static inline int phys_addr_valid(resource_size_t addr)
 {
 #ifdef CONFIG_PHYS_ADDR_T_64BIT
        return !(addr    boot_cpu_data.x86_phys_bits);
 #else
        return 1;
 #endif
 }

 the value of addr (in this case the 48-bit virtual address
 c200) is shifted to the right shift by
 boot_cpu_data.x86_phys_bits (which is 40 bits, the physical address
 size), so a non-zero value is returned which causes the test to fail
 and generates the invalid physical address error in the guest.

 Any help is appreciated as to whether this is a Qemu or guest kernel
 issue.



 The guest kernel should never have generated an address that is bigger
 than
 cpu_phys_bits in the first place.  What's the value for cpu_phys_bits in
 the
 guest? (/proc/cpuinfo, 'address sizes :' line).


 Sorry I missed your reply until now.  The guest address sizes are as
 follows:

 address sizes   : 40 bits physical, 48 bits virtual


 So the address c200 is illegal.

 Is this really the address the guest programmed, or is qemu
 misinterpreting
 it?


 Well, what's the answer?

You're going to have to give me a hint on how to determine that.

lspci in the guest shows the following

Memory at c200 (64-bit, non-prefetchable) [size=1024M]

does that demonstrate a guest generated address?

Cam


 --
 error compiling committee.c: too many arguments to function





[Qemu-devel] Re: Unusual physical address when using 64-bit BAR

2010-06-24 Thread Cam Macdonell
On Tue, Jun 15, 2010 at 5:04 AM, Avi Kivity a...@redhat.com wrote:
 On 06/11/2010 08:31 PM, Cam Macdonell wrote:

 On Mon, Apr 19, 2010 at 10:41 AM, Cam Macdonellc...@cs.ualberta.ca
  wrote:


 Hi,

 I'm trying to use a 64-bit BAR for my shared memory device.  In simply
 changing the memory type in pci_register_bar() to
 PCI_BASE_ADDRESS_MEM_TYPE_64 I get an unusual physical address for
 that BAR (and my driver crashes in pci_ioremap).

 from lspci:

 00:04.0 RAM memory: Qumranet, Inc. Device 1110
        Subsystem: Qumranet, Inc. Device 1100
        Flags: fast devsel, IRQ 10
        Memory at f102 (32-bit, non-prefetchable) [size=1K]
        Memory at f1021000 (32-bit, non-prefetchable) [size=4K]
        Memory at c200 (64-bit, non-prefetchable) [size=1024M]
        Capabilities:access denied
 00: f4 1a 10 11 03 00 10 00 00 00 00 05 00 00 00 00
 10: 00 00 02 f1 00 10 02 f1 04 00 00 00 00 c2 00 00
 20: 00 00 00 00 00 00 00 00 00 00 00 00 f4 1a 00 11
 30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00

 with DEBUG_MEMREG, I see

 kvm_unregister_memory_area:666 Unregistering memory region
 c200 (4000)
 kvm_destroy_phys_mem:649 slot 7 start c200 len 0 flags 0
 IVSHMEM: addr = 3221225472 size = 1073741824
 kvm_register_phys_mem:605 memory: gpa: c200c000, size: 4000,
 uaddr: 7f6dd7ffe000, slot: 7, flags: 0
 kvm_unregister_memory_area:666 Unregistering memory region
 c200c000 (4000)
 kvm_destroy_phys_mem:649 slot 7 start c200c000 len 0 flags 0
 IVSHMEM: addr = 0 size = 1073741824
 kvm_register_phys_mem:605 memory: gpa: c200, size: 4000,
 uaddr: 7f6dd7ffe000, slot: 7, flags: 0
 kvm_unregister_memory_area:666 Unregistering memory region
 c200 (4000)
 kvm_destroy_phys_mem:649 slot 7 start c200 len 0 flags 0
 IVSHMEM: addr = 0 size = 1073741824
 kvm_register_phys_mem:605 memory: gpa: , size:
 4000, uaddr: 7f6dd7ffe000, slot: 7, flags: 0
 kvm_unregister_memory_area:666 Unregistering memory region
  (4000)
 kvm_destroy_phys_mem:649 slot 7 start  len 0 flags 0
 IVSHMEM: addr = 0 size = 1073741824
 kvm_register_phys_mem:605 memory: gpa: c200, size: 4000,
 uaddr: 7f6dd7ffe000, slot: 7, flags: 0

 (the IVSHMEM lines are my debug statements)

 address sizes   : 40 bits physical, 48 bits virtual  (guest)
 address sizes   : 38 bits physical, 48 bits virtual  (host)



 Hi, I happened to run into this problem again when trying to use a
 64-bit BAR.  I did a bit more digging and the test that is failing is
 called from arch/x86/mm/ioremap.c in the guest and here it is.

 static inline int phys_addr_valid(resource_size_t addr)
 {
 #ifdef CONFIG_PHYS_ADDR_T_64BIT
        return !(addr  boot_cpu_data.x86_phys_bits);
 #else
        return 1;
 #endif
 }

 the value of addr (in this case the 48-bit virtual address
 c200) is shifted to the right shift by
 boot_cpu_data.x86_phys_bits (which is 40 bits, the physical address
 size), so a non-zero value is returned which causes the test to fail
 and generates the invalid physical address error in the guest.

 Any help is appreciated as to whether this is a Qemu or guest kernel
 issue.


 The guest kernel should never have generated an address that is bigger than
 cpu_phys_bits in the first place.  What's the value for cpu_phys_bits in the
 guest? (/proc/cpuinfo, 'address sizes :' line).

Sorry I missed your reply until now.  The guest address sizes are as follows:

address sizes   : 40 bits physical, 48 bits virtual


 Is this really the address the guest programmed, or is qemu misinterpreting
 it?

 --
 error compiling committee.c: too many arguments to function





[Qemu-devel] [PATCH v7 RESEND 4/4] Inter-VM shared memory PCI device

2010-06-23 Thread Cam Macdonell
resent for some printf macros and one addition

Support an inter-vm shared memory device that maps a shared-memory object as a
PCI device in the guest.  This patch also supports interrupts between guest by
communicating over a unix domain socket.  This patch applies to the qemu-kvm
repository.

-device ivshmem,size=size in format accepted by -m[,shm=shm name]

Interrupts are supported between multiple VMs by using a shared memory server
by using a chardev socket.

-device ivshmem,size=size in format accepted by -m[,shm=shm name]
   [,chardev=id][,msi=on][,irqfd=on][,vectors=n][,role=peer|master]
-chardev socket,path=path,id=id

The shared memory server, sample programs and init scripts are in a git repo 
here:

www.gitorious.org/nahanni

Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 Makefile.target |3 +
 hw/ivshmem.c|  828 +++
 qemu-char.c |6 +
 qemu-char.h |3 +
 qemu-doc.texi   |   43 +++
 5 files changed, 883 insertions(+), 0 deletions(-)
 create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index a0e9747..1e99ec8 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -203,6 +203,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
 obj-y += rtl8139.o
 obj-y += e1000.o
 
+# Inter-VM PCI shared memory
+obj-y += ivshmem.o
+
 # Hardware support
 obj-i386-y += vga.o
 obj-i386-y += mc146818rtc.o i8259.o pc.o
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
new file mode 100644
index 000..8534e50
--- /dev/null
+++ b/hw/ivshmem.c
@@ -0,0 +1,828 @@
+/*
+ * Inter-VM Shared Memory PCI device.
+ *
+ * Author:
+ *  Cam Macdonell c...@cs.ualberta.ca
+ *
+ * Based On: cirrus_vga.c
+ *  Copyright (c) 2004 Fabrice Bellard
+ *  Copyright (c) 2004 Makoto Suzuki (suzu)
+ *
+ *  and rtl8139.c
+ *  Copyright (c) 2006 Igor Kovalenko
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+#include hw.h
+#include pc.h
+#include pci.h
+#include msix.h
+#include kvm.h
+
+#include sys/mman.h
+#include sys/types.h
+
+#define IVSHMEM_IRQFD   0
+#define IVSHMEM_MSI 1
+
+#define IVSHMEM_PEER0
+#define IVSHMEM_MASTER  1
+
+#define IVSHMEM_REG_BAR_SIZE 0x100
+
+#define DEBUG_IVSHMEM
+#ifdef DEBUG_IVSHMEM
+#define IVSHMEM_DPRINTF(fmt, ...)\
+do {printf(IVSHMEM:  fmt, ## __VA_ARGS__); } while (0)
+#else
+#define IVSHMEM_DPRINTF(fmt, ...)
+#endif
+
+typedef struct Peer {
+int nb_eventfds;
+int *eventfds;
+} Peer;
+
+typedef struct EventfdEntry {
+PCIDevice *pdev;
+int vector;
+} EventfdEntry;
+
+typedef struct IVShmemState {
+PCIDevice dev;
+uint32_t intrmask;
+uint32_t intrstatus;
+uint32_t doorbell;
+
+CharDriverState **eventfd_chr;
+CharDriverState *server_chr;
+int ivshmem_mmio_io_addr;
+
+pcibus_t mmio_addr;
+pcibus_t shm_pci_addr;
+uint64_t ivshmem_offset;
+uint64_t ivshmem_size; /* size of shared memory region */
+int shm_fd; /* shared memory file descriptor */
+
+Peer *peers;
+int nb_peers; /* how many guests we have space for */
+int max_peer; /* maximum numbered peer */
+
+int vm_id;
+uint32_t vectors;
+uint32_t features;
+EventfdEntry *eventfd_table;
+
+char * shmobj;
+char * sizearg;
+char * role;
+int role_val;   /* scalar to avoid multiple string comparisons */
+} IVShmemState;
+
+/* registers for the Inter-VM shared memory device */
+enum ivshmem_registers {
+INTRMASK = 0,
+INTRSTATUS = 4,
+IVPOSITION = 8,
+DOORBELL = 12,
+};
+
+static inline uint32_t ivshmem_has_feature(IVShmemState *ivs, unsigned int 
feature) {
+return (ivs-features  (1  feature));
+}
+
+static inline bool is_power_of_two(uint64_t x) {
+return (x  (x - 1)) == 0;
+}
+
+static void ivshmem_map(PCIDevice *pci_dev, int region_num,
+pcibus_t addr, pcibus_t size, int type)
+{
+IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev);
+
+s-shm_pci_addr = addr;
+
+if (s-ivshmem_offset  0) {
+cpu_register_physical_memory(s-shm_pci_addr, s-ivshmem_size,
+s-ivshmem_offset);
+}
+
+IVSHMEM_DPRINTF(guest pci addr = % FMT_PCIBUS , guest h/w addr = % 
PRIu64
+, size = % FMT_PCIBUS \n, addr, s-ivshmem_offset, 
size);
+
+}
+
+/* accessing registers - based on rtl8139 */
+static void ivshmem_update_irq(IVShmemState *s, int val)
+{
+int isr;
+isr = (s-intrstatus  s-intrmask)  0x;
+
+/* don't print ISR resets */
+if (isr) {
+IVSHMEM_DPRINTF(Set IRQ to %d (%04x %04x)\n,
+   isr ? 1 : 0, s-intrstatus, s-intrmask);
+}
+
+qemu_set_irq(s-dev.irq[0], (isr != 0));
+}
+
+static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val)
+{
+IVSHMEM_DPRINTF(IntrMask write(w) val = 0x%04x\n, val);
+
+s-intrmask = val;
+
+ivshmem_update_irq(s, val);
+}
+
+static uint32_t ivshmem_IntrMask_read

Re: [Qemu-devel] Re: [PATCH RFC] Mark a device as non-migratable

2010-06-16 Thread Cam Macdonell
On Wed, Jun 16, 2010 at 6:34 AM, Anthony Liguori anth...@codemonkey.ws wrote:
 On 06/16/2010 12:05 AM, Cam Macdonell wrote:

 On Tue, Jun 15, 2010 at 4:33 PM, Anthony Liguorianth...@codemonkey.ws
  wrote:


 On 06/15/2010 05:26 PM, Cam Macdonell wrote:


 On Tue, Jun 15, 2010 at 10:32 AM, Anthony Liguorianth...@codemonkey.ws
  wrote:



 On 06/15/2010 11:16 AM, Cam Macdonell wrote:



 How does this look for marking the device as non-migratable?  It adds
 a
 field
 'no_migrate' to the SaveStateEntry and tests for it in vmstate_save.
  This
 would
 replace anything that touches memory.

 Cam

 ---
  hw/hw.h  |    1 +
  savevm.c |   32 +---
  2 files changed, 30 insertions(+), 3 deletions(-)

 diff --git a/hw/hw.h b/hw/hw.h
 index d78d814..7c93f08 100644
 --- a/hw/hw.h
 +++ b/hw/hw.h
 @@ -263,6 +263,7 @@ int register_savevm_live(const char *idstr,
                           void *opaque);

  void unregister_savevm(const char *idstr, void *opaque);
 +void mark_no_migrate(const char *idstr, void *opaque);




 I'm not thrilled with the name but the functionality is spot on.  I
 lack
 the
 creativity to offer a better name suggestion :-)

 Regards,

 Anthony Liguori



 Hmmm, in working on this it seems that the memory (from
 qemu_ram_map()) is still attached even when the device is removed
 (which causes migration to fail because there is an unexpected
 memory).

 Is something like cpu_unregister_physical_memory()/qemu_ram_free()
 needed?



 Yes.  You need to unregister any memory that you have registered upon
 device
 removal.


 Is there an established way to achieve this?  I can't seem find
 another device that unregisters memory registered with
 cpu_register_physical_memory().  Is something like
 cpu_unregister_physical_memory() needed?


 cpu_register_physical_memory(IO_MEM_UNASSIGNED).

 If you look at pci.c, you'll see that it automatically unregisters any
 mapped io regions on remove.


It appears that the 'peer' migration won't work until memory hotplug
is supported, correct?  AFAICT the memory sizes will not match between
the source and destination VMs after the device is removed and the
memory system currently doesn't support gaps.  A technique similar to
my patch for non-migratable memory would be needed to mark free'd
memory pages without Alex's patches in.

For the purposes of my patch, can it be merged without the 'peer' case
(pending Alex's patches and hotplug)?

Thanks,
Cam



[Qemu-devel] [PATCH RFC] Mark a device as non-migratable

2010-06-15 Thread Cam Macdonell
How does this look for marking the device as non-migratable?  It adds a field
'no_migrate' to the SaveStateEntry and tests for it in vmstate_save.  This would
replace anything that touches memory.

Cam

---
 hw/hw.h  |1 +
 savevm.c |   32 +---
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index d78d814..7c93f08 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -263,6 +263,7 @@ int register_savevm_live(const char *idstr,
  void *opaque);
 
 void unregister_savevm(const char *idstr, void *opaque);
+void mark_no_migrate(const char *idstr, void *opaque);
 
 typedef void QEMUResetHandler(void *opaque);
 
diff --git a/savevm.c b/savevm.c
index 017695b..2642a9c 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1023,6 +1023,7 @@ typedef struct SaveStateEntry {
 LoadStateHandler *load_state;
 const VMStateDescription *vmsd;
 void *opaque;
+int no_migrate;
 } SaveStateEntry;
 
 
@@ -1069,6 +1070,7 @@ int register_savevm_live(const char *idstr,
 se-load_state = load_state;
 se-opaque = opaque;
 se-vmsd = NULL;
+se-no_migrate = 0;
 
 if (instance_id == -1) {
 se-instance_id = calculate_new_instance_id(idstr);
@@ -1103,6 +1105,19 @@ void unregister_savevm(const char *idstr, void *opaque)
 }
 }
 
+/* mark a device as not to be migrated, that is the device should be
+   unplugged before migration */
+void mark_no_migrate(const char *idstr, void *opaque)
+{
+SaveStateEntry *se;
+
+QTAILQ_FOREACH(se, savevm_handlers, entry) {
+if (strcmp(se-idstr, idstr) == 0  se-opaque == opaque) {
+se-no_migrate = 1;
+}
+}
+}
+
 int vmstate_register_with_alias_id(int instance_id,
const VMStateDescription *vmsd,
void *opaque, int alias_id,
@@ -1277,13 +1292,19 @@ static int vmstate_load(QEMUFile *f, SaveStateEntry 
*se, int version_id)
 return vmstate_load_state(f, se-vmsd, se-opaque, version_id);
 }
 
-static void vmstate_save(QEMUFile *f, SaveStateEntry *se)
+static int vmstate_save(QEMUFile *f, SaveStateEntry *se)
 {
+if (se-no_migrate) {
+return -1;
+}
+
 if (!se-vmsd) { /* Old style */
 se-save_state(f, se-opaque);
-return;
+return 0;
 }
 vmstate_save_state(f,se-vmsd, se-opaque);
+
+return 0;
 }
 
 #define QEMU_VM_FILE_MAGIC   0x5145564d
@@ -1377,6 +1398,7 @@ int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f)
 int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
 {
 SaveStateEntry *se;
+int r;
 
 cpu_synchronize_all_states();
 
@@ -1409,7 +1431,11 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
 qemu_put_be32(f, se-instance_id);
 qemu_put_be32(f, se-version_id);
 
-vmstate_save(f, se);
+r = vmstate_save(f, se);
+if (r  0) {
+monitor_printf(mon, cannot migrate with device '%s'\n, 
se-idstr);
+return r;
+}
 }
 
 qemu_put_byte(f, QEMU_VM_EOF);
-- 
1.6.3.2.198.g6096d




[Qemu-devel] [PATCH v7 0/4] Inter-VM shared memory device

2010-06-15 Thread Cam Macdonell
Latest patch for PCI shared memory device that maps a host shared memory object
to be shared between guests

new in this series

- replace marking memory from v6 with marking device as unmigratable 
indicating
  that it should be unplugged before migration and re-added after.
- 'peer' case changed to require removal before migration, only 'master'
  devices can be migrated while attached.

v6
- migration support with 'master' and 'peer' roles for guest to determine
  who owns memory
- modified phys_ram_dirty array for marking memory as not to be migrated

v5:
- fixed segfault for non-server case
- code style fixes
- removed limit on the number of guests
- shared memory server is now in qemu.git/contrib
- made ioeventfd setup function generic
- removed interrupts when guest joined (let application handle it)

v4:
- moved to single Doorbell register and use datamatch to trigger different
  VMs rather than one register per eventfd
- remove writing arbitrary values to eventfds.  Only values of 1 are now
  written to ensure correct usage

Cam Macdonell (4):
  Device specification for shared memory PCI device
  Add function to assign ioeventfd to MMIO.
  Support marking a device as non-migratable
  Inter-VM shared memory PCI device

 Makefile.target|3 +
 docs/specs/ivshmem_device_spec.txt |   96 +
 hw/hw.h|1 +
 hw/ivshmem.c   |  823 
 kvm-all.c  |   32 ++
 kvm.h  |1 +
 qemu-char.c|6 +
 qemu-char.h|3 +
 qemu-doc.texi  |   43 ++
 savevm.c   |   32 ++-
 10 files changed, 1037 insertions(+), 3 deletions(-)
 create mode 100644 docs/specs/ivshmem_device_spec.txt
 create mode 100644 hw/ivshmem.c




[Qemu-devel] [PATCH v7 3/4] Support marking a device as non-migratable

2010-06-15 Thread Cam Macdonell
A non-migratable device should be removed before migration and re-added after.

Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 hw/hw.h  |1 +
 savevm.c |   32 +---
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index d78d814..894e6b0 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -263,6 +263,7 @@ int register_savevm_live(const char *idstr,
  void *opaque);
 
 void unregister_savevm(const char *idstr, void *opaque);
+void register_device_unmigratable(const char *idstr, void *opaque);
 
 typedef void QEMUResetHandler(void *opaque);
 
diff --git a/savevm.c b/savevm.c
index ada5c63..ac1d70d 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1024,6 +1024,7 @@ typedef struct SaveStateEntry {
 LoadStateHandler *load_state;
 const VMStateDescription *vmsd;
 void *opaque;
+int no_migrate;
 } SaveStateEntry;
 
 
@@ -1070,6 +1071,7 @@ int register_savevm_live(const char *idstr,
 se-load_state = load_state;
 se-opaque = opaque;
 se-vmsd = NULL;
+se-no_migrate = 0;
 
 if (instance_id == -1) {
 se-instance_id = calculate_new_instance_id(idstr);
@@ -1104,6 +1106,19 @@ void unregister_savevm(const char *idstr, void *opaque)
 }
 }
 
+/* mark a device as not to be migrated, that is the device should be
+   unplugged before migration */
+void register_device_unmigratable(const char *idstr, void *opaque)
+{
+SaveStateEntry *se;
+
+QTAILQ_FOREACH(se, savevm_handlers, entry) {
+if (strcmp(se-idstr, idstr) == 0  se-opaque == opaque) {
+se-no_migrate = 1;
+}
+}
+}
+
 int vmstate_register_with_alias_id(int instance_id,
const VMStateDescription *vmsd,
void *opaque, int alias_id,
@@ -1278,13 +1293,19 @@ static int vmstate_load(QEMUFile *f, SaveStateEntry 
*se, int version_id)
 return vmstate_load_state(f, se-vmsd, se-opaque, version_id);
 }
 
-static void vmstate_save(QEMUFile *f, SaveStateEntry *se)
+static int vmstate_save(QEMUFile *f, SaveStateEntry *se)
 {
+if (se-no_migrate) {
+return -1;
+}
+
 if (!se-vmsd) { /* Old style */
 se-save_state(f, se-opaque);
-return;
+return 0;
 }
 vmstate_save_state(f,se-vmsd, se-opaque);
+
+return 0;
 }
 
 #define QEMU_VM_FILE_MAGIC   0x5145564d
@@ -1378,6 +1399,7 @@ int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f)
 int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
 {
 SaveStateEntry *se;
+int r;
 
 cpu_synchronize_all_states();
 
@@ -1410,7 +1432,11 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
 qemu_put_be32(f, se-instance_id);
 qemu_put_be32(f, se-version_id);
 
-vmstate_save(f, se);
+r = vmstate_save(f, se);
+if (r  0) {
+monitor_printf(mon, cannot migrate with device '%s'\n, 
se-idstr);
+return r;
+}
 }
 
 qemu_put_byte(f, QEMU_VM_EOF);
-- 
1.6.3.2.198.g6096d




[Qemu-devel] [PATCH v7 2/4] Add function to assign ioeventfd to MMIO.

2010-06-15 Thread Cam Macdonell

Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 kvm-all.c |   32 
 kvm.h |1 +
 2 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 47f58a6..2982631 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1257,6 +1257,38 @@ int kvm_set_signal_mask(CPUState *env, const sigset_t 
*sigset)
 return r;
 }
 
+int kvm_set_ioeventfd_mmio_long(int fd, uint32_t addr, uint32_t val, bool 
assign)
+{
+#ifdef KVM_IOEVENTFD
+int ret;
+struct kvm_ioeventfd iofd;
+
+iofd.datamatch = val;
+iofd.addr = addr;
+iofd.len = 4;
+iofd.flags = KVM_IOEVENTFD_FLAG_DATAMATCH;
+iofd.fd = fd;
+
+if (!kvm_enabled()) {
+return -ENOSYS;
+}
+
+if (!assign) {
+iofd.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN;
+}
+
+ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, iofd);
+
+if (ret  0) {
+return -errno;
+}
+
+return 0;
+#else
+return -ENOSYS;
+#endif
+}
+
 int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool 
assign)
 {
 #ifdef KVM_IOEVENTFD
diff --git a/kvm.h b/kvm.h
index aab5118..52e3a7f 100644
--- a/kvm.h
+++ b/kvm.h
@@ -181,6 +181,7 @@ static inline void cpu_synchronize_post_init(CPUState *env)
 }
 
 #endif
+int kvm_set_ioeventfd_mmio_long(int fd, uint32_t adr, uint32_t val, bool 
assign);
 
 #if defined(KVM_IRQFD)  defined(CONFIG_KVM)
 int kvm_set_irqfd(int gsi, int fd, bool assigned);
-- 
1.6.6.1




[Qemu-devel] [PATCH v7 4/4] Inter-VM shared memory PCI device

2010-06-15 Thread Cam Macdonell
Support an inter-vm shared memory device that maps a shared-memory object as a
PCI device in the guest.  This patch also supports interrupts between guest by
communicating over a unix domain socket.  This patch applies to the qemu-kvm
repository.

-device ivshmem,size=size in format accepted by -m[,shm=shm name]

Interrupts are supported between multiple VMs by using a shared memory server
by using a chardev socket.

-device ivshmem,size=size in format accepted by -m[,shm=shm name]
   [,chardev=id][,msi=on][,irqfd=on][,vectors=n][,role=peer|master]
-chardev socket,path=path,id=id

The shared memory server, sample programs and init scripts are in a git repo 
here:

www.gitorious.org/nahanni

Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 Makefile.target |3 +
 hw/ivshmem.c|  823 +++
 qemu-char.c |6 +
 qemu-char.h |3 +
 qemu-doc.texi   |   43 +++
 5 files changed, 878 insertions(+), 0 deletions(-)
 create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index a0e9747..1e99ec8 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -203,6 +203,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
 obj-y += rtl8139.o
 obj-y += e1000.o
 
+# Inter-VM PCI shared memory
+obj-y += ivshmem.o
+
 # Hardware support
 obj-i386-y += vga.o
 obj-i386-y += mc146818rtc.o i8259.o pc.o
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
new file mode 100644
index 000..af035c3
--- /dev/null
+++ b/hw/ivshmem.c
@@ -0,0 +1,823 @@
+/*
+ * Inter-VM Shared Memory PCI device.
+ *
+ * Author:
+ *  Cam Macdonell c...@cs.ualberta.ca
+ *
+ * Based On: cirrus_vga.c
+ *  Copyright (c) 2004 Fabrice Bellard
+ *  Copyright (c) 2004 Makoto Suzuki (suzu)
+ *
+ *  and rtl8139.c
+ *  Copyright (c) 2006 Igor Kovalenko
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+#include hw.h
+#include pc.h
+#include pci.h
+#include msix.h
+#include kvm.h
+
+#include sys/mman.h
+#include sys/types.h
+
+#define IVSHMEM_IRQFD   0
+#define IVSHMEM_MSI 1
+
+#define IVSHMEM_PEER0
+#define IVSHMEM_MASTER  1
+
+#define IVSHMEM_REG_BAR_SIZE 0x100
+
+//#define DEBUG_IVSHMEM
+#ifdef DEBUG_IVSHMEM
+#define IVSHMEM_DPRINTF(fmt, ...)\
+do {printf(IVSHMEM:  fmt, ## __VA_ARGS__); } while (0)
+#else
+#define IVSHMEM_DPRINTF(fmt, ...)
+#endif
+
+typedef struct Peer {
+int nb_eventfds;
+int *eventfds;
+} Peer;
+
+typedef struct EventfdEntry {
+PCIDevice *pdev;
+int vector;
+} EventfdEntry;
+
+typedef struct IVShmemState {
+PCIDevice dev;
+uint32_t intrmask;
+uint32_t intrstatus;
+uint32_t doorbell;
+
+CharDriverState **eventfd_chr;
+CharDriverState *server_chr;
+int ivshmem_mmio_io_addr;
+
+pcibus_t mmio_addr;
+pcibus_t shm_pci_addr;
+uint64_t ivshmem_offset;
+uint64_t ivshmem_size; /* size of shared memory region */
+int shm_fd; /* shared memory file descriptor */
+
+Peer *peers;
+int nb_peers; /* how many guests we have space for */
+int max_peer; /* maximum numbered peer */
+
+int vm_id;
+uint32_t vectors;
+uint32_t features;
+EventfdEntry *eventfd_table;
+
+char * shmobj;
+char * sizearg;
+char * role;
+int role_val;   /* scalar to avoid multiple string comparisons */
+} IVShmemState;
+
+/* registers for the Inter-VM shared memory device */
+enum ivshmem_registers {
+INTRMASK = 0,
+INTRSTATUS = 4,
+IVPOSITION = 8,
+DOORBELL = 12,
+};
+
+static inline uint32_t ivshmem_has_feature(IVShmemState *ivs, unsigned int 
feature) {
+return (ivs-features  (1  feature));
+}
+
+static inline bool is_power_of_two(uint64_t x) {
+return (x  (x - 1)) == 0;
+}
+
+static void ivshmem_map(PCIDevice *pci_dev, int region_num,
+pcibus_t addr, pcibus_t size, int type)
+{
+IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev);
+
+s-shm_pci_addr = addr;
+
+if (s-ivshmem_offset  0) {
+cpu_register_physical_memory(s-shm_pci_addr, s-ivshmem_size,
+s-ivshmem_offset);
+}
+
+IVSHMEM_DPRINTF(guest pci addr = % FMT_PCIBUS ,guest h/w addr = % 
PRIu64
+size = % FMT_PCIBUS \n, addr, s-ivshmem_offset, size);
+
+}
+
+/* accessing registers - based on rtl8139 */
+static void ivshmem_update_irq(IVShmemState *s, int val)
+{
+int isr;
+isr = (s-intrstatus  s-intrmask)  0x;
+
+/* don't print ISR resets */
+if (isr) {
+IVSHMEM_DPRINTF(Set IRQ to %d (%04x %04x)\n,
+   isr ? 1 : 0, s-intrstatus, s-intrmask);
+}
+
+qemu_set_irq(s-dev.irq[0], (isr != 0));
+}
+
+static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val)
+{
+IVSHMEM_DPRINTF(IntrMask write(w) val = 0x%04x\n, val);
+
+s-intrmask = val;
+
+ivshmem_update_irq(s, val);
+}
+
+static uint32_t ivshmem_IntrMask_read(IVShmemState *s)
+{
+uint32_t ret = s-intrmask

[Qemu-devel] [PATCH v7 1/4] Device specification for shared memory PCI device

2010-06-15 Thread Cam Macdonell

Signed-off-by: Cam Macdonell c...@cs.ualberta.ca
---
 docs/specs/ivshmem_device_spec.txt |   96 
 1 files changed, 96 insertions(+), 0 deletions(-)
 create mode 100644 docs/specs/ivshmem_device_spec.txt

diff --git a/docs/specs/ivshmem_device_spec.txt 
b/docs/specs/ivshmem_device_spec.txt
new file mode 100644
index 000..23dd2ba
--- /dev/null
+++ b/docs/specs/ivshmem_device_spec.txt
@@ -0,0 +1,96 @@
+
+Device Specification for Inter-VM shared memory device
+--
+
+The Inter-VM shared memory device is designed to share a region of memory to
+userspace in multiple virtual guests.  The memory region does not belong to any
+guest, but is a POSIX memory object on the host.  Optionally, the device may
+support sending interrupts to other guests sharing the same memory region.
+
+
+The Inter-VM PCI device
+---
+
+*BARs*
+
+The device supports three BARs.  BAR0 is a 1 Kbyte MMIO region to support
+registers.  BAR1 is used for MSI-X when it is enabled in the device.  BAR2 is
+used to map the shared memory object from the host.  The size of BAR2 is
+specified when the guest is started and must be a power of 2 in size.
+
+*Registers*
+
+The device currently supports 4 registers of 32-bits each.  Registers
+are used for synchronization between guests sharing the same memory object when
+interrupts are supported (this requires using the shared memory server).
+
+The server assigns each VM an ID number and sends this ID number to the Qemu
+process when the guest starts.
+
+enum ivshmem_registers {
+IntrMask = 0,
+IntrStatus = 4,
+IVPosition = 8,
+Doorbell = 12
+};
+
+The first two registers are the interrupt mask and status registers.  Mask and
+status are only used with pin-based interrupts.  They are unused with MSI
+interrupts.
+
+Status Register: The status register is set to 1 when an interrupt occurs.
+
+Mask Register: The mask register is bitwise ANDed with the interrupt status
+and the result will raise an interrupt if it is non-zero.  However, since 1 is
+the only value the status will be set to, it is only the first bit of the mask
+that has any effect.  Therefore interrupts can be masked by setting the first
+bit to 0 and unmasked by setting the first bit to 1.
+
+IVPosition Register: The IVPosition register is read-only and reports the
+guest's ID number.  The guest IDs are non-negative integers.  When using the
+server, since the server is a separate process, the VM ID will only be set when
+the device is ready (shared memory is received from the server and accessible 
via
+the device).  If the device is not ready, the IVPosition will return -1.
+Applications should ensure that they have a valid VM ID before accessing the
+shared memory.
+
+Doorbell Register:  To interrupt another guest, a guest must write to the
+Doorbell register.  The doorbell register is 32-bits, logically divided into
+two 16-bit fields.  The high 16-bits are the guest ID to interrupt and the low
+16-bits are the interrupt vector to trigger.  The semantics of the value
+written to the doorbell depends on whether the device is using MSI or a regular
+pin-based interrupt.  In short, MSI uses vectors while regular interrupts set 
the
+status register.
+
+Regular Interrupts
+
+If regular interrupts are used (due to either a guest not supporting MSI or the
+user specifying not to use them on startup) then the value written to the lower
+16-bits of the Doorbell register results is arbitrary and will trigger an
+interrupt in the destination guest.
+
+Message Signalled Interrupts
+
+A ivshmem device may support multiple MSI vectors.  If so, the lower 16-bits
+written to the Doorbell register must be between 0 and the maximum number of
+vectors the guest supports.  The lower 16 bits written to the doorbell is the
+MSI vector that will be raised in the destination guest.  The number of MSI
+vectors is configurable but it is set when the VM is started.
+
+The important thing to remember with MSI is that it is only a signal, no status
+is set (since MSI interrupts are not shared).  All information other than the
+interrupt itself should be communicated via the shared memory region.  Devices
+supporting multiple MSI vectors can use different vectors to indicate different
+events have occurred.  The semantics of interrupt vectors are left to the
+user's discretion.
+
+
+Usage in the Guest
+--
+
+The shared memory device is intended to be used with the provided UIO driver.
+Very little configuration is needed.  The guest should map BAR0 to access the
+registers (an array of 32-bit ints allows simple writing) and map BAR2 to
+access the shared memory region itself.  The size of the shared memory region
+is specified when the guest (or shared memory server) is started.  A guest may
+map the whole shared memory region or only part of it.
-- 
1.6.6.1




[Qemu-devel] Re: [PATCH RFC] Mark a device as non-migratable

2010-06-15 Thread Cam Macdonell
On Tue, Jun 15, 2010 at 10:32 AM, Anthony Liguori anth...@codemonkey.ws wrote:
 On 06/15/2010 11:16 AM, Cam Macdonell wrote:

 How does this look for marking the device as non-migratable?  It adds a
 field
 'no_migrate' to the SaveStateEntry and tests for it in vmstate_save.  This
 would
 replace anything that touches memory.

 Cam

 ---
  hw/hw.h  |    1 +
  savevm.c |   32 +---
  2 files changed, 30 insertions(+), 3 deletions(-)

 diff --git a/hw/hw.h b/hw/hw.h
 index d78d814..7c93f08 100644
 --- a/hw/hw.h
 +++ b/hw/hw.h
 @@ -263,6 +263,7 @@ int register_savevm_live(const char *idstr,
                           void *opaque);

  void unregister_savevm(const char *idstr, void *opaque);
 +void mark_no_migrate(const char *idstr, void *opaque);


 I'm not thrilled with the name but the functionality is spot on.  I lack the
 creativity to offer a better name suggestion :-)

 Regards,

 Anthony Liguori

Hmmm, in working on this it seems that the memory (from
qemu_ram_map()) is still attached even when the device is removed
(which causes migration to fail because there is an unexpected
memory).

Is something like cpu_unregister_physical_memory()/qemu_ram_free() needed?

Cam



Re: [Qemu-devel] Re: [PATCH RFC] Mark a device as non-migratable

2010-06-15 Thread Cam Macdonell
On Tue, Jun 15, 2010 at 4:33 PM, Anthony Liguori anth...@codemonkey.ws wrote:
 On 06/15/2010 05:26 PM, Cam Macdonell wrote:

 On Tue, Jun 15, 2010 at 10:32 AM, Anthony Liguorianth...@codemonkey.ws
  wrote:


 On 06/15/2010 11:16 AM, Cam Macdonell wrote:


 How does this look for marking the device as non-migratable?  It adds a
 field
 'no_migrate' to the SaveStateEntry and tests for it in vmstate_save.
  This
 would
 replace anything that touches memory.

 Cam

 ---
  hw/hw.h  |    1 +
  savevm.c |   32 +---
  2 files changed, 30 insertions(+), 3 deletions(-)

 diff --git a/hw/hw.h b/hw/hw.h
 index d78d814..7c93f08 100644
 --- a/hw/hw.h
 +++ b/hw/hw.h
 @@ -263,6 +263,7 @@ int register_savevm_live(const char *idstr,
                           void *opaque);

  void unregister_savevm(const char *idstr, void *opaque);
 +void mark_no_migrate(const char *idstr, void *opaque);



 I'm not thrilled with the name but the functionality is spot on.  I lack
 the
 creativity to offer a better name suggestion :-)

 Regards,

 Anthony Liguori


 Hmmm, in working on this it seems that the memory (from
 qemu_ram_map()) is still attached even when the device is removed
 (which causes migration to fail because there is an unexpected
 memory).

 Is something like cpu_unregister_physical_memory()/qemu_ram_free() needed?


 Yes.  You need to unregister any memory that you have registered upon device
 removal.

Is there an established way to achieve this?  I can't seem find
another device that unregisters memory registered with
cpu_register_physical_memory().  Is something like
cpu_unregister_physical_memory() needed?

Thanks,
Cam


 Regards,

 Anthony Liguori

 Cam







  1   2   >