Re: [Qemu-devel] [RFC v2 0/6] qtest unit test framework

2011-12-04 Thread Dor Laor

On 12/01/2011 08:43 PM, Anthony Liguori wrote:

This series is still pretty rough but I wanted to get an idea of what people
thought about it before polishing it.

The general idea is outlined in the first test.  The main advantage of this
type of test framework compared to something like kvm-unit-test is that you
don't need a build environment for what you're trying to test.


Luckily w/ qemu cpu emulation and few images it can be set once and be 
there for ever.


The advantage of kvm-unit-test is that the code actually does run. So we 
can test irq injections, io/mmio in the kernel too, dirty bit tracking 
and some more all together.




Since your tests also link against the host environment, it potentially makes
tests much simplier to write (as you aren't reinventing an OS).  I think this
makes this style of test more appropriate for something like QEMU.

Anthony Liguori (6):
   qtest: add test framework
   qtest: add support for target-i386 -M pc
   Add core python test framework
   Add uart test case
   Add RTC test case
   Add C version of rtc-test

  Makefile|4 +
  Makefile.objs   |2 +
  hw/pc.c |7 +-
  hw/pc_piix.c|9 +-
  qemu-options.hx |8 ++
  qtest.c |  357 +++
  qtest.h |   37 ++
  qtest.py|   69 +++
  rtc-test.c  |  201 +++
  rtc-test.py |  105 
  serial-test.py  |   24 
  vl.c|8 ++
  12 files changed, 827 insertions(+), 4 deletions(-)
  create mode 100644 qtest.c
  create mode 100644 qtest.h
  create mode 100644 qtest.py
  create mode 100644 rtc-test.c
  create mode 100644 rtc-test.py
  create mode 100644 serial-test.py






Re: [Qemu-devel] [PATCH for v1.0 1/3] msix: track function masked in pci device state

2011-12-04 Thread Michael S. Tsirkin
On Fri, Dec 02, 2011 at 04:34:21PM -0700, Cam Macdonell wrote:
 Based on a git bisect, this patch breaks msi-x interrupt delivery in
 the ivshmem device.
 
 On Mon, Nov 21, 2011 at 9:57 AM, Michael S. Tsirkin m...@redhat.com wrote:
  Only go over the table when function is masked.
  This is not really important for qemu.git but helps
  fix a bug in qemu-kvm.git.
 
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   hw/msix.c |   21 ++---
   hw/pci.h  |    2 ++
   2 files changed, 16 insertions(+), 7 deletions(-)
 
  diff --git a/hw/msix.c b/hw/msix.c
  index b15bafc..63b41b9 100644
  --- a/hw/msix.c
  +++ b/hw/msix.c
  @@ -79,6 +79,7 @@ static int msix_add_config(struct PCIDevice *pdev, 
  unsigned short nentries,
      /* Make flags bit writable. */
      pdev-wmask[config_offset + MSIX_CONTROL_OFFSET] |= MSIX_ENABLE_MASK |
             MSIX_MASKALL_MASK;
  +    pdev-msix_function_masked = true;
      return 0;
 
 iiuc, this masks the msix by default.

Yes, because msi-x is disabled by default, that's
in the pci spec.

   }
 
  @@ -117,16 +118,11 @@ static void msix_clr_pending(PCIDevice *dev, int 
  vector)
      *msix_pending_byte(dev, vector) = ~msix_pending_mask(vector);
   }
 
  -static int msix_function_masked(PCIDevice *dev)
  -{
  -    return dev-config[dev-msix_cap + MSIX_CONTROL_OFFSET]  
  MSIX_MASKALL_MASK;
  -}
  -
   static int msix_is_masked(PCIDevice *dev, int vector)
   {
      unsigned offset =
          vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
  -    return msix_function_masked(dev) ||
  +    return dev-msix_function_masked ||
            dev-msix_table_page[offset]  PCI_MSIX_ENTRY_CTRL_MASKBIT;
   }
 
  @@ -138,24 +134,34 @@ static void msix_handle_mask_update(PCIDevice *dev, 
  int vector)
      }
   }
 
  +static void msix_update_function_masked(PCIDevice *dev)
  +{
  +    dev-msix_function_masked = !msix_enabled(dev) ||
  +        (dev-config[dev-msix_cap + MSIX_CONTROL_OFFSET]  
  MSIX_MASKALL_MASK);
  +}
  +
   /* Handle MSI-X capability config write. */
   void msix_write_config(PCIDevice *dev, uint32_t addr,
                         uint32_t val, int len)
   {
      unsigned enable_pos = dev-msix_cap + MSIX_CONTROL_OFFSET;
      int vector;
  +    bool was_masked;
 
      if (!range_covers_byte(addr, len, enable_pos)) {
          return;
      }
 
  +    was_masked = dev-msix_function_masked;
  +    msix_update_function_masked(dev);
  +
      if (!msix_enabled(dev)) {
          return;
      }
 
      pci_device_deassert_intx(dev);
 
  -    if (msix_function_masked(dev)) {
  +    if (dev-msix_function_masked == was_masked) {
          return;
      }
 
 So I believe my bug is due to the fact the new logic included in this
 patch requires msix_write_config() to be called to unmask the vectors.

Not exactly, to enable msi-x really.

  Virtio-pci calls msix_write_config(), but ivshmem does not (nor does
 PCIe so I'm not sure if it's also affected).

At this point PCIe is a stub.

 I haven't been able to fix the bug yet, but I wanted to make sure I
 was looking in the correct place.  Any help of further explanation of
 this patch would be greatly appreciated.
 
 Sincerely,
 Cam

So I think you just need to call msix_write_config,
otherwise msix is not getting enabled.

BTW looking at the ivshmem code, this bit looks wrong:

pci_conf[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY;

I think the spec says IO/MEMORY must be disabled at init time since BARs
are not yet set to anything reasonable.

 
  @@ -300,6 +306,7 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
      msix_free_irq_entries(dev);
      qemu_get_buffer(f, dev-msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
      qemu_get_buffer(f, dev-msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 
  8);
  +    msix_update_function_masked(dev);
   }
 
   /* Does device support MSI-X? */
  diff --git a/hw/pci.h b/hw/pci.h
  index 4b2e785..625e717 100644
  --- a/hw/pci.h
  +++ b/hw/pci.h
  @@ -178,6 +178,8 @@ struct PCIDevice {
      unsigned *msix_entry_used;
      /* Region including the MSI-X table */
      uint32_t msix_bar_size;
  +    /* MSIX function mask set or MSIX disabled */
  +    bool msix_function_masked;
      /* Version id needed for VMState */
      int32_t version_id;
 
  --
  1.7.5.53.gc233e
 
 



Re: [Qemu-devel] [PATCH for v1.0 1/3] msix: track function masked in pci device state

2011-12-04 Thread Michael S. Tsirkin
On Fri, Dec 02, 2011 at 04:34:21PM -0700, Cam Macdonell wrote:
 Based on a git bisect, this patch breaks msi-x interrupt delivery in
 the ivshmem device.

I think the following should fix it. Compiled-only -
could you pls check? If yes let's apply to the stable branch.

--

ivshmem: add missing msix calls

ivshmem used msix but didn't call it on either reset or
config write paths. This used to partically work since
guests don't use all of msi-x configuration fields,
and reset is rarely used, but the patch 'msix: track function masked
in pci device state' broke that. Fix by adding appropriate calls.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

--

diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 242fbea..3680c0f 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -505,6 +505,7 @@ static void ivshmem_reset(DeviceState *d)
 IVShmemState *s = DO_UPCAST(IVShmemState, dev.qdev, d);
 
 s-intrstatus = 0;
+msix_reset(s-dev);
 return;
 }
 
@@ -610,6 +611,13 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int 
version_id)
 return 0;
 }
 
+static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address,
+uint32_t val, int len)
+{
+pci_default_write_config(pci_dev, address, val, len);
+msix_write_config(pci_dev, address, val, len);
+}
+
 static int pci_ivshmem_init(PCIDevice *dev)
 {
 IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
@@ -734,6 +742,8 @@ static int pci_ivshmem_init(PCIDevice *dev)
 
 }
 
+s-dev.config_write = ivshmem_write_config;
+
 return 0;
 }
 



Re: [Qemu-devel] [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-12-04 Thread Avi Kivity
On 12/03/2011 06:37 AM, Takuya Yoshikawa wrote:
 Avi Kivity a...@redhat.com wrote:
  That's true.  But some applications do require low latency, and the
  current code can impose a lot of time with the mmu spinlock held.
  
  The total amount of work actually increases slightly, from O(N) to O(N
  log N), but since the tree is so wide, the overhead is small.
  

 Controlling the latency can be achieved by making the user space limit
 the number of dirty pages to scan without hacking the core mmu code.

   The fact that we cannot transfer so many pages on the network at
   once suggests this is reasonable.

That is true.  Write protecting everything at once means that there is a
large window between the sampling the dirty log, and transferring the
page.  Any writes within that window cause a re-transfer, even when they
should not.


 With the rmap write protection method in KVM, the only thing we need is
 a new GET_DIRTY_LOG api which takes the [gfn_start, gfn_end] to scan,
 or max_write_protections optionally.

Right.


   I remember that someone suggested splitting the slot at KVM forum.
   Same effect with less effort.

 QEMU can also avoid unwanted page faults by using this api wisely.

   E.g. you can use this for Interactivity improvements TODO on
   KVM wiki, I think.

 Furthermore, QEMU may be able to use multiple threads for the memory
 copy task.

   Each thread has its own range of memory to copy, and does
   GET_DIRTY_LOG independently.  This will make things easy to
   add further optimizations in QEMU.

 In summary, my impression is that the main cause of the current latency
 problem is not the write protection of KVM but the strategy which tries
 to cook the large slot in one hand.

 What do you think?

I agree.  Maybe O(1) write protection has a place, but it is secondary
to fine-grained dirty logging, and if we implement it, it should be
after your idea, and further measurements.

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC][PATCH 01/16] msi: Generalize msix_supported to msi_supported

2011-12-04 Thread Michael S. Tsirkin
On Sat, Dec 03, 2011 at 12:17:26PM +0100, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com
 
 Rename msix_supported to msi_supported and control MSI and MSI-X
 activation this way. That was likely to original intention for this
 flag, but MSI support came after MSI-X.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Acked-by: Michael S. Tsirkin m...@redhat.com

This patch should go into qemu.git, right?

 ---
  hw/msi.c  |8 
  hw/msi.h  |2 ++
  hw/msix.c |9 -
  hw/msix.h |2 --
  hw/pc.c   |4 ++--
  5 files changed, 16 insertions(+), 9 deletions(-)
 
 diff --git a/hw/msi.c b/hw/msi.c
 index f214fcf..5d6ceb6 100644
 --- a/hw/msi.c
 +++ b/hw/msi.c
 @@ -36,6 +36,9 @@
  
  #define PCI_MSI_VECTORS_MAX 32
  
 +/* Flag for interrupt controller to declare MSI/MSI-X support */
 +bool msi_supported;
 +
  /* If we get rid of cap allocator, we won't need this. */
  static inline uint8_t msi_cap_sizeof(uint16_t flags)
  {
 @@ -116,6 +119,11 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
  uint16_t flags;
  uint8_t cap_size;
  int config_offset;
 +
 +if (!msi_supported) {
 +return -ENOTSUP;
 +}
 +
  MSI_DEV_PRINTF(dev,
 init offset: 0x%PRIx8 vector: %PRId8
  64bit %d mask %d\n,
 diff --git a/hw/msi.h b/hw/msi.h
 index 5766018..3040bb0 100644
 --- a/hw/msi.h
 +++ b/hw/msi.h
 @@ -24,6 +24,8 @@
  #include qemu-common.h
  #include pci.h
  
 +extern bool msi_supported;
 +
  bool msi_enabled(const PCIDevice *dev);
  int msi_init(struct PCIDevice *dev, uint8_t offset,
   unsigned int nr_vectors, bool msi64bit, bool 
 msi_per_vector_mask);
 diff --git a/hw/msix.c b/hw/msix.c
 index b15bafc..8850fbd 100644
 --- a/hw/msix.c
 +++ b/hw/msix.c
 @@ -12,6 +12,7 @@
   */
  
  #include hw.h
 +#include msi.h
  #include msix.h
  #include pci.h
  #include range.h
 @@ -32,9 +33,6 @@
  #define MSIX_MAX_ENTRIES 32
  
  
 -/* Flag for interrupt controller to declare MSI-X support */
 -int msix_supported;
 -
  /* Add MSI-X capability to the config space for the device. */
  /* Given a bar and its size, add MSI-X table on top of it
   * and fill MSI-X capability in the config space.
 @@ -212,10 +210,11 @@ int msix_init(struct PCIDevice *dev, unsigned short 
 nentries,
unsigned bar_nr, unsigned bar_size)
  {
  int ret;
 +
  /* Nothing to do if MSI is not supported by interrupt controller */
 -if (!msix_supported)
 +if (!msi_supported) {
  return -ENOTSUP;
 -
 +}
  if (nentries  MSIX_MAX_ENTRIES)
  return -EINVAL;
  
 diff --git a/hw/msix.h b/hw/msix.h
 index 7e04336..5aba22b 100644
 --- a/hw/msix.h
 +++ b/hw/msix.h
 @@ -29,6 +29,4 @@ void msix_notify(PCIDevice *dev, unsigned vector);
  
  void msix_reset(PCIDevice *dev);
  
 -extern int msix_supported;
 -
  #endif
 diff --git a/hw/pc.c b/hw/pc.c
 index 9328ee5..5225d5b 100644
 --- a/hw/pc.c
 +++ b/hw/pc.c
 @@ -36,7 +36,7 @@
  #include elf.h
  #include multiboot.h
  #include mc146818rtc.h
 -#include msix.h
 +#include msi.h
  #include sysbus.h
  #include sysemu.h
  #include blockdev.h
 @@ -896,7 +896,7 @@ static DeviceState *apic_init(void *env, uint8_t apic_id)
  apic_mapped = 1;
  }
  
 -msix_supported = 1;
 +msi_supported = true;
  
  return dev;
  }
 -- 
 1.7.3.4



Re: [Qemu-devel] [RFC][PATCH 01/16] msi: Generalize msix_supported to msi_supported

2011-12-04 Thread Jan Kiszka
On 2011-12-04 11:42, Michael S. Tsirkin wrote:
 On Sat, Dec 03, 2011 at 12:17:26PM +0100, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 Rename msix_supported to msi_supported and control MSI and MSI-X
 activation this way. That was likely to original intention for this
 flag, but MSI support came after MSI-X.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
 Acked-by: Michael S. Tsirkin m...@redhat.com
 
 This patch should go into qemu.git, right?

Right. It was just that this series depends on it. Feel free to pick it
up earlier.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC][PATCH 02/16] kvm: Move kvmclock into hw/kvm folder

2011-12-04 Thread Avi Kivity
On 12/04/2011 12:33 AM, Jan Kiszka wrote:
 Do we have a convention that every include in  is considered system
 header? Should probably be documented then (and code should be converted
 gradually).

It's documented in The C Programming Language, by KR.

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC][PATCH 02/16] kvm: Move kvmclock into hw/kvm folder

2011-12-04 Thread Jan Kiszka
On 2011-12-04 11:43, Avi Kivity wrote:
 On 12/04/2011 12:33 AM, Jan Kiszka wrote:
 Do we have a convention that every include in  is considered system
 header? Should probably be documented then (and code should be converted
 gradually).
 
 It's documented in The C Programming Language, by KR.

It's just a convention, nothing more. If you consider certain parts of
QEMU's API as system (e.g. the parts that may once make our modular
API), it makes some sense to use  for. Right now this happens for some
parts of the hw API. But inconsistently.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] Improve QEMU performance with LLVM codegen and other techniques

2011-12-04 Thread Alexander Graf

On 04.12.2011, at 07:14, 陳韋任 wrote:

 3. Then a trace composed of TCG blocks is sent to a LLVM translator. The 
 translator
  generates the host binary for the trace into a LLVM code cache, and patch 
 the
 
 I don't fully understand this part. Do you disassemble the x86 blob that TCG 
 emitted?
 
  We ask TCG to disassemble the guest binary where the trace beginning with
 _again_ to get a set of TCG blocks, then sent them to the LLVM translator.

So you have two TCG backends? One to generate real host code and one that goes 
into your LLVM generator?

 
 the moment (make the situation simpler), I think we still don't have to 
 check
 the blocks' hflags and segment descriptors in the trace to see if they 
 match.
 
 Yeah. You only need to be sync'ed with the invalidation then. And make sure 
 you patch the TB atomically, so you don't have a separate thread 
 accidentally run half your code and half the old code.
 
  Sync'ed with the invalidation means tb_flush, cpu_unlink and 
 tb_phys_invalidate?

Yup :)


Alex




Re: [Qemu-devel] sub-page-sized mmio regions and address passed to read/write fns

2011-12-04 Thread Avi Kivity
On 12/02/2011 04:49 PM, Peter Maydell wrote:
 Hi; I was working on a refactoring of the ARM 11MPCore/A9MP private
 peripherals and encountered something odd. Rather than having a single
 large mmio region, I tried splitting into several regions, like this:

 memory_region_init(s-container, a9mp-priv-container, 0x2000);
 memory_region_init_io(s-scu_iomem, a9_scu_ops, s, a9mp-scu, 0x100);
 memory_region_init_io(s-gic_cpu_iomem, a9_gic_cpu_ops, s,
   a9mp-gic-cpu, 0x100);
 memory_region_init_io(s-ptimer_iomem, a9_ptimer_ops, s,
   a9mp-ptimer, 0x100);
 memory_region_add_subregion(s-container, 0, s-scu_iomem);
 memory_region_add_subregion(s-container, 0x100, s-gic_cpu_iomem);
 memory_region_add_subregion(s-container, 0x600, s-ptimer_iomem);
 memory_region_add_subregion(s-container, 0x1000, s-gic.iomem);
 sysbus_init_mmio_region(dev, s-container);

Good practice IMO, will become more important when we introduce a
Register class.

 However what I found is that the addresses passed to the read/write
 functions aren't what I would expect. For instance if the board
 maps the container at address 0x1e00, then a read from 0x1e000100
 goes to the functions given by a9_gic_cpu_ops, as it should. However,
 the offset parameter that the read function is passed is not 0x0
 (offset from the start of the a9mp-gic-cpu region) but 0x100 (offset
 from the start of the page, I think).

 Is this expected behaviour? I certainly wasn't expecting it...

A while ago this was the behaviour across the board.  Then 8da3ff1809747
changed addresses to be relative, but apparently missed the subpage case.

 I looked through the code that's getting called for reads, and
 it looks to me like exec.c:subpage_readlen() is causing this.
 We look up the subpage_t based on the address within the page,
 but we don't then adjust the address we pass to io_mem_read
 (except by region_offset, which I take from the comment at the
 top of cpu_register_physical_memory_log() to be for something
 else.)


I think you can use subpage_t's region_offset array for this (adding
into it, of course, so the original value remains).

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [PATCH for v1.0 1/3] msix: track function masked in pci device state

2011-12-04 Thread Jan Kiszka
On 2011-12-04 11:20, Michael S. Tsirkin wrote:
 On Fri, Dec 02, 2011 at 04:34:21PM -0700, Cam Macdonell wrote:
 Based on a git bisect, this patch breaks msi-x interrupt delivery in
 the ivshmem device.
 
 I think the following should fix it. Compiled-only -
 could you pls check? If yes let's apply to the stable branch.
 
 --
 
 ivshmem: add missing msix calls
 
 ivshmem used msix but didn't call it on either reset or
 config write paths. This used to partically work since
 guests don't use all of msi-x configuration fields,
 and reset is rarely used, but the patch 'msix: track function masked
 in pci device state' broke that. Fix by adding appropriate calls.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 --
 
 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 242fbea..3680c0f 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -505,6 +505,7 @@ static void ivshmem_reset(DeviceState *d)
  IVShmemState *s = DO_UPCAST(IVShmemState, dev.qdev, d);
  
  s-intrstatus = 0;
 +msix_reset(s-dev);
  return;
  }
  
 @@ -610,6 +611,13 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int 
 version_id)
  return 0;
  }
  
 +static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address,
 +  uint32_t val, int len)
 +{
 +pci_default_write_config(pci_dev, address, val, len);
 +msix_write_config(pci_dev, address, val, len);
 +}
 +
  static int pci_ivshmem_init(PCIDevice *dev)
  {
  IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 @@ -734,6 +742,8 @@ static int pci_ivshmem_init(PCIDevice *dev)
  
  }
  
 +s-dev.config_write = ivshmem_write_config;
 +
  return 0;
  }
  
 
 

But please fix this for real and merge [1][2] (with depending patches)
into master. The above is just boilerplate code from device POV.

Jan

[1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/80240
[2] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/80244



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH for v1.0 1/3] msix: track function masked in pci device state

2011-12-04 Thread Michael S. Tsirkin
On Sun, Dec 04, 2011 at 01:35:03PM +0100, Jan Kiszka wrote:
 On 2011-12-04 11:20, Michael S. Tsirkin wrote:
  On Fri, Dec 02, 2011 at 04:34:21PM -0700, Cam Macdonell wrote:
  Based on a git bisect, this patch breaks msi-x interrupt delivery in
  the ivshmem device.
  
  I think the following should fix it. Compiled-only -
  could you pls check? If yes let's apply to the stable branch.
  
  --
  
  ivshmem: add missing msix calls
  
  ivshmem used msix but didn't call it on either reset or
  config write paths. This used to partically work since
  guests don't use all of msi-x configuration fields,
  and reset is rarely used, but the patch 'msix: track function masked
  in pci device state' broke that. Fix by adding appropriate calls.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  
  --
  
  diff --git a/hw/ivshmem.c b/hw/ivshmem.c
  index 242fbea..3680c0f 100644
  --- a/hw/ivshmem.c
  +++ b/hw/ivshmem.c
  @@ -505,6 +505,7 @@ static void ivshmem_reset(DeviceState *d)
   IVShmemState *s = DO_UPCAST(IVShmemState, dev.qdev, d);
   
   s-intrstatus = 0;
  +msix_reset(s-dev);
   return;
   }
   
  @@ -610,6 +611,13 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int 
  version_id)
   return 0;
   }
   
  +static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address,
  +uint32_t val, int len)
  +{
  +pci_default_write_config(pci_dev, address, val, len);
  +msix_write_config(pci_dev, address, val, len);
  +}
  +
   static int pci_ivshmem_init(PCIDevice *dev)
   {
   IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
  @@ -734,6 +742,8 @@ static int pci_ivshmem_init(PCIDevice *dev)
   
   }
   
  +s-dev.config_write = ivshmem_write_config;
  +
   return 0;
   }
   
  
  
 
 But please fix this for real and merge [1][2] (with depending patches)
 into master. The above is just boilerplate code from device POV.
 
 Jan
 
 [1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/80240
 [2] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/80244
 

Yes, I agree we should make it easier for devices.
What annoyed me was the need to put msix in save/load.

And that is because of the need to do this in a specific
order.  I hope to switch to an unordered format and
then this will become straight-forward.

-- 
MST



Re: [Qemu-devel] [RFC][PATCH 01/16] msi: Generalize msix_supported to msi_supported

2011-12-04 Thread Avi Kivity
On 12/03/2011 01:17 PM, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 Rename msix_supported to msi_supported and control MSI and MSI-X
 activation this way. That was likely to original intention for this
 flag, but MSI support came after MSI-X.

'and' is a dangerous word in a changelog entry.


 +
 +if (!msi_supported) {
 +return -ENOTSUP;
 +}
 +


This changes behaviour.  qemu 1.0 -M pc-1.0 and qemu-1.1 -M pc-1.0 will
be different after this, no?

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC][PATCH 01/16] msi: Generalize msix_supported to msi_supported

2011-12-04 Thread Jan Kiszka
On 2011-12-04 14:12, Avi Kivity wrote:
 On 12/03/2011 01:17 PM, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 Rename msix_supported to msi_supported and control MSI and MSI-X
 activation this way. That was likely to original intention for this
 flag, but MSI support came after MSI-X.
 
 'and' is a dangerous word in a changelog entry.

This patch hardly qualifies for two IMHO.

 

 +
 +if (!msi_supported) {
 +return -ENOTSUP;
 +}
 +

 
 This changes behaviour.  qemu 1.0 -M pc-1.0 and qemu-1.1 -M pc-1.0 will
 be different after this, no?
 

Only isapc had msix_supported = 0, and I doubt we got there (msi_init)
for that machine. Or am I missing something?

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC][PATCH 10/16] memory: Introduce memory_region_init_reservation

2011-12-04 Thread Avi Kivity
On 12/03/2011 01:17 PM, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 Introduce a memory region type that can reserve I/O space. Such regions
 are useful for modeling I/O that is only handled outside of QEMU, i.e.
 in the context of an accelerator like KVM. Any access to such a region
 from QEMU is a bug and will be reported as such.

This is guest triggerable (DMA into the region), so abort() is too drastic.

 +void memory_region_init_reservation(MemoryRegion *mr,
 +const char *name,
 +uint64_t size)
 +{
 +memory_region_init(mr, name, size);
 +mr-ops = reservation_ops;
 +mr-opaque = mr;
 +mr-terminates = true;
 +mr-backend_registered = false;
 +}

Just calling memory_region_init_io() is simpler, no?


-- 
error compiling committee.c: too many arguments to function




[Qemu-devel] [PATCH 2/6] msi: Guard msi_reset with msi_present

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/msi.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index 541e4e1..612b168 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -183,6 +183,10 @@ void msi_reset(PCIDevice *dev)
 uint16_t flags;
 bool msi64bit;
 
+if (!msi_present(dev)) {
+return;
+}
+
 flags = pci_get_word(dev-config + msi_flags_off(dev));
 flags = ~(PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
 msi64bit = flags  PCI_MSI_FLAGS_64BIT;
-- 
1.7.3.4




[Qemu-devel] [PATCH 0/6] msi: Small refactoring

2011-12-04 Thread Jan Kiszka
Collection of patches to improve MSI[X] usability in device models,
clean up some minor bits, and help kvm irqchip introduction.

CC: Alexander Graf ag...@suse.de
CC: Gerd Hoffmann kra...@redhat.com
CC: Isaku Yamahata yamah...@valinux.co.jp

Jan Kiszka (6):
  msi: Guard msi/msix_write_config with msi_present
  msi: Guard msi_reset with msi_present
  msi: Use msi/msix_present more consistently
  msi: Invoke msi/msix_reset from PCI core
  msi: Invoke msi/msix_write_config from PCI core
  msi: Generalize msix_supported to msi_supported

 hw/ide/ich.c|8 
 hw/intel-hda.c  |   12 
 hw/ioh3420.c|3 +--
 hw/msi.c|   19 ---
 hw/msi.h|2 ++
 hw/msix.c   |   24 +---
 hw/msix.h   |2 --
 hw/pc.c |4 ++--
 hw/pci.c|8 
 hw/pci_bridge.c |4 
 hw/virtio-pci.c |3 ---
 hw/xio3130_downstream.c |3 +--
 hw/xio3130_upstream.c   |2 --
 13 files changed, 47 insertions(+), 47 deletions(-)

-- 
1.7.3.4




[Qemu-devel] [PATCH 6/6] msi: Generalize msix_supported to msi_supported

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Rename msix_supported to msi_supported and control MSI and MSI-X
activation this way. That was likely to original intention for this
flag, but MSI support came after MSI-X.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/msi.c  |8 
 hw/msi.h  |2 ++
 hw/msix.c |9 -
 hw/msix.h |2 --
 hw/pc.c   |4 ++--
 5 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index c4e8a6e..5233204 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -36,6 +36,9 @@
 
 #define PCI_MSI_VECTORS_MAX 32
 
+/* Flag for interrupt controller to declare MSI/MSI-X support */
+bool msi_supported;
+
 /* If we get rid of cap allocator, we won't need this. */
 static inline uint8_t msi_cap_sizeof(uint16_t flags)
 {
@@ -116,6 +119,11 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
 uint16_t flags;
 uint8_t cap_size;
 int config_offset;
+
+if (!msi_supported) {
+return -ENOTSUP;
+}
+
 MSI_DEV_PRINTF(dev,
init offset: 0x%PRIx8 vector: %PRId8
 64bit %d mask %d\n,
diff --git a/hw/msi.h b/hw/msi.h
index 5766018..3040bb0 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -24,6 +24,8 @@
 #include qemu-common.h
 #include pci.h
 
+extern bool msi_supported;
+
 bool msi_enabled(const PCIDevice *dev);
 int msi_init(struct PCIDevice *dev, uint8_t offset,
  unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask);
diff --git a/hw/msix.c b/hw/msix.c
index 876793a..4897c58 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -12,6 +12,7 @@
  */
 
 #include hw.h
+#include msi.h
 #include msix.h
 #include pci.h
 #include range.h
@@ -32,9 +33,6 @@
 #define MSIX_MAX_ENTRIES 32
 
 
-/* Flag for interrupt controller to declare MSI-X support */
-int msix_supported;
-
 /* Add MSI-X capability to the config space for the device. */
 /* Given a bar and its size, add MSI-X table on top of it
  * and fill MSI-X capability in the config space.
@@ -235,10 +233,11 @@ int msix_init(struct PCIDevice *dev, unsigned short 
nentries,
   unsigned bar_nr, unsigned bar_size)
 {
 int ret;
+
 /* Nothing to do if MSI is not supported by interrupt controller */
-if (!msix_supported)
+if (!msi_supported) {
 return -ENOTSUP;
-
+}
 if (nentries  MSIX_MAX_ENTRIES)
 return -EINVAL;
 
diff --git a/hw/msix.h b/hw/msix.h
index 7e04336..5aba22b 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -29,6 +29,4 @@ void msix_notify(PCIDevice *dev, unsigned vector);
 
 void msix_reset(PCIDevice *dev);
 
-extern int msix_supported;
-
 #endif
diff --git a/hw/pc.c b/hw/pc.c
index 33778fe..7e40031 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -36,7 +36,7 @@
 #include elf.h
 #include multiboot.h
 #include mc146818rtc.h
-#include msix.h
+#include msi.h
 #include sysbus.h
 #include sysemu.h
 #include blockdev.h
@@ -896,7 +896,7 @@ static DeviceState *apic_init(void *env, uint8_t apic_id)
 apic_mapped = 1;
 }
 
-msix_supported = 1;
+msi_supported = true;
 
 return dev;
 }
-- 
1.7.3.4




[Qemu-devel] [PATCH 1/6] msi: Guard msi/msix_write_config with msi_present

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Terminate msi/msix_write_config early if support is not enabled. This
allows to remove checks at the caller site if MSI is optional.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/msi.c  |3 ++-
 hw/msix.c |2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index f214fcf..541e4e1 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -264,7 +264,8 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, 
uint32_t val, int len)
 unsigned int vector;
 uint32_t pending;
 
-if (!ranges_overlap(addr, len, dev-msi_cap, msi_cap_sizeof(flags))) {
+if (!msi_present(dev) ||
+!ranges_overlap(addr, len, dev-msi_cap, msi_cap_sizeof(flags))) {
 return;
 }
 
diff --git a/hw/msix.c b/hw/msix.c
index 149eed2..32fd9b2 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -156,7 +156,7 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
 int vector;
 bool was_masked;
 
-if (!range_covers_byte(addr, len, enable_pos)) {
+if (!msix_present(dev) || !range_covers_byte(addr, len, enable_pos)) {
 return;
 }
 
-- 
1.7.3.4




[Qemu-devel] [PATCH 5/6] msi: Invoke msi/msix_write_config from PCI core

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Also this functions is better invoked by the core than by each and every
device. This allows to drop the config_write callbacks from ich and
intel-hda.

CC: Alexander Graf ag...@suse.de
CC: Gerd Hoffmann kra...@redhat.com
CC: Isaku Yamahata yamah...@valinux.co.jp
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/ide/ich.c|8 
 hw/intel-hda.c  |   12 
 hw/ioh3420.c|1 -
 hw/msi.c|2 +-
 hw/pci.c|3 +++
 hw/virtio-pci.c |2 --
 hw/xio3130_downstream.c |1 -
 hw/xio3130_upstream.c   |1 -
 8 files changed, 4 insertions(+), 26 deletions(-)

diff --git a/hw/ide/ich.c b/hw/ide/ich.c
index 3f7510f..a470c01 100644
--- a/hw/ide/ich.c
+++ b/hw/ide/ich.c
@@ -139,13 +139,6 @@ static int pci_ich9_uninit(PCIDevice *dev)
 return 0;
 }
 
-static void pci_ich9_write_config(PCIDevice *pci, uint32_t addr,
-  uint32_t val, int len)
-{
-pci_default_write_config(pci, addr, val, len);
-msi_write_config(pci, addr, val, len);
-}
-
 static PCIDeviceInfo ich_ahci_info[] = {
 {
 .qdev.name= ich9-ahci,
@@ -154,7 +147,6 @@ static PCIDeviceInfo ich_ahci_info[] = {
 .qdev.vmsd= vmstate_ahci,
 .init = pci_ich9_ahci_init,
 .exit = pci_ich9_uninit,
-.config_write = pci_ich9_write_config,
 .vendor_id= PCI_VENDOR_ID_INTEL,
 .device_id= PCI_DEVICE_ID_INTEL_82801IR,
 .revision = 0x02,
diff --git a/hw/intel-hda.c b/hw/intel-hda.c
index 10769e0..995d895 100644
--- a/hw/intel-hda.c
+++ b/hw/intel-hda.c
@@ -1158,17 +1158,6 @@ static int intel_hda_exit(PCIDevice *pci)
 return 0;
 }
 
-static void intel_hda_write_config(PCIDevice *pci, uint32_t addr,
-   uint32_t val, int len)
-{
-IntelHDAState *d = DO_UPCAST(IntelHDAState, pci, pci);
-
-pci_default_write_config(pci, addr, val, len);
-if (d-msi) {
-msi_write_config(pci, addr, val, len);
-}
-}
-
 static int intel_hda_post_load(void *opaque, int version)
 {
 IntelHDAState* d = opaque;
@@ -1252,7 +1241,6 @@ static PCIDeviceInfo intel_hda_info = {
 .qdev.reset   = intel_hda_reset,
 .init = intel_hda_init,
 .exit = intel_hda_exit,
-.config_write = intel_hda_write_config,
 .vendor_id= PCI_VENDOR_ID_INTEL,
 .device_id= 0x2668,
 .revision = 1,
diff --git a/hw/ioh3420.c b/hw/ioh3420.c
index fc2fb3b..886ede8 100644
--- a/hw/ioh3420.c
+++ b/hw/ioh3420.c
@@ -71,7 +71,6 @@ static void ioh3420_write_config(PCIDevice *d,
 pci_get_long(d-config + d-exp.aer_cap + PCI_ERR_ROOT_COMMAND);
 
 pci_bridge_write_config(d, address, val, len);
-msi_write_config(d, address, val, len);
 ioh3420_aer_vector_update(d);
 pcie_cap_slot_write_config(d, address, val, len);
 pcie_aer_write_config(d, address, val, len);
diff --git a/hw/msi.c b/hw/msi.c
index 137dba0..c4e8a6e 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -256,7 +256,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
 stl_le_phys(address, data);
 }
 
-/* call this function after updating configs by pci_default_write_config(). */
+/* Normally called by pci_default_write_config(). */
 void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
 {
 uint16_t flags = pci_get_word(dev-config + msi_flags_off(dev));
diff --git a/hw/pci.c b/hw/pci.c
index 5d5829d..8c814cd 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1056,6 +1056,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
addr, uint32_t val, int l)
 
 if (range_covers_byte(addr, l, PCI_COMMAND))
 pci_update_irq_disabled(d, was_irq_disabled);
+
+msi_write_config(d, addr, val, l);
+msix_write_config(d, addr, val, l);
 }
 
 /***/
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 16a5b08..d21a7ee 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -492,8 +492,6 @@ static void virtio_write_config(PCIDevice *pci_dev, 
uint32_t address,
 virtio_set_status(proxy-vdev,
   proxy-vdev-status  ~VIRTIO_CONFIG_S_DRIVER_OK);
 }
-
-msix_write_config(pci_dev, address, val, len);
 }
 
 static unsigned virtio_pci_get_features(void *opaque)
diff --git a/hw/xio3130_downstream.c b/hw/xio3130_downstream.c
index 464eefa..8e9117d 100644
--- a/hw/xio3130_downstream.c
+++ b/hw/xio3130_downstream.c
@@ -41,7 +41,6 @@ static void xio3130_downstream_write_config(PCIDevice *d, 
uint32_t address,
 pci_bridge_write_config(d, address, val, len);
 pcie_cap_flr_write_config(d, address, val, len);
 pcie_cap_slot_write_config(d, address, val, len);
-msi_write_config(d, address, val, len);
 pcie_aer_write_config(d, address, val, len);
 }
 
diff --git a/hw/xio3130_upstream.c b/hw/xio3130_upstream.c
index 0d8d254..707401e 100644
--- 

Re: [Qemu-devel] [RFC][PATCH 11/16] kvm: Introduce core services for in-kernel irqchip support

2011-12-04 Thread Avi Kivity
On 12/03/2011 01:17 PM, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 Add the basic infrastructure to active in-kernel irqchip support, inject
 interrupts into these models, and maintain IRQ routes.

 Routing is optional and depends on the host arch supporting
 KVM_CAP_IRQ_ROUTING. When it's not available on x86, we loose the HPET

lose

 as we can't route GSI0 to IOAPIC pin 2.

 In-kernel irqchip support will once be controlled by the machine
 property 'kernel_irqchip', but this is not yet wired up.



-- 
error compiling committee.c: too many arguments to function




[Qemu-devel] [PATCH 3/6] msi: Use msi/msix_present more consistently

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Replace some open-coded msi/msix_present checks and drop redundant
msix_supported tests (present implies supported).

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/msi.c  |2 +-
 hw/msix.c |   13 -
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index 612b168..137dba0 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -167,7 +167,7 @@ void msi_uninit(struct PCIDevice *dev)
 uint16_t flags;
 uint8_t cap_size;
 
-if (!(dev-cap_present  QEMU_PCI_CAP_MSI)) {
+if (!msi_present(dev)) {
 return;
 }
 flags = pci_get_word(dev-config + msi_flags_off(dev));
diff --git a/hw/msix.c b/hw/msix.c
index 32fd9b2..876793a 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -283,8 +283,9 @@ static void msix_free_irq_entries(PCIDevice *dev)
 /* Clean up resources for the device. */
 int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
 {
-if (!(dev-cap_present  QEMU_PCI_CAP_MSIX))
+if (!msix_present(dev)) {
 return 0;
+}
 pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
 dev-msix_cap = 0;
 msix_free_irq_entries(dev);
@@ -303,7 +304,7 @@ void msix_save(PCIDevice *dev, QEMUFile *f)
 {
 unsigned n = dev-msix_entries_nr;
 
-if (!(dev-cap_present  QEMU_PCI_CAP_MSIX)) {
+if (!msix_present(dev)) {
 return;
 }
 
@@ -316,7 +317,7 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
 {
 unsigned n = dev-msix_entries_nr;
 
-if (!(dev-cap_present  QEMU_PCI_CAP_MSIX)) {
+if (!msix_present(dev)) {
 return;
 }
 
@@ -368,8 +369,9 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 
 void msix_reset(PCIDevice *dev)
 {
-if (!(dev-cap_present  QEMU_PCI_CAP_MSIX))
+if (!msix_present(dev)) {
 return;
+}
 msix_free_irq_entries(dev);
 dev-config[dev-msix_cap + MSIX_CONTROL_OFFSET] =
~dev-wmask[dev-msix_cap + MSIX_CONTROL_OFFSET];
@@ -408,7 +410,8 @@ void msix_vector_unuse(PCIDevice *dev, unsigned vector)
 
 void msix_unuse_all_vectors(PCIDevice *dev)
 {
-if (!(dev-cap_present  QEMU_PCI_CAP_MSIX))
+if (!msix_present(dev)) {
 return;
+}
 msix_free_irq_entries(dev);
 }
-- 
1.7.3.4




[Qemu-devel] [PATCH 4/6] msi: Invoke msi/msix_reset from PCI core

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

There is no point in pushing this burden to the devices, they may rather
forget to call them (like intel-hda and ahci ATM). Instead, reset
functions are now called from pci_device_reset and pci_bridge_reset.
They do nothing if the MSI/MSI-X is not in use.

CC: Alexander Graf ag...@suse.de
CC: Gerd Hoffmann kra...@redhat.com
CC: Isaku Yamahata yamah...@valinux.co.jp
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/ioh3420.c|2 +-
 hw/pci.c|5 +
 hw/pci_bridge.c |4 
 hw/virtio-pci.c |1 -
 hw/xio3130_downstream.c |2 +-
 hw/xio3130_upstream.c   |1 -
 6 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/hw/ioh3420.c b/hw/ioh3420.c
index a6bfbb9..fc2fb3b 100644
--- a/hw/ioh3420.c
+++ b/hw/ioh3420.c
@@ -81,7 +81,7 @@ static void ioh3420_write_config(PCIDevice *d,
 static void ioh3420_reset(DeviceState *qdev)
 {
 PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
-msi_reset(d);
+
 ioh3420_aer_vector_update(d);
 pcie_cap_root_reset(d);
 pcie_cap_deverr_reset(d);
diff --git a/hw/pci.c b/hw/pci.c
index 399227f..5d5829d 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -31,6 +31,8 @@
 #include loader.h
 #include range.h
 #include qmp-commands.h
+#include msi.h
+#include msix.h
 
 //#define DEBUG_PCI
 #ifdef DEBUG_PCI
@@ -191,6 +193,9 @@ void pci_device_reset(PCIDevice *dev)
 }
 }
 pci_update_mappings(dev);
+
+msi_reset(dev);
+msix_reset(dev);
 }
 
 /*
diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c
index 650d165..6799978 100644
--- a/hw/pci_bridge.c
+++ b/hw/pci_bridge.c
@@ -32,6 +32,8 @@
 #include pci_bridge.h
 #include pci_internals.h
 #include range.h
+#include msi.h
+#include msix.h
 
 /* PCI bridge subsystem vendor ID helper functions */
 #define PCI_SSVID_SIZEOF8
@@ -296,6 +298,8 @@ void pci_bridge_reset(DeviceState *qdev)
 {
 PCIDevice *dev = DO_UPCAST(PCIDevice, qdev, qdev);
 pci_bridge_reset_reg(dev);
+msi_reset(dev);
+msix_reset(dev);
 }
 
 /* default qdev initialization function for PCI-to-PCI bridge */
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 64c6a94..16a5b08 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -271,7 +271,6 @@ static void virtio_pci_reset(DeviceState *d)
 VirtIOPCIProxy *proxy = container_of(d, VirtIOPCIProxy, pci_dev.qdev);
 virtio_pci_stop_ioeventfd(proxy);
 virtio_reset(proxy-vdev);
-msix_reset(proxy-pci_dev);
 proxy-flags = ~VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
 }
 
diff --git a/hw/xio3130_downstream.c b/hw/xio3130_downstream.c
index d3c387d..464eefa 100644
--- a/hw/xio3130_downstream.c
+++ b/hw/xio3130_downstream.c
@@ -48,7 +48,7 @@ static void xio3130_downstream_write_config(PCIDevice *d, 
uint32_t address,
 static void xio3130_downstream_reset(DeviceState *qdev)
 {
 PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
-msi_reset(d);
+
 pcie_cap_deverr_reset(d);
 pcie_cap_slot_reset(d);
 pcie_cap_ari_reset(d);
diff --git a/hw/xio3130_upstream.c b/hw/xio3130_upstream.c
index 8283695..0d8d254 100644
--- a/hw/xio3130_upstream.c
+++ b/hw/xio3130_upstream.c
@@ -47,7 +47,6 @@ static void xio3130_upstream_write_config(PCIDevice *d, 
uint32_t address,
 static void xio3130_upstream_reset(DeviceState *qdev)
 {
 PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
-msi_reset(d);
 pci_bridge_reset(qdev);
 pcie_cap_deverr_reset(d);
 }
-- 
1.7.3.4




Re: [Qemu-devel] [RFC][PATCH 10/16] memory: Introduce memory_region_init_reservation

2011-12-04 Thread Jan Kiszka
On 2011-12-04 14:20, Avi Kivity wrote:
 On 12/03/2011 01:17 PM, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 Introduce a memory region type that can reserve I/O space. Such regions
 are useful for modeling I/O that is only handled outside of QEMU, i.e.
 in the context of an accelerator like KVM. Any access to such a region
 from QEMU is a bug and will be reported as such.
 
 This is guest triggerable (DMA into the region), so abort() is too drastic.

Mmh, true. Will turn it into a print-once warning.

 
 +void memory_region_init_reservation(MemoryRegion *mr,
 +const char *name,
 +uint64_t size)
 +{
 +memory_region_init(mr, name, size);
 +mr-ops = reservation_ops;
 +mr-opaque = mr;
 +mr-terminates = true;
 +mr-backend_registered = false;
 +}
 
 Just calling memory_region_init_io() is simpler, no?

Yep.

Thanks,
Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC][PATCH 13/16] kvm: x86: Add user space part for in-kernel APIC

2011-12-04 Thread Avi Kivity
On 12/03/2011 01:17 PM, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 This introduces the alternative APIC model 'kvm-apic' which makes use of
 KVM's in-kernel device model. MSI is not yet supported, so we disable
 this when the in-kernel model is in use.

  
 -dev = qdev_create(NULL, apic);
 +if (kvm_enabled()  kvm_irqchip_in_kernel()) {
 +dev = qdev_create(NULL, kvm-apic);
 +} else {
 +dev = qdev_create(NULL, apic);
 +}

Is there anything that makes those two devices incompatible?


-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC][PATCH 01/16] msi: Generalize msix_supported to msi_supported

2011-12-04 Thread Avi Kivity
On 12/04/2011 03:16 PM, Jan Kiszka wrote:
 On 2011-12-04 14:12, Avi Kivity wrote:
  On 12/03/2011 01:17 PM, Jan Kiszka wrote:
  From: Jan Kiszka jan.kis...@siemens.com
 
  Rename msix_supported to msi_supported and control MSI and MSI-X
  activation this way. That was likely to original intention for this
  flag, but MSI support came after MSI-X.
  
  'and' is a dangerous word in a changelog entry.

 This patch hardly qualifies for two IMHO.

If we don't have to change it, no.


  
 
  +
  +if (!msi_supported) {
  +return -ENOTSUP;
  +}
  +
 
  
  This changes behaviour.  qemu 1.0 -M pc-1.0 and qemu-1.1 -M pc-1.0 will
  be different after this, no?
  

 Only isapc had msix_supported = 0, and I doubt we got there (msi_init)
 for that machine. Or am I missing something?


Ah, I thought it was a user-settable property, but it isn't.

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC][PATCH 11/16] kvm: Introduce core services for in-kernel irqchip support

2011-12-04 Thread Jan Kiszka
On 2011-12-04 14:23, Avi Kivity wrote:
 On 12/03/2011 01:17 PM, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 Add the basic infrastructure to active in-kernel irqchip support, inject
 interrupts into these models, and maintain IRQ routes.

 Routing is optional and depends on the host arch supporting
 KVM_CAP_IRQ_ROUTING. When it's not available on x86, we loose the HPET
 
 lose

/me is still looking for a semantic proofreader plugin...

 
 as we can't route GSI0 to IOAPIC pin 2.

 In-kernel irqchip support will once be controlled by the machine
 property 'kernel_irqchip', but this is not yet wired up.


 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC][PATCH 11/16] kvm: Introduce core services for in-kernel irqchip support

2011-12-04 Thread Avi Kivity
On 12/04/2011 03:27 PM, Jan Kiszka wrote:
 On 2011-12-04 14:23, Avi Kivity wrote:
  On 12/03/2011 01:17 PM, Jan Kiszka wrote:
  From: Jan Kiszka jan.kis...@siemens.com
 
  Add the basic infrastructure to active in-kernel irqchip support, inject
  interrupts into these models, and maintain IRQ routes.
 
  Routing is optional and depends on the host arch supporting
  KVM_CAP_IRQ_ROUTING. When it's not available on x86, we loose the HPET
  
  lose

 /me is still looking for a semantic proofreader plugin...


Well, I have to comment on something.  If you don't want spelling
corrections, leave some trailing whitespace.

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC][PATCH 11/16] kvm: Introduce core services for in-kernel irqchip support

2011-12-04 Thread Jan Kiszka
On 2011-12-04 14:28, Avi Kivity wrote:
 On 12/04/2011 03:27 PM, Jan Kiszka wrote:
 On 2011-12-04 14:23, Avi Kivity wrote:
 On 12/03/2011 01:17 PM, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 Add the basic infrastructure to active in-kernel irqchip support, inject
 interrupts into these models, and maintain IRQ routes.

 Routing is optional and depends on the host arch supporting
 KVM_CAP_IRQ_ROUTING. When it's not available on x86, we loose the HPET

 lose

 /me is still looking for a semantic proofreader plugin...

 
 Well, I have to comment on something.  If you don't want spelling
 corrections, leave some trailing whitespace.

I could create a messpatch.pl...

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-04 Thread Avi Kivity
On 12/03/2011 01:17 PM, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 Introduce the alternative 'kvm-i8259' device model that exploits KVM
 in-kernel acceleration.

 The PIIX3 initialization code is furthermore extended by KVM specific
 IRQ route setup. Moreover, GSI injection differs in KVM mode from the
 user space model. As we can dispatch ISA-range IRQs to both IOAPIC and
 PIC inside the kernel, we do not need to inject them separately. This is
 reflected by a KVM-specific GSI handler.

 +
 +qemu_irq *kvm_i8259_init(void)
 +{
 +ISADevice *dev;
 +
 +dev = isa_create(kvm-i8259);


Same issue.  Is this a different device, or an different implementation
of the same device?

We're forcing migration from 1.0 to 1.1 to disable in-kernel irqchip on
the target.  For qemu itself, that's no issue.  But for qemu-kvm, it
will result in loss of performance, or hacks to alias the two back together.

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC][PATCH 11/16] kvm: Introduce core services for in-kernel irqchip support

2011-12-04 Thread Avi Kivity
On 12/04/2011 03:30 PM, Jan Kiszka wrote:
  Well, I have to comment on something.  If you don't want spelling
  corrections, leave some trailing whitespace.

 I could create a messpatch.pl...

Ah, and with a --reverse flag we could go through the motions of patch
review without requiring a repost.

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-04 Thread Jan Kiszka
On 2011-12-04 14:31, Avi Kivity wrote:
 On 12/03/2011 01:17 PM, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 Introduce the alternative 'kvm-i8259' device model that exploits KVM
 in-kernel acceleration.

 The PIIX3 initialization code is furthermore extended by KVM specific
 IRQ route setup. Moreover, GSI injection differs in KVM mode from the
 user space model. As we can dispatch ISA-range IRQs to both IOAPIC and
 PIC inside the kernel, we do not need to inject them separately. This is
 reflected by a KVM-specific GSI handler.

 +
 +qemu_irq *kvm_i8259_init(void)
 +{
 +ISADevice *dev;
 +
 +dev = isa_create(kvm-i8259);

 
 Same issue.  Is this a different device, or an different implementation
 of the same device?

They are theoretically the same from guest perspective (therefore you
can migrate between machines that differ in this).

 
 We're forcing migration from 1.0 to 1.1 to disable in-kernel irqchip on
 the target.  For qemu itself, that's no issue.  But for qemu-kvm, it
 will result in loss of performance, or hacks to alias the two back together.

We should this happen with qemu-kvm? The vmstates are compatible, thus
you can migration from old qemu-kvm in-kernel devices to the new kvm-*
ones (once they are feature-equivalent). Not sure how much hacks this
may require to qemu-kvm, but I don't think it should make the situation
worse for that tree.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-04 Thread Avi Kivity
On 12/04/2011 03:42 PM, Jan Kiszka wrote:
 On 2011-12-04 14:31, Avi Kivity wrote:
  On 12/03/2011 01:17 PM, Jan Kiszka wrote:
  From: Jan Kiszka jan.kis...@siemens.com
 
  Introduce the alternative 'kvm-i8259' device model that exploits KVM
  in-kernel acceleration.
 
  The PIIX3 initialization code is furthermore extended by KVM specific
  IRQ route setup. Moreover, GSI injection differs in KVM mode from the
  user space model. As we can dispatch ISA-range IRQs to both IOAPIC and
  PIC inside the kernel, we do not need to inject them separately. This is
  reflected by a KVM-specific GSI handler.
 
  +
  +qemu_irq *kvm_i8259_init(void)
  +{
  +ISADevice *dev;
  +
  +dev = isa_create(kvm-i8259);
 
  
  Same issue.  Is this a different device, or an different implementation
  of the same device?

 They are theoretically the same from guest perspective (therefore you
 can migrate between machines that differ in this).

But the name becomes part of the save/restore ABI, so you can't.

  
  We're forcing migration from 1.0 to 1.1 to disable in-kernel irqchip on
  the target.  For qemu itself, that's no issue.  But for qemu-kvm, it
  will result in loss of performance, or hacks to alias the two back together.

 We should this happen with qemu-kvm? The vmstates are compatible, thus
 you can migration from old qemu-kvm in-kernel devices to the new kvm-*
 ones (once they are feature-equivalent). Not sure how much hacks this
 may require to qemu-kvm, but I don't think it should make the situation
 worse for that tree.

They aren't compatible due to the name clash.  The hack won't be large
(add an alias for the name), but just one hack is enough to keep the
tree alive for a long while.  Better not to add it in the first place.

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-04 Thread Jan Kiszka
On 2011-12-04 14:49, Avi Kivity wrote:
 On 12/04/2011 03:42 PM, Jan Kiszka wrote:
 On 2011-12-04 14:31, Avi Kivity wrote:
 On 12/03/2011 01:17 PM, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 Introduce the alternative 'kvm-i8259' device model that exploits KVM
 in-kernel acceleration.

 The PIIX3 initialization code is furthermore extended by KVM specific
 IRQ route setup. Moreover, GSI injection differs in KVM mode from the
 user space model. As we can dispatch ISA-range IRQs to both IOAPIC and
 PIC inside the kernel, we do not need to inject them separately. This is
 reflected by a KVM-specific GSI handler.

 +
 +qemu_irq *kvm_i8259_init(void)
 +{
 +ISADevice *dev;
 +
 +dev = isa_create(kvm-i8259);


 Same issue.  Is this a different device, or an different implementation
 of the same device?

 They are theoretically the same from guest perspective (therefore you
 can migrate between machines that differ in this).
 
 But the name becomes part of the save/restore ABI, so you can't.

Nope, the vmstate names are identical. That would ruin migration
otherwise. It's just the output of info qtree  co. that changes.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-04 Thread Avi Kivity
On 12/04/2011 03:51 PM, Jan Kiszka wrote:
  
  But the name becomes part of the save/restore ABI, so you can't.

 Nope, the vmstate names are identical. That would ruin migration
 otherwise. It's just the output of info qtree  co. that changes.

Oh, okay.  I still think it's wrong, but now it's just a matter of
taste, and I can live with it.

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-04 Thread Jan Kiszka
On 2011-12-04 15:04, Avi Kivity wrote:
 On 12/04/2011 03:51 PM, Jan Kiszka wrote:

 But the name becomes part of the save/restore ABI, so you can't.

 Nope, the vmstate names are identical. That would ruin migration
 otherwise. It's just the output of info qtree  co. that changes.
 
 Oh, okay.  I still think it's wrong, but now it's just a matter of
 taste, and I can live with it.

Wrong in what sense?

I think the way of merging kvm support into the user space models in
qemu-kvm is not particularly beautiful. But that's my taste, and
therefore I modeled the upstream proposal differently. :)

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 4/6] msi: Invoke msi/msix_reset from PCI core

2011-12-04 Thread Michael S. Tsirkin
On Sun, Dec 04, 2011 at 02:22:12PM +0100, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com
 
 There is no point in pushing this burden to the devices, they may rather
 forget to call them (like intel-hda and ahci ATM). Instead, reset
 functions are now called from pci_device_reset and pci_bridge_reset.
 They do nothing if the MSI/MSI-X is not in use.
 
 CC: Alexander Graf ag...@suse.de
 CC: Gerd Hoffmann kra...@redhat.com
 CC: Isaku Yamahata yamah...@valinux.co.jp
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

What makes me unhappy with this proposal is that msix_write_config, for
example, becomes in fact an internal interface. So devices should be
calling some functions like msix_init from msix.h, but not others like
msix_write_config.

It used to be simple: devices should call msix_.
Now, how are devices to figure it out?

E.g. the comment near msix_write_config says:
/* Handle MSI-X capability config write. */

This puts it at level 11 on Rusty's misuse scale:
Read the documentation and you will get it wrong.

So I tried writing a wapper, something like pci_capability.h, that would
hide the detail and handle all capabilities seamlessly.  Where I got
stuck was migration though, format is ordered so we can't just move the
fields around.  So I decided to wait until we switch to an unordered
format, then it'll become easy.

Thoughts?

 ---
  hw/ioh3420.c|2 +-
  hw/pci.c|5 +
  hw/pci_bridge.c |4 
  hw/virtio-pci.c |1 -
  hw/xio3130_downstream.c |2 +-
  hw/xio3130_upstream.c   |1 -
  6 files changed, 11 insertions(+), 4 deletions(-)
 
 diff --git a/hw/ioh3420.c b/hw/ioh3420.c
 index a6bfbb9..fc2fb3b 100644
 --- a/hw/ioh3420.c
 +++ b/hw/ioh3420.c
 @@ -81,7 +81,7 @@ static void ioh3420_write_config(PCIDevice *d,
  static void ioh3420_reset(DeviceState *qdev)
  {
  PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
 -msi_reset(d);
 +
  ioh3420_aer_vector_update(d);
  pcie_cap_root_reset(d);
  pcie_cap_deverr_reset(d);
 diff --git a/hw/pci.c b/hw/pci.c
 index 399227f..5d5829d 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -31,6 +31,8 @@
  #include loader.h
  #include range.h
  #include qmp-commands.h
 +#include msi.h
 +#include msix.h
  
  //#define DEBUG_PCI
  #ifdef DEBUG_PCI
 @@ -191,6 +193,9 @@ void pci_device_reset(PCIDevice *dev)
  }
  }
  pci_update_mappings(dev);
 +
 +msi_reset(dev);
 +msix_reset(dev);
  }
  
  /*
 diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c
 index 650d165..6799978 100644
 --- a/hw/pci_bridge.c
 +++ b/hw/pci_bridge.c
 @@ -32,6 +32,8 @@
  #include pci_bridge.h
  #include pci_internals.h
  #include range.h
 +#include msi.h
 +#include msix.h
  
  /* PCI bridge subsystem vendor ID helper functions */
  #define PCI_SSVID_SIZEOF8
 @@ -296,6 +298,8 @@ void pci_bridge_reset(DeviceState *qdev)
  {
  PCIDevice *dev = DO_UPCAST(PCIDevice, qdev, qdev);
  pci_bridge_reset_reg(dev);
 +msi_reset(dev);
 +msix_reset(dev);
  }
  
  /* default qdev initialization function for PCI-to-PCI bridge */
 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index 64c6a94..16a5b08 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -271,7 +271,6 @@ static void virtio_pci_reset(DeviceState *d)
  VirtIOPCIProxy *proxy = container_of(d, VirtIOPCIProxy, pci_dev.qdev);
  virtio_pci_stop_ioeventfd(proxy);
  virtio_reset(proxy-vdev);
 -msix_reset(proxy-pci_dev);
  proxy-flags = ~VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
  }
  
 diff --git a/hw/xio3130_downstream.c b/hw/xio3130_downstream.c
 index d3c387d..464eefa 100644
 --- a/hw/xio3130_downstream.c
 +++ b/hw/xio3130_downstream.c
 @@ -48,7 +48,7 @@ static void xio3130_downstream_write_config(PCIDevice *d, 
 uint32_t address,
  static void xio3130_downstream_reset(DeviceState *qdev)
  {
  PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
 -msi_reset(d);
 +
  pcie_cap_deverr_reset(d);
  pcie_cap_slot_reset(d);
  pcie_cap_ari_reset(d);
 diff --git a/hw/xio3130_upstream.c b/hw/xio3130_upstream.c
 index 8283695..0d8d254 100644
 --- a/hw/xio3130_upstream.c
 +++ b/hw/xio3130_upstream.c
 @@ -47,7 +47,6 @@ static void xio3130_upstream_write_config(PCIDevice *d, 
 uint32_t address,
  static void xio3130_upstream_reset(DeviceState *qdev)
  {
  PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
 -msi_reset(d);
  pci_bridge_reset(qdev);
  pcie_cap_deverr_reset(d);
  }
 -- 
 1.7.3.4



Re: [Qemu-devel] [PATCH 4/6] msi: Invoke msi/msix_reset from PCI core

2011-12-04 Thread Jan Kiszka
On 2011-12-04 15:24, Michael S. Tsirkin wrote:
 On Sun, Dec 04, 2011 at 02:22:12PM +0100, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 There is no point in pushing this burden to the devices, they may rather
 forget to call them (like intel-hda and ahci ATM). Instead, reset
 functions are now called from pci_device_reset and pci_bridge_reset.
 They do nothing if the MSI/MSI-X is not in use.

 CC: Alexander Graf ag...@suse.de
 CC: Gerd Hoffmann kra...@redhat.com
 CC: Isaku Yamahata yamah...@valinux.co.jp
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
 What makes me unhappy with this proposal is that msix_write_config, for
 example, becomes in fact an internal interface. So devices should be
 calling some functions like msix_init from msix.h, but not others like
 msix_write_config.
 
 It used to be simple: devices should call msix_.
 Now, how are devices to figure it out?
 
 E.g. the comment near msix_write_config says:
 /* Handle MSI-X capability config write. */

That should be aligned to msi_write_config's comment.

My goal is to reduce the number of calls devices have to do in order to
use MSI. We have quite a few correct examples by now, so it should not
be too hard to figure out what to do to use standard MSI[X] services.

Maybe a PCI skeleton device model would help further. Or up-to-date
documentation, thought that may be even harder. ;)

 
 This puts it at level 11 on Rusty's misuse scale:
 Read the documentation and you will get it wrong.
 
 So I tried writing a wapper, something like pci_capability.h, that would
 hide the detail and handle all capabilities seamlessly.  Where I got
 stuck was migration though, format is ordered so we can't just move the
 fields around.  So I decided to wait until we switch to an unordered
 format, then it'll become easy.
 
 Thoughts?

MSI-X save/restore is, well, unfortunate. Just like the whole PCI layer
in this regard. But I don't think that should block this particular step
as it frees device models from an unneeded burden.

Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v2 00/16] uq/master: Introduce basic irqchip support

2011-12-04 Thread Jan Kiszka
This is v2, addressing the feedback comments provided so far, namely:
 - dropped #include  conversions
 - do not abort() on reserved memory region accesses but only warn once
 - use memory_region_init_io in memory_region_init_reservation

Patch 1 of this series has meanwhile been posted for direct upstream
inclusion, see http://thread.gmane.org/gmane.comp.emulators.qemu/127308.
I'm keeping it to easy testing, but it should likely not go via
uq/master. The same may apply to other patches too, e.g. 10.

--- original series description ---

Some weeks back I posted my MSI rework for qemu-kvm that shall once help
integrating those bits into upstream. After that I wondered how a
rewritten in-kernel irqchip model could look like and make use of this.
But then I realized that there is actually no technical need to role out
a first version of kvm irqchips that already support MSI. As the MSI
thing will likely take a few more iterations, I now decided to rush
forward with basic kvm irqchip for QEMU upstream. Here we go.

My idea was always to create proper alternatives to the existing user
space device models while keeping the vmstates 100% compatible. I think
I succeeded in this, tests worked fine so far. The kvm and the user
space models now have a common core where they share logic and specific
code modules where they differ. Also, I moved all kvm devices into
hw/kvm.

The in-kernel irqchip support can be controlled via a machine property
(-machine ...,kernel_irqchip=on), in contrast to qemu-kvm's dedicated
command line switch. This series keeps the support off by default
because we still lack the MSI bits as I explained. Also, in-kernel PIT
is not yet implemented and TPR patching/VAPIC (for Windows guests).

The merge story would basically look similar to what we did before with
the clean-room reimplementation of kvm for QEMU: Merge into upstream,
merge back into qemu-kvm, disabling the new bits for now, then gradually
switching over to the new services, specifically once they are
feature-equivalent. Of course, I will support these steps as usual.

So, feedback and review welcome!

Jan Kiszka (16):
  msi: Generalize msix_supported to msi_supported
  kvm: Move kvmclock into hw/kvm folder
  apic: Stop timer on reset
  apic: Factor out core for KVM reuse
  apic: Open-code timer save/restore
  i8259: Factor out core for KVM reuse
  ioapic: Convert to memory API
  ioapic: Reject non-dword accesses to IOWIN register
  ioapic: Factor out core for KVM reuse
  memory: Introduce memory_region_init_reservation
  kvm: Introduce core services for in-kernel irqchip support
  kvm: x86: Establish IRQ0 override control
  kvm: x86: Add user space part for in-kernel APIC
  kvm: x86: Add user space part for in-kernel i8259
  kvm: x86: Add user space part for in-kernel IOAPIC
  kvm: Arm in-kernel irqchip support

 Makefile.objs  |2 +-
 Makefile.target|6 +-
 configure  |1 +
 hw/apic.c  |  288 
 hw/apic_common.c   |  262 
 hw/apic_internal.h |  111 +++
 hw/i8259.c |   78 +---
 hw/i8259_common.c  |  103 ++
 hw/i8259_internal.h|   67 +
 hw/ioapic.c|  136 +++
 hw/ioapic_common.c |   89 
 hw/ioapic_internal.h   |   94 +
 hw/kvm/apic.c  |  147 
 hw/{kvmclock.c = kvm/clock.c} |4 +-
 hw/{kvmclock.h = kvm/clock.h} |0
 hw/kvm/i8259.c |  154 +
 hw/kvm/ioapic.c|  120 +
 hw/msi.c   |8 +
 hw/msi.h   |2 +
 hw/msix.c  |9 +-
 hw/msix.h  |2 -
 hw/pc.c|   20 ++-
 hw/pc.h|1 +
 hw/pc_piix.c   |   67 +-
 kvm-all.c  |  154 +
 kvm-stub.c |5 +
 kvm.h  |   13 ++
 memory.c   |   36 +
 memory.h   |   16 +++
 qemu-config.c  |4 +
 qemu-options.hx|5 +-
 sysemu.h   |1 -
 target-i386/kvm.c  |   19 +++
 trace-events   |2 +-
 vl.c   |1 -
 35 files changed, 1547 insertions(+), 480 deletions(-)
 create mode 100644 hw/apic_common.c
 create mode 100644 hw/apic_internal.h
 create mode 100644 hw/i8259_common.c
 create mode 100644 hw/i8259_internal.h
 create mode 100644 hw/ioapic_common.c
 create mode 100644 hw/ioapic_internal.h
 create mode 100644 hw/kvm/apic.c
 rename hw/{kvmclock.c = kvm/clock.c} (98%)
 rename hw/{kvmclock.h = kvm/clock.h} (100%)
 create mode 100644 hw/kvm/i8259.c
 create mode 

[Qemu-devel] [PATCH v2 03/16] apic: Stop timer on reset

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

All LVTs are masked on reset, so the timer becomes ineffective. Letting
it tick nevertheless is harmless, but will at least create a spurious
trace event.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/apic.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 8289eef..2644a82 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -528,6 +528,8 @@ void apic_init_reset(DeviceState *d)
 s-initial_count_load_time = 0;
 s-next_time = 0;
 s-wait_for_sipi = 1;
+
+qemu_del_timer(s-timer);
 }
 
 static void apic_startup(APICState *s, int vector_num)
-- 
1.7.3.4




[Qemu-devel] [PATCH v2 07/16] ioapic: Convert to memory API

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

This maintains the old imprecise access size handling.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/ioapic.c |   28 +++-
 1 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/hw/ioapic.c b/hw/ioapic.c
index 61991d7..56b1612 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -86,6 +86,7 @@ typedef struct IOAPICState IOAPICState;
 
 struct IOAPICState {
 SysBusDevice busdev;
+MemoryRegion io_memory;
 uint8_t id;
 uint8_t ioregsel;
 uint32_t irr;
@@ -195,7 +196,8 @@ void ioapic_eoi_broadcast(int vector)
 }
 }
 
-static uint32_t ioapic_mem_readl(void *opaque, target_phys_addr_t addr)
+static uint64_t
+ioapic_mem_read(void *opaque, target_phys_addr_t addr, unsigned int size)
 {
 IOAPICState *s = opaque;
 int index;
@@ -234,7 +236,8 @@ static uint32_t ioapic_mem_readl(void *opaque, 
target_phys_addr_t addr)
 }
 
 static void
-ioapic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
+ioapic_mem_write(void *opaque, target_phys_addr_t addr, uint64_t val,
+ unsigned int size)
 {
 IOAPICState *s = opaque;
 int index;
@@ -309,32 +312,23 @@ static void ioapic_reset(DeviceState *d)
 }
 }
 
-static CPUReadMemoryFunc * const ioapic_mem_read[3] = {
-ioapic_mem_readl,
-ioapic_mem_readl,
-ioapic_mem_readl,
-};
-
-static CPUWriteMemoryFunc * const ioapic_mem_write[3] = {
-ioapic_mem_writel,
-ioapic_mem_writel,
-ioapic_mem_writel,
+static const MemoryRegionOps ioapic_io_ops = {
+.read = ioapic_mem_read,
+.write = ioapic_mem_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 static int ioapic_init1(SysBusDevice *dev)
 {
 IOAPICState *s = FROM_SYSBUS(IOAPICState, dev);
-int io_memory;
 static int ioapic_no;
 
 if (ioapic_no = MAX_IOAPICS) {
 return -1;
 }
 
-io_memory = cpu_register_io_memory(ioapic_mem_read,
-   ioapic_mem_write, s,
-   DEVICE_NATIVE_ENDIAN);
-sysbus_init_mmio(dev, 0x1000, io_memory);
+memory_region_init_io(s-io_memory, ioapic_io_ops, s, ioapic, 0x1000);
+sysbus_init_mmio_region(dev, s-io_memory);
 
 qdev_init_gpio_in(dev-qdev, ioapic_set_irq, IOAPIC_NUM_PINS);
 
-- 
1.7.3.4




Re: [Qemu-devel] [PATCH 4/6] msi: Invoke msi/msix_reset from PCI core

2011-12-04 Thread Michael S. Tsirkin
On Sun, Dec 04, 2011 at 03:35:38PM +0100, Jan Kiszka wrote:
 On 2011-12-04 15:24, Michael S. Tsirkin wrote:
  On Sun, Dec 04, 2011 at 02:22:12PM +0100, Jan Kiszka wrote:
  From: Jan Kiszka jan.kis...@siemens.com
 
  There is no point in pushing this burden to the devices, they may rather
  forget to call them (like intel-hda and ahci ATM). Instead, reset
  functions are now called from pci_device_reset and pci_bridge_reset.
  They do nothing if the MSI/MSI-X is not in use.
 
  CC: Alexander Graf ag...@suse.de
  CC: Gerd Hoffmann kra...@redhat.com
  CC: Isaku Yamahata yamah...@valinux.co.jp
  Signed-off-by: Jan Kiszka jan.kis...@siemens.com
  
  What makes me unhappy with this proposal is that msix_write_config, for
  example, becomes in fact an internal interface. So devices should be
  calling some functions like msix_init from msix.h, but not others like
  msix_write_config.
  
  It used to be simple: devices should call msix_.
  Now, how are devices to figure it out?
  
  E.g. the comment near msix_write_config says:
  /* Handle MSI-X capability config write. */
 
 That should be aligned to msi_write_config's comment.
 
 My goal is to reduce the number of calls devices have to do in order to
 use MSI. We have quite a few correct examples by now, so it should not
 be too hard to figure out what to do to use standard MSI[X] services.
 
 Maybe a PCI skeleton device model would help further. Or up-to-date
 documentation, thought that may be even harder. ;)

Maybe it's time to move code into hw/pci/ ?
Then we could have private interfaces without
kludges like pci_internals.h ...

-- 
MST



[Qemu-devel] [PATCH v2 13/16] kvm: x86: Add user space part for in-kernel APIC

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

This introduces the alternative APIC model 'kvm-apic' which makes use of
KVM's in-kernel device model. MSI is not yet supported, so we disable
this when the in-kernel model is in use.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Makefile.target   |2 +-
 hw/kvm/apic.c |  147 +
 hw/pc.c   |   15 --
 kvm.h |3 +
 target-i386/kvm.c |8 +++
 5 files changed, 169 insertions(+), 6 deletions(-)
 create mode 100644 hw/kvm/apic.c

diff --git a/Makefile.target b/Makefile.target
index 4cd3c0e..66b42d5 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -231,7 +231,7 @@ obj-i386-y += vmport.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
 obj-i386-y += pc_piix.o
-obj-i386-$(CONFIG_KVM) += kvm/clock.o
+obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o
 obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
 # shared objects
diff --git a/hw/kvm/apic.c b/hw/kvm/apic.c
new file mode 100644
index 000..be6f401
--- /dev/null
+++ b/hw/kvm/apic.c
@@ -0,0 +1,147 @@
+/*
+ * KVM in-kernel APIC support
+ *
+ * Copyright (c) 2011 Siemens AG
+ *
+ * Authors:
+ *  Jan Kiszka  jan.kis...@siemens.com
+ *
+ * This work is licensed under the terms of the GNU GPL version 2.
+ * See the COPYING file in the top-level directory.
+ */
+#include hw/apic_internal.h
+#include kvm.h
+
+static inline void kvm_apic_set_reg(struct kvm_lapic_state *kapic,
+   int reg_id, uint32_t val)
+{
+*((uint32_t *)(kapic-regs + (reg_id  4))) = val;
+}
+
+static inline uint32_t kvm_apic_get_reg(struct kvm_lapic_state *kapic,
+   int reg_id)
+{
+return *((uint32_t *)(kapic-regs + (reg_id  4)));
+}
+
+int kvm_put_apic(CPUState *env)
+{
+APICState *s = DO_UPCAST(APICState, busdev.qdev, env-apic_state);
+struct kvm_lapic_state kapic;
+int i;
+
+if (s  kvm_enabled()  kvm_irqchip_in_kernel()) {
+memset(kapic, 0, sizeof(kapic));
+kvm_apic_set_reg(kapic, 0x2, s-id  24);
+kvm_apic_set_reg(kapic, 0x8, s-tpr);
+kvm_apic_set_reg(kapic, 0xd, s-log_dest  24);
+kvm_apic_set_reg(kapic, 0xe, s-dest_mode  28 | 0x0fff);
+kvm_apic_set_reg(kapic, 0xf, s-spurious_vec);
+for (i = 0; i  8; i++) {
+kvm_apic_set_reg(kapic, 0x10 + i, s-isr[i]);
+kvm_apic_set_reg(kapic, 0x18 + i, s-tmr[i]);
+kvm_apic_set_reg(kapic, 0x20 + i, s-irr[i]);
+}
+kvm_apic_set_reg(kapic, 0x28, s-esr);
+kvm_apic_set_reg(kapic, 0x30, s-icr[0]);
+kvm_apic_set_reg(kapic, 0x31, s-icr[1]);
+for (i = 0; i  APIC_LVT_NB; i++) {
+kvm_apic_set_reg(kapic, 0x32 + i, s-lvt[i]);
+}
+kvm_apic_set_reg(kapic, 0x38, s-initial_count);
+kvm_apic_set_reg(kapic, 0x3e, s-divide_conf);
+
+return kvm_vcpu_ioctl(env, KVM_SET_LAPIC, kapic);
+}
+
+return 0;
+}
+
+int kvm_get_apic(CPUState *env)
+{
+APICState *s = DO_UPCAST(APICState, busdev.qdev, env-apic_state);
+struct kvm_lapic_state kapic;
+int ret, i, v;
+
+if (s  kvm_enabled()  kvm_irqchip_in_kernel()) {
+ret = kvm_vcpu_ioctl(env, KVM_GET_LAPIC, kapic);
+if (ret  0) {
+return ret;
+}
+
+s-id = kvm_apic_get_reg(kapic, 0x2)  24;
+s-tpr = kvm_apic_get_reg(kapic, 0x8);
+s-arb_id = kvm_apic_get_reg(kapic, 0x9);
+s-log_dest = kvm_apic_get_reg(kapic, 0xd)  24;
+s-dest_mode = kvm_apic_get_reg(kapic, 0xe)  28;
+s-spurious_vec = kvm_apic_get_reg(kapic, 0xf);
+for (i = 0; i  8; i++) {
+s-isr[i] = kvm_apic_get_reg(kapic, 0x10 + i);
+s-tmr[i] = kvm_apic_get_reg(kapic, 0x18 + i);
+s-irr[i] = kvm_apic_get_reg(kapic, 0x20 + i);
+}
+s-esr = kvm_apic_get_reg(kapic, 0x28);
+s-icr[0] = kvm_apic_get_reg(kapic, 0x30);
+s-icr[1] = kvm_apic_get_reg(kapic, 0x31);
+for (i = 0; i  APIC_LVT_NB; i++) {
+s-lvt[i] = kvm_apic_get_reg(kapic, 0x32 + i);
+}
+s-initial_count = kvm_apic_get_reg(kapic, 0x38);
+s-divide_conf = kvm_apic_get_reg(kapic, 0x3e);
+
+v = (s-divide_conf  3) | ((s-divide_conf  1)  4);
+s-count_shift = (v + 1)  7;
+
+s-initial_count_load_time = qemu_get_clock_ns(vm_clock);
+apic_next_timer(s, s-initial_count_load_time);
+}
+return 0;
+}
+
+static void kvm_apic_set_base(APICState *s, uint64_t val)
+{
+s-apicbase = val;
+}
+
+static void kvm_apic_set_tpr(APICState *s, uint8_t val)
+{
+s-tpr = (val  0x0f)  4;
+}
+
+static int kvm_apic_init(SysBusDevice *dev)
+{
+APICState *s = FROM_SYSBUS(APICState, dev);
+
+memory_region_init_reservation(s-io_memory, kvm-apic-msi,
+   MSI_SPACE_SIZE);
+
+if 

[Qemu-devel] [PATCH v2 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Introduce the alternative 'kvm-i8259' device model that exploits KVM
in-kernel acceleration.

The PIIX3 initialization code is furthermore extended by KVM specific
IRQ route setup. Moreover, GSI injection differs in KVM mode from the
user space model. As we can dispatch ISA-range IRQs to both IOAPIC and
PIC inside the kernel, we do not need to inject them separately. This is
reflected by a KVM-specific GSI handler.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Makefile.target |2 +-
 hw/kvm/i8259.c  |  154 +++
 hw/pc.h |1 +
 hw/pc_piix.c|   50 --
 4 files changed, 202 insertions(+), 5 deletions(-)
 create mode 100644 hw/kvm/i8259.c

diff --git a/Makefile.target b/Makefile.target
index 66b42d5..850b80f 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -231,7 +231,7 @@ obj-i386-y += vmport.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
 obj-i386-y += pc_piix.o
-obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o
+obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o
 obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
 # shared objects
diff --git a/hw/kvm/i8259.c b/hw/kvm/i8259.c
new file mode 100644
index 000..f3994cb
--- /dev/null
+++ b/hw/kvm/i8259.c
@@ -0,0 +1,154 @@
+/*
+ * KVM in-kernel PIC (i8259) support
+ *
+ * Copyright (c) 2011 Siemens AG
+ *
+ * Authors:
+ *  Jan Kiszka  jan.kis...@siemens.com
+ *
+ * This work is licensed under the terms of the GNU GPL version 2.
+ * See the COPYING file in the top-level directory.
+ */
+#include hw/i8259_internal.h
+#include hw/apic_internal.h
+#include kvm.h
+
+static void kvm_pic_get(PicState *s)
+{
+struct kvm_irqchip chip;
+struct kvm_pic_state *kpic;
+int ret;
+
+chip.chip_id = s-master ? KVM_IRQCHIP_PIC_MASTER : KVM_IRQCHIP_PIC_SLAVE;
+ret = kvm_vm_ioctl(kvm_state, KVM_GET_IRQCHIP, chip);
+if (ret  0) {
+fprintf(stderr, KVM_GET_IRQCHIP failed: %s\n, strerror(ret));
+abort();
+}
+
+kpic = chip.chip.pic;
+
+s-last_irr = kpic-last_irr;
+s-irr = kpic-irr;
+s-imr = kpic-imr;
+s-isr = kpic-isr;
+s-priority_add = kpic-priority_add;
+s-irq_base = kpic-irq_base;
+s-read_reg_select = kpic-read_reg_select;
+s-poll = kpic-poll;
+s-special_mask = kpic-special_mask;
+s-init_state = kpic-init_state;
+s-auto_eoi = kpic-auto_eoi;
+s-rotate_on_auto_eoi = kpic-rotate_on_auto_eoi;
+s-special_fully_nested_mode = kpic-special_fully_nested_mode;
+s-init4 = kpic-init4;
+s-elcr = kpic-elcr;
+s-elcr_mask = kpic-elcr_mask;
+}
+
+static void kvm_pic_put(PicState *s)
+{
+struct kvm_irqchip chip;
+struct kvm_pic_state *kpic;
+int ret;
+
+chip.chip_id = s-master ? KVM_IRQCHIP_PIC_MASTER : KVM_IRQCHIP_PIC_SLAVE;
+
+kpic = chip.chip.pic;
+
+kpic-last_irr = s-last_irr;
+kpic-irr = s-irr;
+kpic-imr = s-imr;
+kpic-isr = s-isr;
+kpic-priority_add = s-priority_add;
+kpic-irq_base = s-irq_base;
+kpic-read_reg_select = s-read_reg_select;
+kpic-poll = s-poll;
+kpic-special_mask = s-special_mask;
+kpic-init_state = s-init_state;
+kpic-auto_eoi = s-auto_eoi;
+kpic-rotate_on_auto_eoi = s-rotate_on_auto_eoi;
+kpic-special_fully_nested_mode = s-special_fully_nested_mode;
+kpic-init4 = s-init4;
+kpic-elcr = s-elcr;
+kpic-elcr_mask = s-elcr_mask;
+
+ret = kvm_vm_ioctl(kvm_state, KVM_SET_IRQCHIP, chip);
+if (ret  0) {
+fprintf(stderr, KVM_GET_IRQCHIP failed: %s\n, strerror(ret));
+abort();
+}
+}
+
+static void kvm_pic_reset(DeviceState *dev)
+{
+PicState *s = container_of(dev, PicState, dev.qdev);
+
+pic_reset_internal(s);
+s-elcr = 0;
+
+kvm_pic_put(s);
+}
+
+static void kvm_pic_set_irq(void *opaque, int irq, int level)
+{
+int delivered;
+
+delivered = kvm_irqchip_set_irq(kvm_state, irq, level);
+apic_set_irq_delivered(delivered);
+}
+
+static int kvm_pic_init(ISADevice *dev)
+{
+PicState *s = DO_UPCAST(PicState, dev, dev);
+
+memory_region_init_reservation(s-base_io, kvm-pic, 2);
+memory_region_init_reservation(s-elcr_io, kvm-elcr, 1);
+
+pic_init_common(s);
+
+s-pre_save = kvm_pic_get;
+s-post_load = kvm_pic_put;
+
+return 0;
+}
+
+qemu_irq *kvm_i8259_init(void)
+{
+ISADevice *dev;
+
+dev = isa_create(kvm-i8259);
+qdev_prop_set_uint32(dev-qdev, iobase, 0x20);
+qdev_prop_set_uint32(dev-qdev, elcr_addr, 0x4d0);
+qdev_prop_set_bit(dev-qdev, master, true);
+qdev_init_nofail(dev-qdev);
+
+dev = isa_create(kvm-i8259);
+qdev_prop_set_uint32(dev-qdev, iobase, 0xa0);
+qdev_prop_set_uint32(dev-qdev, elcr_addr, 0x4d1);
+qdev_init_nofail(dev-qdev);
+
+return qemu_allocate_irqs(kvm_pic_set_irq, NULL, ISA_NUM_IRQS);
+}
+
+static ISADeviceInfo kvm_i8259_info = {
+

Re: [Qemu-devel] [PATCH 4/6] msi: Invoke msi/msix_reset from PCI core

2011-12-04 Thread Jan Kiszka
On 2011-12-04 15:48, Michael S. Tsirkin wrote:
 On Sun, Dec 04, 2011 at 03:35:38PM +0100, Jan Kiszka wrote:
 On 2011-12-04 15:24, Michael S. Tsirkin wrote:
 On Sun, Dec 04, 2011 at 02:22:12PM +0100, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 There is no point in pushing this burden to the devices, they may rather
 forget to call them (like intel-hda and ahci ATM). Instead, reset
 functions are now called from pci_device_reset and pci_bridge_reset.
 They do nothing if the MSI/MSI-X is not in use.

 CC: Alexander Graf ag...@suse.de
 CC: Gerd Hoffmann kra...@redhat.com
 CC: Isaku Yamahata yamah...@valinux.co.jp
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

 What makes me unhappy with this proposal is that msix_write_config, for
 example, becomes in fact an internal interface. So devices should be
 calling some functions like msix_init from msix.h, but not others like
 msix_write_config.

 It used to be simple: devices should call msix_.
 Now, how are devices to figure it out?

 E.g. the comment near msix_write_config says:
 /* Handle MSI-X capability config write. */

 That should be aligned to msi_write_config's comment.

 My goal is to reduce the number of calls devices have to do in order to
 use MSI. We have quite a few correct examples by now, so it should not
 be too hard to figure out what to do to use standard MSI[X] services.

 Maybe a PCI skeleton device model would help further. Or up-to-date
 documentation, thought that may be even harder. ;)
 
 Maybe it's time to move code into hw/pci/ ?
 Then we could have private interfaces without
 kludges like pci_internals.h ...
 

Sounds reasonable.

Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v2 10/16] memory: Introduce memory_region_init_reservation

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Introduce a memory region type that can reserve I/O space. Such regions
are useful for modeling I/O that is only handled outside of QEMU, i.e.
in the context of an accelerator like KVM.

Any access to such a region from QEMU is a bug, but could theoretically
be triggered by guest code (DMA to reserved region). So only warning
about such events once, then ignore them.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 memory.c |   36 
 memory.h |   16 
 2 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/memory.c b/memory.c
index dc5e35d..6d55cf6 100644
--- a/memory.c
+++ b/memory.c
@@ -1003,6 +1003,42 @@ void memory_region_init_rom_device(MemoryRegion *mr,
 mr-backend_registered = true;
 }
 
+static uint64_t invalid_read(void *opaque, target_phys_addr_t addr,
+ unsigned size)
+{
+MemoryRegion *mr = opaque;
+
+if (!mr-warning_printed) {
+fprintf(stderr, Invalid read from memory region %s\n, mr-name);
+mr-warning_printed = true;
+}
+return -1U;
+}
+
+static void invalid_write(void *opaque, target_phys_addr_t addr, uint64_t data,
+  unsigned size)
+{
+MemoryRegion *mr = opaque;
+
+if (!mr-warning_printed) {
+fprintf(stderr, Invalid write to memory region %s\n, mr-name);
+mr-warning_printed = true;
+}
+}
+
+static const MemoryRegionOps reservation_ops = {
+.read = invalid_read,
+.write = invalid_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+void memory_region_init_reservation(MemoryRegion *mr,
+const char *name,
+uint64_t size)
+{
+memory_region_init_io(mr, reservation_ops, mr, name, size);
+}
+
 void memory_region_destroy(MemoryRegion *mr)
 {
 assert(QTAILQ_EMPTY(mr-subregions));
diff --git a/memory.h b/memory.h
index d5b47da..b479350 100644
--- a/memory.h
+++ b/memory.h
@@ -115,6 +115,7 @@ struct MemoryRegion {
 bool terminates;
 bool readable;
 bool readonly; /* For RAM regions */
+bool warning_printed; /* For reservations */
 MemoryRegion *alias;
 target_phys_addr_t alias_offset;
 unsigned priority;
@@ -242,6 +243,21 @@ void memory_region_init_rom_device(MemoryRegion *mr,
uint64_t size);
 
 /**
+ * memory_region_init_reservation: Initialize a memory region that reserves
+ * I/O space.
+ *
+ * A reservation region primariy serves debugging purposes.  It claims I/O
+ * space that is not supposed to be handled by QEMU itself.  Any access via
+ * the memory API will cause an abort().
+ *
+ * @mr: the #MemoryRegion to be initialized
+ * @name: used for debugging; not visible to the user or ABI
+ * @size: size of the region.
+ */
+void memory_region_init_reservation(MemoryRegion *mr,
+const char *name,
+uint64_t size);
+/**
  * memory_region_destroy: Destroy a memory region and relaim all resources.
  *
  * @mr: the region to be destroyed.  May not currently be a subregion
-- 
1.7.3.4




[Qemu-devel] [PATCH v2 12/16] kvm: x86: Establish IRQ0 override control

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

KVM is forced to disable the IRQ0 override when we run with in-kernel
irqchip but without IRQ routing support of the kernel. Set the fwcfg
value correspondingly. This aligns us with qemu-kvm.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/pc.c|3 ++-
 kvm-all.c  |5 +
 kvm-stub.c |5 +
 kvm.h  |2 ++
 sysemu.h   |1 -
 vl.c   |1 -
 6 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 5225d5b..715cc63 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -39,6 +39,7 @@
 #include msi.h
 #include sysbus.h
 #include sysemu.h
+#include kvm.h
 #include blockdev.h
 #include ui/qemu-spice.h
 #include memory.h
@@ -609,7 +610,7 @@ static void *bochs_bios_init(void)
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables,
  acpi_tables_len);
-fw_cfg_add_bytes(fw_cfg, FW_CFG_IRQ0_OVERRIDE, irq0override, 1);
+fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, kvm_allows_irq0_override());
 
 smbios_table = smbios_get_table(smbios_len);
 if (smbios_table)
diff --git a/kvm-all.c b/kvm-all.c
index a85e14f..665455c 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1260,6 +1260,11 @@ int kvm_has_gsi_routing(void)
 return kvm_check_extension(kvm_state, KVM_CAP_IRQ_ROUTING);
 }
 
+int kvm_allows_irq0_override(void)
+{
+return !kvm_enabled() || !kvm_irqchip_in_kernel() || kvm_has_gsi_routing();
+}
+
 void kvm_setup_guest_memory(void *start, size_t size)
 {
 if (!kvm_has_sync_mmu()) {
diff --git a/kvm-stub.c b/kvm-stub.c
index 06064b9..6c2b06b 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -78,6 +78,11 @@ int kvm_has_many_ioeventfds(void)
 return 0;
 }
 
+int kvm_allows_irq0_override(void)
+{
+return 1;
+}
+
 void kvm_setup_guest_memory(void *start, size_t size)
 {
 }
diff --git a/kvm.h b/kvm.h
index 0d6c453..a3c87af 100644
--- a/kvm.h
+++ b/kvm.h
@@ -53,6 +53,8 @@ int kvm_has_xcrs(void);
 int kvm_has_many_ioeventfds(void);
 int kvm_has_gsi_routing(void);
 
+int kvm_allows_irq0_override(void);
+
 #ifdef NEED_CPU_H
 int kvm_init_vcpu(CPUState *env);
 
diff --git a/sysemu.h b/sysemu.h
index 22cd720..3bd896e 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -102,7 +102,6 @@ extern int vga_interface_type;
 extern int graphic_width;
 extern int graphic_height;
 extern int graphic_depth;
-extern uint8_t irq0override;
 extern DisplayType display_type;
 extern const char *keyboard_layout;
 extern int win2k_install_hack;
diff --git a/vl.c b/vl.c
index fcce25f..22d02b9 100644
--- a/vl.c
+++ b/vl.c
@@ -218,7 +218,6 @@ int no_reboot = 0;
 int no_shutdown = 0;
 int cursor_hide = 1;
 int graphic_rotate = 0;
-uint8_t irq0override = 1;
 const char *watchdog;
 QEMUOptionRom option_rom[MAX_OPTION_ROMS];
 int nb_option_roms;
-- 
1.7.3.4




[Qemu-devel] [PATCH v2 02/16] kvm: Move kvmclock into hw/kvm folder

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

More KVM-specific devices will come, so let's start with moving the
kvmclock into a dedicated folder.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Makefile.target|4 ++--
 configure  |1 +
 hw/{kvmclock.c = kvm/clock.c} |4 ++--
 hw/{kvmclock.h = kvm/clock.h} |0
 hw/pc_piix.c   |2 +-
 5 files changed, 6 insertions(+), 5 deletions(-)
 rename hw/{kvmclock.c = kvm/clock.c} (98%)
 rename hw/{kvmclock.h = kvm/clock.h} (100%)

diff --git a/Makefile.target b/Makefile.target
index 1e90df7..3a9e95d 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -231,7 +231,7 @@ obj-i386-y += vmport.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
 obj-i386-y += pc_piix.o
-obj-i386-$(CONFIG_KVM) += kvmclock.o
+obj-i386-$(CONFIG_KVM) += kvm/clock.o
 obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
 # shared objects
@@ -421,7 +421,7 @@ qmp-commands-old.h: $(SRC_PATH)/qmp-commands.hx
 
 clean:
rm -f *.o *.a *~ $(PROGS) nwfpe/*.o fpu/*.o
-   rm -f *.d */*.d tcg/*.o ide/*.o 9pfs/*.o
+   rm -f *.d */*.d tcg/*.o ide/*.o 9pfs/*.o kvm/*.o
rm -f hmp-commands.h qmp-commands-old.h gdbstub-xml.c
 ifdef CONFIG_TRACE_SYSTEMTAP
rm -f *.stp
diff --git a/configure b/configure
index 4f87e0a..d768e44 100755
--- a/configure
+++ b/configure
@@ -3220,6 +3220,7 @@ mkdir -p $target_dir/fpu
 mkdir -p $target_dir/tcg
 mkdir -p $target_dir/ide
 mkdir -p $target_dir/9pfs
+mkdir -p $target_dir/kvm
 if test $target = arm-linux-user -o $target = armeb-linux-user -o 
$target = arm-bsd-user -o $target = armeb-bsd-user ; then
   mkdir -p $target_dir/nwfpe
 fi
diff --git a/hw/kvmclock.c b/hw/kvm/clock.c
similarity index 98%
rename from hw/kvmclock.c
rename to hw/kvm/clock.c
index 5388bc4..5983271 100644
--- a/hw/kvmclock.c
+++ b/hw/kvm/clock.c
@@ -13,9 +13,9 @@
 
 #include qemu-common.h
 #include sysemu.h
-#include sysbus.h
 #include kvm.h
-#include kvmclock.h
+#include hw/sysbus.h
+#include hw/kvm/clock.h
 
 #include linux/kvm.h
 #include linux/kvm_para.h
diff --git a/hw/kvmclock.h b/hw/kvm/clock.h
similarity index 100%
rename from hw/kvmclock.h
rename to hw/kvm/clock.h
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index c89042f..22997b0 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -34,7 +34,7 @@
 #include boards.h
 #include ide.h
 #include kvm.h
-#include kvmclock.h
+#include kvm/clock.h
 #include sysemu.h
 #include sysbus.h
 #include arch_init.h
-- 
1.7.3.4




Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-04 Thread Avi Kivity
On 12/04/2011 04:06 PM, Jan Kiszka wrote:
 On 2011-12-04 15:04, Avi Kivity wrote:
  On 12/04/2011 03:51 PM, Jan Kiszka wrote:
 
  But the name becomes part of the save/restore ABI, so you can't.
 
  Nope, the vmstate names are identical. That would ruin migration
  otherwise. It's just the output of info qtree  co. that changes.
  
  Oh, okay.  I still think it's wrong, but now it's just a matter of
  taste, and I can live with it.

 Wrong in what sense?

In the sense that kernel-apic is just an accelerated apic.  From the
guest point of view, there's no difference, and that should be reflected
in the device model.

If I'm reading an apic register, either from the guest or via a monitor
debug interface, I shouldn't care whether it's accelerated or not.  The
guest part already holds, of course.

 I think the way of merging kvm support into the user space models in
 qemu-kvm is not particularly beautiful. But that's my taste, and
 therefore I modeled the upstream proposal differently. :)

Oh, qemu-kvm was not meant to be an example of engineering elegance,
just minimal changes.

-- 
error compiling committee.c: too many arguments to function




[Qemu-devel] [PATCH v2 11/16] kvm: Introduce core services for in-kernel irqchip support

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Add the basic infrastructure to active in-kernel irqchip support, inject
interrupts into these models, and maintain IRQ routes.

Routing is optional and depends on the host arch supporting
KVM_CAP_IRQ_ROUTING. When it's not available on x86, we looe the HPET as
we can't route GSI0 to IOAPIC pin 2.

In-kernel irqchip support will once be controlled by the machine
property 'kernel_irqchip', but this is not yet wired up.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 kvm-all.c |  149 +
 kvm.h |8 +++
 target-i386/kvm.c |   11 
 3 files changed, 168 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index e7faf5c..a85e14f 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -76,6 +76,13 @@ struct KVMState
 int pit_in_kernel;
 int xsave, xcrs;
 int many_ioeventfds;
+int irqchip_inject_ioctl;
+#ifdef KVM_CAP_IRQ_ROUTING
+struct kvm_irq_routing *irq_routes;
+int nr_allocated_irq_routes;
+uint32_t *used_gsi_bitmap;
+unsigned int max_gsi;
+#endif
 };
 
 KVMState *kvm_state;
@@ -692,6 +699,138 @@ static void kvm_handle_interrupt(CPUState *env, int mask)
 }
 }
 
+int kvm_irqchip_set_irq(KVMState *s, int irq, int level)
+{
+struct kvm_irq_level event;
+int ret;
+
+assert(s-irqchip_in_kernel);
+
+event.level = level;
+event.irq = irq;
+ret = kvm_vm_ioctl(s, s-irqchip_inject_ioctl, event);
+if (ret  0) {
+perror(kvm_set_irqchip_line);
+abort();
+}
+
+return (s-irqchip_inject_ioctl == KVM_IRQ_LINE) ? 1 : event.status;
+}
+
+#ifdef KVM_CAP_IRQ_ROUTING
+static void set_gsi(KVMState *s, unsigned int gsi)
+{
+assert(gsi  s-max_gsi);
+
+s-used_gsi_bitmap[gsi / 32] |= 1U  (gsi % 32);
+}
+
+static void kvm_init_irq_routing(KVMState *s)
+{
+int gsi_count;
+
+gsi_count = kvm_check_extension(s, KVM_CAP_IRQ_ROUTING);
+if (gsi_count  0) {
+unsigned int gsi_bits, i;
+
+/* Round up so we can search ints using ffs */
+gsi_bits = (gsi_count + 31) / 32;
+s-used_gsi_bitmap = g_malloc0(gsi_bits / 8);
+s-max_gsi = gsi_bits;
+
+/* Mark any over-allocated bits as already in use */
+for (i = gsi_count; i  gsi_bits; i++) {
+set_gsi(s, i);
+}
+}
+
+s-irq_routes = g_malloc0(sizeof(*s-irq_routes));
+s-nr_allocated_irq_routes = 0;
+
+kvm_arch_init_irq_routing(s);
+}
+
+static void kvm_add_routing_entry(KVMState *s,
+  struct kvm_irq_routing_entry *entry)
+{
+struct kvm_irq_routing_entry *new;
+int n, size;
+
+if (s-irq_routes-nr == s-nr_allocated_irq_routes) {
+n = s-nr_allocated_irq_routes * 2;
+if (n  64) {
+n = 64;
+}
+size = sizeof(struct kvm_irq_routing);
+size += n * sizeof(*new);
+s-irq_routes = g_realloc(s-irq_routes, size);
+s-nr_allocated_irq_routes = n;
+}
+n = s-irq_routes-nr++;
+new = s-irq_routes-entries[n];
+memset(new, 0, sizeof(*new));
+new-gsi = entry-gsi;
+new-type = entry-type;
+new-flags = entry-flags;
+new-u = entry-u;
+
+set_gsi(s, entry-gsi);
+}
+
+void kvm_irqchip_add_route(KVMState *s, int irq, int irqchip, int pin)
+{
+struct kvm_irq_routing_entry e;
+
+e.gsi = irq;
+e.type = KVM_IRQ_ROUTING_IRQCHIP;
+e.flags = 0;
+e.u.irqchip.irqchip = irqchip;
+e.u.irqchip.pin = pin;
+kvm_add_routing_entry(s, e);
+}
+
+int kvm_irqchip_commit_routes(KVMState *s)
+{
+s-irq_routes-flags = 0;
+return kvm_vm_ioctl(s, KVM_SET_GSI_ROUTING, s-irq_routes);
+}
+
+#else /* !KVM_CAP_IRQ_ROUTING */
+
+static void kvm_init_irq_routing(KVMState *s)
+{
+}
+#endif /* !KVM_CAP_IRQ_ROUTING */
+
+static int kvm_irqchip_create(KVMState *s)
+{
+QemuOptsList *list = qemu_find_opts(machine);
+int ret;
+
+if (QTAILQ_EMPTY(list-head) ||
+!qemu_opt_get_bool(QTAILQ_FIRST(list-head),
+   kernel_irqchip, false) ||
+!kvm_check_extension(s, KVM_CAP_IRQCHIP)) {
+return 0;
+}
+
+ret = kvm_vm_ioctl(s, KVM_CREATE_IRQCHIP);
+if (ret  0) {
+fprintf(stderr, Create kernel irqchip failed\n);
+return ret;
+}
+
+s-irqchip_inject_ioctl = KVM_IRQ_LINE;
+if (kvm_check_extension(s, KVM_CAP_IRQ_INJECT_STATUS)) {
+s-irqchip_inject_ioctl = KVM_IRQ_LINE_STATUS;
+}
+s-irqchip_in_kernel = 1;
+
+kvm_init_irq_routing(s);
+
+return 0;
+}
+
 int kvm_init(void)
 {
 static const char upgrade_note[] =
@@ -786,6 +925,11 @@ int kvm_init(void)
 goto err;
 }
 
+ret = kvm_irqchip_create(s);
+if (ret  0) {
+goto err;
+}
+
 kvm_state = s;
 cpu_register_phys_memory_client(kvm_cpu_phys_memory_client);
 
@@ -,6 +1255,11 @@ int kvm_has_many_ioeventfds(void)
 return kvm_state-many_ioeventfds;
 }
 
+int kvm_has_gsi_routing(void)

[Qemu-devel] [PATCH v2 15/16] kvm: x86: Add user space part for in-kernel IOAPIC

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

This introduces the KVM-accelerated IOAPIC model 'kvm-ioapic' and
extends the IRQ routing setup by the 0-2 redirection when needed.

The kvm-ioapic model has a property that allows to define its GSI base
for injecting interrupts into the kernel model. This will allow to
disentangle PIC and IOAPIC pins for chipsets that support more
sophisticated IRQ routes than the PIIX3. So far the base is kept at 0,
i.e. PIC and IOAPIC share pins 0..15.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Makefile.target  |2 +-
 hw/ioapic_internal.h |1 +
 hw/kvm/ioapic.c  |  120 ++
 hw/pc_piix.c |   15 ++-
 4 files changed, 136 insertions(+), 2 deletions(-)
 create mode 100644 hw/kvm/ioapic.c

diff --git a/Makefile.target b/Makefile.target
index 850b80f..2f3407b 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -231,7 +231,7 @@ obj-i386-y += vmport.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
 obj-i386-y += pc_piix.o
-obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o
+obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o kvm/ioapic.o
 obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
 # shared objects
diff --git a/hw/ioapic_internal.h b/hw/ioapic_internal.h
index bda3608..7d5f735 100644
--- a/hw/ioapic_internal.h
+++ b/hw/ioapic_internal.h
@@ -83,6 +83,7 @@ struct IOAPICState {
 
 void (*pre_save)(IOAPICState *s);
 void (*post_load)(IOAPICState *s);
+uint32_t kvm_gsi_base;
 };
 
 extern const VMStateDescription vmstate_ioapic;
diff --git a/hw/kvm/ioapic.c b/hw/kvm/ioapic.c
new file mode 100644
index 000..1040e29
--- /dev/null
+++ b/hw/kvm/ioapic.c
@@ -0,0 +1,120 @@
+/*
+ * KVM in-kernel IOPIC support
+ *
+ * Copyright (c) 2011 Siemens AG
+ *
+ * Authors:
+ *  Jan Kiszka  jan.kis...@siemens.com
+ *
+ * This work is licensed under the terms of the GNU GPL version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include hw/pc.h
+#include hw/ioapic_internal.h
+#include hw/apic_internal.h
+#include kvm.h
+
+static void kvm_ioapic_get(IOAPICState *s)
+{
+struct kvm_irqchip chip;
+struct kvm_ioapic_state *kioapic;
+int ret, i;
+
+chip.chip_id = KVM_IRQCHIP_IOAPIC;
+ret = kvm_vm_ioctl(kvm_state, KVM_GET_IRQCHIP, chip);
+if (ret  0) {
+fprintf(stderr, KVM_GET_IRQCHIP failed: %s\n, strerror(ret));
+abort();
+}
+
+kioapic = chip.chip.ioapic;
+
+s-id = kioapic-id;
+s-ioregsel = kioapic-ioregsel;
+s-irr = kioapic-irr;
+for (i = 0; i  IOAPIC_NUM_PINS; i++) {
+s-ioredtbl[i] = kioapic-redirtbl[i].bits;
+}
+}
+
+static void kvm_ioapic_put(IOAPICState *s)
+{
+struct kvm_irqchip chip;
+struct kvm_ioapic_state *kioapic;
+int ret, i;
+
+chip.chip_id = KVM_IRQCHIP_IOAPIC;
+kioapic = chip.chip.ioapic;
+
+kioapic-id = s-id;
+kioapic-ioregsel = s-ioregsel;
+kioapic-base_address = s-busdev.mmio[0].addr;
+kioapic-irr = s-irr;
+for (i = 0; i  IOAPIC_NUM_PINS; i++) {
+kioapic-redirtbl[i].bits = s-ioredtbl[i];
+}
+
+ret = kvm_vm_ioctl(kvm_state, KVM_SET_IRQCHIP, chip);
+if (ret  0) {
+fprintf(stderr, KVM_GET_IRQCHIP failed: %s\n, strerror(ret));
+abort();
+}
+}
+
+static void kvm_ioapic_reset(DeviceState *d)
+{
+IOAPICState *s = DO_UPCAST(IOAPICState, busdev.qdev, d);
+
+ioapic_reset_internal(s);
+
+kvm_ioapic_put(s);
+}
+
+static void kvm_ioapic_set_irq(void *opaque, int irq, int level)
+{
+IOAPICState *s = opaque;
+int delivered;
+
+delivered = kvm_irqchip_set_irq(kvm_state, s-kvm_gsi_base + irq, level);
+apic_set_irq_delivered(delivered);
+}
+
+static int kvm_ioapic_init(SysBusDevice *dev)
+{
+IOAPICState *s = FROM_SYSBUS(IOAPICState, dev);
+
+memory_region_init_reservation(s-io_memory, kvm-ioapic, 0x1000);
+
+if (ioapic_init_common(s)  0) {
+memory_region_destroy(s-io_memory);
+return -1;
+}
+
+s-pre_save = kvm_ioapic_get;
+s-post_load = kvm_ioapic_put;
+
+qdev_init_gpio_in(dev-qdev, kvm_ioapic_set_irq, IOAPIC_NUM_PINS);
+
+return 0;
+}
+
+static SysBusDeviceInfo kvm_ioapic_info = {
+.init = kvm_ioapic_init,
+.qdev.name = kvm-ioapic,
+.qdev.size = sizeof(IOAPICState),
+.qdev.vmsd = vmstate_ioapic,
+.qdev.reset = kvm_ioapic_reset,
+.qdev.no_user = 1,
+.qdev.props = (Property[]) {
+DEFINE_PROP_UINT32(gsi_base, IOAPICState, kvm_gsi_base, 0),
+DEFINE_PROP_END_OF_LIST(),
+}
+};
+
+static void kvm_ioapic_register_devices(void)
+{
+sysbus_register_withprop(kvm_ioapic_info);
+}
+
+device_init(kvm_ioapic_register_devices)
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 351b032..624aecd 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -68,6 +68,15 @@ static void kvm_piix3_setup_irq_routing(bool pci_enabled)
 for (i = 8; 

[Qemu-devel] [PATCH v2 05/16] apic: Open-code timer save/restore

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

To enable migration between accelerated and non-accelerated APIC models,
we will need to handle the timer saving and restoring specially and can
no longer rely on the automatics of VMSTATE_TIMER. Specifically,
accelerated model will not start any QEMUTimer.

This patch therefore factors out the generic bits into apic_next_timer
and introduces a post-load callback that can be implemented differently
by both models.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/apic.c  |   30 --
 hw/apic_common.c   |   51 +--
 hw/apic_internal.h |3 +++
 3 files changed, 64 insertions(+), 20 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 27b18d6..9b83c0c 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -516,25 +516,9 @@ static uint32_t apic_get_current_count(APICState *s)
 
 static void apic_timer_update(APICState *s, int64_t current_time)
 {
-int64_t next_time, d;
-
-if (!(s-lvt[APIC_LVT_TIMER]  APIC_LVT_MASKED)) {
-d = (current_time - s-initial_count_load_time) 
-s-count_shift;
-if (s-lvt[APIC_LVT_TIMER]  APIC_LVT_TIMER_PERIODIC) {
-if (!s-initial_count)
-goto no_timer;
-d = ((d / ((uint64_t)s-initial_count + 1)) + 1) * 
((uint64_t)s-initial_count + 1);
-} else {
-if (d = s-initial_count)
-goto no_timer;
-d = (uint64_t)s-initial_count + 1;
-}
-next_time = s-initial_count_load_time + (d  s-count_shift);
-qemu_mod_timer(s-timer, next_time);
-s-next_time = next_time;
+if (apic_next_timer(s, current_time)) {
+qemu_mod_timer(s-timer, s-next_time);
 } else {
-no_timer:
 qemu_del_timer(s-timer);
 }
 }
@@ -756,6 +740,15 @@ static const MemoryRegionOps apic_io_ops = {
 .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
+static void apic_post_load(APICState *s)
+{
+if (s-timer_expiry != -1) {
+qemu_mod_timer(s-timer, s-timer_expiry);
+} else {
+qemu_del_timer(s-timer);
+}
+}
+
 static int apic_init(SysBusDevice *dev)
 {
 APICState *s = FROM_SYSBUS(APICState, dev);
@@ -772,6 +765,7 @@ static int apic_init(SysBusDevice *dev)
 s-timer = qemu_new_timer_ns(vm_clock, apic_timer, s);
 s-set_base = apic_set_base;
 s-set_tpr = apic_set_tpr;
+s-post_load = apic_post_load;
 local_apics[s-idx] = s;
 return 0;
 }
diff --git a/hw/apic_common.c b/hw/apic_common.c
index 7d30356..84a3a27 100644
--- a/hw/apic_common.c
+++ b/hw/apic_common.c
@@ -80,6 +80,39 @@ int apic_get_irq_delivered(void)
 return apic_irq_delivered;
 }
 
+bool apic_next_timer(APICState *s, int64_t current_time)
+{
+int64_t d;
+
+/* We need to store the timer state separately to support APIC
+ * implementations that maintain a non-QEMU timer, e.g. inside the
+ * host kernel. This open-coded state allows us to migrate between
+ * both models. */
+s-timer_expiry = -1;
+
+if (s-lvt[APIC_LVT_TIMER]  APIC_LVT_MASKED) {
+return false;
+}
+
+d = (current_time - s-initial_count_load_time)  s-count_shift;
+
+if (s-lvt[APIC_LVT_TIMER]  APIC_LVT_TIMER_PERIODIC) {
+if (!s-initial_count) {
+return false;
+}
+d = ((d / ((uint64_t)s-initial_count + 1)) + 1) *
+((uint64_t)s-initial_count + 1);
+} else {
+if (d = s-initial_count) {
+return false;
+}
+d = (uint64_t)s-initial_count + 1;
+}
+s-next_time = s-initial_count_load_time + (d  s-count_shift);
+s-timer_expiry = s-next_time;
+return true;
+}
+
 void apic_init_reset(DeviceState *d)
 {
 APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
@@ -107,7 +140,10 @@ void apic_init_reset(DeviceState *d)
 s-next_time = 0;
 s-wait_for_sipi = 1;
 
-qemu_del_timer(s-timer);
+if (s-timer) {
+qemu_del_timer(s-timer);
+}
+s-timer_expiry = -1;
 }
 
 void apic_reset(DeviceState *d)
@@ -172,12 +208,23 @@ static int apic_load_old(QEMUFile *f, void *opaque, int 
version_id)
 return 0;
 }
 
+static int apic_dispatch_post_load(void *opaque, int version_id)
+{
+APICState *s = opaque;
+
+if (s-post_load) {
+s-post_load(s);
+}
+return 0;
+}
+
 const VMStateDescription vmstate_apic = {
 .name = apic,
 .version_id = 3,
 .minimum_version_id = 3,
 .minimum_version_id_old = 1,
 .load_state_old = apic_load_old,
+.post_load = apic_dispatch_post_load,
 .fields  = (VMStateField[]) {
 VMSTATE_UINT32(apicbase, APICState),
 VMSTATE_UINT8(id, APICState),
@@ -197,7 +244,7 @@ const VMStateDescription vmstate_apic = {
 VMSTATE_UINT32(initial_count, APICState),
 VMSTATE_INT64(initial_count_load_time, APICState),
 VMSTATE_INT64(next_time, APICState),
-VMSTATE_TIMER(timer, APICState),
+VMSTATE_INT64(timer_expiry, 

Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-04 Thread Jan Kiszka
On 2011-12-04 16:12, Avi Kivity wrote:
 On 12/04/2011 04:06 PM, Jan Kiszka wrote:
 On 2011-12-04 15:04, Avi Kivity wrote:
 On 12/04/2011 03:51 PM, Jan Kiszka wrote:

 But the name becomes part of the save/restore ABI, so you can't.

 Nope, the vmstate names are identical. That would ruin migration
 otherwise. It's just the output of info qtree  co. that changes.

 Oh, okay.  I still think it's wrong, but now it's just a matter of
 taste, and I can live with it.

 Wrong in what sense?
 
 In the sense that kernel-apic is just an accelerated apic.  From the
 guest point of view, there's no difference, and that should be reflected
 in the device model.

That was my goal as well: The guest should not notice the difference,
but the admin on the host side should still be able to tell both
internally fairly different models apart. Plus the code should be
clearly split where there are differences and explicitly shared where
there aren't.

 
 If I'm reading an apic register, either from the guest or via a monitor
 debug interface, I shouldn't care whether it's accelerated or not.  The
 guest part already holds, of course.

Specifically for the debug scenario, I'd prefer the clear
differentiation by name as there can always remain subtle differences in
the implementation of kernel vs. user space. Someone debugging the guest
and/or qemu/kvm should remain aware of this.

Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v2 06/16] i8259: Factor out core for KVM reuse

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Analogously to the APIC, we will reuse some parts of the user space
i8259 model for KVM. In this case it is the PicState, vmstate
description, a reset core and some init bits.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Makefile.objs   |2 +-
 hw/i8259.c  |   78 +-
 hw/i8259_common.c   |  103 +++
 hw/i8259_internal.h |   67 +
 4 files changed, 174 insertions(+), 76 deletions(-)
 create mode 100644 hw/i8259_common.c
 create mode 100644 hw/i8259_internal.h

diff --git a/Makefile.objs b/Makefile.objs
index 01587c8..5372eec 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -220,7 +220,7 @@ hw-obj-$(CONFIG_APPLESMC) += applesmc.o
 hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o ccid-card-passthru.o
 hw-obj-$(CONFIG_SMARTCARD_NSS) += ccid-card-emulated.o
 hw-obj-$(CONFIG_USB_REDIR) += usb-redir.o
-hw-obj-$(CONFIG_I8259) += i8259.o
+hw-obj-$(CONFIG_I8259) += i8259_common.o i8259.o
 
 # PPC devices
 hw-obj-$(CONFIG_PREP_PCI) += prep_pci.o
diff --git a/hw/i8259.c b/hw/i8259.c
index ab519de..e8a6a9a 100644
--- a/hw/i8259.c
+++ b/hw/i8259.c
@@ -26,6 +26,7 @@
 #include isa.h
 #include monitor.h
 #include qemu-timer.h
+#include i8259_internal.h
 
 /* debug PIC */
 //#define DEBUG_PIC
@@ -40,33 +41,6 @@
 //#define DEBUG_IRQ_LATENCY
 //#define DEBUG_IRQ_COUNT
 
-struct PicState {
-ISADevice dev;
-uint8_t last_irr; /* edge detection */
-uint8_t irr; /* interrupt request register */
-uint8_t imr; /* interrupt mask register */
-uint8_t isr; /* interrupt service register */
-uint8_t priority_add; /* highest irq priority */
-uint8_t irq_base;
-uint8_t read_reg_select;
-uint8_t poll;
-uint8_t special_mask;
-uint8_t init_state;
-uint8_t auto_eoi;
-uint8_t rotate_on_auto_eoi;
-uint8_t special_fully_nested_mode;
-uint8_t init4; /* true if 4 byte init */
-uint8_t single_mode; /* true if slave pic is not initialized */
-uint8_t elcr; /* PIIX edge/trigger selection*/
-uint8_t elcr_mask;
-qemu_irq int_out[1];
-uint32_t master; /* reflects /SP input pin */
-uint32_t iobase;
-uint32_t elcr_addr;
-MemoryRegion base_io;
-MemoryRegion elcr_io;
-};
-
 #if defined(DEBUG_PIC) || defined(DEBUG_IRQ_COUNT)
 static int irq_level[16];
 #endif
@@ -248,22 +222,7 @@ int pic_read_irq(PicState *s)
 
 static void pic_init_reset(PicState *s)
 {
-s-last_irr = 0;
-s-irr = 0;
-s-imr = 0;
-s-isr = 0;
-s-priority_add = 0;
-s-irq_base = 0;
-s-read_reg_select = 0;
-s-poll = 0;
-s-special_mask = 0;
-s-init_state = 0;
-s-auto_eoi = 0;
-s-rotate_on_auto_eoi = 0;
-s-special_fully_nested_mode = 0;
-s-init4 = 0;
-s-single_mode = 0;
-/* Note: ELCR is not reset */
+pic_reset_internal(s);
 pic_update_irq(s);
 }
 
@@ -418,32 +377,6 @@ static uint64_t elcr_ioport_read(void *opaque, 
target_phys_addr_t addr,
 return s-elcr;
 }
 
-static const VMStateDescription vmstate_pic = {
-.name = i8259,
-.version_id = 1,
-.minimum_version_id = 1,
-.minimum_version_id_old = 1,
-.fields = (VMStateField[]) {
-VMSTATE_UINT8(last_irr, PicState),
-VMSTATE_UINT8(irr, PicState),
-VMSTATE_UINT8(imr, PicState),
-VMSTATE_UINT8(isr, PicState),
-VMSTATE_UINT8(priority_add, PicState),
-VMSTATE_UINT8(irq_base, PicState),
-VMSTATE_UINT8(read_reg_select, PicState),
-VMSTATE_UINT8(poll, PicState),
-VMSTATE_UINT8(special_mask, PicState),
-VMSTATE_UINT8(init_state, PicState),
-VMSTATE_UINT8(auto_eoi, PicState),
-VMSTATE_UINT8(rotate_on_auto_eoi, PicState),
-VMSTATE_UINT8(special_fully_nested_mode, PicState),
-VMSTATE_UINT8(init4, PicState),
-VMSTATE_UINT8(single_mode, PicState),
-VMSTATE_UINT8(elcr, PicState),
-VMSTATE_END_OF_LIST()
-}
-};
-
 static const MemoryRegionOps pic_base_ioport_ops = {
 .read = pic_ioport_read,
 .write = pic_ioport_write,
@@ -469,16 +402,11 @@ static int pic_initfn(ISADevice *dev)
 memory_region_init_io(s-base_io, pic_base_ioport_ops, s, pic, 2);
 memory_region_init_io(s-elcr_io, pic_elcr_ioport_ops, s, elcr, 1);
 
-isa_register_ioport(NULL, s-base_io, s-iobase);
-if (s-elcr_addr != -1) {
-isa_register_ioport(NULL, s-elcr_io, s-elcr_addr);
-}
+pic_init_common(s);
 
 qdev_init_gpio_out(dev-qdev, s-int_out, ARRAY_SIZE(s-int_out));
 qdev_init_gpio_in(dev-qdev, pic_set_irq, 8);
 
-qdev_set_legacy_instance_id(dev-qdev, s-iobase, 1);
-
 return 0;
 }
 
diff --git a/hw/i8259_common.c b/hw/i8259_common.c
new file mode 100644
index 000..9d2fbc3
--- /dev/null
+++ b/hw/i8259_common.c
@@ -0,0 +1,103 @@
+/*
+ * QEMU 8259 - common bits of emulated and KVM kernel model
+ *
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ * Copyright (c) 

[Qemu-devel] [PATCH v2 09/16] ioapic: Factor out core for KVM reuse

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

KVM will share the IOAPICState, the vmstate, the reset logic and certain
init parts with the user space model.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Makefile.target  |2 +-
 hw/ioapic.c  |  108 -
 hw/ioapic_common.c   |   89 +
 hw/ioapic_internal.h |   93 +++
 4 files changed, 192 insertions(+), 100 deletions(-)
 create mode 100644 hw/ioapic_common.c
 create mode 100644 hw/ioapic_internal.h

diff --git a/Makefile.target b/Makefile.target
index 7bb6b13..4cd3c0e 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -226,7 +226,7 @@ obj-$(CONFIG_IVSHMEM) += ivshmem.o
 # Hardware support
 obj-i386-y += vga.o
 obj-i386-y += mc146818rtc.o pc.o
-obj-i386-y += cirrus_vga.o sga.o apic_common.o apic.o ioapic.o piix_pci.o
+obj-i386-y += cirrus_vga.o sga.o apic_common.o apic.o ioapic_common.o ioapic.o 
piix_pci.o
 obj-i386-y += vmport.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
diff --git a/hw/ioapic.c b/hw/ioapic.c
index eb75766..8876d5d 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -24,9 +24,7 @@
 #include pc.h
 #include apic.h
 #include ioapic.h
-#include qemu-timer.h
-#include host-utils.h
-#include sysbus.h
+#include ioapic_internal.h
 
 //#define DEBUG_IOAPIC
 
@@ -37,62 +35,6 @@
 #define DPRINTF(fmt, ...)
 #endif
 
-#define MAX_IOAPICS 1
-
-#define IOAPIC_VERSION  0x11
-
-#define IOAPIC_LVT_DEST_SHIFT   56
-#define IOAPIC_LVT_MASKED_SHIFT 16
-#define IOAPIC_LVT_TRIGGER_MODE_SHIFT   15
-#define IOAPIC_LVT_REMOTE_IRR_SHIFT 14
-#define IOAPIC_LVT_POLARITY_SHIFT   13
-#define IOAPIC_LVT_DELIV_STATUS_SHIFT   12
-#define IOAPIC_LVT_DEST_MODE_SHIFT  11
-#define IOAPIC_LVT_DELIV_MODE_SHIFT 8
-
-#define IOAPIC_LVT_MASKED   (1  IOAPIC_LVT_MASKED_SHIFT)
-#define IOAPIC_LVT_REMOTE_IRR   (1  IOAPIC_LVT_REMOTE_IRR_SHIFT)
-
-#define IOAPIC_TRIGGER_EDGE 0
-#define IOAPIC_TRIGGER_LEVEL1
-
-/*io{apic,sapic} delivery mode*/
-#define IOAPIC_DM_FIXED 0x0
-#define IOAPIC_DM_LOWEST_PRIORITY   0x1
-#define IOAPIC_DM_PMI   0x2
-#define IOAPIC_DM_NMI   0x4
-#define IOAPIC_DM_INIT  0x5
-#define IOAPIC_DM_SIPI  0x6
-#define IOAPIC_DM_EXTINT0x7
-#define IOAPIC_DM_MASK  0x7
-
-#define IOAPIC_VECTOR_MASK  0xff
-
-#define IOAPIC_IOREGSEL 0x00
-#define IOAPIC_IOWIN0x10
-
-#define IOAPIC_REG_ID   0x00
-#define IOAPIC_REG_VER  0x01
-#define IOAPIC_REG_ARB  0x02
-#define IOAPIC_REG_REDTBL_BASE  0x10
-#define IOAPIC_ID   0x00
-
-#define IOAPIC_ID_SHIFT 24
-#define IOAPIC_ID_MASK  0xf
-
-#define IOAPIC_VER_ENTRIES_SHIFT16
-
-typedef struct IOAPICState IOAPICState;
-
-struct IOAPICState {
-SysBusDevice busdev;
-MemoryRegion io_memory;
-uint8_t id;
-uint8_t ioregsel;
-uint32_t irr;
-uint64_t ioredtbl[IOAPIC_NUM_PINS];
-};
-
 static IOAPICState *ioapics[MAX_IOAPICS];
 
 static void ioapic_service(IOAPICState *s)
@@ -278,44 +220,11 @@ ioapic_mem_write(void *opaque, target_phys_addr_t addr, 
uint64_t val,
 }
 }
 
-static int ioapic_post_load(void *opaque, int version_id)
-{
-IOAPICState *s = opaque;
-
-if (version_id == 1) {
-/* set sane value */
-s-irr = 0;
-}
-return 0;
-}
-
-static const VMStateDescription vmstate_ioapic = {
-.name = ioapic,
-.version_id = 3,
-.post_load = ioapic_post_load,
-.minimum_version_id = 1,
-.minimum_version_id_old = 1,
-.fields = (VMStateField[]) {
-VMSTATE_UINT8(id, IOAPICState),
-VMSTATE_UINT8(ioregsel, IOAPICState),
-VMSTATE_UNUSED_V(2, 8), /* to account for qemu-kvm's v2 format */
-VMSTATE_UINT32_V(irr, IOAPICState, 2),
-VMSTATE_UINT64_ARRAY(ioredtbl, IOAPICState, IOAPIC_NUM_PINS),
-VMSTATE_END_OF_LIST()
-}
-};
-
 static void ioapic_reset(DeviceState *d)
 {
 IOAPICState *s = DO_UPCAST(IOAPICState, busdev.qdev, d);
-int i;
 
-s-id = 0;
-s-ioregsel = 0;
-s-irr = 0;
-for (i = 0; i  IOAPIC_NUM_PINS; i++) {
-s-ioredtbl[i] = 1  IOAPIC_LVT_MASKED_SHIFT;
-}
+ioapic_reset_internal(s);
 }
 
 static const MemoryRegionOps ioapic_io_ops = {
@@ -327,18 +236,19 @@ static const MemoryRegionOps ioapic_io_ops = {
 static int ioapic_init1(SysBusDevice *dev)
 {
 IOAPICState *s = FROM_SYSBUS(IOAPICState, dev);
-static int ioapic_no;
+int ioapic_no;
 
-if (ioapic_no = MAX_IOAPICS) {
+memory_region_init_io(s-io_memory, ioapic_io_ops, s, ioapic, 0x1000);
+
+ioapic_no = ioapic_init_common(s);
+if 

[Qemu-devel] [PATCH v2 16/16] kvm: Arm in-kernel irqchip support

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Make the basic in-kernel irqchip support selectable via
-machine ...,kernel_irqchip=on. Leave it off by default until it can
fully replace user space models.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 qemu-config.c   |4 
 qemu-options.hx |5 -
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/qemu-config.c b/qemu-config.c
index 90b6b3e..fc25115 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -483,6 +483,10 @@ static QemuOptsList qemu_machine_opts = {
 .name = accel,
 .type = QEMU_OPT_STRING,
 .help = accelerator list,
+}, {
+.name = kernel_irqchip,
+.type = QEMU_OPT_BOOL,
+.help = use KVM in-kernel irqchip,
 },
 { /* End of list */ }
 },
diff --git a/qemu-options.hx b/qemu-options.hx
index 5d2a776..e10186b 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -31,7 +31,8 @@ DEF(machine, HAS_ARG, QEMU_OPTION_machine, \
 -machine [type=]name[,prop[=value][,...]]\n
 selects emulated machine (-machine ? for list)\n
 property accel=accel1[:accel2[:...]] selects 
accelerator\n
-supported accelerators are kvm, xen, tcg (default: 
tcg)\n,
+supported accelerators are kvm, xen, tcg (default: tcg)\n
+kernel_irqchip=on|off controls accelerated irqchip 
support\n,
 QEMU_ARCH_ALL)
 STEXI
 @item -machine [type=]@var{name}[,prop=@var{value}[,...]]
@@ -44,6 +45,8 @@ This is used to enable an accelerator. Depending on the 
target architecture,
 kvm, xen, or tcg can be available. By default, tcg is used. If there is more
 than one accelerator specified, the next one is used if the previous one fails
 to initialize.
+@item kernel_irqchip=on|off
+Enables in-kernel irqchip support for the chosen accelerator when available.
 @end table
 ETEXI
 
-- 
1.7.3.4




[Qemu-devel] [PATCH v2 01/16] msi: Generalize msix_supported to msi_supported

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Rename msix_supported to msi_supported and control MSI and MSI-X
activation this way. That was likely to original intention for this
flag, but MSI support came after MSI-X.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/msi.c  |8 
 hw/msi.h  |2 ++
 hw/msix.c |9 -
 hw/msix.h |2 --
 hw/pc.c   |4 ++--
 5 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index f214fcf..5d6ceb6 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -36,6 +36,9 @@
 
 #define PCI_MSI_VECTORS_MAX 32
 
+/* Flag for interrupt controller to declare MSI/MSI-X support */
+bool msi_supported;
+
 /* If we get rid of cap allocator, we won't need this. */
 static inline uint8_t msi_cap_sizeof(uint16_t flags)
 {
@@ -116,6 +119,11 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
 uint16_t flags;
 uint8_t cap_size;
 int config_offset;
+
+if (!msi_supported) {
+return -ENOTSUP;
+}
+
 MSI_DEV_PRINTF(dev,
init offset: 0x%PRIx8 vector: %PRId8
 64bit %d mask %d\n,
diff --git a/hw/msi.h b/hw/msi.h
index 5766018..3040bb0 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -24,6 +24,8 @@
 #include qemu-common.h
 #include pci.h
 
+extern bool msi_supported;
+
 bool msi_enabled(const PCIDevice *dev);
 int msi_init(struct PCIDevice *dev, uint8_t offset,
  unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask);
diff --git a/hw/msix.c b/hw/msix.c
index b15bafc..8850fbd 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -12,6 +12,7 @@
  */
 
 #include hw.h
+#include msi.h
 #include msix.h
 #include pci.h
 #include range.h
@@ -32,9 +33,6 @@
 #define MSIX_MAX_ENTRIES 32
 
 
-/* Flag for interrupt controller to declare MSI-X support */
-int msix_supported;
-
 /* Add MSI-X capability to the config space for the device. */
 /* Given a bar and its size, add MSI-X table on top of it
  * and fill MSI-X capability in the config space.
@@ -212,10 +210,11 @@ int msix_init(struct PCIDevice *dev, unsigned short 
nentries,
   unsigned bar_nr, unsigned bar_size)
 {
 int ret;
+
 /* Nothing to do if MSI is not supported by interrupt controller */
-if (!msix_supported)
+if (!msi_supported) {
 return -ENOTSUP;
-
+}
 if (nentries  MSIX_MAX_ENTRIES)
 return -EINVAL;
 
diff --git a/hw/msix.h b/hw/msix.h
index 7e04336..5aba22b 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -29,6 +29,4 @@ void msix_notify(PCIDevice *dev, unsigned vector);
 
 void msix_reset(PCIDevice *dev);
 
-extern int msix_supported;
-
 #endif
diff --git a/hw/pc.c b/hw/pc.c
index 9328ee5..5225d5b 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -36,7 +36,7 @@
 #include elf.h
 #include multiboot.h
 #include mc146818rtc.h
-#include msix.h
+#include msi.h
 #include sysbus.h
 #include sysemu.h
 #include blockdev.h
@@ -896,7 +896,7 @@ static DeviceState *apic_init(void *env, uint8_t apic_id)
 apic_mapped = 1;
 }
 
-msix_supported = 1;
+msi_supported = true;
 
 return dev;
 }
-- 
1.7.3.4




[Qemu-devel] [PATCH v2 04/16] apic: Factor out core for KVM reuse

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

The KVM in-kernel APIC model will reuse parts of the user space model,
namely the vmstate, reset handling, IRQ coalescing tracker, some init
steps and the base and tpr set/get routines. For the latter, we also
prepare set callbacks as KVM will override those.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Makefile.target|2 +-
 hw/apic.c  |  260 +++-
 hw/apic_common.c   |  215 +++
 hw/apic_internal.h |  108 ++
 trace-events   |2 +-
 5 files changed, 339 insertions(+), 248 deletions(-)
 create mode 100644 hw/apic_common.c
 create mode 100644 hw/apic_internal.h

diff --git a/Makefile.target b/Makefile.target
index 3a9e95d..7bb6b13 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -226,7 +226,7 @@ obj-$(CONFIG_IVSHMEM) += ivshmem.o
 # Hardware support
 obj-i386-y += vga.o
 obj-i386-y += mc146818rtc.o pc.o
-obj-i386-y += cirrus_vga.o sga.o apic.o ioapic.o piix_pci.o
+obj-i386-y += cirrus_vga.o sga.o apic_common.o apic.o ioapic.o piix_pci.o
 obj-i386-y += vmport.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
diff --git a/hw/apic.c b/hw/apic.c
index 2644a82..27b18d6 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -16,53 +16,13 @@
  * You should have received a copy of the GNU Lesser General Public
  * License along with this library; if not, see http://www.gnu.org/licenses/
  */
-#include hw.h
+#include apic_internal.h
 #include apic.h
 #include ioapic.h
-#include qemu-timer.h
 #include host-utils.h
-#include sysbus.h
 #include trace.h
 #include pc.h
 
-/* APIC Local Vector Table */
-#define APIC_LVT_TIMER   0
-#define APIC_LVT_THERMAL 1
-#define APIC_LVT_PERFORM 2
-#define APIC_LVT_LINT0   3
-#define APIC_LVT_LINT1   4
-#define APIC_LVT_ERROR   5
-#define APIC_LVT_NB  6
-
-/* APIC delivery modes */
-#define APIC_DM_FIXED  0
-#define APIC_DM_LOWPRI 1
-#define APIC_DM_SMI2
-#define APIC_DM_NMI4
-#define APIC_DM_INIT   5
-#define APIC_DM_SIPI   6
-#define APIC_DM_EXTINT 7
-
-/* APIC destination mode */
-#define APIC_DESTMODE_FLAT 0xf
-#define APIC_DESTMODE_CLUSTER  1
-
-#define APIC_TRIGGER_EDGE  0
-#define APIC_TRIGGER_LEVEL 1
-
-#defineAPIC_LVT_TIMER_PERIODIC (117)
-#defineAPIC_LVT_MASKED (116)
-#defineAPIC_LVT_LEVEL_TRIGGER  (115)
-#defineAPIC_LVT_REMOTE_IRR (114)
-#defineAPIC_INPUT_POLARITY (113)
-#defineAPIC_SEND_PENDING   (112)
-
-#define ESR_ILLEGAL_ADDRESS (1  7)
-
-#define APIC_SV_DIRECTED_IO (112)
-#define APIC_SV_ENABLE  (18)
-
-#define MAX_APICS 255
 #define MAX_APIC_WORDS 8
 
 /* Intel APIC constants: from include/asm/msidef.h */
@@ -75,40 +35,7 @@
 #define MSI_ADDR_DEST_ID_SHIFT 12
 #defineMSI_ADDR_DEST_ID_MASK   0x000
 
-#define MSI_ADDR_SIZE   0x10
-
-typedef struct APICState APICState;
-
-struct APICState {
-SysBusDevice busdev;
-MemoryRegion io_memory;
-void *cpu_env;
-uint32_t apicbase;
-uint8_t id;
-uint8_t arb_id;
-uint8_t tpr;
-uint32_t spurious_vec;
-uint8_t log_dest;
-uint8_t dest_mode;
-uint32_t isr[8];  /* in service register */
-uint32_t tmr[8];  /* trigger mode register */
-uint32_t irr[8]; /* interrupt request register */
-uint32_t lvt[APIC_LVT_NB];
-uint32_t esr; /* error register */
-uint32_t icr[2];
-
-uint32_t divide_conf;
-int count_shift;
-uint32_t initial_count;
-int64_t initial_count_load_time, next_time;
-uint32_t idx;
-QEMUTimer *timer;
-int sipi_vector;
-int wait_for_sipi;
-};
-
 static APICState *local_apics[MAX_APICS + 1];
-static int apic_irq_delivered;
 
 static void apic_set_irq(APICState *s, int vector_num, int trigger_mode);
 static void apic_update_irq(APICState *s);
@@ -293,14 +220,8 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, 
uint8_t delivery_mode,
 apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
 }
 
-void cpu_set_apic_base(DeviceState *d, uint64_t val)
+static void apic_set_base(APICState *s, uint64_t val)
 {
-APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
-
-trace_cpu_set_apic_base(val);
-
-if (!s)
-return;
 s-apicbase = (val  0xf000) |
 (s-apicbase  (MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE));
 /* if disabled, cannot be enabled again */
@@ -311,32 +232,12 @@ void cpu_set_apic_base(DeviceState *d, uint64_t val)
 }
 }
 
-uint64_t cpu_get_apic_base(DeviceState *d)
-{
-APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
-
-trace_cpu_get_apic_base(s ? (uint64_t)s-apicbase: 0);
-
-return s ? s-apicbase : 0;
-}
-
-void cpu_set_apic_tpr(DeviceState *d, uint8_t val)
+static void apic_set_tpr(APICState *s, uint8_t val)
 {

[Qemu-devel] [PATCH v2 08/16] ioapic: Reject non-dword accesses to IOWIN register

2011-12-04 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Aligns the model with the spec.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/ioapic.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/hw/ioapic.c b/hw/ioapic.c
index 56b1612..eb75766 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -208,6 +208,9 @@ ioapic_mem_read(void *opaque, target_phys_addr_t addr, 
unsigned int size)
 val = s-ioregsel;
 break;
 case IOAPIC_IOWIN:
+if (size != 4) {
+break;
+}
 switch (s-ioregsel) {
 case IOAPIC_REG_ID:
 val = s-id  IOAPIC_ID_SHIFT;
@@ -247,6 +250,9 @@ ioapic_mem_write(void *opaque, target_phys_addr_t addr, 
uint64_t val,
 s-ioregsel = val;
 break;
 case IOAPIC_IOWIN:
+if (size != 4) {
+break;
+}
 DPRINTF(write: %08x = %08x\n, s-ioregsel, val);
 switch (s-ioregsel) {
 case IOAPIC_REG_ID:
-- 
1.7.3.4




Re: [Qemu-devel] [Bug 899143] [NEW] Raw img not recognized by Windows

2011-12-04 Thread Vincent Autefage
Ok thanks a lot :)

Vincent Autefage
Le 03/12/2011 19:45, Stefan Hajnoczi a écrit :
 On Fri, Dec 2, 2011 at 2:45 PM, Vincent Autefage
 899...@bugs.launchpad.net  wrote:
 $ qemu-img create -f raw root.img 100GB
 $ mkntfs -F root.img
 $ qemu -name W -sdl -m 2048 -enable-kvm -localtime -k fr -hda root.img
 -cdrom windows7.iso -boot d -net nic,macaddr=a0:00:00:00:00:01 -net
 user,vlan=0
 QEMU does recognize the raw image.  You can check this by running
 'info block' at the QEMU monitor (Ctrl-Alt-2) and you'll see ide-hd0
 is the raw image file you specified.  Press Ctrl-Alt-1 to get back to
 the VM display.

 The problem is that the Windows installer does not like the disk image
 you have prepared.  A normal harddisk has a master boot record but you
 created a raw image without a master boot record.  The Windows
 installer is being picky/careful and not displaying this non-standard
 disk you created.

 Skip the mkntfs step and the installer works fine.  There's no need to
 create the file system because the installer will do it for you.

 Stefan


-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/899143

Title:
  Raw img not recognized by Windows

Status in QEMU:
  New

Bug description:
  Hi,

  The installation process of Windows (XP/Vista/7) doesn’t seem to recognize a 
raw img generated by qemu-img.
  The installer does not see any hard drive...

  The problem exists only with a raw img but not with a vmdk for
  instance.

  Thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/899143/+subscriptions



Re: [Qemu-devel] [Bug 899140] Re: Problem with Linux Kernel Traffic Control

2011-12-04 Thread Vincent Autefage
The result without TC is about 120 Mbit/s.
I check the bandwidth with lot of programs (not only with Iperf) and the 
result is also the same

However, if I use the same raw image and the same TC configuration with 
the version 0.14.0 of QEMU or with some real physical hosts, the result 
with TC is about 19.2 Mbit/s what is the desired result...

Vincent


Le 03/12/2011 19:48, Stefan Hajnoczi a écrit :
 On Fri, Dec 2, 2011 at 2:42 PM, Vincent Autefage
 899...@bugs.launchpad.net  wrote:
 *root@A# tc qdisc add dev eth0 root tbf rate 20mbit burst 20480 latency
 50ms*

 *root@B# **ifconfig eth0 192.168.0.2*

 Then if we check with /Iperf/, the real rate will be about 100kbit/s :
 What is the iperf result without tc?  It's worth checking what rate
 the unlimited interface saturates at before applying tc.  Perhaps this
 setup is just performing very poorly and it has nothing to do with tc.

 Stefan


-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/899140

Title:
  Problem with Linux Kernel Traffic Control

Status in QEMU:
  New

Bug description:
  Hi,

  The last main versions of QEMU (0.14.1, 0.15 and 1.0) have an important 
problem when running on a Linux distribution which running itself a Traffic 
Control (TC) instance.
  Indeed, when TC is configured with a Token Bucket Filter (TBF) with a 
particular rate, the effective rate is very slower than the desired one.

  For instance, lets consider the following configuration :

  # tc qdisc add dev eth0 root tbf rate 20mbit burst 20k latency 50ms

  The effective rate will be about 100kbit/s ! (verified with iperf)
  I've encountered this problem on versions 0.14.1, 0.15 and 1.0 but not with 
the 0.14.0...
  In the 0.14.0, we have a rate of 19.2 mbit/s which is quiet normal.

  I've done the experimentation on several hosts :

  - Debian 32bit core i7, 4GB RAM
  - Debian 64bit core i7, 8GB RAM
  - 3 different high performance servers : Ubuntu 64 bits, 48 AMD Opteron, 
128GB of RAM

  The problem is always the same... The problem is also seen with a
  Class Based Queuing (CBQ) in TC.

  Thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/899140/+subscriptions



[Qemu-devel] [Bug 899961] [NEW] qemu/kvm locks up when run 32bit userspace with 64bit kernel

2011-12-04 Thread Michael Tokarev
Public bug reported:

Applies to both qemu and qemu-kvm 1.0, but only when kernel is 64bit and
userspace is 32bit, on x86.  Did not happen with previous released
versions, such as 0.15.  Not all guests triggers this issue - so far,
only (32bit) windows 7 guest shows it, but does that quite reliable:
first boot of an old guest with new qemu (or qemu-kvm), windows finds a
new CPU and suggests rebooting - hit Reboot and in a few seconds it
will be locked up (including the monitor), with 100% CPU usage.
Killable with -9.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/899961

Title:
  qemu/kvm locks up when run 32bit userspace with 64bit kernel

Status in QEMU:
  New

Bug description:
  Applies to both qemu and qemu-kvm 1.0, but only when kernel is 64bit
  and userspace is 32bit, on x86.  Did not happen with previous released
  versions, such as 0.15.  Not all guests triggers this issue - so far,
  only (32bit) windows 7 guest shows it, but does that quite reliable:
  first boot of an old guest with new qemu (or qemu-kvm), windows finds
  a new CPU and suggests rebooting - hit Reboot and in a few seconds
  it will be locked up (including the monitor), with 100% CPU usage.
  Killable with -9.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/899961/+subscriptions



[Qemu-devel] linux-user: interrupting syscalls

2011-12-04 Thread Peter Maydell
Disclaimer: I'm writing this email because I had a neat idea about how
to solve a problem which Alex Graf discovered, but I don't have the
time to actually implement it :-)

Consider the following guest code, to be run under linux-user mode:

---begin---
#include stdio.h
#include errno.h
#include signal.h
#include unistd.h

int pipefd[2];

void usr1_handler(int s)
{
char x = 'x';
write(pipefd[1], x, 1);
}

int main(void)
{
struct sigaction sa; char x; ssize_t r;
if (pipe(pipefd) != 0) {
perror(pipe); return 1;
}
sa.sa_handler = usr1_handler;
sa.sa_flags = SA_RESTART;
sigemptyset(sa.sa_mask);
if (sigaction(SIGUSR1, sa, 0) != 0) {
perror(sigaction); return 1;
}
printf(read()ing pipe...\n);
r = read(pipefd[0], x, 1);
printf(read returned %d\n, r);
return 0;
}
---endit---

When run natively, this program will block until you send it a
SIGUSR1; the signal handler will write to the pipe and cause the read
to complete. Run in linux-user mode, we deadlock, because qemu does
not run the guest signal handler when in the middle of emulating a
system call -- it merely queues it to be run when the syscall
finishes. For cases like this where the event that causes the syscall
to complete is actually triggered by the guest signal handler, this
doesn't work.  (There is a real-world instance of this problem in the
Boehm garbage collector, where a signal handler posts to a semaphore
which is being waited on by the mainline code.)

It's not sufficient to simply force all syscalls to be non-restartable
(and then to take the signal when the syscall returns EINTR), because
of the following race condition:
 * qemu enters do_syscall on behalf of main thread
 * do_syscall is about to call the underlying syscall, when...
 * the signal arrives (and we queue it)
 * do_syscall then calls the host syscall, which will block. Oops.

To fix this I think we need to have linux-user's signal handler
wrapper do a siglongjmp if a signal arrives while we're inside
do_syscall(). This allows us to properly interrupt whether we'd
got to the point of making the host syscall or not.

The tricky bit here is in the details; specifically it's painful to
write code can cope with being siglongjmp()ed out of at any
point. You need to be careful not to call anything that might not
like being aborted (no malloc, for instance). This might need some
support like an equivalent of critical section macros to prevent
the siglongjmp in some places, and/or cleanup routines to be called
in the event of the jump occurring to release resources.

Luckily we don't have to write the whole of syscall.c like
that: a lot of syscalls are non-blocking, so we can continue to deal
with them as we do now (queue signal, take it on exit). (Incidentally
any code in the implementation of a 'non-blocking' syscall which
doesn't retry if it gets an EINTR return value is broken.) Linux's
signal(7) manpage has a handy overview of which syscalls have to be
interruptible.

We also need to properly handle restarting syscalls when we've jumped
out of them to run the guest signal handler. For this I think we
should use a structure basically the same as the Linux kernel uses
itself: do_syscall() returns ERESTARTSYS, and the cpu-specific code
then rewinds the PC to before the syscall insn if the signal we're
about to deliver is one that was registered with SA_RESTART. A handful
of syscalls may need a 'restart handler' (where we both wind back PC
and change the syscall number to NR_restartsys so we can invoke a
syscall-specific 'resume this' function.)

I think to do this properly you'd also want to refactor syscall.c so
that instead of being an enormous switch statement it was
table-driven, so you just looked up the handler function for the
syscall as well as what classification it was (non-blocking vs.
having to handle being interrupted). We could roll in the strace table
too, which might avoid the problem of people adding new syscall
support and forgetting about strace.

So have I missed something that would mean this wouldn't work?

-- PMM



Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-04 Thread Avi Kivity
On 12/04/2011 05:19 PM, Jan Kiszka wrote:
  
  In the sense that kernel-apic is just an accelerated apic.  From the
  guest point of view, there's no difference, and that should be reflected
  in the device model.

 That was my goal as well: The guest should not notice the difference,
 but the admin on the host side should still be able to tell both
 internally fairly different models apart. 

This should be some attribute, not the name.

 Plus the code should be
 clearly split where there are differences and explicitly shared where
 there aren't.

That's a good goal, yes.


  
  If I'm reading an apic register, either from the guest or via a monitor
  debug interface, I shouldn't care whether it's accelerated or not.  The
  guest part already holds, of course.

 Specifically for the debug scenario, I'd prefer the clear
 differentiation by name as there can always remain subtle differences in
 the implementation of kernel vs. user space. Someone debugging the guest
 and/or qemu/kvm should remain aware of this.

Aware, yes, but the name change is too drastic.

-- 
error compiling committee.c: too many arguments to function




[Qemu-devel] [Bug 899961] Re: qemu/kvm locks up when run 32bit userspace with 64bit kernel

2011-12-04 Thread Michael Tokarev
Actually after trying to do lots of experiments and finally a git
bisection, it turned out that the issue only affects qemu-kvm, not
upstream qemu.  Bisection between qemu-kvm 0.15.0 and 1.0 lead to this
commit:

commit 145e11e840500e04a4d0a624918bb17596be19e9
Merge: ce967f6 b195043
Author: Avi Kivity a...@redhat.com
Date:   Wed Aug 10 12:06:58 2011 +0300

Merge commit 'b195043003d90ea4027ea01cc7a6c974ac915108' into upstream-merge

* commit 'b195043003d90ea4027ea01cc7a6c974ac915108': (130 commits)
   ...

After which I'm stuck... ;)

** Tags added: lockup qemu-kvm

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/899961

Title:
  qemu/kvm locks up when run 32bit userspace with 64bit kernel

Status in QEMU:
  New

Bug description:
  Applies to both qemu and qemu-kvm 1.0, but only when kernel is 64bit
  and userspace is 32bit, on x86.  Did not happen with previous released
  versions, such as 0.15.  Not all guests triggers this issue - so far,
  only (32bit) windows 7 guest shows it, but does that quite reliable:
  first boot of an old guest with new qemu (or qemu-kvm), windows finds
  a new CPU and suggests rebooting - hit Reboot and in a few seconds
  it will be locked up (including the monitor), with 100% CPU usage.
  Killable with -9.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/899961/+subscriptions



[Qemu-devel] [PATCH v2 0/6] Memory API mutators

2011-12-04 Thread Avi Kivity
This patchset introduces memory_region_set_enabled() and
memory_region_set_address() to avoid the requirement on memory
routers to track the internal state of the memory API (so they know
whether they need to add or remove a region).  Instead, they can
simply copy the state of the region from the guest-exposed register
to the memory core, via the new mutator functions.

v2:
   - fix minor bug in set_address()
   - add set_alias_offset()
   - two example users

Avi Kivity (6):
  memory: introduce memory_region_set_enabled()
  memory: introduce memory_region_set_address()
  memory: introduce memory_region_set_alias_offset()
  memory: optimize empty transactions due to mutators
  cirrus_vga: adapt to memory mutators API
  piix_pci: adapt smram mapping to use memory mutators

 hw/cirrus_vga.c |   50 +++--
 hw/piix_pci.c   |   20 -
 memory.c|   81 +++---
 memory.h|   39 ++
 4 files changed, 132 insertions(+), 58 deletions(-)

-- 
1.7.7.1




[Qemu-devel] [PATCH v2 4/6] memory: optimize empty transactions due to mutators

2011-12-04 Thread Avi Kivity
The mutating memory APIs can easily cause empty transactions,
where the mutators don't actually change anything, or perhaps
only modify disabled regions.  Detect these conditions and
avoid regenerating the memory topology.

Signed-off-by: Avi Kivity a...@redhat.com
---
 memory.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/memory.c b/memory.c
index 7e842b3..87639ab 100644
--- a/memory.c
+++ b/memory.c
@@ -19,6 +19,7 @@
 #include assert.h
 
 unsigned memory_region_transaction_depth = 0;
+static bool memory_region_update_pending = false;
 
 typedef struct AddrRange AddrRange;
 
@@ -757,6 +758,7 @@ static void address_space_update_topology(AddressSpace *as)
 static void memory_region_update_topology(MemoryRegion *mr)
 {
 if (memory_region_transaction_depth) {
+memory_region_update_pending |= !mr || mr-enabled;
 return;
 }
 
@@ -770,6 +772,8 @@ static void memory_region_update_topology(MemoryRegion *mr)
 if (address_space_io.root) {
 address_space_update_topology(address_space_io);
 }
+
+memory_region_update_pending = false;
 }
 
 void memory_region_transaction_begin(void)
@@ -781,7 +785,9 @@ void memory_region_transaction_commit(void)
 {
 assert(memory_region_transaction_depth);
 --memory_region_transaction_depth;
-memory_region_update_topology(NULL);
+if (!memory_region_transaction_depth  memory_region_update_pending) {
+memory_region_update_topology(NULL);
+}
 }
 
 static void memory_region_destructor_none(MemoryRegion *mr)
-- 
1.7.7.1




[Qemu-devel] [PATCH v2 2/6] memory: introduce memory_region_set_address()

2011-12-04 Thread Avi Kivity
Allow changing the address of a memory region while it is
in the memory hierarchy.

Signed-off-by: Avi Kivity a...@redhat.com
---
 memory.c |   21 +
 memory.h |   11 +++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/memory.c b/memory.c
index d0f90ca..a080d21 100644
--- a/memory.c
+++ b/memory.c
@@ -1324,6 +1324,27 @@ void memory_region_set_enabled(MemoryRegion *mr, bool 
enabled)
 memory_region_update_topology(NULL);
 }
 
+void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr)
+{
+MemoryRegion *parent = mr-parent;
+unsigned priority = mr-priority;
+bool may_overlap = mr-may_overlap;
+
+if (addr == mr-addr || !parent) {
+mr-addr = addr;
+return;
+}
+
+memory_region_transaction_begin();
+memory_region_del_subregion(parent, mr);
+if (may_overlap) {
+memory_region_add_subregion_overlap(parent, addr, mr, priority);
+} else {
+memory_region_add_subregion(parent, addr, mr);
+}
+memory_region_transaction_commit();
+}
+
 void set_system_memory_map(MemoryRegion *mr)
 {
 address_space_memory.root = mr;
diff --git a/memory.h b/memory.h
index c6997c4..db53422 100644
--- a/memory.h
+++ b/memory.h
@@ -518,6 +518,17 @@ void memory_region_del_subregion(MemoryRegion *mr,
  */
 void memory_region_set_enabled(MemoryRegion *mr, bool enabled);
 
+/*
+ * memory_region_set_address: dynamically update the address of a region
+ *
+ * Dynamically updates the address of a region, relative to its parent.
+ * May be used on regions are currently part of a memory hierarchy.
+ *
+ * @mr: the region to be updated
+ * @addr: new address, relative to parent region
+ */
+void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr);
+
 /* Start a transaction; changes will be accumulated and made visible only
  * when the transaction ends.
  */
-- 
1.7.7.1




[Qemu-devel] [PATCH v2 6/6] piix_pci: adapt smram mapping to use memory mutators

2011-12-04 Thread Avi Kivity
Eliminates fake state -smram_enabled.

Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/piix_pci.c |   20 ++--
 1 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/hw/piix_pci.c b/hw/piix_pci.c
index d183443..ac3d898 100644
--- a/hw/piix_pci.c
+++ b/hw/piix_pci.c
@@ -81,7 +81,6 @@ struct PCII440FXState {
 PAMMemoryRegion pam_regions[13];
 MemoryRegion smram_region;
 uint8_t smm_enabled;
-bool smram_enabled;
 PIIX3State *piix3;
 };
 
@@ -141,6 +140,7 @@ static void i440fx_update_memory_mappings(PCII440FXState *d)
 {
 int i, r;
 uint32_t smram;
+bool smram_enabled;
 
 memory_region_transaction_begin();
 update_pam(d, 0xf, 0x10, (d-dev.config[I440FX_PAM]  4)  3,
@@ -151,18 +151,8 @@ static void i440fx_update_memory_mappings(PCII440FXState 
*d)
d-pam_regions[i+1]);
 }
 smram = d-dev.config[I440FX_SMRAM];
-if ((d-smm_enabled  (smram  0x08)) || (smram  0x40)) {
-if (!d-smram_enabled) {
-memory_region_del_subregion(d-system_memory, d-smram_region);
-d-smram_enabled = true;
-}
-} else {
-if (d-smram_enabled) {
-memory_region_add_subregion_overlap(d-system_memory, 0xa,
-d-smram_region, 1);
-d-smram_enabled = false;
-}
-}
+smram_enabled = (d-smm_enabled  (smram  0x08)) || (smram  0x40);
+memory_region_set_enabled(d-smram_region, !smram_enabled);
 memory_region_transaction_commit();
 }
 
@@ -307,7 +297,9 @@ static int i440fx_initfn(PCIDevice *dev)
 }
 memory_region_init_alias(f-smram_region, smram-region,
  f-pci_address_space, 0xa, 0x2);
-f-smram_enabled = true;
+memory_region_add_subregion_overlap(f-system_memory, 0xa,
+f-smram_region, 1);
+memory_region_set_enabled(f-smram_region, false);
 
 /* Xen supports additional interrupt routes from the PCI devices to
  * the IOAPIC: the four pins of each PCI device on the bus are also
-- 
1.7.7.1




[Qemu-devel] [PATCH v2 5/6] cirrus_vga: adapt to memory mutators API

2011-12-04 Thread Avi Kivity
Simplify the code by avoiding dynamic creation and destruction of
memory regions.

Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/cirrus_vga.c |   50 +-
 1 files changed, 17 insertions(+), 33 deletions(-)

diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index c7e365b..9f7fea1 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -205,7 +205,7 @@ typedef void (*cirrus_fill_t)(struct CirrusVGAState *s,
 bool linear_vram;  /* vga.vram mapped over cirrus_linear_io */
 MemoryRegion low_mem_container; /* container for 0xa-0xc */
 MemoryRegion low_mem;   /* always mapped, overridden by: */
-MemoryRegion *cirrus_bank[2];   /*   aliases at 0xa-0xb  */
+MemoryRegion cirrus_bank[2];/*   aliases at 0xa-0xb  */
 uint32_t cirrus_addr_mask;
 uint32_t linear_mmio_mask;
 uint8_t cirrus_shadow_gr0;
@@ -2363,40 +2363,16 @@ static void cirrus_linear_bitblt_write(void *opaque,
 },
 };
 
-static void unmap_bank(CirrusVGAState *s, unsigned bank)
-{
-if (s-cirrus_bank[bank]) {
-memory_region_del_subregion(s-low_mem_container,
-s-cirrus_bank[bank]);
-memory_region_destroy(s-cirrus_bank[bank]);
-g_free(s-cirrus_bank[bank]);
-s-cirrus_bank[bank] = NULL;
-}
-}
-
 static void map_linear_vram_bank(CirrusVGAState *s, unsigned bank)
 {
-MemoryRegion *mr;
-static const char *names[] = { vga.bank0, vga.bank1 };
-
-if (!(s-cirrus_srcptr != s-cirrus_srcptr_end)
+MemoryRegion *mr = s-cirrus_bank[bank];
+bool enabled = !(s-cirrus_srcptr != s-cirrus_srcptr_end)
  !((s-vga.sr[0x07]  0x01) == 0)
  !((s-vga.gr[0x0B]  0x14) == 0x14)
- !(s-vga.gr[0x0B]  0x02)) {
-
-mr = g_malloc(sizeof(*mr));
-memory_region_init_alias(mr, names[bank], s-vga.vram,
- s-cirrus_bank_base[bank], 0x8000);
-memory_region_add_subregion_overlap(
-s-low_mem_container,
-0x8000 * bank,
-mr,
-1);
-unmap_bank(s, bank);
-s-cirrus_bank[bank] = mr;
-} else {
-unmap_bank(s, bank);
-}
+ !(s-vga.gr[0x0B]  0x02);
+
+memory_region_set_enabled(mr, enabled);
+memory_region_set_alias_offset(mr, s-cirrus_bank_base[bank]);
 }
 
 static void map_linear_vram(CirrusVGAState *s)
@@ -2415,8 +2391,8 @@ static void unmap_linear_vram(CirrusVGAState *s)
 s-linear_vram = false;
 memory_region_del_subregion(s-pci_bar, s-vga.vram);
 }
-unmap_bank(s, 0);
-unmap_bank(s, 1);
+memory_region_set_enabled(s-cirrus_bank[0], false);
+memory_region_set_enabled(s-cirrus_bank[1], false);
 }
 
 /* Compute the memory access functions */
@@ -2856,6 +2832,14 @@ static void cirrus_init_common(CirrusVGAState * s, int 
device_id, int is_pci,
 memory_region_init_io(s-low_mem, cirrus_vga_mem_ops, s,
   cirrus-low-memory, 0x2);
 memory_region_add_subregion(s-low_mem_container, 0, s-low_mem);
+for (i = 0; i  2; ++i) {
+static const char *names[] = { vga.bank0, vga.bank1 };
+MemoryRegion *bank = s-cirrus_bank[i];
+memory_region_init_alias(bank, names[i], s-vga.vram, 0, 0x8000);
+memory_region_set_enabled(bank, false);
+memory_region_add_subregion_overlap(s-low_mem_container, i * 0x8000,
+bank, 1);
+}
 memory_region_add_subregion_overlap(system_memory,
 isa_mem_base + 0x000a,
 s-low_mem_container,
-- 
1.7.7.1




[Qemu-devel] [Bug 899961] Re: qemu/kvm locks up when run 32bit userspace with 64bit kernel

2011-12-04 Thread Michael Tokarev
And some more info.  Debugging with gdb shows this:

(gdb) info threads
  Id   Target Id Frame 
  2Thread 0xf6d4eb70 (LWP 28697) qemu-system-x86 0xf7711425 in 
__kernel_vsyscall ()
* 1Thread 0xf6f50700 (LWP 28694) qemu-system-x86 0xf7711425 in 
__kernel_vsyscall ()
(gdb) bt
#0  0xf7711425 in __kernel_vsyscall ()
#1  0xf76d620a in __pthread_cond_wait (cond=0x840fa60, mutex=0x89e82f0)
at pthread_cond_wait.c:153
#2  0x080e8bb5 in qemu_cond_wait (cond=0x840fa60, mutex=0x89e82f0)
at /build/kvm/git/qemu-thread-posix.c:113
#3  0x08050c2e in run_on_cpu (env=0x9466460, 
func=0x8083ad0 do_kvm_cpu_synchronize_state, data=0x9466460)
at /build/kvm/git/cpus.c:715
#4  0x08083b63 in kvm_cpu_synchronize_state (env=0x9466460)
at /build/kvm/git/kvm-all.c:927
#5  0x0804faaa in cpu_synchronize_state (env=0x9466460)
at /build/kvm/git/kvm.h:173
#6  0x0804fc3a in cpu_synchronize_all_states () at /build/kvm/git/cpus.c:94
#7  0x080647ec in main_loop () at /build/kvm/git/vl.c:1421
#8  0x0806974d in main (argc=17, argv=0xff996e04, envp=0xff996e4c)
at /build/kvm/git/vl.c:3395
(gdb) frame 2
#2  0x080e8bb5 in qemu_cond_wait (cond=0x840fa60, mutex=0x89e82f0)
at /build/kvm/git/qemu-thread-posix.c:113
113 err = pthread_cond_wait(cond-cond, mutex-lock);
(gdb) 
(gdb) thread 2
[Switching to thread 2 (Thread 0xf6d4eb70 (LWP 28697))]
#0  0xf7711425 in __kernel_vsyscall ()
(gdb) bt
#0  0xf7711425 in __kernel_vsyscall ()
#1  0xf727ac89 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#2  0x08084004 in kvm_vcpu_ioctl (env=0x9466460, type=44672)
at /build/kvm/git/kvm-all.c:1090
#3  0x08083cd8 in kvm_cpu_exec (env=0x9466460) at /build/kvm/git/kvm-all.c:976
#4  0x08050f44 in qemu_kvm_cpu_thread_fn (arg=0x9466460)
at /build/kvm/git/cpus.c:806
#5  0xf76d1c39 in start_thread (arg=0xf6d4eb70) at pthread_create.c:304
#6  0xf728296e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130
Backtrace stopped: Not enough registers or memory available to unwind further

which is not entirely interesting, but:

when exiting gdb (I attached it to a running process), the whole thing
unfreezes and continue its work as usual, if no lockup ever occured --
ie, it is enough to attach gdb to a locked up process and quit gdb -
enough to unfreeze it.  Also, when running under gdb, the lockup does
not occur - I can reboot the guest at will any times, it all goes fine.
Once gdb is detached, reboot immediately results in a lockup again -
which - again - can be cured by attaching and detaching gdb to the
process.

And one more correction for the original report.  When locked up, it
does NOT use 100% CPU - CPU is 100% _idle_.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/899961

Title:
  qemu/kvm locks up when run 32bit userspace with 64bit kernel

Status in QEMU:
  New

Bug description:
  Applies to both qemu and qemu-kvm 1.0, but only when kernel is 64bit
  and userspace is 32bit, on x86.  Did not happen with previous released
  versions, such as 0.15.  Not all guests triggers this issue - so far,
  only (32bit) windows 7 guest shows it, but does that quite reliable:
  first boot of an old guest with new qemu (or qemu-kvm), windows finds
  a new CPU and suggests rebooting - hit Reboot and in a few seconds
  it will be locked up (including the monitor), with 100% CPU usage.
  Killable with -9.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/899961/+subscriptions



[Qemu-devel] [PATCH 1/3] QEMU kvm: Syncing linux headers to 3.2.0-rc1

2011-12-04 Thread Raghavendra K T
 Update the kvm kernel headers to the 3.2.0-rc1 post using
scripts/update-linux-headers.sh script.

Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h
index fb3fddc..08fe69e 100644
--- a/linux-headers/asm-powerpc/kvm.h
+++ b/linux-headers/asm-powerpc/kvm.h
@@ -149,6 +149,12 @@ struct kvm_regs {
 #define KVM_SREGS_E_UPDATE_DBSR(1  3)
 
 /*
+ * Book3S special bits to indicate contents in the struct by maintaining
+ * backwards compatibility with older structs. If adding a new field,
+ * please make sure to add a flag for that new field */
+#define KVM_SREGS_S_HIOR   (1  0)
+
+/*
  * In KVM_SET_SREGS, reserved/pad fields must be left untouched from a
  * previous KVM_GET_REGS.
  *
@@ -170,9 +176,11 @@ struct kvm_sregs {
} ppc64;
struct {
__u32 sr[16];
-   __u64 ibat[8];
-   __u64 dbat[8];
+   __u64 ibat[8]; 
+   __u64 dbat[8]; 
} ppc32;
+   __u64 flags; /* KVM_SREGS_S_ */
+   __u64 hior;
} s;
struct {
union {
@@ -292,41 +300,4 @@ struct kvm_allocate_rma {
__u64 rma_size;
 };
 
-struct kvm_book3e_206_tlb_entry {
-   __u32 mas8;
-   __u32 mas1;
-   __u64 mas2;
-   __u64 mas7_3;
-};
-
-struct kvm_book3e_206_tlb_params {
-   /*
-* For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
-*
-* - The number of ways of TLB0 must be a power of two between 2 and
-*   16.
-* - TLB1 must be fully associative.
-* - The size of TLB0 must be a multiple of the number of ways, and
-*   the number of sets must be a power of two.
-* - The size of TLB1 may not exceed 64 entries.
-* - TLB0 supports 4 KiB pages.
-* - The page sizes supported by TLB1 are as indicated by
-*   TLB1CFG (if MMUCFG[MAVN] = 0) or TLB1PS (if MMUCFG[MAVN] = 1)
-*   as returned by KVM_GET_SREGS.
-* - TLB2 and TLB3 are reserved, and their entries in tlb_sizes[]
-*   and tlb_ways[] must be zero.
-*
-* tlb_ways[n] = tlb_sizes[n] means the array is fully associative.
-*
-* KVM will adjust TLBnCFG based on the sizes configured here,
-* though arrays greater than 2048 entries will have TLBnCFG[NENTRY]
-* set to zero.
-*/
-   __u32 tlb_sizes[4];
-   __u32 tlb_ways[4];
-   __u32 reserved[8];
-};
-
-#define KVM_ONE_REG_PPC_HIOR   KVM_ONE_REG_PPC | 0x100
-
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/linux-headers/asm-x86/hyperv.h b/linux-headers/asm-x86/hyperv.h
index 5df477a..b80420b 100644
--- a/linux-headers/asm-x86/hyperv.h
+++ b/linux-headers/asm-x86/hyperv.h
@@ -189,5 +189,6 @@
 #define HV_STATUS_INVALID_HYPERCALL_CODE   2
 #define HV_STATUS_INVALID_HYPERCALL_INPUT  3
 #define HV_STATUS_INVALID_ALIGNMENT4
+#define HV_STATUS_INSUFFICIENT_BUFFERS 19
 
 #endif
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index a8761d3..07bd557 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -371,6 +371,7 @@ struct kvm_s390_psw {
 #define KVM_S390_INT_VIRTIO0x2603u
 #define KVM_S390_INT_SERVICE   0x2401u
 #define KVM_S390_INT_EMERGENCY 0x1201u
+#define KVM_S390_INT_EXTERNAL_CALL 0x1202u
 
 struct kvm_s390_interrupt {
__u32 type;
@@ -556,8 +557,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_MAX_VCPUS 66   /* returns max vcpus per vm */
 #define KVM_CAP_PPC_HIOR 67
 #define KVM_CAP_PPC_PAPR 68
-#define KVM_CAP_SW_TLB 69
-#define KVM_CAP_ONE_REG 70
+#define KVM_CAP_S390_GMAP 71
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -637,49 +637,6 @@ struct kvm_clock_data {
__u32 pad[9];
 };
 
-#define KVM_MMU_FSL_BOOKE_NOHV 0
-#define KVM_MMU_FSL_BOOKE_HV   1
-
-struct kvm_config_tlb {
-   __u64 params;
-   __u64 array;
-   __u32 mmu_type;
-   __u32 array_len;
-};
-
-struct kvm_dirty_tlb {
-   __u64 bitmap;
-   __u32 num_dirty;
-};
-
-/* Available with KVM_CAP_ONE_REG */
-
-#define KVM_ONE_REG_GENERIC0xULL
-
-/*
- * Architecture specific registers are to be defined in arch headers and
- * ORed with the arch identifier.
- */
-#define KVM_ONE_REG_PPC0x1000ULL
-#define KVM_ONE_REG_X860x2000ULL
-#define KVM_ONE_REG_IA64   0x3000ULL
-#define KVM_ONE_REG_ARM0x4000ULL
-#define KVM_ONE_REG_S390   0x5000ULL
-
-struct kvm_one_reg {
-   __u64 id;
-   union {
-   __u8 reg8;
-   

[Qemu-devel] [PATCH 0/3] QEMU kvm: Adding KICK_VCPU capability to i386 kvm

2011-12-04 Thread Raghavendra K T
From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

Three patch series following this, extends KVM-hypervisor
and Linux guest running on KVM-hypervisor to support pv-ticket spinlocks.

PV ticket spinlock helps to solve Lock Holder Preemption problem discussed in
http://www.amd64.org/fileadmin/user_upload/pub/LHP-commented_slides.pdf.

When spinlock is contended,a guest vcpu relinqueshes cpu by halt().
Correspondingly, One hypercall is introduced in KVM hypervisor,that allows
a vcpu to kick the halted vcpu to continue with execution.

The series will : 
- Update qemu with latest linux header files (to 3.2.0-rc1).
- Enable KICK_VCPU capability in kvm/i386.

Raghavendra K T(3):
  Sync the linux headers to 3.2.0-rc1
  Sync the linux headers to patched linux kernel with 
KICK_VCPU capability.
  Add KICK_VCPU support in i386 target

---
The corresponding kernel patch is available in the thread
https://lkml.org/lkml/2011/11/30/62




[Qemu-devel] [PATCH 3/3] QEMU kvm/i386 : Adding KICK_VCPU capability support in i386 target.

2011-12-04 Thread Raghavendra K T
 Extend the KVM Hypervisor to enable KICK_VCPU feature that allows
a vcpu to kick the halted vcpu to continue with execution in PV ticket
spinlock.

Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5bfc21f..69bce21 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -97,6 +97,7 @@ struct kvm_para_features {
 { KVM_CAP_NOP_IO_DELAY, KVM_FEATURE_NOP_IO_DELAY },
 { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP },
 { KVM_CAP_ASYNC_PF, KVM_FEATURE_ASYNC_PF },
+{ KVM_CAP_KICK_VCPU, KVM_FEATURE_KICK_VCPU },
 { -1, -1 }
 };
 




[Qemu-devel] [PATCH 2/3] QEMU kvm: Syncing linux headers to support KICK_VCPU capability

2011-12-04 Thread Raghavendra K T
Update the kernel header that adds a hypercall to support pv-ticketlocks.

Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h
index f2ac46a..03d3a36 100644
--- a/linux-headers/asm-x86/kvm_para.h
+++ b/linux-headers/asm-x86/kvm_para.h
@@ -16,12 +16,14 @@
 #define KVM_FEATURE_CLOCKSOURCE0
 #define KVM_FEATURE_NOP_IO_DELAY   1
 #define KVM_FEATURE_MMU_OP 2
+
 /* This indicates that the new set of kvmclock msrs
  * are available. The use of 0x11 and 0x12 is deprecated
  */
 #define KVM_FEATURE_CLOCKSOURCE23
 #define KVM_FEATURE_ASYNC_PF   4
 #define KVM_FEATURE_STEAL_TIME 5
+#define KVM_FEATURE_KICK_VCPU  6
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 07bd557..47ab6ff 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_HIOR 67
 #define KVM_CAP_PPC_PAPR 68
 #define KVM_CAP_S390_GMAP 71
+#define KVM_CAP_KICK_VCPU 72
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/linux-headers/linux/kvm_para.h b/linux-headers/linux/kvm_para.h
index b315e27..e4a0e3e 100644
--- a/linux-headers/linux/kvm_para.h
+++ b/linux-headers/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP  2
 #define KVM_HC_FEATURES3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE  4
+#define KVM_HC_KICK_CPU5
 
 /*
  * hypercalls use architecture specific




[Qemu-devel] [PATCH v2 3/6] memory: introduce memory_region_set_alias_offset()

2011-12-04 Thread Avi Kivity
Add an API to update an alias offset of an active alias.  This can be
used to simplify implementation of dynamic memory banks.

Signed-off-by: Avi Kivity a...@redhat.com
---
 memory.c |   14 ++
 memory.h |   13 -
 2 files changed, 26 insertions(+), 1 deletions(-)

diff --git a/memory.c b/memory.c
index a080d21..7e842b3 100644
--- a/memory.c
+++ b/memory.c
@@ -1345,6 +1345,20 @@ void memory_region_set_address(MemoryRegion *mr, 
target_phys_addr_t addr)
 memory_region_transaction_commit();
 }
 
+void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t 
offset)
+{
+target_phys_addr_t old_offset = mr-alias_offset;
+
+assert(mr-alias);
+mr-alias_offset = offset;
+
+if (offset == old_offset || !mr-parent) {
+return;
+}
+
+memory_region_update_topology(mr);
+}
+
 void set_system_memory_map(MemoryRegion *mr)
 {
 address_space_memory.root = mr;
diff --git a/memory.h b/memory.h
index db53422..2022de7 100644
--- a/memory.h
+++ b/memory.h
@@ -527,7 +527,18 @@ void memory_region_set_enabled(MemoryRegion *mr, bool 
enabled);
  * @mr: the region to be updated
  * @addr: new address, relative to parent region
  */
-void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr);
+void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t offset);
+
+/*
+ * memory_region_set_alias_offset: dynamically update a memory alias's offset
+ *
+ * Dynamically updates the offset into the target region that an alias points
+ * to, as if the fourth argument to memory_region_init_alias() has changed.
+ *
+ * @mr: the #MemoryRegion to be updated; should be an alias.
+ * @offset: the new offset into the target memory region
+ */
+void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t addr);
 
 /* Start a transaction; changes will be accumulated and made visible only
  * when the transaction ends.
-- 
1.7.7.1




[Qemu-devel] [PATCH v2 1/6] memory: introduce memory_region_set_enabled()

2011-12-04 Thread Avi Kivity
This allows users to disable a memory region without removing
it from the hierarchy, simplifying the implementation of
memory routers.

Signed-off-by: Avi Kivity a...@redhat.com
---
 memory.c |   40 +---
 memory.h |   17 +
 2 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/memory.c b/memory.c
index adfdf14..d0f90ca 100644
--- a/memory.c
+++ b/memory.c
@@ -528,6 +528,10 @@ static void render_memory_region(FlatView *view,
 FlatRange fr;
 AddrRange tmp;
 
+if (!mr-enabled) {
+return;
+}
+
 int128_addto(base, int128_make64(mr-addr));
 readonly |= mr-readonly;
 
@@ -750,12 +754,16 @@ static void address_space_update_topology(AddressSpace 
*as)
 address_space_update_ioeventfds(as);
 }
 
-static void memory_region_update_topology(void)
+static void memory_region_update_topology(MemoryRegion *mr)
 {
 if (memory_region_transaction_depth) {
 return;
 }
 
+if (mr  !mr-enabled) {
+return;
+}
+
 if (address_space_memory.root) {
 address_space_update_topology(address_space_memory);
 }
@@ -773,7 +781,7 @@ void memory_region_transaction_commit(void)
 {
 assert(memory_region_transaction_depth);
 --memory_region_transaction_depth;
-memory_region_update_topology();
+memory_region_update_topology(NULL);
 }
 
 static void memory_region_destructor_none(MemoryRegion *mr)
@@ -813,6 +821,7 @@ void memory_region_init(MemoryRegion *mr,
 }
 mr-addr = 0;
 mr-offset = 0;
+mr-enabled = true;
 mr-terminates = false;
 mr-readable = true;
 mr-readonly = false;
@@ -1058,7 +1067,7 @@ void memory_region_set_log(MemoryRegion *mr, bool log, 
unsigned client)
 uint8_t mask = 1  client;
 
 mr-dirty_log_mask = (mr-dirty_log_mask  ~mask) | (log * mask);
-memory_region_update_topology();
+memory_region_update_topology(mr);
 }
 
 bool memory_region_get_dirty(MemoryRegion *mr, target_phys_addr_t addr,
@@ -1090,7 +1099,7 @@ void memory_region_set_readonly(MemoryRegion *mr, bool 
readonly)
 {
 if (mr-readonly != readonly) {
 mr-readonly = readonly;
-memory_region_update_topology();
+memory_region_update_topology(mr);
 }
 }
 
@@ -1098,7 +1107,7 @@ void memory_region_rom_device_set_readable(MemoryRegion 
*mr, bool readable)
 {
 if (mr-readable != readable) {
 mr-readable = readable;
-memory_region_update_topology();
+memory_region_update_topology(mr);
 }
 }
 
@@ -1203,7 +1212,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
 memmove(mr-ioeventfds[i+1], mr-ioeventfds[i],
 sizeof(*mr-ioeventfds) * (mr-ioeventfd_nb-1 - i));
 mr-ioeventfds[i] = mrfd;
-memory_region_update_topology();
+memory_region_update_topology(mr);
 }
 
 void memory_region_del_eventfd(MemoryRegion *mr,
@@ -1233,7 +1242,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
 --mr-ioeventfd_nb;
 mr-ioeventfds = g_realloc(mr-ioeventfds,
   sizeof(*mr-ioeventfds)*mr-ioeventfd_nb + 
1);
-memory_region_update_topology();
+memory_region_update_topology(mr);
 }
 
 static void memory_region_add_subregion_common(MemoryRegion *mr,
@@ -1274,7 +1283,7 @@ static void 
memory_region_add_subregion_common(MemoryRegion *mr,
 }
 QTAILQ_INSERT_TAIL(mr-subregions, subregion, subregions_link);
 done:
-memory_region_update_topology();
+memory_region_update_topology(mr);
 }
 
 
@@ -1303,19 +1312,28 @@ void memory_region_del_subregion(MemoryRegion *mr,
 assert(subregion-parent == mr);
 subregion-parent = NULL;
 QTAILQ_REMOVE(mr-subregions, subregion, subregions_link);
-memory_region_update_topology();
+memory_region_update_topology(mr);
+}
+
+void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
+{
+if (enabled == mr-enabled) {
+return;
+}
+mr-enabled = enabled;
+memory_region_update_topology(NULL);
 }
 
 void set_system_memory_map(MemoryRegion *mr)
 {
 address_space_memory.root = mr;
-memory_region_update_topology();
+memory_region_update_topology(NULL);
 }
 
 void set_system_io_map(MemoryRegion *mr)
 {
 address_space_io.root = mr;
-memory_region_update_topology();
+memory_region_update_topology(NULL);
 }
 
 typedef struct MemoryRegionList MemoryRegionList;
diff --git a/memory.h b/memory.h
index 53bf261..c6997c4 100644
--- a/memory.h
+++ b/memory.h
@@ -123,6 +123,7 @@ struct MemoryRegion {
 bool terminates;
 bool readable;
 bool readonly; /* For RAM regions */
+bool enabled;
 MemoryRegion *alias;
 target_phys_addr_t alias_offset;
 unsigned priority;
@@ -501,6 +502,22 @@ void memory_region_add_subregion_overlap(MemoryRegion *mr,
 void memory_region_del_subregion(MemoryRegion *mr,
  MemoryRegion *subregion);
 
+
+/*
+ * memory_region_set_enabled: dynamically enable or disable a region
+ *
+ * Enables or disables a 

Re: [Qemu-devel] [PATCH v2 00/18] qom: dynamic properties and composition tree (v2)

2011-12-04 Thread Anthony Liguori

On 12/03/2011 03:34 PM, Anthony Liguori wrote:

On 12/03/2011 08:24 AM, Paolo Bonzini wrote:

On 12/03/2011 03:40 AM, Anthony Liguori wrote:

That is still true. The next step, inheritance, will pull the properties
into a base class. That base class can be used elsewhere outside of the
device model.

But this is already a 20 patch series. If you want all of that in one
series, it's going to be 100 patches that are not terribly easy to
review at once.


Without a design document and a roadmap, however, it's impossible to try to
understand how the pieces will be together. 100 patches may require some time to
digest, but 20 patches require a crystal ball to figure out what's ahead.


You can see a bit further by looking at:

https://github.com/aliguori/qemu/commits/qom-next

That fills out the composition tree pretty well for the pc. The next step is
aggressive refactoring such that the qdev objects reflect the composition. IOW,
we should create the rtc from within the piix3 initialization function.


I've begun the work of introducing proper inheritance.  There's a lot going on 
but the basic idea is:


1) introduce QOM base type (Object), make qdev inherit from it

2) create a dynamic typeinfo based DeviceInfo, make device class point to 
deviceinfo

3) model qdev hierarchy in QOM

4) starting from the bottom of the hierarchy, remove DeviceInfo subclass and 
push that functionality into QOM classes


5) once (4) is complete, remove DeviceInfo

6) refactor any use of multiple child busses into separate devices with one bus

7) refactor busstate as an interface

8) refactor device model to make more aggressive use of composition

9) refactor life cycle events into virtual methods

The tree I've posted is on step (4).

Regards,

Anthony Liguori



Re: [Qemu-devel] sub-page-sized mmio regions and address passed to read/write fns

2011-12-04 Thread Peter Maydell
On 4 December 2011 12:17, Avi Kivity a...@redhat.com wrote:
 On 12/02/2011 04:49 PM, Peter Maydell wrote:
 However what I found is that the addresses passed to the read/write
 functions aren't what I would expect. For instance if the board
 maps the container at address 0x1e00, then a read from 0x1e000100
 goes to the functions given by a9_gic_cpu_ops, as it should. However,
 the offset parameter that the read function is passed is not 0x0
 (offset from the start of the a9mp-gic-cpu region) but 0x100 (offset
 from the start of the page, I think).

 Is this expected behaviour? I certainly wasn't expecting it...

 A while ago this was the behaviour across the board.  Then 8da3ff1809747
 changed addresses to be relative, but apparently missed the subpage case.

Having looked a bit more closely at the code I think this is what
the comment at the top of cpu_register_physical_memory_log() is
referring to:

# Both start_addr and region_offset are rounded down to a page boundary
# before calculating this offset.  This should not be a problem unless
# the low bits of start_addr and region_offset differ.

In the case of a subregion at a non-page-aligned-address the
start_addr is not page aligned, but the region_offset is zero,
in the usual case, so we have differing low bits.

 I looked through the code that's getting called for reads, and
 it looks to me like exec.c:subpage_readlen() is causing this.
 We look up the subpage_t based on the address within the page,
 but we don't then adjust the address we pass to io_mem_read
 (except by region_offset, which I take from the comment at the
 top of cpu_register_physical_memory_log() to be for something
 else.)

 I think you can use subpage_t's region_offset array for this (adding
 into it, of course, so the original value remains).

Yes. I think the correction has to be calculated and applied in
cpu_register_physical_memory_log() -- for a region which starts
at a non-page-aligned address and extends over more than a page
the correcting offset needs to be applied for the whole region,
not just the first partial page.

-- PMM



Re: [Qemu-devel] [PATCH v2 3/6] memory: introduce memory_region_set_alias_offset()

2011-12-04 Thread Blue Swirl
On Sun, Dec 4, 2011 at 18:09, Avi Kivity a...@redhat.com wrote:
 Add an API to update an alias offset of an active alias.  This can be
 used to simplify implementation of dynamic memory banks.

 Signed-off-by: Avi Kivity a...@redhat.com
 ---
  memory.c |   14 ++
  memory.h |   13 -
  2 files changed, 26 insertions(+), 1 deletions(-)

 diff --git a/memory.c b/memory.c
 index a080d21..7e842b3 100644
 --- a/memory.c
 +++ b/memory.c
 @@ -1345,6 +1345,20 @@ void memory_region_set_address(MemoryRegion *mr, 
 target_phys_addr_t addr)
     memory_region_transaction_commit();
  }

 +void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t 
 offset)
 +{
 +    target_phys_addr_t old_offset = mr-alias_offset;
 +
 +    assert(mr-alias);
 +    mr-alias_offset = offset;
 +
 +    if (offset == old_offset || !mr-parent) {
 +        return;
 +    }
 +
 +    memory_region_update_topology(mr);
 +}
 +
  void set_system_memory_map(MemoryRegion *mr)
  {
     address_space_memory.root = mr;
 diff --git a/memory.h b/memory.h
 index db53422..2022de7 100644
 --- a/memory.h
 +++ b/memory.h
 @@ -527,7 +527,18 @@ void memory_region_set_enabled(MemoryRegion *mr, bool 
 enabled);
  * @mr: the region to be updated
  * @addr: new address, relative to parent region
  */
 -void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr);
 +void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t offset);

This isn't the function you are looking for, but still 'addr' is
changed to 'offset'.

 +
 +/*
 + * memory_region_set_alias_offset: dynamically update a memory alias's offset
 + *
 + * Dynamically updates the offset into the target region that an alias points
 + * to, as if the fourth argument to memory_region_init_alias() has changed.
 + *
 + * @mr: the #MemoryRegion to be updated; should be an alias.
 + * @offset: the new offset into the target memory region
 + */
 +void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t 
 addr);

Here 'addr' doesn't match the description above.



Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-04 Thread Jan Kiszka
On 2011-12-04 22:31, Blue Swirl wrote:
 On Sun, Dec 4, 2011 at 16:35, Avi Kivity a...@redhat.com wrote:
 On 12/04/2011 05:19 PM, Jan Kiszka wrote:

 In the sense that kernel-apic is just an accelerated apic.  From the
 guest point of view, there's no difference, and that should be reflected
 in the device model.

 That was my goal as well: The guest should not notice the difference,
 but the admin on the host side should still be able to tell both
 internally fairly different models apart.

 This should be some attribute, not the name.

 Plus the code should be
 clearly split where there are differences and explicitly shared where
 there aren't.

 That's a good goal, yes.
 
 I'd prefer an unified device built from a single source file if
 possible. This conflicts with the build-once model though.

Right, another reason to not do this.

 


 If I'm reading an apic register, either from the guest or via a monitor
 debug interface, I shouldn't care whether it's accelerated or not.  The
 guest part already holds, of course.

 Specifically for the debug scenario, I'd prefer the clear
 differentiation by name as there can always remain subtle differences in
 the implementation of kernel vs. user space. Someone debugging the guest
 and/or qemu/kvm should remain aware of this.

 Aware, yes, but the name change is too drastic.
 
 It should be also possible to migrate from non-KVM device to KVM
 version, different names would prevent that for ever.

It is (theoretically) possible with these patches as the vmstate names
are the same. KVM to TCG migration does not work right now, so I was
only able to test in-kernel - user space irqchip model migrations.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH for v1.0 1/3] msix: track function masked in pci device state

2011-12-04 Thread Cam Macdonell
On Sun, Dec 4, 2011 at 3:20 AM, Michael S. Tsirkin m...@redhat.com wrote:
 On Fri, Dec 02, 2011 at 04:34:21PM -0700, Cam Macdonell wrote:
 Based on a git bisect, this patch breaks msi-x interrupt delivery in
 the ivshmem device.

 I think the following should fix it. Compiled-only -
 could you pls check? If yes let's apply to the stable branch.

Thanks for the patch Michael.

It addresses the need for msix_write_config() to be called, but the
addition of the msix_reset() is causing a reset of the vectors after
they've been initialized in pci_ivshmem_init().  So, interrupts still
aren't delivered with this patch applied as it is.

In particular, a reset occurs after pci_ivshmem_init runs, so the
msix_entry_used array is reset to 0s, which causes the interrupt
delivery to fail.

If I comment out the msix_reset(), then interrupts are delivered.
Would the reset be caused by a bug in the guest driver?  or do I need
to reconfigure the msix after reset?  I'm unclear as to the proper
behaviour after a reset.

Thanks,
Cam


 --

 ivshmem: add missing msix calls

 ivshmem used msix but didn't call it on either reset or
 config write paths. This used to partically work since
 guests don't use all of msi-x configuration fields,
 and reset is rarely used, but the patch 'msix: track function masked
 in pci device state' broke that. Fix by adding appropriate calls.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com

 --

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 242fbea..3680c0f 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -505,6 +505,7 @@ static void ivshmem_reset(DeviceState *d)
     IVShmemState *s = DO_UPCAST(IVShmemState, dev.qdev, d);

     s-intrstatus = 0;
 +    msix_reset(s-dev);
     return;
  }

 @@ -610,6 +611,13 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int 
 version_id)
     return 0;
  }

 +static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address,
 +                                uint32_t val, int len)
 +{
 +    pci_default_write_config(pci_dev, address, val, len);
 +    msix_write_config(pci_dev, address, val, len);
 +}
 +
  static int pci_ivshmem_init(PCIDevice *dev)
  {
     IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 @@ -734,6 +742,8 @@ static int pci_ivshmem_init(PCIDevice *dev)

     }

 +    s-dev.config_write = ivshmem_write_config;
 +
     return 0;
  }





Re: [Qemu-devel] [PATCH v5] block:add-cow file format

2011-12-04 Thread Dong Xu Wang
Ping...

2011/11/28 Dong Xu Wang wdon...@linux.vnet.ibm.com

 Any comment?
 Thanks.


 2011/11/15 Dong Xu Wang wdon...@linux.vnet.ibm.com

 From: Dong Xu Wang wdon...@linux.vnet.ibm.com

 Provide a new file format: add-cow. The usage can be found in add-cow.txt
 of
 this patch.

 Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
 ---
  Makefile.objs  |1 +
  block.c|2 +-
  block.h|1 +
  block/add-cow.c|  417
 
  block_int.h|1 +
  docs/specs/add-cow.txt |   57 +++
  6 files changed, 478 insertions(+), 1 deletions(-)
  create mode 100644 block/add-cow.c
  create mode 100644 docs/specs/add-cow.txt

 diff --git a/Makefile.objs b/Makefile.objs
 index d7a6539..ad99243 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -31,6 +31,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o

  block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o
 vpc.o vvfat.o
  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o
 qcow2-snapshot.o qcow2-cache.o
 +block-nested-y += add-cow.o
  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o
 qed-cluster.o
  block-nested-y += qed-check.o
  block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 diff --git a/block.c b/block.c
 index 86910b0..a2be27b 100644
 --- a/block.c
 +++ b/block.c
 @@ -106,7 +106,7 @@ int is_windows_drive(const char *filename)
  #endif

  /* check if the path starts with protocol: */
 -static int path_has_protocol(const char *path)
 +int path_has_protocol(const char *path)
  {
  #ifdef _WIN32
 if (is_windows_drive(path) ||
 diff --git a/block.h b/block.h
 index 051a25d..836284f 100644
 --- a/block.h
 +++ b/block.h
 @@ -276,6 +276,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size,
 QEMUSnapshotInfo *sn);

  char *get_human_readable_size(char *buf, int buf_size, int64_t size);
  int path_is_absolute(const char *path);
 +int path_has_protocol(const char *path);
  void path_combine(char *dest, int dest_size,
   const char *base_path,
   const char *filename);
 diff --git a/block/add-cow.c b/block/add-cow.c
 new file mode 100644
 index 000..54d30a9
 --- /dev/null
 +++ b/block/add-cow.c
 @@ -0,0 +1,417 @@
 +#include qemu-common.h
 +#include block_int.h
 +#include module.h
 +
 +#define ADD_COW_MAGIC   (((uint64_t)'A'  56) | ((uint64_t)'D' 
 48) | \
 +((uint64_t)'D'  40) | ((uint64_t)'_' 
 32) | \
 +((uint64_t)'C'  24) | ((uint64_t)'O' 
 16) | \
 +((uint64_t)'W'  8) | 0xFF)
 +#define ADD_COW_VERSION 1
 +#define ADD_COW_FILE_LEN1024
 +
 +typedef struct AddCowHeader {
 +uint64_tmagic;
 +uint32_tversion;
 +charbacking_file[ADD_COW_FILE_LEN];
 +charimage_file[ADD_COW_FILE_LEN];
 +uint64_tsize;
 +} QEMU_PACKED AddCowHeader;
 +
 +typedef struct BDRVAddCowState {
 +charimage_file[ADD_COW_FILE_LEN];
 +BlockDriverState*image_hd;
 +uint8_t *bitmap;
 +uint64_tbitmap_size;
 +CoMutex lock;
 +} BDRVAddCowState;
 +
 +static int add_cow_probe(const uint8_t *buf, int buf_size, const char
 *filename)
 +{
 +const AddCowHeader *header = (const void *)buf;
 +
 +if (be64_to_cpu(header-magic) == ADD_COW_MAGIC 
 +be32_to_cpu(header-version) == ADD_COW_VERSION) {
 +return 100;
 +} else {
 +return 0;
 +}
 +}
 +
 +static int add_cow_open(BlockDriverState *bs, int flags)
 +{
 +AddCowHeaderheader;
 +int64_t size;
 +charimage_filename[ADD_COW_FILE_LEN];
 +int image_flags;
 +BlockDriver *image_drv = NULL;
 +int ret;
 +BDRVAddCowState *state = (BDRVAddCowState *)(bs-opaque);
 +
 +ret = bdrv_pread(bs-file, 0, header, sizeof(header));
 +if (ret != sizeof(header)) {
 +goto fail;
 +}
 +
 +if (be64_to_cpu(header.magic) != ADD_COW_MAGIC ||
 +be32_to_cpu(header.version) != ADD_COW_VERSION) {
 +ret = -EINVAL;
 +goto fail;
 +}
 +
 +size = be64_to_cpu(header.size);
 +bs-total_sectors = size / BDRV_SECTOR_SIZE;
 +
 +QEMU_BUILD_BUG_ON(sizeof(state-image_file) !=
 sizeof(header.image_file));
 +pstrcpy(bs-backing_file, sizeof(bs-backing_file),
 +header.backing_file);
 +pstrcpy(state-image_file, sizeof(state-image_file),
 +header.image_file);
 +
 +state-bitmap_size = ((bs-total_sectors + 7)  3);
 +state-bitmap = g_malloc0(state-bitmap_size);
 +
 +ret = bdrv_pread(bs-file, sizeof(header), state-bitmap,
 +state-bitmap_size);
 +if (ret != state-bitmap_size) {
 +goto fail;
 +}
 +   /* If there is a image_file, must be together with backing_file */
 +if (state-image_file[0] != '\0') {
 +state-image_hd = 

Re: [Qemu-devel] windows guest virtio serial and balloon driver test issues

2011-12-04 Thread Cao,Bing Bu

On 11/29/2011 08:36 PM, Vadim Rozenfeld wrote:

On Tue, 2011-11-29 at 08:58 +0800, Cao,Bing Bu wrote:

Hi,

  Rozenfeld,Thanks,got it!

  And do you know whether there are some sufficient test tools (such
as IOmeter)
  to test the virtio driver performance?

IoMeter is good. But you also might be interested in
SQLIOSim, database hammer, and diskio (part of WLK) + xperf.


On 11/25/2011 02:42 PM, Vadim Rozenfeld wrote:

On Fri, 2011-11-25 at 09:59 +0800, Cao,Bing Bu wrote:

Hi,all


Thanks,Frenkel.The test application of the balloon must be run as
admin.



But I found 2 problems(question) this week when testing windows guest
drivers:


*
  If only virtio serial driver installed,the virtio serial test app
can not enumerate/find the virtio serial device,
  but after virtio balloon driver installed,the app can find the
virtio serial device correctly.
  Because of the same GUID which balloon and serial both use?

Correct. This test application is a very simplified one. We published it
mostly as an example, but not as a real test application. It doesn't
enumerate all virtio serial instances, rather just find the
first one and use it.


*
  When inflate/deflate the balloon size using qemu monitor balloon
command,
  the total physical memory did not decrease/increase
correspondingly,as seen from resource monitor,
  only the available memory size descrease/increase.But I test on
other Linux guest,
  the total physical memory of the guest OS was changed.

  Is it a problem? If not,is it confusing to user?
  Related to the windows internal memory management?


Total physical memory on Windows will always be the same,
because we don't hot-plug/unplug physical memory.
Balloon driver works with non-paged pool memory instead.
So, every time you inflate or deflate balloon in your system,
you should see Available memory is changing, while physical
will always be the same.

Best,
Vadim.



On 11/21/2011 06:33 PM, Arkady Frenkel wrote:

On 11/21/2011 10:39 AM, Cao,Bing Bu wrote:

Hi,

  Recently,I am testing windows guest driver on Win7 and
WinXP(32bit) with the latest windows guest driver development source.
  Download from
http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/src/


virtio-blk:
  It seems OK both on Win7 and WinXP,the r/w performance is better
than IDE disk.



virtio-serial:
  I tried to test virtio serial driver using the test application in
the project.

  WinXP:
  Write: OK
  Read: Error: Read File Failed.

  Win7:
  The test application return error can not find vioserial device.
  But i debug the code and check that the GetDevicePath() return
value is not NULL,and same as the value when testing on WinXP.
  Why the CreateFile() in init() not called? (:


virtio-balloon:

  QEMU monitor: device_add virtio-balloon-pci

  On the guest,a new device PCI standard RAM controller added.
  But the Device Manager prompt No driver installed for this
device, but install the driver balloon.sys failed.
  It said the driver is up to date.Confused. (:

  How can I install and test the balloon driver on Windows?



The kvm-guest-drivers-windows.git on kernel.org is not available,is
there any mirror git repository?
Any mail-list or bugzilla for windows guest driver?

Any help from will be appreciated.



You need to run serial test app as admin only.

To install balloon you have to go throw additional option when click on
Browse my computer for driver software. Choose Let me pick from the
list of device drivers on my computer option.

Arkady


Best regards
Cao,Bing Bu








Thanks you,Vadim.(:

Is there anything TO-DO or need further optimization in current windows 
guest driver?
How could I contribute to the windows guest driver development(test 
patches,sign-off patches,bug fix,etc.)?


--
Best Regards,
Cao,Bing Bu




Re: [Qemu-devel] [KVM][Kemari]: Build error fix

2011-12-04 Thread OHMURA Kei
On 2011/12/02 21:51, Pradeep Kumar wrote:
 It fixes build failure.
 
 I hit this error, after succsfull migration and sync.
 
 (qemu) qemu-system-x86_64: fill buffer failed, Interrupted system call
 
 qemu-system-x86_64: recv header failed
 
 qemu-system-x86_64: recv ack failed
 
 qemu_transaction_begin failed

Did you use master branch?
It is not latest version. next branch is latest and fixed error.

git://kemari.git.sourceforge.net/gitroot/kemari/kemari next

Thanks,
Kei

 
 Any one working on this now? 
 
From 827c04da6574be80d8352acd7c40b0b4524af5f4 Mon Sep 17 00:00:00 2001
 Date: Fri, 2 Dec 2011 18:11:40 +0530
 Subject: [PATCH]  [Qemu][Kemari]: Build Failure
Signed-off-by: pradeep psuri...@linux.vnet.ibm.com
modified:   ft_trans_file.c
 
 ---
  ft_trans_file.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/ft_trans_file.c b/ft_trans_file.c
 index 4e33034..dc36757 100644
 --- a/ft_trans_file.c
 +++ b/ft_trans_file.c
 @@ -174,7 +174,7 @@ static int ft_trans_send_header(QEMUFileFtTrans *s,
  static int ft_trans_put_buffer(void *opaque, const uint8_t *buf, int64_t 
 pos, int size)
  {
  QEMUFileFtTrans *s = opaque;
 -ssize_t ret;
 +ssize_t ret = 0;
  
  trace_ft_trans_put_buffer(size, pos);




Re: [Qemu-devel] [BUG] [Seabios] PCI 64bit BARs on Win2008 - unable to start the device. (ACPI lacks the _DSM method)

2011-12-04 Thread Alexey Korolev

Hi Michael,

Thank you for good advice, you are right.  When I added new range above 
4GB in _CRS the problem has gone.
  QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, 
NonCacheable, ReadWrite,

   0x,  // Address Space Granularity
   0x1,// Address Range Minimum
   0x3,// Address Range Maximum
   0x,  // Address Translation Offset
   0x4,// Address Length
   ,, , AddressRangeMemory, TypeStatic)

The only big problem with this range - as soon as I have more than 3GB 
of RAM, windows will boot in BSOD. The problem relates to memory range 
intersection.
Unfortunately it is not possible to predict how many GB of RAM the 
virtual machine could have - so it's difficult to specify a particular 
region.

Do you have any ideas what can be done to solve this problem?

Regards,
Alexey


On Thu, Dec 01, 2011 at 06:49:54PM +1300, Alexey Korolev wrote:

Isaku san,

I've just added you to discussion.
There are some issues with PCI 64bit support in Windows. Windows
fails to assign the resource if it doesn't fit in first 4GB window.

I really don't know why it happens.
One of the possibilities is related to lack of _DSM method in ACPI.

Another guesse could be related to the fact that 440FX only supports
32bit PCI bus interface and windows may limit PCI address range to
first 4GB for PCI devices under this bridge.
I remember you were working on Q35 chipset simulation, I wonder if
it is working and would it be possible to try?

Thanks,
Alexey

Maybe the range above 4G needs to be declared in the _CRS
resource?





Re: [Qemu-devel] [BUG] [Seabios] PCI 64bit BARs on Win2008 - unable to start the device. (ACPI lacks the _DSM method)

2011-12-04 Thread Alexey Korolev

Hi Gerd,

We have very early prototype of data acquisition device, with quite 
large MMIO buffer. It is an emulated device.

We are running the 0.15 release.
0.15 doesn't work correctly with 64bit BARs so I've already added some 
hacks to Seabios to let OS to choose the memory region.

Thus you see bar 1, addr 0 in seabios log.
Sorry that I haven't specified all this initially. I just want to make 
64bit PCI bar working properly. Linux guests works correctly (except 
early versions - not investigated this yet). At the moment I have some 
issues with windows which relies on ACPI _CRS.


Thanks,
Alexey



   Hi,


PCI: map device bus 0, bfd 0x28
   bar 0, addr febe, size 1 [mem]
   bar 1, addr 0, size 2000 [mem]

Somehow seabios didn't recognise the bar correctly it seems (both 512
and 256 MB cases look the same).  For the 256 MB case seabios should
have mapped the bar @ 0xe000.

... and it should also have figured it is prefetchable memory.  Was pci
config space messed up somehow?  What does 'lspci -v' say once you've
booted the machine with linux?

What qemu version you are running?
What kind of device is this?
Emulated?  Code somewhere?
Or a real device passed through to the guest?

cheers,
   Gerd





Re: [Qemu-devel] [PATCH v4 2/3] Extract code to nbd_setup function to be used for many purposes

2011-12-04 Thread Chunyan Liu
2011/12/3 Paolo Bonzini pbonz...@redhat.com

 On 12/02/2011 04:27 PM, Chunyan Liu wrote:

 @@ -42,6 +42,18 @@ static int verbose;
  static char *device;
  static char *srcpath;
  static char *sockpath;
 +static int is_sockpath_option;
 +static int sigterm_fd[2];
 +static off_t dev_offset;
 +static uint32_t nbdflags;
 +static bool disconnect;
 +static const char *bindto = 0.0.0.0;
 +static int port = NBD_DEFAULT_PORT;
 +static int li;
 +static int flags = BDRV_O_RDWR;
 +static int partition = -1;
 +static int shared = 1;
 +static int persistent;


 A lot of statics... li seems unused.


Using these statics simply because most of them are global parameters
getting from command line options, will be used later. Otherwise, the
nbd_setup() function should take many parameters.

Ahh, li could be defined in main(). After getting parameters from option,
later places can use port.
   case 'p':
li = strtol(optarg, end, 0);
if (*end) {
errx(EXIT_FAILURE, Invalid port `%s', optarg);
}
if (li  1 || li  65535) {
errx(EXIT_FAILURE, Port out of range `%s', optarg);
}
port = (uint16_t)li;


 I took patch 1/3 in my tree 
 (git://github.com/bonzini/**qemu.githttp://github.com/bonzini/qemu.gitbranch
  nbd-server).  I'll post it together with my patches next week.

 Paolo




Re: [Qemu-devel] Improve QEMU performance with LLVM codegen and other techniques

2011-12-04 Thread 陳韋任
   We ask TCG to disassemble the guest binary where the trace beginning with
  _again_ to get a set of TCG blocks, then sent them to the LLVM translator.
 
 So you have two TCG backends? One to generate real host code and one that 
 goes into your LLVM generator?

  Ah..., I should say we ask QEMU frontend to disassemble the guest binary
to TCG again.

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj



Re: [Qemu-devel] [BUG] [Seabios] PCI 64bit BARs on Win2008 - unable to start the device. (ACPI lacks the _DSM method)

2011-12-04 Thread Michael S. Tsirkin
On Mon, Dec 05, 2011 at 05:20:32PM +1300, Alexey Korolev wrote:
 Hi Michael,
 
 Thank you for good advice, you are right.  When I added new range
 above 4GB in _CRS the problem has gone.
   QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed,
 NonCacheable, ReadWrite,
0x,  // Address Space Granularity
0x1,// Address Range Minimum
0x3,// Address Range Maximum
0x,  // Address Translation Offset
0x4,// Address Length
,, , AddressRangeMemory, TypeStatic)
 
 The only big problem with this range - as soon as I have more than
 3GB of RAM, windows will boot in BSOD. The problem relates to memory
 range intersection.
 Unfortunately it is not possible to predict how many GB of RAM the
 virtual machine could have - so it's difficult to specify a
 particular region.
 Do you have any ideas what can be done to solve this problem?
 
 Regards,
 Alexey

Two possible ideas:
1. Pass the value in from qemu
2. Get a range toward the upper end of the memory, around 140

 On Thu, Dec 01, 2011 at 06:49:54PM +1300, Alexey Korolev wrote:
 Isaku san,
 
 I've just added you to discussion.
 There are some issues with PCI 64bit support in Windows. Windows
 fails to assign the resource if it doesn't fit in first 4GB window.
 
 I really don't know why it happens.
 One of the possibilities is related to lack of _DSM method in ACPI.
 
 Another guesse could be related to the fact that 440FX only supports
 32bit PCI bus interface and windows may limit PCI address range to
 first 4GB for PCI devices under this bridge.
 I remember you were working on Q35 chipset simulation, I wonder if
 it is working and would it be possible to try?
 
 Thanks,
 Alexey
 Maybe the range above 4G needs to be declared in the _CRS
 resource?