Re: [Qemu-devel] [RFC v2 0/6] qtest unit test framework
On 12/01/2011 08:43 PM, Anthony Liguori wrote: This series is still pretty rough but I wanted to get an idea of what people thought about it before polishing it. The general idea is outlined in the first test. The main advantage of this type of test framework compared to something like kvm-unit-test is that you don't need a build environment for what you're trying to test. Luckily w/ qemu cpu emulation and few images it can be set once and be there for ever. The advantage of kvm-unit-test is that the code actually does run. So we can test irq injections, io/mmio in the kernel too, dirty bit tracking and some more all together. Since your tests also link against the host environment, it potentially makes tests much simplier to write (as you aren't reinventing an OS). I think this makes this style of test more appropriate for something like QEMU. Anthony Liguori (6): qtest: add test framework qtest: add support for target-i386 -M pc Add core python test framework Add uart test case Add RTC test case Add C version of rtc-test Makefile|4 + Makefile.objs |2 + hw/pc.c |7 +- hw/pc_piix.c|9 +- qemu-options.hx |8 ++ qtest.c | 357 +++ qtest.h | 37 ++ qtest.py| 69 +++ rtc-test.c | 201 +++ rtc-test.py | 105 serial-test.py | 24 vl.c|8 ++ 12 files changed, 827 insertions(+), 4 deletions(-) create mode 100644 qtest.c create mode 100644 qtest.h create mode 100644 qtest.py create mode 100644 rtc-test.c create mode 100644 rtc-test.py create mode 100644 serial-test.py
Re: [Qemu-devel] [PATCH for v1.0 1/3] msix: track function masked in pci device state
On Fri, Dec 02, 2011 at 04:34:21PM -0700, Cam Macdonell wrote: Based on a git bisect, this patch breaks msi-x interrupt delivery in the ivshmem device. On Mon, Nov 21, 2011 at 9:57 AM, Michael S. Tsirkin m...@redhat.com wrote: Only go over the table when function is masked. This is not really important for qemu.git but helps fix a bug in qemu-kvm.git. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/msix.c | 21 ++--- hw/pci.h | 2 ++ 2 files changed, 16 insertions(+), 7 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index b15bafc..63b41b9 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -79,6 +79,7 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries, /* Make flags bit writable. */ pdev-wmask[config_offset + MSIX_CONTROL_OFFSET] |= MSIX_ENABLE_MASK | MSIX_MASKALL_MASK; + pdev-msix_function_masked = true; return 0; iiuc, this masks the msix by default. Yes, because msi-x is disabled by default, that's in the pci spec. } @@ -117,16 +118,11 @@ static void msix_clr_pending(PCIDevice *dev, int vector) *msix_pending_byte(dev, vector) = ~msix_pending_mask(vector); } -static int msix_function_masked(PCIDevice *dev) -{ - return dev-config[dev-msix_cap + MSIX_CONTROL_OFFSET] MSIX_MASKALL_MASK; -} - static int msix_is_masked(PCIDevice *dev, int vector) { unsigned offset = vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL; - return msix_function_masked(dev) || + return dev-msix_function_masked || dev-msix_table_page[offset] PCI_MSIX_ENTRY_CTRL_MASKBIT; } @@ -138,24 +134,34 @@ static void msix_handle_mask_update(PCIDevice *dev, int vector) } } +static void msix_update_function_masked(PCIDevice *dev) +{ + dev-msix_function_masked = !msix_enabled(dev) || + (dev-config[dev-msix_cap + MSIX_CONTROL_OFFSET] MSIX_MASKALL_MASK); +} + /* Handle MSI-X capability config write. */ void msix_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len) { unsigned enable_pos = dev-msix_cap + MSIX_CONTROL_OFFSET; int vector; + bool was_masked; if (!range_covers_byte(addr, len, enable_pos)) { return; } + was_masked = dev-msix_function_masked; + msix_update_function_masked(dev); + if (!msix_enabled(dev)) { return; } pci_device_deassert_intx(dev); - if (msix_function_masked(dev)) { + if (dev-msix_function_masked == was_masked) { return; } So I believe my bug is due to the fact the new logic included in this patch requires msix_write_config() to be called to unmask the vectors. Not exactly, to enable msi-x really. Virtio-pci calls msix_write_config(), but ivshmem does not (nor does PCIe so I'm not sure if it's also affected). At this point PCIe is a stub. I haven't been able to fix the bug yet, but I wanted to make sure I was looking in the correct place. Any help of further explanation of this patch would be greatly appreciated. Sincerely, Cam So I think you just need to call msix_write_config, otherwise msix is not getting enabled. BTW looking at the ivshmem code, this bit looks wrong: pci_conf[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY; I think the spec says IO/MEMORY must be disabled at init time since BARs are not yet set to anything reasonable. @@ -300,6 +306,7 @@ void msix_load(PCIDevice *dev, QEMUFile *f) msix_free_irq_entries(dev); qemu_get_buffer(f, dev-msix_table_page, n * PCI_MSIX_ENTRY_SIZE); qemu_get_buffer(f, dev-msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 8); + msix_update_function_masked(dev); } /* Does device support MSI-X? */ diff --git a/hw/pci.h b/hw/pci.h index 4b2e785..625e717 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -178,6 +178,8 @@ struct PCIDevice { unsigned *msix_entry_used; /* Region including the MSI-X table */ uint32_t msix_bar_size; + /* MSIX function mask set or MSIX disabled */ + bool msix_function_masked; /* Version id needed for VMState */ int32_t version_id; -- 1.7.5.53.gc233e
Re: [Qemu-devel] [PATCH for v1.0 1/3] msix: track function masked in pci device state
On Fri, Dec 02, 2011 at 04:34:21PM -0700, Cam Macdonell wrote: Based on a git bisect, this patch breaks msi-x interrupt delivery in the ivshmem device. I think the following should fix it. Compiled-only - could you pls check? If yes let's apply to the stable branch. -- ivshmem: add missing msix calls ivshmem used msix but didn't call it on either reset or config write paths. This used to partically work since guests don't use all of msi-x configuration fields, and reset is rarely used, but the patch 'msix: track function masked in pci device state' broke that. Fix by adding appropriate calls. Signed-off-by: Michael S. Tsirkin m...@redhat.com -- diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 242fbea..3680c0f 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -505,6 +505,7 @@ static void ivshmem_reset(DeviceState *d) IVShmemState *s = DO_UPCAST(IVShmemState, dev.qdev, d); s-intrstatus = 0; +msix_reset(s-dev); return; } @@ -610,6 +611,13 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id) return 0; } +static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address, +uint32_t val, int len) +{ +pci_default_write_config(pci_dev, address, val, len); +msix_write_config(pci_dev, address, val, len); +} + static int pci_ivshmem_init(PCIDevice *dev) { IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); @@ -734,6 +742,8 @@ static int pci_ivshmem_init(PCIDevice *dev) } +s-dev.config_write = ivshmem_write_config; + return 0; }
Re: [Qemu-devel] [PATCH 0/4] KVM: Dirty logging optimization using rmap
On 12/03/2011 06:37 AM, Takuya Yoshikawa wrote: Avi Kivity a...@redhat.com wrote: That's true. But some applications do require low latency, and the current code can impose a lot of time with the mmu spinlock held. The total amount of work actually increases slightly, from O(N) to O(N log N), but since the tree is so wide, the overhead is small. Controlling the latency can be achieved by making the user space limit the number of dirty pages to scan without hacking the core mmu code. The fact that we cannot transfer so many pages on the network at once suggests this is reasonable. That is true. Write protecting everything at once means that there is a large window between the sampling the dirty log, and transferring the page. Any writes within that window cause a re-transfer, even when they should not. With the rmap write protection method in KVM, the only thing we need is a new GET_DIRTY_LOG api which takes the [gfn_start, gfn_end] to scan, or max_write_protections optionally. Right. I remember that someone suggested splitting the slot at KVM forum. Same effect with less effort. QEMU can also avoid unwanted page faults by using this api wisely. E.g. you can use this for Interactivity improvements TODO on KVM wiki, I think. Furthermore, QEMU may be able to use multiple threads for the memory copy task. Each thread has its own range of memory to copy, and does GET_DIRTY_LOG independently. This will make things easy to add further optimizations in QEMU. In summary, my impression is that the main cause of the current latency problem is not the write protection of KVM but the strategy which tries to cook the large slot in one hand. What do you think? I agree. Maybe O(1) write protection has a place, but it is secondary to fine-grained dirty logging, and if we implement it, it should be after your idea, and further measurements. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC][PATCH 01/16] msi: Generalize msix_supported to msi_supported
On Sat, Dec 03, 2011 at 12:17:26PM +0100, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Rename msix_supported to msi_supported and control MSI and MSI-X activation this way. That was likely to original intention for this flag, but MSI support came after MSI-X. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Acked-by: Michael S. Tsirkin m...@redhat.com This patch should go into qemu.git, right? --- hw/msi.c |8 hw/msi.h |2 ++ hw/msix.c |9 - hw/msix.h |2 -- hw/pc.c |4 ++-- 5 files changed, 16 insertions(+), 9 deletions(-) diff --git a/hw/msi.c b/hw/msi.c index f214fcf..5d6ceb6 100644 --- a/hw/msi.c +++ b/hw/msi.c @@ -36,6 +36,9 @@ #define PCI_MSI_VECTORS_MAX 32 +/* Flag for interrupt controller to declare MSI/MSI-X support */ +bool msi_supported; + /* If we get rid of cap allocator, we won't need this. */ static inline uint8_t msi_cap_sizeof(uint16_t flags) { @@ -116,6 +119,11 @@ int msi_init(struct PCIDevice *dev, uint8_t offset, uint16_t flags; uint8_t cap_size; int config_offset; + +if (!msi_supported) { +return -ENOTSUP; +} + MSI_DEV_PRINTF(dev, init offset: 0x%PRIx8 vector: %PRId8 64bit %d mask %d\n, diff --git a/hw/msi.h b/hw/msi.h index 5766018..3040bb0 100644 --- a/hw/msi.h +++ b/hw/msi.h @@ -24,6 +24,8 @@ #include qemu-common.h #include pci.h +extern bool msi_supported; + bool msi_enabled(const PCIDevice *dev); int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask); diff --git a/hw/msix.c b/hw/msix.c index b15bafc..8850fbd 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -12,6 +12,7 @@ */ #include hw.h +#include msi.h #include msix.h #include pci.h #include range.h @@ -32,9 +33,6 @@ #define MSIX_MAX_ENTRIES 32 -/* Flag for interrupt controller to declare MSI-X support */ -int msix_supported; - /* Add MSI-X capability to the config space for the device. */ /* Given a bar and its size, add MSI-X table on top of it * and fill MSI-X capability in the config space. @@ -212,10 +210,11 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries, unsigned bar_nr, unsigned bar_size) { int ret; + /* Nothing to do if MSI is not supported by interrupt controller */ -if (!msix_supported) +if (!msi_supported) { return -ENOTSUP; - +} if (nentries MSIX_MAX_ENTRIES) return -EINVAL; diff --git a/hw/msix.h b/hw/msix.h index 7e04336..5aba22b 100644 --- a/hw/msix.h +++ b/hw/msix.h @@ -29,6 +29,4 @@ void msix_notify(PCIDevice *dev, unsigned vector); void msix_reset(PCIDevice *dev); -extern int msix_supported; - #endif diff --git a/hw/pc.c b/hw/pc.c index 9328ee5..5225d5b 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -36,7 +36,7 @@ #include elf.h #include multiboot.h #include mc146818rtc.h -#include msix.h +#include msi.h #include sysbus.h #include sysemu.h #include blockdev.h @@ -896,7 +896,7 @@ static DeviceState *apic_init(void *env, uint8_t apic_id) apic_mapped = 1; } -msix_supported = 1; +msi_supported = true; return dev; } -- 1.7.3.4
Re: [Qemu-devel] [RFC][PATCH 01/16] msi: Generalize msix_supported to msi_supported
On 2011-12-04 11:42, Michael S. Tsirkin wrote: On Sat, Dec 03, 2011 at 12:17:26PM +0100, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Rename msix_supported to msi_supported and control MSI and MSI-X activation this way. That was likely to original intention for this flag, but MSI support came after MSI-X. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Acked-by: Michael S. Tsirkin m...@redhat.com This patch should go into qemu.git, right? Right. It was just that this series depends on it. Feel free to pick it up earlier. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [RFC][PATCH 02/16] kvm: Move kvmclock into hw/kvm folder
On 12/04/2011 12:33 AM, Jan Kiszka wrote: Do we have a convention that every include in is considered system header? Should probably be documented then (and code should be converted gradually). It's documented in The C Programming Language, by KR. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC][PATCH 02/16] kvm: Move kvmclock into hw/kvm folder
On 2011-12-04 11:43, Avi Kivity wrote: On 12/04/2011 12:33 AM, Jan Kiszka wrote: Do we have a convention that every include in is considered system header? Should probably be documented then (and code should be converted gradually). It's documented in The C Programming Language, by KR. It's just a convention, nothing more. If you consider certain parts of QEMU's API as system (e.g. the parts that may once make our modular API), it makes some sense to use for. Right now this happens for some parts of the hw API. But inconsistently. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Improve QEMU performance with LLVM codegen and other techniques
On 04.12.2011, at 07:14, 陳韋任 wrote: 3. Then a trace composed of TCG blocks is sent to a LLVM translator. The translator generates the host binary for the trace into a LLVM code cache, and patch the I don't fully understand this part. Do you disassemble the x86 blob that TCG emitted? We ask TCG to disassemble the guest binary where the trace beginning with _again_ to get a set of TCG blocks, then sent them to the LLVM translator. So you have two TCG backends? One to generate real host code and one that goes into your LLVM generator? the moment (make the situation simpler), I think we still don't have to check the blocks' hflags and segment descriptors in the trace to see if they match. Yeah. You only need to be sync'ed with the invalidation then. And make sure you patch the TB atomically, so you don't have a separate thread accidentally run half your code and half the old code. Sync'ed with the invalidation means tb_flush, cpu_unlink and tb_phys_invalidate? Yup :) Alex
Re: [Qemu-devel] sub-page-sized mmio regions and address passed to read/write fns
On 12/02/2011 04:49 PM, Peter Maydell wrote: Hi; I was working on a refactoring of the ARM 11MPCore/A9MP private peripherals and encountered something odd. Rather than having a single large mmio region, I tried splitting into several regions, like this: memory_region_init(s-container, a9mp-priv-container, 0x2000); memory_region_init_io(s-scu_iomem, a9_scu_ops, s, a9mp-scu, 0x100); memory_region_init_io(s-gic_cpu_iomem, a9_gic_cpu_ops, s, a9mp-gic-cpu, 0x100); memory_region_init_io(s-ptimer_iomem, a9_ptimer_ops, s, a9mp-ptimer, 0x100); memory_region_add_subregion(s-container, 0, s-scu_iomem); memory_region_add_subregion(s-container, 0x100, s-gic_cpu_iomem); memory_region_add_subregion(s-container, 0x600, s-ptimer_iomem); memory_region_add_subregion(s-container, 0x1000, s-gic.iomem); sysbus_init_mmio_region(dev, s-container); Good practice IMO, will become more important when we introduce a Register class. However what I found is that the addresses passed to the read/write functions aren't what I would expect. For instance if the board maps the container at address 0x1e00, then a read from 0x1e000100 goes to the functions given by a9_gic_cpu_ops, as it should. However, the offset parameter that the read function is passed is not 0x0 (offset from the start of the a9mp-gic-cpu region) but 0x100 (offset from the start of the page, I think). Is this expected behaviour? I certainly wasn't expecting it... A while ago this was the behaviour across the board. Then 8da3ff1809747 changed addresses to be relative, but apparently missed the subpage case. I looked through the code that's getting called for reads, and it looks to me like exec.c:subpage_readlen() is causing this. We look up the subpage_t based on the address within the page, but we don't then adjust the address we pass to io_mem_read (except by region_offset, which I take from the comment at the top of cpu_register_physical_memory_log() to be for something else.) I think you can use subpage_t's region_offset array for this (adding into it, of course, so the original value remains). -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH for v1.0 1/3] msix: track function masked in pci device state
On 2011-12-04 11:20, Michael S. Tsirkin wrote: On Fri, Dec 02, 2011 at 04:34:21PM -0700, Cam Macdonell wrote: Based on a git bisect, this patch breaks msi-x interrupt delivery in the ivshmem device. I think the following should fix it. Compiled-only - could you pls check? If yes let's apply to the stable branch. -- ivshmem: add missing msix calls ivshmem used msix but didn't call it on either reset or config write paths. This used to partically work since guests don't use all of msi-x configuration fields, and reset is rarely used, but the patch 'msix: track function masked in pci device state' broke that. Fix by adding appropriate calls. Signed-off-by: Michael S. Tsirkin m...@redhat.com -- diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 242fbea..3680c0f 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -505,6 +505,7 @@ static void ivshmem_reset(DeviceState *d) IVShmemState *s = DO_UPCAST(IVShmemState, dev.qdev, d); s-intrstatus = 0; +msix_reset(s-dev); return; } @@ -610,6 +611,13 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id) return 0; } +static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address, + uint32_t val, int len) +{ +pci_default_write_config(pci_dev, address, val, len); +msix_write_config(pci_dev, address, val, len); +} + static int pci_ivshmem_init(PCIDevice *dev) { IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); @@ -734,6 +742,8 @@ static int pci_ivshmem_init(PCIDevice *dev) } +s-dev.config_write = ivshmem_write_config; + return 0; } But please fix this for real and merge [1][2] (with depending patches) into master. The above is just boilerplate code from device POV. Jan [1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/80240 [2] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/80244 signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH for v1.0 1/3] msix: track function masked in pci device state
On Sun, Dec 04, 2011 at 01:35:03PM +0100, Jan Kiszka wrote: On 2011-12-04 11:20, Michael S. Tsirkin wrote: On Fri, Dec 02, 2011 at 04:34:21PM -0700, Cam Macdonell wrote: Based on a git bisect, this patch breaks msi-x interrupt delivery in the ivshmem device. I think the following should fix it. Compiled-only - could you pls check? If yes let's apply to the stable branch. -- ivshmem: add missing msix calls ivshmem used msix but didn't call it on either reset or config write paths. This used to partically work since guests don't use all of msi-x configuration fields, and reset is rarely used, but the patch 'msix: track function masked in pci device state' broke that. Fix by adding appropriate calls. Signed-off-by: Michael S. Tsirkin m...@redhat.com -- diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 242fbea..3680c0f 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -505,6 +505,7 @@ static void ivshmem_reset(DeviceState *d) IVShmemState *s = DO_UPCAST(IVShmemState, dev.qdev, d); s-intrstatus = 0; +msix_reset(s-dev); return; } @@ -610,6 +611,13 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id) return 0; } +static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address, +uint32_t val, int len) +{ +pci_default_write_config(pci_dev, address, val, len); +msix_write_config(pci_dev, address, val, len); +} + static int pci_ivshmem_init(PCIDevice *dev) { IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); @@ -734,6 +742,8 @@ static int pci_ivshmem_init(PCIDevice *dev) } +s-dev.config_write = ivshmem_write_config; + return 0; } But please fix this for real and merge [1][2] (with depending patches) into master. The above is just boilerplate code from device POV. Jan [1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/80240 [2] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/80244 Yes, I agree we should make it easier for devices. What annoyed me was the need to put msix in save/load. And that is because of the need to do this in a specific order. I hope to switch to an unordered format and then this will become straight-forward. -- MST
Re: [Qemu-devel] [RFC][PATCH 01/16] msi: Generalize msix_supported to msi_supported
On 12/03/2011 01:17 PM, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Rename msix_supported to msi_supported and control MSI and MSI-X activation this way. That was likely to original intention for this flag, but MSI support came after MSI-X. 'and' is a dangerous word in a changelog entry. + +if (!msi_supported) { +return -ENOTSUP; +} + This changes behaviour. qemu 1.0 -M pc-1.0 and qemu-1.1 -M pc-1.0 will be different after this, no? -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC][PATCH 01/16] msi: Generalize msix_supported to msi_supported
On 2011-12-04 14:12, Avi Kivity wrote: On 12/03/2011 01:17 PM, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Rename msix_supported to msi_supported and control MSI and MSI-X activation this way. That was likely to original intention for this flag, but MSI support came after MSI-X. 'and' is a dangerous word in a changelog entry. This patch hardly qualifies for two IMHO. + +if (!msi_supported) { +return -ENOTSUP; +} + This changes behaviour. qemu 1.0 -M pc-1.0 and qemu-1.1 -M pc-1.0 will be different after this, no? Only isapc had msix_supported = 0, and I doubt we got there (msi_init) for that machine. Or am I missing something? Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [RFC][PATCH 10/16] memory: Introduce memory_region_init_reservation
On 12/03/2011 01:17 PM, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Introduce a memory region type that can reserve I/O space. Such regions are useful for modeling I/O that is only handled outside of QEMU, i.e. in the context of an accelerator like KVM. Any access to such a region from QEMU is a bug and will be reported as such. This is guest triggerable (DMA into the region), so abort() is too drastic. +void memory_region_init_reservation(MemoryRegion *mr, +const char *name, +uint64_t size) +{ +memory_region_init(mr, name, size); +mr-ops = reservation_ops; +mr-opaque = mr; +mr-terminates = true; +mr-backend_registered = false; +} Just calling memory_region_init_io() is simpler, no? -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH 2/6] msi: Guard msi_reset with msi_present
From: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/msi.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/hw/msi.c b/hw/msi.c index 541e4e1..612b168 100644 --- a/hw/msi.c +++ b/hw/msi.c @@ -183,6 +183,10 @@ void msi_reset(PCIDevice *dev) uint16_t flags; bool msi64bit; +if (!msi_present(dev)) { +return; +} + flags = pci_get_word(dev-config + msi_flags_off(dev)); flags = ~(PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE); msi64bit = flags PCI_MSI_FLAGS_64BIT; -- 1.7.3.4
[Qemu-devel] [PATCH 0/6] msi: Small refactoring
Collection of patches to improve MSI[X] usability in device models, clean up some minor bits, and help kvm irqchip introduction. CC: Alexander Graf ag...@suse.de CC: Gerd Hoffmann kra...@redhat.com CC: Isaku Yamahata yamah...@valinux.co.jp Jan Kiszka (6): msi: Guard msi/msix_write_config with msi_present msi: Guard msi_reset with msi_present msi: Use msi/msix_present more consistently msi: Invoke msi/msix_reset from PCI core msi: Invoke msi/msix_write_config from PCI core msi: Generalize msix_supported to msi_supported hw/ide/ich.c|8 hw/intel-hda.c | 12 hw/ioh3420.c|3 +-- hw/msi.c| 19 --- hw/msi.h|2 ++ hw/msix.c | 24 +--- hw/msix.h |2 -- hw/pc.c |4 ++-- hw/pci.c|8 hw/pci_bridge.c |4 hw/virtio-pci.c |3 --- hw/xio3130_downstream.c |3 +-- hw/xio3130_upstream.c |2 -- 13 files changed, 47 insertions(+), 47 deletions(-) -- 1.7.3.4
[Qemu-devel] [PATCH 6/6] msi: Generalize msix_supported to msi_supported
From: Jan Kiszka jan.kis...@siemens.com Rename msix_supported to msi_supported and control MSI and MSI-X activation this way. That was likely to original intention for this flag, but MSI support came after MSI-X. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/msi.c |8 hw/msi.h |2 ++ hw/msix.c |9 - hw/msix.h |2 -- hw/pc.c |4 ++-- 5 files changed, 16 insertions(+), 9 deletions(-) diff --git a/hw/msi.c b/hw/msi.c index c4e8a6e..5233204 100644 --- a/hw/msi.c +++ b/hw/msi.c @@ -36,6 +36,9 @@ #define PCI_MSI_VECTORS_MAX 32 +/* Flag for interrupt controller to declare MSI/MSI-X support */ +bool msi_supported; + /* If we get rid of cap allocator, we won't need this. */ static inline uint8_t msi_cap_sizeof(uint16_t flags) { @@ -116,6 +119,11 @@ int msi_init(struct PCIDevice *dev, uint8_t offset, uint16_t flags; uint8_t cap_size; int config_offset; + +if (!msi_supported) { +return -ENOTSUP; +} + MSI_DEV_PRINTF(dev, init offset: 0x%PRIx8 vector: %PRId8 64bit %d mask %d\n, diff --git a/hw/msi.h b/hw/msi.h index 5766018..3040bb0 100644 --- a/hw/msi.h +++ b/hw/msi.h @@ -24,6 +24,8 @@ #include qemu-common.h #include pci.h +extern bool msi_supported; + bool msi_enabled(const PCIDevice *dev); int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask); diff --git a/hw/msix.c b/hw/msix.c index 876793a..4897c58 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -12,6 +12,7 @@ */ #include hw.h +#include msi.h #include msix.h #include pci.h #include range.h @@ -32,9 +33,6 @@ #define MSIX_MAX_ENTRIES 32 -/* Flag for interrupt controller to declare MSI-X support */ -int msix_supported; - /* Add MSI-X capability to the config space for the device. */ /* Given a bar and its size, add MSI-X table on top of it * and fill MSI-X capability in the config space. @@ -235,10 +233,11 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries, unsigned bar_nr, unsigned bar_size) { int ret; + /* Nothing to do if MSI is not supported by interrupt controller */ -if (!msix_supported) +if (!msi_supported) { return -ENOTSUP; - +} if (nentries MSIX_MAX_ENTRIES) return -EINVAL; diff --git a/hw/msix.h b/hw/msix.h index 7e04336..5aba22b 100644 --- a/hw/msix.h +++ b/hw/msix.h @@ -29,6 +29,4 @@ void msix_notify(PCIDevice *dev, unsigned vector); void msix_reset(PCIDevice *dev); -extern int msix_supported; - #endif diff --git a/hw/pc.c b/hw/pc.c index 33778fe..7e40031 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -36,7 +36,7 @@ #include elf.h #include multiboot.h #include mc146818rtc.h -#include msix.h +#include msi.h #include sysbus.h #include sysemu.h #include blockdev.h @@ -896,7 +896,7 @@ static DeviceState *apic_init(void *env, uint8_t apic_id) apic_mapped = 1; } -msix_supported = 1; +msi_supported = true; return dev; } -- 1.7.3.4
[Qemu-devel] [PATCH 1/6] msi: Guard msi/msix_write_config with msi_present
From: Jan Kiszka jan.kis...@siemens.com Terminate msi/msix_write_config early if support is not enabled. This allows to remove checks at the caller site if MSI is optional. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/msi.c |3 ++- hw/msix.c |2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/msi.c b/hw/msi.c index f214fcf..541e4e1 100644 --- a/hw/msi.c +++ b/hw/msi.c @@ -264,7 +264,8 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len) unsigned int vector; uint32_t pending; -if (!ranges_overlap(addr, len, dev-msi_cap, msi_cap_sizeof(flags))) { +if (!msi_present(dev) || +!ranges_overlap(addr, len, dev-msi_cap, msi_cap_sizeof(flags))) { return; } diff --git a/hw/msix.c b/hw/msix.c index 149eed2..32fd9b2 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -156,7 +156,7 @@ void msix_write_config(PCIDevice *dev, uint32_t addr, int vector; bool was_masked; -if (!range_covers_byte(addr, len, enable_pos)) { +if (!msix_present(dev) || !range_covers_byte(addr, len, enable_pos)) { return; } -- 1.7.3.4
[Qemu-devel] [PATCH 5/6] msi: Invoke msi/msix_write_config from PCI core
From: Jan Kiszka jan.kis...@siemens.com Also this functions is better invoked by the core than by each and every device. This allows to drop the config_write callbacks from ich and intel-hda. CC: Alexander Graf ag...@suse.de CC: Gerd Hoffmann kra...@redhat.com CC: Isaku Yamahata yamah...@valinux.co.jp Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/ide/ich.c|8 hw/intel-hda.c | 12 hw/ioh3420.c|1 - hw/msi.c|2 +- hw/pci.c|3 +++ hw/virtio-pci.c |2 -- hw/xio3130_downstream.c |1 - hw/xio3130_upstream.c |1 - 8 files changed, 4 insertions(+), 26 deletions(-) diff --git a/hw/ide/ich.c b/hw/ide/ich.c index 3f7510f..a470c01 100644 --- a/hw/ide/ich.c +++ b/hw/ide/ich.c @@ -139,13 +139,6 @@ static int pci_ich9_uninit(PCIDevice *dev) return 0; } -static void pci_ich9_write_config(PCIDevice *pci, uint32_t addr, - uint32_t val, int len) -{ -pci_default_write_config(pci, addr, val, len); -msi_write_config(pci, addr, val, len); -} - static PCIDeviceInfo ich_ahci_info[] = { { .qdev.name= ich9-ahci, @@ -154,7 +147,6 @@ static PCIDeviceInfo ich_ahci_info[] = { .qdev.vmsd= vmstate_ahci, .init = pci_ich9_ahci_init, .exit = pci_ich9_uninit, -.config_write = pci_ich9_write_config, .vendor_id= PCI_VENDOR_ID_INTEL, .device_id= PCI_DEVICE_ID_INTEL_82801IR, .revision = 0x02, diff --git a/hw/intel-hda.c b/hw/intel-hda.c index 10769e0..995d895 100644 --- a/hw/intel-hda.c +++ b/hw/intel-hda.c @@ -1158,17 +1158,6 @@ static int intel_hda_exit(PCIDevice *pci) return 0; } -static void intel_hda_write_config(PCIDevice *pci, uint32_t addr, - uint32_t val, int len) -{ -IntelHDAState *d = DO_UPCAST(IntelHDAState, pci, pci); - -pci_default_write_config(pci, addr, val, len); -if (d-msi) { -msi_write_config(pci, addr, val, len); -} -} - static int intel_hda_post_load(void *opaque, int version) { IntelHDAState* d = opaque; @@ -1252,7 +1241,6 @@ static PCIDeviceInfo intel_hda_info = { .qdev.reset = intel_hda_reset, .init = intel_hda_init, .exit = intel_hda_exit, -.config_write = intel_hda_write_config, .vendor_id= PCI_VENDOR_ID_INTEL, .device_id= 0x2668, .revision = 1, diff --git a/hw/ioh3420.c b/hw/ioh3420.c index fc2fb3b..886ede8 100644 --- a/hw/ioh3420.c +++ b/hw/ioh3420.c @@ -71,7 +71,6 @@ static void ioh3420_write_config(PCIDevice *d, pci_get_long(d-config + d-exp.aer_cap + PCI_ERR_ROOT_COMMAND); pci_bridge_write_config(d, address, val, len); -msi_write_config(d, address, val, len); ioh3420_aer_vector_update(d); pcie_cap_slot_write_config(d, address, val, len); pcie_aer_write_config(d, address, val, len); diff --git a/hw/msi.c b/hw/msi.c index 137dba0..c4e8a6e 100644 --- a/hw/msi.c +++ b/hw/msi.c @@ -256,7 +256,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector) stl_le_phys(address, data); } -/* call this function after updating configs by pci_default_write_config(). */ +/* Normally called by pci_default_write_config(). */ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len) { uint16_t flags = pci_get_word(dev-config + msi_flags_off(dev)); diff --git a/hw/pci.c b/hw/pci.c index 5d5829d..8c814cd 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1056,6 +1056,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l) if (range_covers_byte(addr, l, PCI_COMMAND)) pci_update_irq_disabled(d, was_irq_disabled); + +msi_write_config(d, addr, val, l); +msix_write_config(d, addr, val, l); } /***/ diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 16a5b08..d21a7ee 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -492,8 +492,6 @@ static void virtio_write_config(PCIDevice *pci_dev, uint32_t address, virtio_set_status(proxy-vdev, proxy-vdev-status ~VIRTIO_CONFIG_S_DRIVER_OK); } - -msix_write_config(pci_dev, address, val, len); } static unsigned virtio_pci_get_features(void *opaque) diff --git a/hw/xio3130_downstream.c b/hw/xio3130_downstream.c index 464eefa..8e9117d 100644 --- a/hw/xio3130_downstream.c +++ b/hw/xio3130_downstream.c @@ -41,7 +41,6 @@ static void xio3130_downstream_write_config(PCIDevice *d, uint32_t address, pci_bridge_write_config(d, address, val, len); pcie_cap_flr_write_config(d, address, val, len); pcie_cap_slot_write_config(d, address, val, len); -msi_write_config(d, address, val, len); pcie_aer_write_config(d, address, val, len); } diff --git a/hw/xio3130_upstream.c b/hw/xio3130_upstream.c index 0d8d254..707401e 100644 ---
Re: [Qemu-devel] [RFC][PATCH 11/16] kvm: Introduce core services for in-kernel irqchip support
On 12/03/2011 01:17 PM, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Add the basic infrastructure to active in-kernel irqchip support, inject interrupts into these models, and maintain IRQ routes. Routing is optional and depends on the host arch supporting KVM_CAP_IRQ_ROUTING. When it's not available on x86, we loose the HPET lose as we can't route GSI0 to IOAPIC pin 2. In-kernel irqchip support will once be controlled by the machine property 'kernel_irqchip', but this is not yet wired up. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH 3/6] msi: Use msi/msix_present more consistently
From: Jan Kiszka jan.kis...@siemens.com Replace some open-coded msi/msix_present checks and drop redundant msix_supported tests (present implies supported). Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/msi.c |2 +- hw/msix.c | 13 - 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/hw/msi.c b/hw/msi.c index 612b168..137dba0 100644 --- a/hw/msi.c +++ b/hw/msi.c @@ -167,7 +167,7 @@ void msi_uninit(struct PCIDevice *dev) uint16_t flags; uint8_t cap_size; -if (!(dev-cap_present QEMU_PCI_CAP_MSI)) { +if (!msi_present(dev)) { return; } flags = pci_get_word(dev-config + msi_flags_off(dev)); diff --git a/hw/msix.c b/hw/msix.c index 32fd9b2..876793a 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -283,8 +283,9 @@ static void msix_free_irq_entries(PCIDevice *dev) /* Clean up resources for the device. */ int msix_uninit(PCIDevice *dev, MemoryRegion *bar) { -if (!(dev-cap_present QEMU_PCI_CAP_MSIX)) +if (!msix_present(dev)) { return 0; +} pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH); dev-msix_cap = 0; msix_free_irq_entries(dev); @@ -303,7 +304,7 @@ void msix_save(PCIDevice *dev, QEMUFile *f) { unsigned n = dev-msix_entries_nr; -if (!(dev-cap_present QEMU_PCI_CAP_MSIX)) { +if (!msix_present(dev)) { return; } @@ -316,7 +317,7 @@ void msix_load(PCIDevice *dev, QEMUFile *f) { unsigned n = dev-msix_entries_nr; -if (!(dev-cap_present QEMU_PCI_CAP_MSIX)) { +if (!msix_present(dev)) { return; } @@ -368,8 +369,9 @@ void msix_notify(PCIDevice *dev, unsigned vector) void msix_reset(PCIDevice *dev) { -if (!(dev-cap_present QEMU_PCI_CAP_MSIX)) +if (!msix_present(dev)) { return; +} msix_free_irq_entries(dev); dev-config[dev-msix_cap + MSIX_CONTROL_OFFSET] = ~dev-wmask[dev-msix_cap + MSIX_CONTROL_OFFSET]; @@ -408,7 +410,8 @@ void msix_vector_unuse(PCIDevice *dev, unsigned vector) void msix_unuse_all_vectors(PCIDevice *dev) { -if (!(dev-cap_present QEMU_PCI_CAP_MSIX)) +if (!msix_present(dev)) { return; +} msix_free_irq_entries(dev); } -- 1.7.3.4
[Qemu-devel] [PATCH 4/6] msi: Invoke msi/msix_reset from PCI core
From: Jan Kiszka jan.kis...@siemens.com There is no point in pushing this burden to the devices, they may rather forget to call them (like intel-hda and ahci ATM). Instead, reset functions are now called from pci_device_reset and pci_bridge_reset. They do nothing if the MSI/MSI-X is not in use. CC: Alexander Graf ag...@suse.de CC: Gerd Hoffmann kra...@redhat.com CC: Isaku Yamahata yamah...@valinux.co.jp Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/ioh3420.c|2 +- hw/pci.c|5 + hw/pci_bridge.c |4 hw/virtio-pci.c |1 - hw/xio3130_downstream.c |2 +- hw/xio3130_upstream.c |1 - 6 files changed, 11 insertions(+), 4 deletions(-) diff --git a/hw/ioh3420.c b/hw/ioh3420.c index a6bfbb9..fc2fb3b 100644 --- a/hw/ioh3420.c +++ b/hw/ioh3420.c @@ -81,7 +81,7 @@ static void ioh3420_write_config(PCIDevice *d, static void ioh3420_reset(DeviceState *qdev) { PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev); -msi_reset(d); + ioh3420_aer_vector_update(d); pcie_cap_root_reset(d); pcie_cap_deverr_reset(d); diff --git a/hw/pci.c b/hw/pci.c index 399227f..5d5829d 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -31,6 +31,8 @@ #include loader.h #include range.h #include qmp-commands.h +#include msi.h +#include msix.h //#define DEBUG_PCI #ifdef DEBUG_PCI @@ -191,6 +193,9 @@ void pci_device_reset(PCIDevice *dev) } } pci_update_mappings(dev); + +msi_reset(dev); +msix_reset(dev); } /* diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c index 650d165..6799978 100644 --- a/hw/pci_bridge.c +++ b/hw/pci_bridge.c @@ -32,6 +32,8 @@ #include pci_bridge.h #include pci_internals.h #include range.h +#include msi.h +#include msix.h /* PCI bridge subsystem vendor ID helper functions */ #define PCI_SSVID_SIZEOF8 @@ -296,6 +298,8 @@ void pci_bridge_reset(DeviceState *qdev) { PCIDevice *dev = DO_UPCAST(PCIDevice, qdev, qdev); pci_bridge_reset_reg(dev); +msi_reset(dev); +msix_reset(dev); } /* default qdev initialization function for PCI-to-PCI bridge */ diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 64c6a94..16a5b08 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -271,7 +271,6 @@ static void virtio_pci_reset(DeviceState *d) VirtIOPCIProxy *proxy = container_of(d, VirtIOPCIProxy, pci_dev.qdev); virtio_pci_stop_ioeventfd(proxy); virtio_reset(proxy-vdev); -msix_reset(proxy-pci_dev); proxy-flags = ~VIRTIO_PCI_FLAG_BUS_MASTER_BUG; } diff --git a/hw/xio3130_downstream.c b/hw/xio3130_downstream.c index d3c387d..464eefa 100644 --- a/hw/xio3130_downstream.c +++ b/hw/xio3130_downstream.c @@ -48,7 +48,7 @@ static void xio3130_downstream_write_config(PCIDevice *d, uint32_t address, static void xio3130_downstream_reset(DeviceState *qdev) { PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev); -msi_reset(d); + pcie_cap_deverr_reset(d); pcie_cap_slot_reset(d); pcie_cap_ari_reset(d); diff --git a/hw/xio3130_upstream.c b/hw/xio3130_upstream.c index 8283695..0d8d254 100644 --- a/hw/xio3130_upstream.c +++ b/hw/xio3130_upstream.c @@ -47,7 +47,6 @@ static void xio3130_upstream_write_config(PCIDevice *d, uint32_t address, static void xio3130_upstream_reset(DeviceState *qdev) { PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev); -msi_reset(d); pci_bridge_reset(qdev); pcie_cap_deverr_reset(d); } -- 1.7.3.4
Re: [Qemu-devel] [RFC][PATCH 10/16] memory: Introduce memory_region_init_reservation
On 2011-12-04 14:20, Avi Kivity wrote: On 12/03/2011 01:17 PM, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Introduce a memory region type that can reserve I/O space. Such regions are useful for modeling I/O that is only handled outside of QEMU, i.e. in the context of an accelerator like KVM. Any access to such a region from QEMU is a bug and will be reported as such. This is guest triggerable (DMA into the region), so abort() is too drastic. Mmh, true. Will turn it into a print-once warning. +void memory_region_init_reservation(MemoryRegion *mr, +const char *name, +uint64_t size) +{ +memory_region_init(mr, name, size); +mr-ops = reservation_ops; +mr-opaque = mr; +mr-terminates = true; +mr-backend_registered = false; +} Just calling memory_region_init_io() is simpler, no? Yep. Thanks, Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [RFC][PATCH 13/16] kvm: x86: Add user space part for in-kernel APIC
On 12/03/2011 01:17 PM, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com This introduces the alternative APIC model 'kvm-apic' which makes use of KVM's in-kernel device model. MSI is not yet supported, so we disable this when the in-kernel model is in use. -dev = qdev_create(NULL, apic); +if (kvm_enabled() kvm_irqchip_in_kernel()) { +dev = qdev_create(NULL, kvm-apic); +} else { +dev = qdev_create(NULL, apic); +} Is there anything that makes those two devices incompatible? -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC][PATCH 01/16] msi: Generalize msix_supported to msi_supported
On 12/04/2011 03:16 PM, Jan Kiszka wrote: On 2011-12-04 14:12, Avi Kivity wrote: On 12/03/2011 01:17 PM, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Rename msix_supported to msi_supported and control MSI and MSI-X activation this way. That was likely to original intention for this flag, but MSI support came after MSI-X. 'and' is a dangerous word in a changelog entry. This patch hardly qualifies for two IMHO. If we don't have to change it, no. + +if (!msi_supported) { +return -ENOTSUP; +} + This changes behaviour. qemu 1.0 -M pc-1.0 and qemu-1.1 -M pc-1.0 will be different after this, no? Only isapc had msix_supported = 0, and I doubt we got there (msi_init) for that machine. Or am I missing something? Ah, I thought it was a user-settable property, but it isn't. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC][PATCH 11/16] kvm: Introduce core services for in-kernel irqchip support
On 2011-12-04 14:23, Avi Kivity wrote: On 12/03/2011 01:17 PM, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Add the basic infrastructure to active in-kernel irqchip support, inject interrupts into these models, and maintain IRQ routes. Routing is optional and depends on the host arch supporting KVM_CAP_IRQ_ROUTING. When it's not available on x86, we loose the HPET lose /me is still looking for a semantic proofreader plugin... as we can't route GSI0 to IOAPIC pin 2. In-kernel irqchip support will once be controlled by the machine property 'kernel_irqchip', but this is not yet wired up. signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [RFC][PATCH 11/16] kvm: Introduce core services for in-kernel irqchip support
On 12/04/2011 03:27 PM, Jan Kiszka wrote: On 2011-12-04 14:23, Avi Kivity wrote: On 12/03/2011 01:17 PM, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Add the basic infrastructure to active in-kernel irqchip support, inject interrupts into these models, and maintain IRQ routes. Routing is optional and depends on the host arch supporting KVM_CAP_IRQ_ROUTING. When it's not available on x86, we loose the HPET lose /me is still looking for a semantic proofreader plugin... Well, I have to comment on something. If you don't want spelling corrections, leave some trailing whitespace. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC][PATCH 11/16] kvm: Introduce core services for in-kernel irqchip support
On 2011-12-04 14:28, Avi Kivity wrote: On 12/04/2011 03:27 PM, Jan Kiszka wrote: On 2011-12-04 14:23, Avi Kivity wrote: On 12/03/2011 01:17 PM, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Add the basic infrastructure to active in-kernel irqchip support, inject interrupts into these models, and maintain IRQ routes. Routing is optional and depends on the host arch supporting KVM_CAP_IRQ_ROUTING. When it's not available on x86, we loose the HPET lose /me is still looking for a semantic proofreader plugin... Well, I have to comment on something. If you don't want spelling corrections, leave some trailing whitespace. I could create a messpatch.pl... Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 12/03/2011 01:17 PM, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Introduce the alternative 'kvm-i8259' device model that exploits KVM in-kernel acceleration. The PIIX3 initialization code is furthermore extended by KVM specific IRQ route setup. Moreover, GSI injection differs in KVM mode from the user space model. As we can dispatch ISA-range IRQs to both IOAPIC and PIC inside the kernel, we do not need to inject them separately. This is reflected by a KVM-specific GSI handler. + +qemu_irq *kvm_i8259_init(void) +{ +ISADevice *dev; + +dev = isa_create(kvm-i8259); Same issue. Is this a different device, or an different implementation of the same device? We're forcing migration from 1.0 to 1.1 to disable in-kernel irqchip on the target. For qemu itself, that's no issue. But for qemu-kvm, it will result in loss of performance, or hacks to alias the two back together. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC][PATCH 11/16] kvm: Introduce core services for in-kernel irqchip support
On 12/04/2011 03:30 PM, Jan Kiszka wrote: Well, I have to comment on something. If you don't want spelling corrections, leave some trailing whitespace. I could create a messpatch.pl... Ah, and with a --reverse flag we could go through the motions of patch review without requiring a repost. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 2011-12-04 14:31, Avi Kivity wrote: On 12/03/2011 01:17 PM, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Introduce the alternative 'kvm-i8259' device model that exploits KVM in-kernel acceleration. The PIIX3 initialization code is furthermore extended by KVM specific IRQ route setup. Moreover, GSI injection differs in KVM mode from the user space model. As we can dispatch ISA-range IRQs to both IOAPIC and PIC inside the kernel, we do not need to inject them separately. This is reflected by a KVM-specific GSI handler. + +qemu_irq *kvm_i8259_init(void) +{ +ISADevice *dev; + +dev = isa_create(kvm-i8259); Same issue. Is this a different device, or an different implementation of the same device? They are theoretically the same from guest perspective (therefore you can migrate between machines that differ in this). We're forcing migration from 1.0 to 1.1 to disable in-kernel irqchip on the target. For qemu itself, that's no issue. But for qemu-kvm, it will result in loss of performance, or hacks to alias the two back together. We should this happen with qemu-kvm? The vmstates are compatible, thus you can migration from old qemu-kvm in-kernel devices to the new kvm-* ones (once they are feature-equivalent). Not sure how much hacks this may require to qemu-kvm, but I don't think it should make the situation worse for that tree. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 12/04/2011 03:42 PM, Jan Kiszka wrote: On 2011-12-04 14:31, Avi Kivity wrote: On 12/03/2011 01:17 PM, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Introduce the alternative 'kvm-i8259' device model that exploits KVM in-kernel acceleration. The PIIX3 initialization code is furthermore extended by KVM specific IRQ route setup. Moreover, GSI injection differs in KVM mode from the user space model. As we can dispatch ISA-range IRQs to both IOAPIC and PIC inside the kernel, we do not need to inject them separately. This is reflected by a KVM-specific GSI handler. + +qemu_irq *kvm_i8259_init(void) +{ +ISADevice *dev; + +dev = isa_create(kvm-i8259); Same issue. Is this a different device, or an different implementation of the same device? They are theoretically the same from guest perspective (therefore you can migrate between machines that differ in this). But the name becomes part of the save/restore ABI, so you can't. We're forcing migration from 1.0 to 1.1 to disable in-kernel irqchip on the target. For qemu itself, that's no issue. But for qemu-kvm, it will result in loss of performance, or hacks to alias the two back together. We should this happen with qemu-kvm? The vmstates are compatible, thus you can migration from old qemu-kvm in-kernel devices to the new kvm-* ones (once they are feature-equivalent). Not sure how much hacks this may require to qemu-kvm, but I don't think it should make the situation worse for that tree. They aren't compatible due to the name clash. The hack won't be large (add an alias for the name), but just one hack is enough to keep the tree alive for a long while. Better not to add it in the first place. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 2011-12-04 14:49, Avi Kivity wrote: On 12/04/2011 03:42 PM, Jan Kiszka wrote: On 2011-12-04 14:31, Avi Kivity wrote: On 12/03/2011 01:17 PM, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Introduce the alternative 'kvm-i8259' device model that exploits KVM in-kernel acceleration. The PIIX3 initialization code is furthermore extended by KVM specific IRQ route setup. Moreover, GSI injection differs in KVM mode from the user space model. As we can dispatch ISA-range IRQs to both IOAPIC and PIC inside the kernel, we do not need to inject them separately. This is reflected by a KVM-specific GSI handler. + +qemu_irq *kvm_i8259_init(void) +{ +ISADevice *dev; + +dev = isa_create(kvm-i8259); Same issue. Is this a different device, or an different implementation of the same device? They are theoretically the same from guest perspective (therefore you can migrate between machines that differ in this). But the name becomes part of the save/restore ABI, so you can't. Nope, the vmstate names are identical. That would ruin migration otherwise. It's just the output of info qtree co. that changes. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 12/04/2011 03:51 PM, Jan Kiszka wrote: But the name becomes part of the save/restore ABI, so you can't. Nope, the vmstate names are identical. That would ruin migration otherwise. It's just the output of info qtree co. that changes. Oh, okay. I still think it's wrong, but now it's just a matter of taste, and I can live with it. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 2011-12-04 15:04, Avi Kivity wrote: On 12/04/2011 03:51 PM, Jan Kiszka wrote: But the name becomes part of the save/restore ABI, so you can't. Nope, the vmstate names are identical. That would ruin migration otherwise. It's just the output of info qtree co. that changes. Oh, okay. I still think it's wrong, but now it's just a matter of taste, and I can live with it. Wrong in what sense? I think the way of merging kvm support into the user space models in qemu-kvm is not particularly beautiful. But that's my taste, and therefore I modeled the upstream proposal differently. :) Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 4/6] msi: Invoke msi/msix_reset from PCI core
On Sun, Dec 04, 2011 at 02:22:12PM +0100, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com There is no point in pushing this burden to the devices, they may rather forget to call them (like intel-hda and ahci ATM). Instead, reset functions are now called from pci_device_reset and pci_bridge_reset. They do nothing if the MSI/MSI-X is not in use. CC: Alexander Graf ag...@suse.de CC: Gerd Hoffmann kra...@redhat.com CC: Isaku Yamahata yamah...@valinux.co.jp Signed-off-by: Jan Kiszka jan.kis...@siemens.com What makes me unhappy with this proposal is that msix_write_config, for example, becomes in fact an internal interface. So devices should be calling some functions like msix_init from msix.h, but not others like msix_write_config. It used to be simple: devices should call msix_. Now, how are devices to figure it out? E.g. the comment near msix_write_config says: /* Handle MSI-X capability config write. */ This puts it at level 11 on Rusty's misuse scale: Read the documentation and you will get it wrong. So I tried writing a wapper, something like pci_capability.h, that would hide the detail and handle all capabilities seamlessly. Where I got stuck was migration though, format is ordered so we can't just move the fields around. So I decided to wait until we switch to an unordered format, then it'll become easy. Thoughts? --- hw/ioh3420.c|2 +- hw/pci.c|5 + hw/pci_bridge.c |4 hw/virtio-pci.c |1 - hw/xio3130_downstream.c |2 +- hw/xio3130_upstream.c |1 - 6 files changed, 11 insertions(+), 4 deletions(-) diff --git a/hw/ioh3420.c b/hw/ioh3420.c index a6bfbb9..fc2fb3b 100644 --- a/hw/ioh3420.c +++ b/hw/ioh3420.c @@ -81,7 +81,7 @@ static void ioh3420_write_config(PCIDevice *d, static void ioh3420_reset(DeviceState *qdev) { PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev); -msi_reset(d); + ioh3420_aer_vector_update(d); pcie_cap_root_reset(d); pcie_cap_deverr_reset(d); diff --git a/hw/pci.c b/hw/pci.c index 399227f..5d5829d 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -31,6 +31,8 @@ #include loader.h #include range.h #include qmp-commands.h +#include msi.h +#include msix.h //#define DEBUG_PCI #ifdef DEBUG_PCI @@ -191,6 +193,9 @@ void pci_device_reset(PCIDevice *dev) } } pci_update_mappings(dev); + +msi_reset(dev); +msix_reset(dev); } /* diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c index 650d165..6799978 100644 --- a/hw/pci_bridge.c +++ b/hw/pci_bridge.c @@ -32,6 +32,8 @@ #include pci_bridge.h #include pci_internals.h #include range.h +#include msi.h +#include msix.h /* PCI bridge subsystem vendor ID helper functions */ #define PCI_SSVID_SIZEOF8 @@ -296,6 +298,8 @@ void pci_bridge_reset(DeviceState *qdev) { PCIDevice *dev = DO_UPCAST(PCIDevice, qdev, qdev); pci_bridge_reset_reg(dev); +msi_reset(dev); +msix_reset(dev); } /* default qdev initialization function for PCI-to-PCI bridge */ diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 64c6a94..16a5b08 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -271,7 +271,6 @@ static void virtio_pci_reset(DeviceState *d) VirtIOPCIProxy *proxy = container_of(d, VirtIOPCIProxy, pci_dev.qdev); virtio_pci_stop_ioeventfd(proxy); virtio_reset(proxy-vdev); -msix_reset(proxy-pci_dev); proxy-flags = ~VIRTIO_PCI_FLAG_BUS_MASTER_BUG; } diff --git a/hw/xio3130_downstream.c b/hw/xio3130_downstream.c index d3c387d..464eefa 100644 --- a/hw/xio3130_downstream.c +++ b/hw/xio3130_downstream.c @@ -48,7 +48,7 @@ static void xio3130_downstream_write_config(PCIDevice *d, uint32_t address, static void xio3130_downstream_reset(DeviceState *qdev) { PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev); -msi_reset(d); + pcie_cap_deverr_reset(d); pcie_cap_slot_reset(d); pcie_cap_ari_reset(d); diff --git a/hw/xio3130_upstream.c b/hw/xio3130_upstream.c index 8283695..0d8d254 100644 --- a/hw/xio3130_upstream.c +++ b/hw/xio3130_upstream.c @@ -47,7 +47,6 @@ static void xio3130_upstream_write_config(PCIDevice *d, uint32_t address, static void xio3130_upstream_reset(DeviceState *qdev) { PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev); -msi_reset(d); pci_bridge_reset(qdev); pcie_cap_deverr_reset(d); } -- 1.7.3.4
Re: [Qemu-devel] [PATCH 4/6] msi: Invoke msi/msix_reset from PCI core
On 2011-12-04 15:24, Michael S. Tsirkin wrote: On Sun, Dec 04, 2011 at 02:22:12PM +0100, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com There is no point in pushing this burden to the devices, they may rather forget to call them (like intel-hda and ahci ATM). Instead, reset functions are now called from pci_device_reset and pci_bridge_reset. They do nothing if the MSI/MSI-X is not in use. CC: Alexander Graf ag...@suse.de CC: Gerd Hoffmann kra...@redhat.com CC: Isaku Yamahata yamah...@valinux.co.jp Signed-off-by: Jan Kiszka jan.kis...@siemens.com What makes me unhappy with this proposal is that msix_write_config, for example, becomes in fact an internal interface. So devices should be calling some functions like msix_init from msix.h, but not others like msix_write_config. It used to be simple: devices should call msix_. Now, how are devices to figure it out? E.g. the comment near msix_write_config says: /* Handle MSI-X capability config write. */ That should be aligned to msi_write_config's comment. My goal is to reduce the number of calls devices have to do in order to use MSI. We have quite a few correct examples by now, so it should not be too hard to figure out what to do to use standard MSI[X] services. Maybe a PCI skeleton device model would help further. Or up-to-date documentation, thought that may be even harder. ;) This puts it at level 11 on Rusty's misuse scale: Read the documentation and you will get it wrong. So I tried writing a wapper, something like pci_capability.h, that would hide the detail and handle all capabilities seamlessly. Where I got stuck was migration though, format is ordered so we can't just move the fields around. So I decided to wait until we switch to an unordered format, then it'll become easy. Thoughts? MSI-X save/restore is, well, unfortunate. Just like the whole PCI layer in this regard. But I don't think that should block this particular step as it frees device models from an unneeded burden. Jan signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH v2 00/16] uq/master: Introduce basic irqchip support
This is v2, addressing the feedback comments provided so far, namely: - dropped #include conversions - do not abort() on reserved memory region accesses but only warn once - use memory_region_init_io in memory_region_init_reservation Patch 1 of this series has meanwhile been posted for direct upstream inclusion, see http://thread.gmane.org/gmane.comp.emulators.qemu/127308. I'm keeping it to easy testing, but it should likely not go via uq/master. The same may apply to other patches too, e.g. 10. --- original series description --- Some weeks back I posted my MSI rework for qemu-kvm that shall once help integrating those bits into upstream. After that I wondered how a rewritten in-kernel irqchip model could look like and make use of this. But then I realized that there is actually no technical need to role out a first version of kvm irqchips that already support MSI. As the MSI thing will likely take a few more iterations, I now decided to rush forward with basic kvm irqchip for QEMU upstream. Here we go. My idea was always to create proper alternatives to the existing user space device models while keeping the vmstates 100% compatible. I think I succeeded in this, tests worked fine so far. The kvm and the user space models now have a common core where they share logic and specific code modules where they differ. Also, I moved all kvm devices into hw/kvm. The in-kernel irqchip support can be controlled via a machine property (-machine ...,kernel_irqchip=on), in contrast to qemu-kvm's dedicated command line switch. This series keeps the support off by default because we still lack the MSI bits as I explained. Also, in-kernel PIT is not yet implemented and TPR patching/VAPIC (for Windows guests). The merge story would basically look similar to what we did before with the clean-room reimplementation of kvm for QEMU: Merge into upstream, merge back into qemu-kvm, disabling the new bits for now, then gradually switching over to the new services, specifically once they are feature-equivalent. Of course, I will support these steps as usual. So, feedback and review welcome! Jan Kiszka (16): msi: Generalize msix_supported to msi_supported kvm: Move kvmclock into hw/kvm folder apic: Stop timer on reset apic: Factor out core for KVM reuse apic: Open-code timer save/restore i8259: Factor out core for KVM reuse ioapic: Convert to memory API ioapic: Reject non-dword accesses to IOWIN register ioapic: Factor out core for KVM reuse memory: Introduce memory_region_init_reservation kvm: Introduce core services for in-kernel irqchip support kvm: x86: Establish IRQ0 override control kvm: x86: Add user space part for in-kernel APIC kvm: x86: Add user space part for in-kernel i8259 kvm: x86: Add user space part for in-kernel IOAPIC kvm: Arm in-kernel irqchip support Makefile.objs |2 +- Makefile.target|6 +- configure |1 + hw/apic.c | 288 hw/apic_common.c | 262 hw/apic_internal.h | 111 +++ hw/i8259.c | 78 +--- hw/i8259_common.c | 103 ++ hw/i8259_internal.h| 67 + hw/ioapic.c| 136 +++ hw/ioapic_common.c | 89 hw/ioapic_internal.h | 94 + hw/kvm/apic.c | 147 hw/{kvmclock.c = kvm/clock.c} |4 +- hw/{kvmclock.h = kvm/clock.h} |0 hw/kvm/i8259.c | 154 + hw/kvm/ioapic.c| 120 + hw/msi.c |8 + hw/msi.h |2 + hw/msix.c |9 +- hw/msix.h |2 - hw/pc.c| 20 ++- hw/pc.h|1 + hw/pc_piix.c | 67 +- kvm-all.c | 154 + kvm-stub.c |5 + kvm.h | 13 ++ memory.c | 36 + memory.h | 16 +++ qemu-config.c |4 + qemu-options.hx|5 +- sysemu.h |1 - target-i386/kvm.c | 19 +++ trace-events |2 +- vl.c |1 - 35 files changed, 1547 insertions(+), 480 deletions(-) create mode 100644 hw/apic_common.c create mode 100644 hw/apic_internal.h create mode 100644 hw/i8259_common.c create mode 100644 hw/i8259_internal.h create mode 100644 hw/ioapic_common.c create mode 100644 hw/ioapic_internal.h create mode 100644 hw/kvm/apic.c rename hw/{kvmclock.c = kvm/clock.c} (98%) rename hw/{kvmclock.h = kvm/clock.h} (100%) create mode 100644 hw/kvm/i8259.c create mode
[Qemu-devel] [PATCH v2 03/16] apic: Stop timer on reset
From: Jan Kiszka jan.kis...@siemens.com All LVTs are masked on reset, so the timer becomes ineffective. Letting it tick nevertheless is harmless, but will at least create a spurious trace event. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/apic.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 8289eef..2644a82 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -528,6 +528,8 @@ void apic_init_reset(DeviceState *d) s-initial_count_load_time = 0; s-next_time = 0; s-wait_for_sipi = 1; + +qemu_del_timer(s-timer); } static void apic_startup(APICState *s, int vector_num) -- 1.7.3.4
[Qemu-devel] [PATCH v2 07/16] ioapic: Convert to memory API
From: Jan Kiszka jan.kis...@siemens.com This maintains the old imprecise access size handling. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/ioapic.c | 28 +++- 1 files changed, 11 insertions(+), 17 deletions(-) diff --git a/hw/ioapic.c b/hw/ioapic.c index 61991d7..56b1612 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -86,6 +86,7 @@ typedef struct IOAPICState IOAPICState; struct IOAPICState { SysBusDevice busdev; +MemoryRegion io_memory; uint8_t id; uint8_t ioregsel; uint32_t irr; @@ -195,7 +196,8 @@ void ioapic_eoi_broadcast(int vector) } } -static uint32_t ioapic_mem_readl(void *opaque, target_phys_addr_t addr) +static uint64_t +ioapic_mem_read(void *opaque, target_phys_addr_t addr, unsigned int size) { IOAPICState *s = opaque; int index; @@ -234,7 +236,8 @@ static uint32_t ioapic_mem_readl(void *opaque, target_phys_addr_t addr) } static void -ioapic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val) +ioapic_mem_write(void *opaque, target_phys_addr_t addr, uint64_t val, + unsigned int size) { IOAPICState *s = opaque; int index; @@ -309,32 +312,23 @@ static void ioapic_reset(DeviceState *d) } } -static CPUReadMemoryFunc * const ioapic_mem_read[3] = { -ioapic_mem_readl, -ioapic_mem_readl, -ioapic_mem_readl, -}; - -static CPUWriteMemoryFunc * const ioapic_mem_write[3] = { -ioapic_mem_writel, -ioapic_mem_writel, -ioapic_mem_writel, +static const MemoryRegionOps ioapic_io_ops = { +.read = ioapic_mem_read, +.write = ioapic_mem_write, +.endianness = DEVICE_NATIVE_ENDIAN, }; static int ioapic_init1(SysBusDevice *dev) { IOAPICState *s = FROM_SYSBUS(IOAPICState, dev); -int io_memory; static int ioapic_no; if (ioapic_no = MAX_IOAPICS) { return -1; } -io_memory = cpu_register_io_memory(ioapic_mem_read, - ioapic_mem_write, s, - DEVICE_NATIVE_ENDIAN); -sysbus_init_mmio(dev, 0x1000, io_memory); +memory_region_init_io(s-io_memory, ioapic_io_ops, s, ioapic, 0x1000); +sysbus_init_mmio_region(dev, s-io_memory); qdev_init_gpio_in(dev-qdev, ioapic_set_irq, IOAPIC_NUM_PINS); -- 1.7.3.4
Re: [Qemu-devel] [PATCH 4/6] msi: Invoke msi/msix_reset from PCI core
On Sun, Dec 04, 2011 at 03:35:38PM +0100, Jan Kiszka wrote: On 2011-12-04 15:24, Michael S. Tsirkin wrote: On Sun, Dec 04, 2011 at 02:22:12PM +0100, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com There is no point in pushing this burden to the devices, they may rather forget to call them (like intel-hda and ahci ATM). Instead, reset functions are now called from pci_device_reset and pci_bridge_reset. They do nothing if the MSI/MSI-X is not in use. CC: Alexander Graf ag...@suse.de CC: Gerd Hoffmann kra...@redhat.com CC: Isaku Yamahata yamah...@valinux.co.jp Signed-off-by: Jan Kiszka jan.kis...@siemens.com What makes me unhappy with this proposal is that msix_write_config, for example, becomes in fact an internal interface. So devices should be calling some functions like msix_init from msix.h, but not others like msix_write_config. It used to be simple: devices should call msix_. Now, how are devices to figure it out? E.g. the comment near msix_write_config says: /* Handle MSI-X capability config write. */ That should be aligned to msi_write_config's comment. My goal is to reduce the number of calls devices have to do in order to use MSI. We have quite a few correct examples by now, so it should not be too hard to figure out what to do to use standard MSI[X] services. Maybe a PCI skeleton device model would help further. Or up-to-date documentation, thought that may be even harder. ;) Maybe it's time to move code into hw/pci/ ? Then we could have private interfaces without kludges like pci_internals.h ... -- MST
[Qemu-devel] [PATCH v2 13/16] kvm: x86: Add user space part for in-kernel APIC
From: Jan Kiszka jan.kis...@siemens.com This introduces the alternative APIC model 'kvm-apic' which makes use of KVM's in-kernel device model. MSI is not yet supported, so we disable this when the in-kernel model is in use. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile.target |2 +- hw/kvm/apic.c | 147 + hw/pc.c | 15 -- kvm.h |3 + target-i386/kvm.c |8 +++ 5 files changed, 169 insertions(+), 6 deletions(-) create mode 100644 hw/kvm/apic.c diff --git a/Makefile.target b/Makefile.target index 4cd3c0e..66b42d5 100644 --- a/Makefile.target +++ b/Makefile.target @@ -231,7 +231,7 @@ obj-i386-y += vmport.o obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o obj-i386-y += debugcon.o multiboot.o obj-i386-y += pc_piix.o -obj-i386-$(CONFIG_KVM) += kvm/clock.o +obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o # shared objects diff --git a/hw/kvm/apic.c b/hw/kvm/apic.c new file mode 100644 index 000..be6f401 --- /dev/null +++ b/hw/kvm/apic.c @@ -0,0 +1,147 @@ +/* + * KVM in-kernel APIC support + * + * Copyright (c) 2011 Siemens AG + * + * Authors: + * Jan Kiszka jan.kis...@siemens.com + * + * This work is licensed under the terms of the GNU GPL version 2. + * See the COPYING file in the top-level directory. + */ +#include hw/apic_internal.h +#include kvm.h + +static inline void kvm_apic_set_reg(struct kvm_lapic_state *kapic, + int reg_id, uint32_t val) +{ +*((uint32_t *)(kapic-regs + (reg_id 4))) = val; +} + +static inline uint32_t kvm_apic_get_reg(struct kvm_lapic_state *kapic, + int reg_id) +{ +return *((uint32_t *)(kapic-regs + (reg_id 4))); +} + +int kvm_put_apic(CPUState *env) +{ +APICState *s = DO_UPCAST(APICState, busdev.qdev, env-apic_state); +struct kvm_lapic_state kapic; +int i; + +if (s kvm_enabled() kvm_irqchip_in_kernel()) { +memset(kapic, 0, sizeof(kapic)); +kvm_apic_set_reg(kapic, 0x2, s-id 24); +kvm_apic_set_reg(kapic, 0x8, s-tpr); +kvm_apic_set_reg(kapic, 0xd, s-log_dest 24); +kvm_apic_set_reg(kapic, 0xe, s-dest_mode 28 | 0x0fff); +kvm_apic_set_reg(kapic, 0xf, s-spurious_vec); +for (i = 0; i 8; i++) { +kvm_apic_set_reg(kapic, 0x10 + i, s-isr[i]); +kvm_apic_set_reg(kapic, 0x18 + i, s-tmr[i]); +kvm_apic_set_reg(kapic, 0x20 + i, s-irr[i]); +} +kvm_apic_set_reg(kapic, 0x28, s-esr); +kvm_apic_set_reg(kapic, 0x30, s-icr[0]); +kvm_apic_set_reg(kapic, 0x31, s-icr[1]); +for (i = 0; i APIC_LVT_NB; i++) { +kvm_apic_set_reg(kapic, 0x32 + i, s-lvt[i]); +} +kvm_apic_set_reg(kapic, 0x38, s-initial_count); +kvm_apic_set_reg(kapic, 0x3e, s-divide_conf); + +return kvm_vcpu_ioctl(env, KVM_SET_LAPIC, kapic); +} + +return 0; +} + +int kvm_get_apic(CPUState *env) +{ +APICState *s = DO_UPCAST(APICState, busdev.qdev, env-apic_state); +struct kvm_lapic_state kapic; +int ret, i, v; + +if (s kvm_enabled() kvm_irqchip_in_kernel()) { +ret = kvm_vcpu_ioctl(env, KVM_GET_LAPIC, kapic); +if (ret 0) { +return ret; +} + +s-id = kvm_apic_get_reg(kapic, 0x2) 24; +s-tpr = kvm_apic_get_reg(kapic, 0x8); +s-arb_id = kvm_apic_get_reg(kapic, 0x9); +s-log_dest = kvm_apic_get_reg(kapic, 0xd) 24; +s-dest_mode = kvm_apic_get_reg(kapic, 0xe) 28; +s-spurious_vec = kvm_apic_get_reg(kapic, 0xf); +for (i = 0; i 8; i++) { +s-isr[i] = kvm_apic_get_reg(kapic, 0x10 + i); +s-tmr[i] = kvm_apic_get_reg(kapic, 0x18 + i); +s-irr[i] = kvm_apic_get_reg(kapic, 0x20 + i); +} +s-esr = kvm_apic_get_reg(kapic, 0x28); +s-icr[0] = kvm_apic_get_reg(kapic, 0x30); +s-icr[1] = kvm_apic_get_reg(kapic, 0x31); +for (i = 0; i APIC_LVT_NB; i++) { +s-lvt[i] = kvm_apic_get_reg(kapic, 0x32 + i); +} +s-initial_count = kvm_apic_get_reg(kapic, 0x38); +s-divide_conf = kvm_apic_get_reg(kapic, 0x3e); + +v = (s-divide_conf 3) | ((s-divide_conf 1) 4); +s-count_shift = (v + 1) 7; + +s-initial_count_load_time = qemu_get_clock_ns(vm_clock); +apic_next_timer(s, s-initial_count_load_time); +} +return 0; +} + +static void kvm_apic_set_base(APICState *s, uint64_t val) +{ +s-apicbase = val; +} + +static void kvm_apic_set_tpr(APICState *s, uint8_t val) +{ +s-tpr = (val 0x0f) 4; +} + +static int kvm_apic_init(SysBusDevice *dev) +{ +APICState *s = FROM_SYSBUS(APICState, dev); + +memory_region_init_reservation(s-io_memory, kvm-apic-msi, + MSI_SPACE_SIZE); + +if
[Qemu-devel] [PATCH v2 14/16] kvm: x86: Add user space part for in-kernel i8259
From: Jan Kiszka jan.kis...@siemens.com Introduce the alternative 'kvm-i8259' device model that exploits KVM in-kernel acceleration. The PIIX3 initialization code is furthermore extended by KVM specific IRQ route setup. Moreover, GSI injection differs in KVM mode from the user space model. As we can dispatch ISA-range IRQs to both IOAPIC and PIC inside the kernel, we do not need to inject them separately. This is reflected by a KVM-specific GSI handler. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile.target |2 +- hw/kvm/i8259.c | 154 +++ hw/pc.h |1 + hw/pc_piix.c| 50 -- 4 files changed, 202 insertions(+), 5 deletions(-) create mode 100644 hw/kvm/i8259.c diff --git a/Makefile.target b/Makefile.target index 66b42d5..850b80f 100644 --- a/Makefile.target +++ b/Makefile.target @@ -231,7 +231,7 @@ obj-i386-y += vmport.o obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o obj-i386-y += debugcon.o multiboot.o obj-i386-y += pc_piix.o -obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o +obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o # shared objects diff --git a/hw/kvm/i8259.c b/hw/kvm/i8259.c new file mode 100644 index 000..f3994cb --- /dev/null +++ b/hw/kvm/i8259.c @@ -0,0 +1,154 @@ +/* + * KVM in-kernel PIC (i8259) support + * + * Copyright (c) 2011 Siemens AG + * + * Authors: + * Jan Kiszka jan.kis...@siemens.com + * + * This work is licensed under the terms of the GNU GPL version 2. + * See the COPYING file in the top-level directory. + */ +#include hw/i8259_internal.h +#include hw/apic_internal.h +#include kvm.h + +static void kvm_pic_get(PicState *s) +{ +struct kvm_irqchip chip; +struct kvm_pic_state *kpic; +int ret; + +chip.chip_id = s-master ? KVM_IRQCHIP_PIC_MASTER : KVM_IRQCHIP_PIC_SLAVE; +ret = kvm_vm_ioctl(kvm_state, KVM_GET_IRQCHIP, chip); +if (ret 0) { +fprintf(stderr, KVM_GET_IRQCHIP failed: %s\n, strerror(ret)); +abort(); +} + +kpic = chip.chip.pic; + +s-last_irr = kpic-last_irr; +s-irr = kpic-irr; +s-imr = kpic-imr; +s-isr = kpic-isr; +s-priority_add = kpic-priority_add; +s-irq_base = kpic-irq_base; +s-read_reg_select = kpic-read_reg_select; +s-poll = kpic-poll; +s-special_mask = kpic-special_mask; +s-init_state = kpic-init_state; +s-auto_eoi = kpic-auto_eoi; +s-rotate_on_auto_eoi = kpic-rotate_on_auto_eoi; +s-special_fully_nested_mode = kpic-special_fully_nested_mode; +s-init4 = kpic-init4; +s-elcr = kpic-elcr; +s-elcr_mask = kpic-elcr_mask; +} + +static void kvm_pic_put(PicState *s) +{ +struct kvm_irqchip chip; +struct kvm_pic_state *kpic; +int ret; + +chip.chip_id = s-master ? KVM_IRQCHIP_PIC_MASTER : KVM_IRQCHIP_PIC_SLAVE; + +kpic = chip.chip.pic; + +kpic-last_irr = s-last_irr; +kpic-irr = s-irr; +kpic-imr = s-imr; +kpic-isr = s-isr; +kpic-priority_add = s-priority_add; +kpic-irq_base = s-irq_base; +kpic-read_reg_select = s-read_reg_select; +kpic-poll = s-poll; +kpic-special_mask = s-special_mask; +kpic-init_state = s-init_state; +kpic-auto_eoi = s-auto_eoi; +kpic-rotate_on_auto_eoi = s-rotate_on_auto_eoi; +kpic-special_fully_nested_mode = s-special_fully_nested_mode; +kpic-init4 = s-init4; +kpic-elcr = s-elcr; +kpic-elcr_mask = s-elcr_mask; + +ret = kvm_vm_ioctl(kvm_state, KVM_SET_IRQCHIP, chip); +if (ret 0) { +fprintf(stderr, KVM_GET_IRQCHIP failed: %s\n, strerror(ret)); +abort(); +} +} + +static void kvm_pic_reset(DeviceState *dev) +{ +PicState *s = container_of(dev, PicState, dev.qdev); + +pic_reset_internal(s); +s-elcr = 0; + +kvm_pic_put(s); +} + +static void kvm_pic_set_irq(void *opaque, int irq, int level) +{ +int delivered; + +delivered = kvm_irqchip_set_irq(kvm_state, irq, level); +apic_set_irq_delivered(delivered); +} + +static int kvm_pic_init(ISADevice *dev) +{ +PicState *s = DO_UPCAST(PicState, dev, dev); + +memory_region_init_reservation(s-base_io, kvm-pic, 2); +memory_region_init_reservation(s-elcr_io, kvm-elcr, 1); + +pic_init_common(s); + +s-pre_save = kvm_pic_get; +s-post_load = kvm_pic_put; + +return 0; +} + +qemu_irq *kvm_i8259_init(void) +{ +ISADevice *dev; + +dev = isa_create(kvm-i8259); +qdev_prop_set_uint32(dev-qdev, iobase, 0x20); +qdev_prop_set_uint32(dev-qdev, elcr_addr, 0x4d0); +qdev_prop_set_bit(dev-qdev, master, true); +qdev_init_nofail(dev-qdev); + +dev = isa_create(kvm-i8259); +qdev_prop_set_uint32(dev-qdev, iobase, 0xa0); +qdev_prop_set_uint32(dev-qdev, elcr_addr, 0x4d1); +qdev_init_nofail(dev-qdev); + +return qemu_allocate_irqs(kvm_pic_set_irq, NULL, ISA_NUM_IRQS); +} + +static ISADeviceInfo kvm_i8259_info = { +
Re: [Qemu-devel] [PATCH 4/6] msi: Invoke msi/msix_reset from PCI core
On 2011-12-04 15:48, Michael S. Tsirkin wrote: On Sun, Dec 04, 2011 at 03:35:38PM +0100, Jan Kiszka wrote: On 2011-12-04 15:24, Michael S. Tsirkin wrote: On Sun, Dec 04, 2011 at 02:22:12PM +0100, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com There is no point in pushing this burden to the devices, they may rather forget to call them (like intel-hda and ahci ATM). Instead, reset functions are now called from pci_device_reset and pci_bridge_reset. They do nothing if the MSI/MSI-X is not in use. CC: Alexander Graf ag...@suse.de CC: Gerd Hoffmann kra...@redhat.com CC: Isaku Yamahata yamah...@valinux.co.jp Signed-off-by: Jan Kiszka jan.kis...@siemens.com What makes me unhappy with this proposal is that msix_write_config, for example, becomes in fact an internal interface. So devices should be calling some functions like msix_init from msix.h, but not others like msix_write_config. It used to be simple: devices should call msix_. Now, how are devices to figure it out? E.g. the comment near msix_write_config says: /* Handle MSI-X capability config write. */ That should be aligned to msi_write_config's comment. My goal is to reduce the number of calls devices have to do in order to use MSI. We have quite a few correct examples by now, so it should not be too hard to figure out what to do to use standard MSI[X] services. Maybe a PCI skeleton device model would help further. Or up-to-date documentation, thought that may be even harder. ;) Maybe it's time to move code into hw/pci/ ? Then we could have private interfaces without kludges like pci_internals.h ... Sounds reasonable. Jan signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH v2 10/16] memory: Introduce memory_region_init_reservation
From: Jan Kiszka jan.kis...@siemens.com Introduce a memory region type that can reserve I/O space. Such regions are useful for modeling I/O that is only handled outside of QEMU, i.e. in the context of an accelerator like KVM. Any access to such a region from QEMU is a bug, but could theoretically be triggered by guest code (DMA to reserved region). So only warning about such events once, then ignore them. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- memory.c | 36 memory.h | 16 2 files changed, 52 insertions(+), 0 deletions(-) diff --git a/memory.c b/memory.c index dc5e35d..6d55cf6 100644 --- a/memory.c +++ b/memory.c @@ -1003,6 +1003,42 @@ void memory_region_init_rom_device(MemoryRegion *mr, mr-backend_registered = true; } +static uint64_t invalid_read(void *opaque, target_phys_addr_t addr, + unsigned size) +{ +MemoryRegion *mr = opaque; + +if (!mr-warning_printed) { +fprintf(stderr, Invalid read from memory region %s\n, mr-name); +mr-warning_printed = true; +} +return -1U; +} + +static void invalid_write(void *opaque, target_phys_addr_t addr, uint64_t data, + unsigned size) +{ +MemoryRegion *mr = opaque; + +if (!mr-warning_printed) { +fprintf(stderr, Invalid write to memory region %s\n, mr-name); +mr-warning_printed = true; +} +} + +static const MemoryRegionOps reservation_ops = { +.read = invalid_read, +.write = invalid_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + +void memory_region_init_reservation(MemoryRegion *mr, +const char *name, +uint64_t size) +{ +memory_region_init_io(mr, reservation_ops, mr, name, size); +} + void memory_region_destroy(MemoryRegion *mr) { assert(QTAILQ_EMPTY(mr-subregions)); diff --git a/memory.h b/memory.h index d5b47da..b479350 100644 --- a/memory.h +++ b/memory.h @@ -115,6 +115,7 @@ struct MemoryRegion { bool terminates; bool readable; bool readonly; /* For RAM regions */ +bool warning_printed; /* For reservations */ MemoryRegion *alias; target_phys_addr_t alias_offset; unsigned priority; @@ -242,6 +243,21 @@ void memory_region_init_rom_device(MemoryRegion *mr, uint64_t size); /** + * memory_region_init_reservation: Initialize a memory region that reserves + * I/O space. + * + * A reservation region primariy serves debugging purposes. It claims I/O + * space that is not supposed to be handled by QEMU itself. Any access via + * the memory API will cause an abort(). + * + * @mr: the #MemoryRegion to be initialized + * @name: used for debugging; not visible to the user or ABI + * @size: size of the region. + */ +void memory_region_init_reservation(MemoryRegion *mr, +const char *name, +uint64_t size); +/** * memory_region_destroy: Destroy a memory region and relaim all resources. * * @mr: the region to be destroyed. May not currently be a subregion -- 1.7.3.4
[Qemu-devel] [PATCH v2 12/16] kvm: x86: Establish IRQ0 override control
From: Jan Kiszka jan.kis...@siemens.com KVM is forced to disable the IRQ0 override when we run with in-kernel irqchip but without IRQ routing support of the kernel. Set the fwcfg value correspondingly. This aligns us with qemu-kvm. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/pc.c|3 ++- kvm-all.c |5 + kvm-stub.c |5 + kvm.h |2 ++ sysemu.h |1 - vl.c |1 - 6 files changed, 14 insertions(+), 3 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 5225d5b..715cc63 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -39,6 +39,7 @@ #include msi.h #include sysbus.h #include sysemu.h +#include kvm.h #include blockdev.h #include ui/qemu-spice.h #include memory.h @@ -609,7 +610,7 @@ static void *bochs_bios_init(void) fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables, acpi_tables_len); -fw_cfg_add_bytes(fw_cfg, FW_CFG_IRQ0_OVERRIDE, irq0override, 1); +fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, kvm_allows_irq0_override()); smbios_table = smbios_get_table(smbios_len); if (smbios_table) diff --git a/kvm-all.c b/kvm-all.c index a85e14f..665455c 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1260,6 +1260,11 @@ int kvm_has_gsi_routing(void) return kvm_check_extension(kvm_state, KVM_CAP_IRQ_ROUTING); } +int kvm_allows_irq0_override(void) +{ +return !kvm_enabled() || !kvm_irqchip_in_kernel() || kvm_has_gsi_routing(); +} + void kvm_setup_guest_memory(void *start, size_t size) { if (!kvm_has_sync_mmu()) { diff --git a/kvm-stub.c b/kvm-stub.c index 06064b9..6c2b06b 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -78,6 +78,11 @@ int kvm_has_many_ioeventfds(void) return 0; } +int kvm_allows_irq0_override(void) +{ +return 1; +} + void kvm_setup_guest_memory(void *start, size_t size) { } diff --git a/kvm.h b/kvm.h index 0d6c453..a3c87af 100644 --- a/kvm.h +++ b/kvm.h @@ -53,6 +53,8 @@ int kvm_has_xcrs(void); int kvm_has_many_ioeventfds(void); int kvm_has_gsi_routing(void); +int kvm_allows_irq0_override(void); + #ifdef NEED_CPU_H int kvm_init_vcpu(CPUState *env); diff --git a/sysemu.h b/sysemu.h index 22cd720..3bd896e 100644 --- a/sysemu.h +++ b/sysemu.h @@ -102,7 +102,6 @@ extern int vga_interface_type; extern int graphic_width; extern int graphic_height; extern int graphic_depth; -extern uint8_t irq0override; extern DisplayType display_type; extern const char *keyboard_layout; extern int win2k_install_hack; diff --git a/vl.c b/vl.c index fcce25f..22d02b9 100644 --- a/vl.c +++ b/vl.c @@ -218,7 +218,6 @@ int no_reboot = 0; int no_shutdown = 0; int cursor_hide = 1; int graphic_rotate = 0; -uint8_t irq0override = 1; const char *watchdog; QEMUOptionRom option_rom[MAX_OPTION_ROMS]; int nb_option_roms; -- 1.7.3.4
[Qemu-devel] [PATCH v2 02/16] kvm: Move kvmclock into hw/kvm folder
From: Jan Kiszka jan.kis...@siemens.com More KVM-specific devices will come, so let's start with moving the kvmclock into a dedicated folder. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile.target|4 ++-- configure |1 + hw/{kvmclock.c = kvm/clock.c} |4 ++-- hw/{kvmclock.h = kvm/clock.h} |0 hw/pc_piix.c |2 +- 5 files changed, 6 insertions(+), 5 deletions(-) rename hw/{kvmclock.c = kvm/clock.c} (98%) rename hw/{kvmclock.h = kvm/clock.h} (100%) diff --git a/Makefile.target b/Makefile.target index 1e90df7..3a9e95d 100644 --- a/Makefile.target +++ b/Makefile.target @@ -231,7 +231,7 @@ obj-i386-y += vmport.o obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o obj-i386-y += debugcon.o multiboot.o obj-i386-y += pc_piix.o -obj-i386-$(CONFIG_KVM) += kvmclock.o +obj-i386-$(CONFIG_KVM) += kvm/clock.o obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o # shared objects @@ -421,7 +421,7 @@ qmp-commands-old.h: $(SRC_PATH)/qmp-commands.hx clean: rm -f *.o *.a *~ $(PROGS) nwfpe/*.o fpu/*.o - rm -f *.d */*.d tcg/*.o ide/*.o 9pfs/*.o + rm -f *.d */*.d tcg/*.o ide/*.o 9pfs/*.o kvm/*.o rm -f hmp-commands.h qmp-commands-old.h gdbstub-xml.c ifdef CONFIG_TRACE_SYSTEMTAP rm -f *.stp diff --git a/configure b/configure index 4f87e0a..d768e44 100755 --- a/configure +++ b/configure @@ -3220,6 +3220,7 @@ mkdir -p $target_dir/fpu mkdir -p $target_dir/tcg mkdir -p $target_dir/ide mkdir -p $target_dir/9pfs +mkdir -p $target_dir/kvm if test $target = arm-linux-user -o $target = armeb-linux-user -o $target = arm-bsd-user -o $target = armeb-bsd-user ; then mkdir -p $target_dir/nwfpe fi diff --git a/hw/kvmclock.c b/hw/kvm/clock.c similarity index 98% rename from hw/kvmclock.c rename to hw/kvm/clock.c index 5388bc4..5983271 100644 --- a/hw/kvmclock.c +++ b/hw/kvm/clock.c @@ -13,9 +13,9 @@ #include qemu-common.h #include sysemu.h -#include sysbus.h #include kvm.h -#include kvmclock.h +#include hw/sysbus.h +#include hw/kvm/clock.h #include linux/kvm.h #include linux/kvm_para.h diff --git a/hw/kvmclock.h b/hw/kvm/clock.h similarity index 100% rename from hw/kvmclock.h rename to hw/kvm/clock.h diff --git a/hw/pc_piix.c b/hw/pc_piix.c index c89042f..22997b0 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -34,7 +34,7 @@ #include boards.h #include ide.h #include kvm.h -#include kvmclock.h +#include kvm/clock.h #include sysemu.h #include sysbus.h #include arch_init.h -- 1.7.3.4
Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 12/04/2011 04:06 PM, Jan Kiszka wrote: On 2011-12-04 15:04, Avi Kivity wrote: On 12/04/2011 03:51 PM, Jan Kiszka wrote: But the name becomes part of the save/restore ABI, so you can't. Nope, the vmstate names are identical. That would ruin migration otherwise. It's just the output of info qtree co. that changes. Oh, okay. I still think it's wrong, but now it's just a matter of taste, and I can live with it. Wrong in what sense? In the sense that kernel-apic is just an accelerated apic. From the guest point of view, there's no difference, and that should be reflected in the device model. If I'm reading an apic register, either from the guest or via a monitor debug interface, I shouldn't care whether it's accelerated or not. The guest part already holds, of course. I think the way of merging kvm support into the user space models in qemu-kvm is not particularly beautiful. But that's my taste, and therefore I modeled the upstream proposal differently. :) Oh, qemu-kvm was not meant to be an example of engineering elegance, just minimal changes. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH v2 11/16] kvm: Introduce core services for in-kernel irqchip support
From: Jan Kiszka jan.kis...@siemens.com Add the basic infrastructure to active in-kernel irqchip support, inject interrupts into these models, and maintain IRQ routes. Routing is optional and depends on the host arch supporting KVM_CAP_IRQ_ROUTING. When it's not available on x86, we looe the HPET as we can't route GSI0 to IOAPIC pin 2. In-kernel irqchip support will once be controlled by the machine property 'kernel_irqchip', but this is not yet wired up. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-all.c | 149 + kvm.h |8 +++ target-i386/kvm.c | 11 3 files changed, 168 insertions(+), 0 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index e7faf5c..a85e14f 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -76,6 +76,13 @@ struct KVMState int pit_in_kernel; int xsave, xcrs; int many_ioeventfds; +int irqchip_inject_ioctl; +#ifdef KVM_CAP_IRQ_ROUTING +struct kvm_irq_routing *irq_routes; +int nr_allocated_irq_routes; +uint32_t *used_gsi_bitmap; +unsigned int max_gsi; +#endif }; KVMState *kvm_state; @@ -692,6 +699,138 @@ static void kvm_handle_interrupt(CPUState *env, int mask) } } +int kvm_irqchip_set_irq(KVMState *s, int irq, int level) +{ +struct kvm_irq_level event; +int ret; + +assert(s-irqchip_in_kernel); + +event.level = level; +event.irq = irq; +ret = kvm_vm_ioctl(s, s-irqchip_inject_ioctl, event); +if (ret 0) { +perror(kvm_set_irqchip_line); +abort(); +} + +return (s-irqchip_inject_ioctl == KVM_IRQ_LINE) ? 1 : event.status; +} + +#ifdef KVM_CAP_IRQ_ROUTING +static void set_gsi(KVMState *s, unsigned int gsi) +{ +assert(gsi s-max_gsi); + +s-used_gsi_bitmap[gsi / 32] |= 1U (gsi % 32); +} + +static void kvm_init_irq_routing(KVMState *s) +{ +int gsi_count; + +gsi_count = kvm_check_extension(s, KVM_CAP_IRQ_ROUTING); +if (gsi_count 0) { +unsigned int gsi_bits, i; + +/* Round up so we can search ints using ffs */ +gsi_bits = (gsi_count + 31) / 32; +s-used_gsi_bitmap = g_malloc0(gsi_bits / 8); +s-max_gsi = gsi_bits; + +/* Mark any over-allocated bits as already in use */ +for (i = gsi_count; i gsi_bits; i++) { +set_gsi(s, i); +} +} + +s-irq_routes = g_malloc0(sizeof(*s-irq_routes)); +s-nr_allocated_irq_routes = 0; + +kvm_arch_init_irq_routing(s); +} + +static void kvm_add_routing_entry(KVMState *s, + struct kvm_irq_routing_entry *entry) +{ +struct kvm_irq_routing_entry *new; +int n, size; + +if (s-irq_routes-nr == s-nr_allocated_irq_routes) { +n = s-nr_allocated_irq_routes * 2; +if (n 64) { +n = 64; +} +size = sizeof(struct kvm_irq_routing); +size += n * sizeof(*new); +s-irq_routes = g_realloc(s-irq_routes, size); +s-nr_allocated_irq_routes = n; +} +n = s-irq_routes-nr++; +new = s-irq_routes-entries[n]; +memset(new, 0, sizeof(*new)); +new-gsi = entry-gsi; +new-type = entry-type; +new-flags = entry-flags; +new-u = entry-u; + +set_gsi(s, entry-gsi); +} + +void kvm_irqchip_add_route(KVMState *s, int irq, int irqchip, int pin) +{ +struct kvm_irq_routing_entry e; + +e.gsi = irq; +e.type = KVM_IRQ_ROUTING_IRQCHIP; +e.flags = 0; +e.u.irqchip.irqchip = irqchip; +e.u.irqchip.pin = pin; +kvm_add_routing_entry(s, e); +} + +int kvm_irqchip_commit_routes(KVMState *s) +{ +s-irq_routes-flags = 0; +return kvm_vm_ioctl(s, KVM_SET_GSI_ROUTING, s-irq_routes); +} + +#else /* !KVM_CAP_IRQ_ROUTING */ + +static void kvm_init_irq_routing(KVMState *s) +{ +} +#endif /* !KVM_CAP_IRQ_ROUTING */ + +static int kvm_irqchip_create(KVMState *s) +{ +QemuOptsList *list = qemu_find_opts(machine); +int ret; + +if (QTAILQ_EMPTY(list-head) || +!qemu_opt_get_bool(QTAILQ_FIRST(list-head), + kernel_irqchip, false) || +!kvm_check_extension(s, KVM_CAP_IRQCHIP)) { +return 0; +} + +ret = kvm_vm_ioctl(s, KVM_CREATE_IRQCHIP); +if (ret 0) { +fprintf(stderr, Create kernel irqchip failed\n); +return ret; +} + +s-irqchip_inject_ioctl = KVM_IRQ_LINE; +if (kvm_check_extension(s, KVM_CAP_IRQ_INJECT_STATUS)) { +s-irqchip_inject_ioctl = KVM_IRQ_LINE_STATUS; +} +s-irqchip_in_kernel = 1; + +kvm_init_irq_routing(s); + +return 0; +} + int kvm_init(void) { static const char upgrade_note[] = @@ -786,6 +925,11 @@ int kvm_init(void) goto err; } +ret = kvm_irqchip_create(s); +if (ret 0) { +goto err; +} + kvm_state = s; cpu_register_phys_memory_client(kvm_cpu_phys_memory_client); @@ -,6 +1255,11 @@ int kvm_has_many_ioeventfds(void) return kvm_state-many_ioeventfds; } +int kvm_has_gsi_routing(void)
[Qemu-devel] [PATCH v2 15/16] kvm: x86: Add user space part for in-kernel IOAPIC
From: Jan Kiszka jan.kis...@siemens.com This introduces the KVM-accelerated IOAPIC model 'kvm-ioapic' and extends the IRQ routing setup by the 0-2 redirection when needed. The kvm-ioapic model has a property that allows to define its GSI base for injecting interrupts into the kernel model. This will allow to disentangle PIC and IOAPIC pins for chipsets that support more sophisticated IRQ routes than the PIIX3. So far the base is kept at 0, i.e. PIC and IOAPIC share pins 0..15. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile.target |2 +- hw/ioapic_internal.h |1 + hw/kvm/ioapic.c | 120 ++ hw/pc_piix.c | 15 ++- 4 files changed, 136 insertions(+), 2 deletions(-) create mode 100644 hw/kvm/ioapic.c diff --git a/Makefile.target b/Makefile.target index 850b80f..2f3407b 100644 --- a/Makefile.target +++ b/Makefile.target @@ -231,7 +231,7 @@ obj-i386-y += vmport.o obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o obj-i386-y += debugcon.o multiboot.o obj-i386-y += pc_piix.o -obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o +obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o kvm/ioapic.o obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o # shared objects diff --git a/hw/ioapic_internal.h b/hw/ioapic_internal.h index bda3608..7d5f735 100644 --- a/hw/ioapic_internal.h +++ b/hw/ioapic_internal.h @@ -83,6 +83,7 @@ struct IOAPICState { void (*pre_save)(IOAPICState *s); void (*post_load)(IOAPICState *s); +uint32_t kvm_gsi_base; }; extern const VMStateDescription vmstate_ioapic; diff --git a/hw/kvm/ioapic.c b/hw/kvm/ioapic.c new file mode 100644 index 000..1040e29 --- /dev/null +++ b/hw/kvm/ioapic.c @@ -0,0 +1,120 @@ +/* + * KVM in-kernel IOPIC support + * + * Copyright (c) 2011 Siemens AG + * + * Authors: + * Jan Kiszka jan.kis...@siemens.com + * + * This work is licensed under the terms of the GNU GPL version 2. + * See the COPYING file in the top-level directory. + */ + +#include hw/pc.h +#include hw/ioapic_internal.h +#include hw/apic_internal.h +#include kvm.h + +static void kvm_ioapic_get(IOAPICState *s) +{ +struct kvm_irqchip chip; +struct kvm_ioapic_state *kioapic; +int ret, i; + +chip.chip_id = KVM_IRQCHIP_IOAPIC; +ret = kvm_vm_ioctl(kvm_state, KVM_GET_IRQCHIP, chip); +if (ret 0) { +fprintf(stderr, KVM_GET_IRQCHIP failed: %s\n, strerror(ret)); +abort(); +} + +kioapic = chip.chip.ioapic; + +s-id = kioapic-id; +s-ioregsel = kioapic-ioregsel; +s-irr = kioapic-irr; +for (i = 0; i IOAPIC_NUM_PINS; i++) { +s-ioredtbl[i] = kioapic-redirtbl[i].bits; +} +} + +static void kvm_ioapic_put(IOAPICState *s) +{ +struct kvm_irqchip chip; +struct kvm_ioapic_state *kioapic; +int ret, i; + +chip.chip_id = KVM_IRQCHIP_IOAPIC; +kioapic = chip.chip.ioapic; + +kioapic-id = s-id; +kioapic-ioregsel = s-ioregsel; +kioapic-base_address = s-busdev.mmio[0].addr; +kioapic-irr = s-irr; +for (i = 0; i IOAPIC_NUM_PINS; i++) { +kioapic-redirtbl[i].bits = s-ioredtbl[i]; +} + +ret = kvm_vm_ioctl(kvm_state, KVM_SET_IRQCHIP, chip); +if (ret 0) { +fprintf(stderr, KVM_GET_IRQCHIP failed: %s\n, strerror(ret)); +abort(); +} +} + +static void kvm_ioapic_reset(DeviceState *d) +{ +IOAPICState *s = DO_UPCAST(IOAPICState, busdev.qdev, d); + +ioapic_reset_internal(s); + +kvm_ioapic_put(s); +} + +static void kvm_ioapic_set_irq(void *opaque, int irq, int level) +{ +IOAPICState *s = opaque; +int delivered; + +delivered = kvm_irqchip_set_irq(kvm_state, s-kvm_gsi_base + irq, level); +apic_set_irq_delivered(delivered); +} + +static int kvm_ioapic_init(SysBusDevice *dev) +{ +IOAPICState *s = FROM_SYSBUS(IOAPICState, dev); + +memory_region_init_reservation(s-io_memory, kvm-ioapic, 0x1000); + +if (ioapic_init_common(s) 0) { +memory_region_destroy(s-io_memory); +return -1; +} + +s-pre_save = kvm_ioapic_get; +s-post_load = kvm_ioapic_put; + +qdev_init_gpio_in(dev-qdev, kvm_ioapic_set_irq, IOAPIC_NUM_PINS); + +return 0; +} + +static SysBusDeviceInfo kvm_ioapic_info = { +.init = kvm_ioapic_init, +.qdev.name = kvm-ioapic, +.qdev.size = sizeof(IOAPICState), +.qdev.vmsd = vmstate_ioapic, +.qdev.reset = kvm_ioapic_reset, +.qdev.no_user = 1, +.qdev.props = (Property[]) { +DEFINE_PROP_UINT32(gsi_base, IOAPICState, kvm_gsi_base, 0), +DEFINE_PROP_END_OF_LIST(), +} +}; + +static void kvm_ioapic_register_devices(void) +{ +sysbus_register_withprop(kvm_ioapic_info); +} + +device_init(kvm_ioapic_register_devices) diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 351b032..624aecd 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -68,6 +68,15 @@ static void kvm_piix3_setup_irq_routing(bool pci_enabled) for (i = 8;
[Qemu-devel] [PATCH v2 05/16] apic: Open-code timer save/restore
From: Jan Kiszka jan.kis...@siemens.com To enable migration between accelerated and non-accelerated APIC models, we will need to handle the timer saving and restoring specially and can no longer rely on the automatics of VMSTATE_TIMER. Specifically, accelerated model will not start any QEMUTimer. This patch therefore factors out the generic bits into apic_next_timer and introduces a post-load callback that can be implemented differently by both models. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/apic.c | 30 -- hw/apic_common.c | 51 +-- hw/apic_internal.h |3 +++ 3 files changed, 64 insertions(+), 20 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 27b18d6..9b83c0c 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -516,25 +516,9 @@ static uint32_t apic_get_current_count(APICState *s) static void apic_timer_update(APICState *s, int64_t current_time) { -int64_t next_time, d; - -if (!(s-lvt[APIC_LVT_TIMER] APIC_LVT_MASKED)) { -d = (current_time - s-initial_count_load_time) -s-count_shift; -if (s-lvt[APIC_LVT_TIMER] APIC_LVT_TIMER_PERIODIC) { -if (!s-initial_count) -goto no_timer; -d = ((d / ((uint64_t)s-initial_count + 1)) + 1) * ((uint64_t)s-initial_count + 1); -} else { -if (d = s-initial_count) -goto no_timer; -d = (uint64_t)s-initial_count + 1; -} -next_time = s-initial_count_load_time + (d s-count_shift); -qemu_mod_timer(s-timer, next_time); -s-next_time = next_time; +if (apic_next_timer(s, current_time)) { +qemu_mod_timer(s-timer, s-next_time); } else { -no_timer: qemu_del_timer(s-timer); } } @@ -756,6 +740,15 @@ static const MemoryRegionOps apic_io_ops = { .endianness = DEVICE_NATIVE_ENDIAN, }; +static void apic_post_load(APICState *s) +{ +if (s-timer_expiry != -1) { +qemu_mod_timer(s-timer, s-timer_expiry); +} else { +qemu_del_timer(s-timer); +} +} + static int apic_init(SysBusDevice *dev) { APICState *s = FROM_SYSBUS(APICState, dev); @@ -772,6 +765,7 @@ static int apic_init(SysBusDevice *dev) s-timer = qemu_new_timer_ns(vm_clock, apic_timer, s); s-set_base = apic_set_base; s-set_tpr = apic_set_tpr; +s-post_load = apic_post_load; local_apics[s-idx] = s; return 0; } diff --git a/hw/apic_common.c b/hw/apic_common.c index 7d30356..84a3a27 100644 --- a/hw/apic_common.c +++ b/hw/apic_common.c @@ -80,6 +80,39 @@ int apic_get_irq_delivered(void) return apic_irq_delivered; } +bool apic_next_timer(APICState *s, int64_t current_time) +{ +int64_t d; + +/* We need to store the timer state separately to support APIC + * implementations that maintain a non-QEMU timer, e.g. inside the + * host kernel. This open-coded state allows us to migrate between + * both models. */ +s-timer_expiry = -1; + +if (s-lvt[APIC_LVT_TIMER] APIC_LVT_MASKED) { +return false; +} + +d = (current_time - s-initial_count_load_time) s-count_shift; + +if (s-lvt[APIC_LVT_TIMER] APIC_LVT_TIMER_PERIODIC) { +if (!s-initial_count) { +return false; +} +d = ((d / ((uint64_t)s-initial_count + 1)) + 1) * +((uint64_t)s-initial_count + 1); +} else { +if (d = s-initial_count) { +return false; +} +d = (uint64_t)s-initial_count + 1; +} +s-next_time = s-initial_count_load_time + (d s-count_shift); +s-timer_expiry = s-next_time; +return true; +} + void apic_init_reset(DeviceState *d) { APICState *s = DO_UPCAST(APICState, busdev.qdev, d); @@ -107,7 +140,10 @@ void apic_init_reset(DeviceState *d) s-next_time = 0; s-wait_for_sipi = 1; -qemu_del_timer(s-timer); +if (s-timer) { +qemu_del_timer(s-timer); +} +s-timer_expiry = -1; } void apic_reset(DeviceState *d) @@ -172,12 +208,23 @@ static int apic_load_old(QEMUFile *f, void *opaque, int version_id) return 0; } +static int apic_dispatch_post_load(void *opaque, int version_id) +{ +APICState *s = opaque; + +if (s-post_load) { +s-post_load(s); +} +return 0; +} + const VMStateDescription vmstate_apic = { .name = apic, .version_id = 3, .minimum_version_id = 3, .minimum_version_id_old = 1, .load_state_old = apic_load_old, +.post_load = apic_dispatch_post_load, .fields = (VMStateField[]) { VMSTATE_UINT32(apicbase, APICState), VMSTATE_UINT8(id, APICState), @@ -197,7 +244,7 @@ const VMStateDescription vmstate_apic = { VMSTATE_UINT32(initial_count, APICState), VMSTATE_INT64(initial_count_load_time, APICState), VMSTATE_INT64(next_time, APICState), -VMSTATE_TIMER(timer, APICState), +VMSTATE_INT64(timer_expiry,
Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 2011-12-04 16:12, Avi Kivity wrote: On 12/04/2011 04:06 PM, Jan Kiszka wrote: On 2011-12-04 15:04, Avi Kivity wrote: On 12/04/2011 03:51 PM, Jan Kiszka wrote: But the name becomes part of the save/restore ABI, so you can't. Nope, the vmstate names are identical. That would ruin migration otherwise. It's just the output of info qtree co. that changes. Oh, okay. I still think it's wrong, but now it's just a matter of taste, and I can live with it. Wrong in what sense? In the sense that kernel-apic is just an accelerated apic. From the guest point of view, there's no difference, and that should be reflected in the device model. That was my goal as well: The guest should not notice the difference, but the admin on the host side should still be able to tell both internally fairly different models apart. Plus the code should be clearly split where there are differences and explicitly shared where there aren't. If I'm reading an apic register, either from the guest or via a monitor debug interface, I shouldn't care whether it's accelerated or not. The guest part already holds, of course. Specifically for the debug scenario, I'd prefer the clear differentiation by name as there can always remain subtle differences in the implementation of kernel vs. user space. Someone debugging the guest and/or qemu/kvm should remain aware of this. Jan signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH v2 06/16] i8259: Factor out core for KVM reuse
From: Jan Kiszka jan.kis...@siemens.com Analogously to the APIC, we will reuse some parts of the user space i8259 model for KVM. In this case it is the PicState, vmstate description, a reset core and some init bits. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile.objs |2 +- hw/i8259.c | 78 +- hw/i8259_common.c | 103 +++ hw/i8259_internal.h | 67 + 4 files changed, 174 insertions(+), 76 deletions(-) create mode 100644 hw/i8259_common.c create mode 100644 hw/i8259_internal.h diff --git a/Makefile.objs b/Makefile.objs index 01587c8..5372eec 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -220,7 +220,7 @@ hw-obj-$(CONFIG_APPLESMC) += applesmc.o hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o ccid-card-passthru.o hw-obj-$(CONFIG_SMARTCARD_NSS) += ccid-card-emulated.o hw-obj-$(CONFIG_USB_REDIR) += usb-redir.o -hw-obj-$(CONFIG_I8259) += i8259.o +hw-obj-$(CONFIG_I8259) += i8259_common.o i8259.o # PPC devices hw-obj-$(CONFIG_PREP_PCI) += prep_pci.o diff --git a/hw/i8259.c b/hw/i8259.c index ab519de..e8a6a9a 100644 --- a/hw/i8259.c +++ b/hw/i8259.c @@ -26,6 +26,7 @@ #include isa.h #include monitor.h #include qemu-timer.h +#include i8259_internal.h /* debug PIC */ //#define DEBUG_PIC @@ -40,33 +41,6 @@ //#define DEBUG_IRQ_LATENCY //#define DEBUG_IRQ_COUNT -struct PicState { -ISADevice dev; -uint8_t last_irr; /* edge detection */ -uint8_t irr; /* interrupt request register */ -uint8_t imr; /* interrupt mask register */ -uint8_t isr; /* interrupt service register */ -uint8_t priority_add; /* highest irq priority */ -uint8_t irq_base; -uint8_t read_reg_select; -uint8_t poll; -uint8_t special_mask; -uint8_t init_state; -uint8_t auto_eoi; -uint8_t rotate_on_auto_eoi; -uint8_t special_fully_nested_mode; -uint8_t init4; /* true if 4 byte init */ -uint8_t single_mode; /* true if slave pic is not initialized */ -uint8_t elcr; /* PIIX edge/trigger selection*/ -uint8_t elcr_mask; -qemu_irq int_out[1]; -uint32_t master; /* reflects /SP input pin */ -uint32_t iobase; -uint32_t elcr_addr; -MemoryRegion base_io; -MemoryRegion elcr_io; -}; - #if defined(DEBUG_PIC) || defined(DEBUG_IRQ_COUNT) static int irq_level[16]; #endif @@ -248,22 +222,7 @@ int pic_read_irq(PicState *s) static void pic_init_reset(PicState *s) { -s-last_irr = 0; -s-irr = 0; -s-imr = 0; -s-isr = 0; -s-priority_add = 0; -s-irq_base = 0; -s-read_reg_select = 0; -s-poll = 0; -s-special_mask = 0; -s-init_state = 0; -s-auto_eoi = 0; -s-rotate_on_auto_eoi = 0; -s-special_fully_nested_mode = 0; -s-init4 = 0; -s-single_mode = 0; -/* Note: ELCR is not reset */ +pic_reset_internal(s); pic_update_irq(s); } @@ -418,32 +377,6 @@ static uint64_t elcr_ioport_read(void *opaque, target_phys_addr_t addr, return s-elcr; } -static const VMStateDescription vmstate_pic = { -.name = i8259, -.version_id = 1, -.minimum_version_id = 1, -.minimum_version_id_old = 1, -.fields = (VMStateField[]) { -VMSTATE_UINT8(last_irr, PicState), -VMSTATE_UINT8(irr, PicState), -VMSTATE_UINT8(imr, PicState), -VMSTATE_UINT8(isr, PicState), -VMSTATE_UINT8(priority_add, PicState), -VMSTATE_UINT8(irq_base, PicState), -VMSTATE_UINT8(read_reg_select, PicState), -VMSTATE_UINT8(poll, PicState), -VMSTATE_UINT8(special_mask, PicState), -VMSTATE_UINT8(init_state, PicState), -VMSTATE_UINT8(auto_eoi, PicState), -VMSTATE_UINT8(rotate_on_auto_eoi, PicState), -VMSTATE_UINT8(special_fully_nested_mode, PicState), -VMSTATE_UINT8(init4, PicState), -VMSTATE_UINT8(single_mode, PicState), -VMSTATE_UINT8(elcr, PicState), -VMSTATE_END_OF_LIST() -} -}; - static const MemoryRegionOps pic_base_ioport_ops = { .read = pic_ioport_read, .write = pic_ioport_write, @@ -469,16 +402,11 @@ static int pic_initfn(ISADevice *dev) memory_region_init_io(s-base_io, pic_base_ioport_ops, s, pic, 2); memory_region_init_io(s-elcr_io, pic_elcr_ioport_ops, s, elcr, 1); -isa_register_ioport(NULL, s-base_io, s-iobase); -if (s-elcr_addr != -1) { -isa_register_ioport(NULL, s-elcr_io, s-elcr_addr); -} +pic_init_common(s); qdev_init_gpio_out(dev-qdev, s-int_out, ARRAY_SIZE(s-int_out)); qdev_init_gpio_in(dev-qdev, pic_set_irq, 8); -qdev_set_legacy_instance_id(dev-qdev, s-iobase, 1); - return 0; } diff --git a/hw/i8259_common.c b/hw/i8259_common.c new file mode 100644 index 000..9d2fbc3 --- /dev/null +++ b/hw/i8259_common.c @@ -0,0 +1,103 @@ +/* + * QEMU 8259 - common bits of emulated and KVM kernel model + * + * Copyright (c) 2003-2004 Fabrice Bellard + * Copyright (c)
[Qemu-devel] [PATCH v2 09/16] ioapic: Factor out core for KVM reuse
From: Jan Kiszka jan.kis...@siemens.com KVM will share the IOAPICState, the vmstate, the reset logic and certain init parts with the user space model. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile.target |2 +- hw/ioapic.c | 108 - hw/ioapic_common.c | 89 + hw/ioapic_internal.h | 93 +++ 4 files changed, 192 insertions(+), 100 deletions(-) create mode 100644 hw/ioapic_common.c create mode 100644 hw/ioapic_internal.h diff --git a/Makefile.target b/Makefile.target index 7bb6b13..4cd3c0e 100644 --- a/Makefile.target +++ b/Makefile.target @@ -226,7 +226,7 @@ obj-$(CONFIG_IVSHMEM) += ivshmem.o # Hardware support obj-i386-y += vga.o obj-i386-y += mc146818rtc.o pc.o -obj-i386-y += cirrus_vga.o sga.o apic_common.o apic.o ioapic.o piix_pci.o +obj-i386-y += cirrus_vga.o sga.o apic_common.o apic.o ioapic_common.o ioapic.o piix_pci.o obj-i386-y += vmport.o obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o obj-i386-y += debugcon.o multiboot.o diff --git a/hw/ioapic.c b/hw/ioapic.c index eb75766..8876d5d 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -24,9 +24,7 @@ #include pc.h #include apic.h #include ioapic.h -#include qemu-timer.h -#include host-utils.h -#include sysbus.h +#include ioapic_internal.h //#define DEBUG_IOAPIC @@ -37,62 +35,6 @@ #define DPRINTF(fmt, ...) #endif -#define MAX_IOAPICS 1 - -#define IOAPIC_VERSION 0x11 - -#define IOAPIC_LVT_DEST_SHIFT 56 -#define IOAPIC_LVT_MASKED_SHIFT 16 -#define IOAPIC_LVT_TRIGGER_MODE_SHIFT 15 -#define IOAPIC_LVT_REMOTE_IRR_SHIFT 14 -#define IOAPIC_LVT_POLARITY_SHIFT 13 -#define IOAPIC_LVT_DELIV_STATUS_SHIFT 12 -#define IOAPIC_LVT_DEST_MODE_SHIFT 11 -#define IOAPIC_LVT_DELIV_MODE_SHIFT 8 - -#define IOAPIC_LVT_MASKED (1 IOAPIC_LVT_MASKED_SHIFT) -#define IOAPIC_LVT_REMOTE_IRR (1 IOAPIC_LVT_REMOTE_IRR_SHIFT) - -#define IOAPIC_TRIGGER_EDGE 0 -#define IOAPIC_TRIGGER_LEVEL1 - -/*io{apic,sapic} delivery mode*/ -#define IOAPIC_DM_FIXED 0x0 -#define IOAPIC_DM_LOWEST_PRIORITY 0x1 -#define IOAPIC_DM_PMI 0x2 -#define IOAPIC_DM_NMI 0x4 -#define IOAPIC_DM_INIT 0x5 -#define IOAPIC_DM_SIPI 0x6 -#define IOAPIC_DM_EXTINT0x7 -#define IOAPIC_DM_MASK 0x7 - -#define IOAPIC_VECTOR_MASK 0xff - -#define IOAPIC_IOREGSEL 0x00 -#define IOAPIC_IOWIN0x10 - -#define IOAPIC_REG_ID 0x00 -#define IOAPIC_REG_VER 0x01 -#define IOAPIC_REG_ARB 0x02 -#define IOAPIC_REG_REDTBL_BASE 0x10 -#define IOAPIC_ID 0x00 - -#define IOAPIC_ID_SHIFT 24 -#define IOAPIC_ID_MASK 0xf - -#define IOAPIC_VER_ENTRIES_SHIFT16 - -typedef struct IOAPICState IOAPICState; - -struct IOAPICState { -SysBusDevice busdev; -MemoryRegion io_memory; -uint8_t id; -uint8_t ioregsel; -uint32_t irr; -uint64_t ioredtbl[IOAPIC_NUM_PINS]; -}; - static IOAPICState *ioapics[MAX_IOAPICS]; static void ioapic_service(IOAPICState *s) @@ -278,44 +220,11 @@ ioapic_mem_write(void *opaque, target_phys_addr_t addr, uint64_t val, } } -static int ioapic_post_load(void *opaque, int version_id) -{ -IOAPICState *s = opaque; - -if (version_id == 1) { -/* set sane value */ -s-irr = 0; -} -return 0; -} - -static const VMStateDescription vmstate_ioapic = { -.name = ioapic, -.version_id = 3, -.post_load = ioapic_post_load, -.minimum_version_id = 1, -.minimum_version_id_old = 1, -.fields = (VMStateField[]) { -VMSTATE_UINT8(id, IOAPICState), -VMSTATE_UINT8(ioregsel, IOAPICState), -VMSTATE_UNUSED_V(2, 8), /* to account for qemu-kvm's v2 format */ -VMSTATE_UINT32_V(irr, IOAPICState, 2), -VMSTATE_UINT64_ARRAY(ioredtbl, IOAPICState, IOAPIC_NUM_PINS), -VMSTATE_END_OF_LIST() -} -}; - static void ioapic_reset(DeviceState *d) { IOAPICState *s = DO_UPCAST(IOAPICState, busdev.qdev, d); -int i; -s-id = 0; -s-ioregsel = 0; -s-irr = 0; -for (i = 0; i IOAPIC_NUM_PINS; i++) { -s-ioredtbl[i] = 1 IOAPIC_LVT_MASKED_SHIFT; -} +ioapic_reset_internal(s); } static const MemoryRegionOps ioapic_io_ops = { @@ -327,18 +236,19 @@ static const MemoryRegionOps ioapic_io_ops = { static int ioapic_init1(SysBusDevice *dev) { IOAPICState *s = FROM_SYSBUS(IOAPICState, dev); -static int ioapic_no; +int ioapic_no; -if (ioapic_no = MAX_IOAPICS) { +memory_region_init_io(s-io_memory, ioapic_io_ops, s, ioapic, 0x1000); + +ioapic_no = ioapic_init_common(s); +if
[Qemu-devel] [PATCH v2 16/16] kvm: Arm in-kernel irqchip support
From: Jan Kiszka jan.kis...@siemens.com Make the basic in-kernel irqchip support selectable via -machine ...,kernel_irqchip=on. Leave it off by default until it can fully replace user space models. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- qemu-config.c |4 qemu-options.hx |5 - 2 files changed, 8 insertions(+), 1 deletions(-) diff --git a/qemu-config.c b/qemu-config.c index 90b6b3e..fc25115 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -483,6 +483,10 @@ static QemuOptsList qemu_machine_opts = { .name = accel, .type = QEMU_OPT_STRING, .help = accelerator list, +}, { +.name = kernel_irqchip, +.type = QEMU_OPT_BOOL, +.help = use KVM in-kernel irqchip, }, { /* End of list */ } }, diff --git a/qemu-options.hx b/qemu-options.hx index 5d2a776..e10186b 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -31,7 +31,8 @@ DEF(machine, HAS_ARG, QEMU_OPTION_machine, \ -machine [type=]name[,prop[=value][,...]]\n selects emulated machine (-machine ? for list)\n property accel=accel1[:accel2[:...]] selects accelerator\n -supported accelerators are kvm, xen, tcg (default: tcg)\n, +supported accelerators are kvm, xen, tcg (default: tcg)\n +kernel_irqchip=on|off controls accelerated irqchip support\n, QEMU_ARCH_ALL) STEXI @item -machine [type=]@var{name}[,prop=@var{value}[,...]] @@ -44,6 +45,8 @@ This is used to enable an accelerator. Depending on the target architecture, kvm, xen, or tcg can be available. By default, tcg is used. If there is more than one accelerator specified, the next one is used if the previous one fails to initialize. +@item kernel_irqchip=on|off +Enables in-kernel irqchip support for the chosen accelerator when available. @end table ETEXI -- 1.7.3.4
[Qemu-devel] [PATCH v2 01/16] msi: Generalize msix_supported to msi_supported
From: Jan Kiszka jan.kis...@siemens.com Rename msix_supported to msi_supported and control MSI and MSI-X activation this way. That was likely to original intention for this flag, but MSI support came after MSI-X. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/msi.c |8 hw/msi.h |2 ++ hw/msix.c |9 - hw/msix.h |2 -- hw/pc.c |4 ++-- 5 files changed, 16 insertions(+), 9 deletions(-) diff --git a/hw/msi.c b/hw/msi.c index f214fcf..5d6ceb6 100644 --- a/hw/msi.c +++ b/hw/msi.c @@ -36,6 +36,9 @@ #define PCI_MSI_VECTORS_MAX 32 +/* Flag for interrupt controller to declare MSI/MSI-X support */ +bool msi_supported; + /* If we get rid of cap allocator, we won't need this. */ static inline uint8_t msi_cap_sizeof(uint16_t flags) { @@ -116,6 +119,11 @@ int msi_init(struct PCIDevice *dev, uint8_t offset, uint16_t flags; uint8_t cap_size; int config_offset; + +if (!msi_supported) { +return -ENOTSUP; +} + MSI_DEV_PRINTF(dev, init offset: 0x%PRIx8 vector: %PRId8 64bit %d mask %d\n, diff --git a/hw/msi.h b/hw/msi.h index 5766018..3040bb0 100644 --- a/hw/msi.h +++ b/hw/msi.h @@ -24,6 +24,8 @@ #include qemu-common.h #include pci.h +extern bool msi_supported; + bool msi_enabled(const PCIDevice *dev); int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask); diff --git a/hw/msix.c b/hw/msix.c index b15bafc..8850fbd 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -12,6 +12,7 @@ */ #include hw.h +#include msi.h #include msix.h #include pci.h #include range.h @@ -32,9 +33,6 @@ #define MSIX_MAX_ENTRIES 32 -/* Flag for interrupt controller to declare MSI-X support */ -int msix_supported; - /* Add MSI-X capability to the config space for the device. */ /* Given a bar and its size, add MSI-X table on top of it * and fill MSI-X capability in the config space. @@ -212,10 +210,11 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries, unsigned bar_nr, unsigned bar_size) { int ret; + /* Nothing to do if MSI is not supported by interrupt controller */ -if (!msix_supported) +if (!msi_supported) { return -ENOTSUP; - +} if (nentries MSIX_MAX_ENTRIES) return -EINVAL; diff --git a/hw/msix.h b/hw/msix.h index 7e04336..5aba22b 100644 --- a/hw/msix.h +++ b/hw/msix.h @@ -29,6 +29,4 @@ void msix_notify(PCIDevice *dev, unsigned vector); void msix_reset(PCIDevice *dev); -extern int msix_supported; - #endif diff --git a/hw/pc.c b/hw/pc.c index 9328ee5..5225d5b 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -36,7 +36,7 @@ #include elf.h #include multiboot.h #include mc146818rtc.h -#include msix.h +#include msi.h #include sysbus.h #include sysemu.h #include blockdev.h @@ -896,7 +896,7 @@ static DeviceState *apic_init(void *env, uint8_t apic_id) apic_mapped = 1; } -msix_supported = 1; +msi_supported = true; return dev; } -- 1.7.3.4
[Qemu-devel] [PATCH v2 04/16] apic: Factor out core for KVM reuse
From: Jan Kiszka jan.kis...@siemens.com The KVM in-kernel APIC model will reuse parts of the user space model, namely the vmstate, reset handling, IRQ coalescing tracker, some init steps and the base and tpr set/get routines. For the latter, we also prepare set callbacks as KVM will override those. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile.target|2 +- hw/apic.c | 260 +++- hw/apic_common.c | 215 +++ hw/apic_internal.h | 108 ++ trace-events |2 +- 5 files changed, 339 insertions(+), 248 deletions(-) create mode 100644 hw/apic_common.c create mode 100644 hw/apic_internal.h diff --git a/Makefile.target b/Makefile.target index 3a9e95d..7bb6b13 100644 --- a/Makefile.target +++ b/Makefile.target @@ -226,7 +226,7 @@ obj-$(CONFIG_IVSHMEM) += ivshmem.o # Hardware support obj-i386-y += vga.o obj-i386-y += mc146818rtc.o pc.o -obj-i386-y += cirrus_vga.o sga.o apic.o ioapic.o piix_pci.o +obj-i386-y += cirrus_vga.o sga.o apic_common.o apic.o ioapic.o piix_pci.o obj-i386-y += vmport.o obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o obj-i386-y += debugcon.o multiboot.o diff --git a/hw/apic.c b/hw/apic.c index 2644a82..27b18d6 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -16,53 +16,13 @@ * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, see http://www.gnu.org/licenses/ */ -#include hw.h +#include apic_internal.h #include apic.h #include ioapic.h -#include qemu-timer.h #include host-utils.h -#include sysbus.h #include trace.h #include pc.h -/* APIC Local Vector Table */ -#define APIC_LVT_TIMER 0 -#define APIC_LVT_THERMAL 1 -#define APIC_LVT_PERFORM 2 -#define APIC_LVT_LINT0 3 -#define APIC_LVT_LINT1 4 -#define APIC_LVT_ERROR 5 -#define APIC_LVT_NB 6 - -/* APIC delivery modes */ -#define APIC_DM_FIXED 0 -#define APIC_DM_LOWPRI 1 -#define APIC_DM_SMI2 -#define APIC_DM_NMI4 -#define APIC_DM_INIT 5 -#define APIC_DM_SIPI 6 -#define APIC_DM_EXTINT 7 - -/* APIC destination mode */ -#define APIC_DESTMODE_FLAT 0xf -#define APIC_DESTMODE_CLUSTER 1 - -#define APIC_TRIGGER_EDGE 0 -#define APIC_TRIGGER_LEVEL 1 - -#defineAPIC_LVT_TIMER_PERIODIC (117) -#defineAPIC_LVT_MASKED (116) -#defineAPIC_LVT_LEVEL_TRIGGER (115) -#defineAPIC_LVT_REMOTE_IRR (114) -#defineAPIC_INPUT_POLARITY (113) -#defineAPIC_SEND_PENDING (112) - -#define ESR_ILLEGAL_ADDRESS (1 7) - -#define APIC_SV_DIRECTED_IO (112) -#define APIC_SV_ENABLE (18) - -#define MAX_APICS 255 #define MAX_APIC_WORDS 8 /* Intel APIC constants: from include/asm/msidef.h */ @@ -75,40 +35,7 @@ #define MSI_ADDR_DEST_ID_SHIFT 12 #defineMSI_ADDR_DEST_ID_MASK 0x000 -#define MSI_ADDR_SIZE 0x10 - -typedef struct APICState APICState; - -struct APICState { -SysBusDevice busdev; -MemoryRegion io_memory; -void *cpu_env; -uint32_t apicbase; -uint8_t id; -uint8_t arb_id; -uint8_t tpr; -uint32_t spurious_vec; -uint8_t log_dest; -uint8_t dest_mode; -uint32_t isr[8]; /* in service register */ -uint32_t tmr[8]; /* trigger mode register */ -uint32_t irr[8]; /* interrupt request register */ -uint32_t lvt[APIC_LVT_NB]; -uint32_t esr; /* error register */ -uint32_t icr[2]; - -uint32_t divide_conf; -int count_shift; -uint32_t initial_count; -int64_t initial_count_load_time, next_time; -uint32_t idx; -QEMUTimer *timer; -int sipi_vector; -int wait_for_sipi; -}; - static APICState *local_apics[MAX_APICS + 1]; -static int apic_irq_delivered; static void apic_set_irq(APICState *s, int vector_num, int trigger_mode); static void apic_update_irq(APICState *s); @@ -293,14 +220,8 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode); } -void cpu_set_apic_base(DeviceState *d, uint64_t val) +static void apic_set_base(APICState *s, uint64_t val) { -APICState *s = DO_UPCAST(APICState, busdev.qdev, d); - -trace_cpu_set_apic_base(val); - -if (!s) -return; s-apicbase = (val 0xf000) | (s-apicbase (MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE)); /* if disabled, cannot be enabled again */ @@ -311,32 +232,12 @@ void cpu_set_apic_base(DeviceState *d, uint64_t val) } } -uint64_t cpu_get_apic_base(DeviceState *d) -{ -APICState *s = DO_UPCAST(APICState, busdev.qdev, d); - -trace_cpu_get_apic_base(s ? (uint64_t)s-apicbase: 0); - -return s ? s-apicbase : 0; -} - -void cpu_set_apic_tpr(DeviceState *d, uint8_t val) +static void apic_set_tpr(APICState *s, uint8_t val) {
[Qemu-devel] [PATCH v2 08/16] ioapic: Reject non-dword accesses to IOWIN register
From: Jan Kiszka jan.kis...@siemens.com Aligns the model with the spec. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/ioapic.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/hw/ioapic.c b/hw/ioapic.c index 56b1612..eb75766 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -208,6 +208,9 @@ ioapic_mem_read(void *opaque, target_phys_addr_t addr, unsigned int size) val = s-ioregsel; break; case IOAPIC_IOWIN: +if (size != 4) { +break; +} switch (s-ioregsel) { case IOAPIC_REG_ID: val = s-id IOAPIC_ID_SHIFT; @@ -247,6 +250,9 @@ ioapic_mem_write(void *opaque, target_phys_addr_t addr, uint64_t val, s-ioregsel = val; break; case IOAPIC_IOWIN: +if (size != 4) { +break; +} DPRINTF(write: %08x = %08x\n, s-ioregsel, val); switch (s-ioregsel) { case IOAPIC_REG_ID: -- 1.7.3.4
Re: [Qemu-devel] [Bug 899143] [NEW] Raw img not recognized by Windows
Ok thanks a lot :) Vincent Autefage Le 03/12/2011 19:45, Stefan Hajnoczi a écrit : On Fri, Dec 2, 2011 at 2:45 PM, Vincent Autefage 899...@bugs.launchpad.net wrote: $ qemu-img create -f raw root.img 100GB $ mkntfs -F root.img $ qemu -name W -sdl -m 2048 -enable-kvm -localtime -k fr -hda root.img -cdrom windows7.iso -boot d -net nic,macaddr=a0:00:00:00:00:01 -net user,vlan=0 QEMU does recognize the raw image. You can check this by running 'info block' at the QEMU monitor (Ctrl-Alt-2) and you'll see ide-hd0 is the raw image file you specified. Press Ctrl-Alt-1 to get back to the VM display. The problem is that the Windows installer does not like the disk image you have prepared. A normal harddisk has a master boot record but you created a raw image without a master boot record. The Windows installer is being picky/careful and not displaying this non-standard disk you created. Skip the mkntfs step and the installer works fine. There's no need to create the file system because the installer will do it for you. Stefan -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/899143 Title: Raw img not recognized by Windows Status in QEMU: New Bug description: Hi, The installation process of Windows (XP/Vista/7) doesn’t seem to recognize a raw img generated by qemu-img. The installer does not see any hard drive... The problem exists only with a raw img but not with a vmdk for instance. Thanks To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/899143/+subscriptions
Re: [Qemu-devel] [Bug 899140] Re: Problem with Linux Kernel Traffic Control
The result without TC is about 120 Mbit/s. I check the bandwidth with lot of programs (not only with Iperf) and the result is also the same However, if I use the same raw image and the same TC configuration with the version 0.14.0 of QEMU or with some real physical hosts, the result with TC is about 19.2 Mbit/s what is the desired result... Vincent Le 03/12/2011 19:48, Stefan Hajnoczi a écrit : On Fri, Dec 2, 2011 at 2:42 PM, Vincent Autefage 899...@bugs.launchpad.net wrote: *root@A# tc qdisc add dev eth0 root tbf rate 20mbit burst 20480 latency 50ms* *root@B# **ifconfig eth0 192.168.0.2* Then if we check with /Iperf/, the real rate will be about 100kbit/s : What is the iperf result without tc? It's worth checking what rate the unlimited interface saturates at before applying tc. Perhaps this setup is just performing very poorly and it has nothing to do with tc. Stefan -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/899140 Title: Problem with Linux Kernel Traffic Control Status in QEMU: New Bug description: Hi, The last main versions of QEMU (0.14.1, 0.15 and 1.0) have an important problem when running on a Linux distribution which running itself a Traffic Control (TC) instance. Indeed, when TC is configured with a Token Bucket Filter (TBF) with a particular rate, the effective rate is very slower than the desired one. For instance, lets consider the following configuration : # tc qdisc add dev eth0 root tbf rate 20mbit burst 20k latency 50ms The effective rate will be about 100kbit/s ! (verified with iperf) I've encountered this problem on versions 0.14.1, 0.15 and 1.0 but not with the 0.14.0... In the 0.14.0, we have a rate of 19.2 mbit/s which is quiet normal. I've done the experimentation on several hosts : - Debian 32bit core i7, 4GB RAM - Debian 64bit core i7, 8GB RAM - 3 different high performance servers : Ubuntu 64 bits, 48 AMD Opteron, 128GB of RAM The problem is always the same... The problem is also seen with a Class Based Queuing (CBQ) in TC. Thanks To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/899140/+subscriptions
[Qemu-devel] [Bug 899961] [NEW] qemu/kvm locks up when run 32bit userspace with 64bit kernel
Public bug reported: Applies to both qemu and qemu-kvm 1.0, but only when kernel is 64bit and userspace is 32bit, on x86. Did not happen with previous released versions, such as 0.15. Not all guests triggers this issue - so far, only (32bit) windows 7 guest shows it, but does that quite reliable: first boot of an old guest with new qemu (or qemu-kvm), windows finds a new CPU and suggests rebooting - hit Reboot and in a few seconds it will be locked up (including the monitor), with 100% CPU usage. Killable with -9. ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/899961 Title: qemu/kvm locks up when run 32bit userspace with 64bit kernel Status in QEMU: New Bug description: Applies to both qemu and qemu-kvm 1.0, but only when kernel is 64bit and userspace is 32bit, on x86. Did not happen with previous released versions, such as 0.15. Not all guests triggers this issue - so far, only (32bit) windows 7 guest shows it, but does that quite reliable: first boot of an old guest with new qemu (or qemu-kvm), windows finds a new CPU and suggests rebooting - hit Reboot and in a few seconds it will be locked up (including the monitor), with 100% CPU usage. Killable with -9. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/899961/+subscriptions
[Qemu-devel] linux-user: interrupting syscalls
Disclaimer: I'm writing this email because I had a neat idea about how to solve a problem which Alex Graf discovered, but I don't have the time to actually implement it :-) Consider the following guest code, to be run under linux-user mode: ---begin--- #include stdio.h #include errno.h #include signal.h #include unistd.h int pipefd[2]; void usr1_handler(int s) { char x = 'x'; write(pipefd[1], x, 1); } int main(void) { struct sigaction sa; char x; ssize_t r; if (pipe(pipefd) != 0) { perror(pipe); return 1; } sa.sa_handler = usr1_handler; sa.sa_flags = SA_RESTART; sigemptyset(sa.sa_mask); if (sigaction(SIGUSR1, sa, 0) != 0) { perror(sigaction); return 1; } printf(read()ing pipe...\n); r = read(pipefd[0], x, 1); printf(read returned %d\n, r); return 0; } ---endit--- When run natively, this program will block until you send it a SIGUSR1; the signal handler will write to the pipe and cause the read to complete. Run in linux-user mode, we deadlock, because qemu does not run the guest signal handler when in the middle of emulating a system call -- it merely queues it to be run when the syscall finishes. For cases like this where the event that causes the syscall to complete is actually triggered by the guest signal handler, this doesn't work. (There is a real-world instance of this problem in the Boehm garbage collector, where a signal handler posts to a semaphore which is being waited on by the mainline code.) It's not sufficient to simply force all syscalls to be non-restartable (and then to take the signal when the syscall returns EINTR), because of the following race condition: * qemu enters do_syscall on behalf of main thread * do_syscall is about to call the underlying syscall, when... * the signal arrives (and we queue it) * do_syscall then calls the host syscall, which will block. Oops. To fix this I think we need to have linux-user's signal handler wrapper do a siglongjmp if a signal arrives while we're inside do_syscall(). This allows us to properly interrupt whether we'd got to the point of making the host syscall or not. The tricky bit here is in the details; specifically it's painful to write code can cope with being siglongjmp()ed out of at any point. You need to be careful not to call anything that might not like being aborted (no malloc, for instance). This might need some support like an equivalent of critical section macros to prevent the siglongjmp in some places, and/or cleanup routines to be called in the event of the jump occurring to release resources. Luckily we don't have to write the whole of syscall.c like that: a lot of syscalls are non-blocking, so we can continue to deal with them as we do now (queue signal, take it on exit). (Incidentally any code in the implementation of a 'non-blocking' syscall which doesn't retry if it gets an EINTR return value is broken.) Linux's signal(7) manpage has a handy overview of which syscalls have to be interruptible. We also need to properly handle restarting syscalls when we've jumped out of them to run the guest signal handler. For this I think we should use a structure basically the same as the Linux kernel uses itself: do_syscall() returns ERESTARTSYS, and the cpu-specific code then rewinds the PC to before the syscall insn if the signal we're about to deliver is one that was registered with SA_RESTART. A handful of syscalls may need a 'restart handler' (where we both wind back PC and change the syscall number to NR_restartsys so we can invoke a syscall-specific 'resume this' function.) I think to do this properly you'd also want to refactor syscall.c so that instead of being an enormous switch statement it was table-driven, so you just looked up the handler function for the syscall as well as what classification it was (non-blocking vs. having to handle being interrupted). We could roll in the strace table too, which might avoid the problem of people adding new syscall support and forgetting about strace. So have I missed something that would mean this wouldn't work? -- PMM
Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 12/04/2011 05:19 PM, Jan Kiszka wrote: In the sense that kernel-apic is just an accelerated apic. From the guest point of view, there's no difference, and that should be reflected in the device model. That was my goal as well: The guest should not notice the difference, but the admin on the host side should still be able to tell both internally fairly different models apart. This should be some attribute, not the name. Plus the code should be clearly split where there are differences and explicitly shared where there aren't. That's a good goal, yes. If I'm reading an apic register, either from the guest or via a monitor debug interface, I shouldn't care whether it's accelerated or not. The guest part already holds, of course. Specifically for the debug scenario, I'd prefer the clear differentiation by name as there can always remain subtle differences in the implementation of kernel vs. user space. Someone debugging the guest and/or qemu/kvm should remain aware of this. Aware, yes, but the name change is too drastic. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [Bug 899961] Re: qemu/kvm locks up when run 32bit userspace with 64bit kernel
Actually after trying to do lots of experiments and finally a git bisection, it turned out that the issue only affects qemu-kvm, not upstream qemu. Bisection between qemu-kvm 0.15.0 and 1.0 lead to this commit: commit 145e11e840500e04a4d0a624918bb17596be19e9 Merge: ce967f6 b195043 Author: Avi Kivity a...@redhat.com Date: Wed Aug 10 12:06:58 2011 +0300 Merge commit 'b195043003d90ea4027ea01cc7a6c974ac915108' into upstream-merge * commit 'b195043003d90ea4027ea01cc7a6c974ac915108': (130 commits) ... After which I'm stuck... ;) ** Tags added: lockup qemu-kvm -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/899961 Title: qemu/kvm locks up when run 32bit userspace with 64bit kernel Status in QEMU: New Bug description: Applies to both qemu and qemu-kvm 1.0, but only when kernel is 64bit and userspace is 32bit, on x86. Did not happen with previous released versions, such as 0.15. Not all guests triggers this issue - so far, only (32bit) windows 7 guest shows it, but does that quite reliable: first boot of an old guest with new qemu (or qemu-kvm), windows finds a new CPU and suggests rebooting - hit Reboot and in a few seconds it will be locked up (including the monitor), with 100% CPU usage. Killable with -9. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/899961/+subscriptions
[Qemu-devel] [PATCH v2 0/6] Memory API mutators
This patchset introduces memory_region_set_enabled() and memory_region_set_address() to avoid the requirement on memory routers to track the internal state of the memory API (so they know whether they need to add or remove a region). Instead, they can simply copy the state of the region from the guest-exposed register to the memory core, via the new mutator functions. v2: - fix minor bug in set_address() - add set_alias_offset() - two example users Avi Kivity (6): memory: introduce memory_region_set_enabled() memory: introduce memory_region_set_address() memory: introduce memory_region_set_alias_offset() memory: optimize empty transactions due to mutators cirrus_vga: adapt to memory mutators API piix_pci: adapt smram mapping to use memory mutators hw/cirrus_vga.c | 50 +++-- hw/piix_pci.c | 20 - memory.c| 81 +++--- memory.h| 39 ++ 4 files changed, 132 insertions(+), 58 deletions(-) -- 1.7.7.1
[Qemu-devel] [PATCH v2 4/6] memory: optimize empty transactions due to mutators
The mutating memory APIs can easily cause empty transactions, where the mutators don't actually change anything, or perhaps only modify disabled regions. Detect these conditions and avoid regenerating the memory topology. Signed-off-by: Avi Kivity a...@redhat.com --- memory.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/memory.c b/memory.c index 7e842b3..87639ab 100644 --- a/memory.c +++ b/memory.c @@ -19,6 +19,7 @@ #include assert.h unsigned memory_region_transaction_depth = 0; +static bool memory_region_update_pending = false; typedef struct AddrRange AddrRange; @@ -757,6 +758,7 @@ static void address_space_update_topology(AddressSpace *as) static void memory_region_update_topology(MemoryRegion *mr) { if (memory_region_transaction_depth) { +memory_region_update_pending |= !mr || mr-enabled; return; } @@ -770,6 +772,8 @@ static void memory_region_update_topology(MemoryRegion *mr) if (address_space_io.root) { address_space_update_topology(address_space_io); } + +memory_region_update_pending = false; } void memory_region_transaction_begin(void) @@ -781,7 +785,9 @@ void memory_region_transaction_commit(void) { assert(memory_region_transaction_depth); --memory_region_transaction_depth; -memory_region_update_topology(NULL); +if (!memory_region_transaction_depth memory_region_update_pending) { +memory_region_update_topology(NULL); +} } static void memory_region_destructor_none(MemoryRegion *mr) -- 1.7.7.1
[Qemu-devel] [PATCH v2 2/6] memory: introduce memory_region_set_address()
Allow changing the address of a memory region while it is in the memory hierarchy. Signed-off-by: Avi Kivity a...@redhat.com --- memory.c | 21 + memory.h | 11 +++ 2 files changed, 32 insertions(+), 0 deletions(-) diff --git a/memory.c b/memory.c index d0f90ca..a080d21 100644 --- a/memory.c +++ b/memory.c @@ -1324,6 +1324,27 @@ void memory_region_set_enabled(MemoryRegion *mr, bool enabled) memory_region_update_topology(NULL); } +void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr) +{ +MemoryRegion *parent = mr-parent; +unsigned priority = mr-priority; +bool may_overlap = mr-may_overlap; + +if (addr == mr-addr || !parent) { +mr-addr = addr; +return; +} + +memory_region_transaction_begin(); +memory_region_del_subregion(parent, mr); +if (may_overlap) { +memory_region_add_subregion_overlap(parent, addr, mr, priority); +} else { +memory_region_add_subregion(parent, addr, mr); +} +memory_region_transaction_commit(); +} + void set_system_memory_map(MemoryRegion *mr) { address_space_memory.root = mr; diff --git a/memory.h b/memory.h index c6997c4..db53422 100644 --- a/memory.h +++ b/memory.h @@ -518,6 +518,17 @@ void memory_region_del_subregion(MemoryRegion *mr, */ void memory_region_set_enabled(MemoryRegion *mr, bool enabled); +/* + * memory_region_set_address: dynamically update the address of a region + * + * Dynamically updates the address of a region, relative to its parent. + * May be used on regions are currently part of a memory hierarchy. + * + * @mr: the region to be updated + * @addr: new address, relative to parent region + */ +void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr); + /* Start a transaction; changes will be accumulated and made visible only * when the transaction ends. */ -- 1.7.7.1
[Qemu-devel] [PATCH v2 6/6] piix_pci: adapt smram mapping to use memory mutators
Eliminates fake state -smram_enabled. Signed-off-by: Avi Kivity a...@redhat.com --- hw/piix_pci.c | 20 ++-- 1 files changed, 6 insertions(+), 14 deletions(-) diff --git a/hw/piix_pci.c b/hw/piix_pci.c index d183443..ac3d898 100644 --- a/hw/piix_pci.c +++ b/hw/piix_pci.c @@ -81,7 +81,6 @@ struct PCII440FXState { PAMMemoryRegion pam_regions[13]; MemoryRegion smram_region; uint8_t smm_enabled; -bool smram_enabled; PIIX3State *piix3; }; @@ -141,6 +140,7 @@ static void i440fx_update_memory_mappings(PCII440FXState *d) { int i, r; uint32_t smram; +bool smram_enabled; memory_region_transaction_begin(); update_pam(d, 0xf, 0x10, (d-dev.config[I440FX_PAM] 4) 3, @@ -151,18 +151,8 @@ static void i440fx_update_memory_mappings(PCII440FXState *d) d-pam_regions[i+1]); } smram = d-dev.config[I440FX_SMRAM]; -if ((d-smm_enabled (smram 0x08)) || (smram 0x40)) { -if (!d-smram_enabled) { -memory_region_del_subregion(d-system_memory, d-smram_region); -d-smram_enabled = true; -} -} else { -if (d-smram_enabled) { -memory_region_add_subregion_overlap(d-system_memory, 0xa, -d-smram_region, 1); -d-smram_enabled = false; -} -} +smram_enabled = (d-smm_enabled (smram 0x08)) || (smram 0x40); +memory_region_set_enabled(d-smram_region, !smram_enabled); memory_region_transaction_commit(); } @@ -307,7 +297,9 @@ static int i440fx_initfn(PCIDevice *dev) } memory_region_init_alias(f-smram_region, smram-region, f-pci_address_space, 0xa, 0x2); -f-smram_enabled = true; +memory_region_add_subregion_overlap(f-system_memory, 0xa, +f-smram_region, 1); +memory_region_set_enabled(f-smram_region, false); /* Xen supports additional interrupt routes from the PCI devices to * the IOAPIC: the four pins of each PCI device on the bus are also -- 1.7.7.1
[Qemu-devel] [PATCH v2 5/6] cirrus_vga: adapt to memory mutators API
Simplify the code by avoiding dynamic creation and destruction of memory regions. Signed-off-by: Avi Kivity a...@redhat.com --- hw/cirrus_vga.c | 50 +- 1 files changed, 17 insertions(+), 33 deletions(-) diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c index c7e365b..9f7fea1 100644 --- a/hw/cirrus_vga.c +++ b/hw/cirrus_vga.c @@ -205,7 +205,7 @@ typedef void (*cirrus_fill_t)(struct CirrusVGAState *s, bool linear_vram; /* vga.vram mapped over cirrus_linear_io */ MemoryRegion low_mem_container; /* container for 0xa-0xc */ MemoryRegion low_mem; /* always mapped, overridden by: */ -MemoryRegion *cirrus_bank[2]; /* aliases at 0xa-0xb */ +MemoryRegion cirrus_bank[2];/* aliases at 0xa-0xb */ uint32_t cirrus_addr_mask; uint32_t linear_mmio_mask; uint8_t cirrus_shadow_gr0; @@ -2363,40 +2363,16 @@ static void cirrus_linear_bitblt_write(void *opaque, }, }; -static void unmap_bank(CirrusVGAState *s, unsigned bank) -{ -if (s-cirrus_bank[bank]) { -memory_region_del_subregion(s-low_mem_container, -s-cirrus_bank[bank]); -memory_region_destroy(s-cirrus_bank[bank]); -g_free(s-cirrus_bank[bank]); -s-cirrus_bank[bank] = NULL; -} -} - static void map_linear_vram_bank(CirrusVGAState *s, unsigned bank) { -MemoryRegion *mr; -static const char *names[] = { vga.bank0, vga.bank1 }; - -if (!(s-cirrus_srcptr != s-cirrus_srcptr_end) +MemoryRegion *mr = s-cirrus_bank[bank]; +bool enabled = !(s-cirrus_srcptr != s-cirrus_srcptr_end) !((s-vga.sr[0x07] 0x01) == 0) !((s-vga.gr[0x0B] 0x14) == 0x14) - !(s-vga.gr[0x0B] 0x02)) { - -mr = g_malloc(sizeof(*mr)); -memory_region_init_alias(mr, names[bank], s-vga.vram, - s-cirrus_bank_base[bank], 0x8000); -memory_region_add_subregion_overlap( -s-low_mem_container, -0x8000 * bank, -mr, -1); -unmap_bank(s, bank); -s-cirrus_bank[bank] = mr; -} else { -unmap_bank(s, bank); -} + !(s-vga.gr[0x0B] 0x02); + +memory_region_set_enabled(mr, enabled); +memory_region_set_alias_offset(mr, s-cirrus_bank_base[bank]); } static void map_linear_vram(CirrusVGAState *s) @@ -2415,8 +2391,8 @@ static void unmap_linear_vram(CirrusVGAState *s) s-linear_vram = false; memory_region_del_subregion(s-pci_bar, s-vga.vram); } -unmap_bank(s, 0); -unmap_bank(s, 1); +memory_region_set_enabled(s-cirrus_bank[0], false); +memory_region_set_enabled(s-cirrus_bank[1], false); } /* Compute the memory access functions */ @@ -2856,6 +2832,14 @@ static void cirrus_init_common(CirrusVGAState * s, int device_id, int is_pci, memory_region_init_io(s-low_mem, cirrus_vga_mem_ops, s, cirrus-low-memory, 0x2); memory_region_add_subregion(s-low_mem_container, 0, s-low_mem); +for (i = 0; i 2; ++i) { +static const char *names[] = { vga.bank0, vga.bank1 }; +MemoryRegion *bank = s-cirrus_bank[i]; +memory_region_init_alias(bank, names[i], s-vga.vram, 0, 0x8000); +memory_region_set_enabled(bank, false); +memory_region_add_subregion_overlap(s-low_mem_container, i * 0x8000, +bank, 1); +} memory_region_add_subregion_overlap(system_memory, isa_mem_base + 0x000a, s-low_mem_container, -- 1.7.7.1
[Qemu-devel] [Bug 899961] Re: qemu/kvm locks up when run 32bit userspace with 64bit kernel
And some more info. Debugging with gdb shows this: (gdb) info threads Id Target Id Frame 2Thread 0xf6d4eb70 (LWP 28697) qemu-system-x86 0xf7711425 in __kernel_vsyscall () * 1Thread 0xf6f50700 (LWP 28694) qemu-system-x86 0xf7711425 in __kernel_vsyscall () (gdb) bt #0 0xf7711425 in __kernel_vsyscall () #1 0xf76d620a in __pthread_cond_wait (cond=0x840fa60, mutex=0x89e82f0) at pthread_cond_wait.c:153 #2 0x080e8bb5 in qemu_cond_wait (cond=0x840fa60, mutex=0x89e82f0) at /build/kvm/git/qemu-thread-posix.c:113 #3 0x08050c2e in run_on_cpu (env=0x9466460, func=0x8083ad0 do_kvm_cpu_synchronize_state, data=0x9466460) at /build/kvm/git/cpus.c:715 #4 0x08083b63 in kvm_cpu_synchronize_state (env=0x9466460) at /build/kvm/git/kvm-all.c:927 #5 0x0804faaa in cpu_synchronize_state (env=0x9466460) at /build/kvm/git/kvm.h:173 #6 0x0804fc3a in cpu_synchronize_all_states () at /build/kvm/git/cpus.c:94 #7 0x080647ec in main_loop () at /build/kvm/git/vl.c:1421 #8 0x0806974d in main (argc=17, argv=0xff996e04, envp=0xff996e4c) at /build/kvm/git/vl.c:3395 (gdb) frame 2 #2 0x080e8bb5 in qemu_cond_wait (cond=0x840fa60, mutex=0x89e82f0) at /build/kvm/git/qemu-thread-posix.c:113 113 err = pthread_cond_wait(cond-cond, mutex-lock); (gdb) (gdb) thread 2 [Switching to thread 2 (Thread 0xf6d4eb70 (LWP 28697))] #0 0xf7711425 in __kernel_vsyscall () (gdb) bt #0 0xf7711425 in __kernel_vsyscall () #1 0xf727ac89 in ioctl () at ../sysdeps/unix/syscall-template.S:82 #2 0x08084004 in kvm_vcpu_ioctl (env=0x9466460, type=44672) at /build/kvm/git/kvm-all.c:1090 #3 0x08083cd8 in kvm_cpu_exec (env=0x9466460) at /build/kvm/git/kvm-all.c:976 #4 0x08050f44 in qemu_kvm_cpu_thread_fn (arg=0x9466460) at /build/kvm/git/cpus.c:806 #5 0xf76d1c39 in start_thread (arg=0xf6d4eb70) at pthread_create.c:304 #6 0xf728296e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130 Backtrace stopped: Not enough registers or memory available to unwind further which is not entirely interesting, but: when exiting gdb (I attached it to a running process), the whole thing unfreezes and continue its work as usual, if no lockup ever occured -- ie, it is enough to attach gdb to a locked up process and quit gdb - enough to unfreeze it. Also, when running under gdb, the lockup does not occur - I can reboot the guest at will any times, it all goes fine. Once gdb is detached, reboot immediately results in a lockup again - which - again - can be cured by attaching and detaching gdb to the process. And one more correction for the original report. When locked up, it does NOT use 100% CPU - CPU is 100% _idle_. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/899961 Title: qemu/kvm locks up when run 32bit userspace with 64bit kernel Status in QEMU: New Bug description: Applies to both qemu and qemu-kvm 1.0, but only when kernel is 64bit and userspace is 32bit, on x86. Did not happen with previous released versions, such as 0.15. Not all guests triggers this issue - so far, only (32bit) windows 7 guest shows it, but does that quite reliable: first boot of an old guest with new qemu (or qemu-kvm), windows finds a new CPU and suggests rebooting - hit Reboot and in a few seconds it will be locked up (including the monitor), with 100% CPU usage. Killable with -9. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/899961/+subscriptions
[Qemu-devel] [PATCH 1/3] QEMU kvm: Syncing linux headers to 3.2.0-rc1
Update the kvm kernel headers to the 3.2.0-rc1 post using scripts/update-linux-headers.sh script. Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h index fb3fddc..08fe69e 100644 --- a/linux-headers/asm-powerpc/kvm.h +++ b/linux-headers/asm-powerpc/kvm.h @@ -149,6 +149,12 @@ struct kvm_regs { #define KVM_SREGS_E_UPDATE_DBSR(1 3) /* + * Book3S special bits to indicate contents in the struct by maintaining + * backwards compatibility with older structs. If adding a new field, + * please make sure to add a flag for that new field */ +#define KVM_SREGS_S_HIOR (1 0) + +/* * In KVM_SET_SREGS, reserved/pad fields must be left untouched from a * previous KVM_GET_REGS. * @@ -170,9 +176,11 @@ struct kvm_sregs { } ppc64; struct { __u32 sr[16]; - __u64 ibat[8]; - __u64 dbat[8]; + __u64 ibat[8]; + __u64 dbat[8]; } ppc32; + __u64 flags; /* KVM_SREGS_S_ */ + __u64 hior; } s; struct { union { @@ -292,41 +300,4 @@ struct kvm_allocate_rma { __u64 rma_size; }; -struct kvm_book3e_206_tlb_entry { - __u32 mas8; - __u32 mas1; - __u64 mas2; - __u64 mas7_3; -}; - -struct kvm_book3e_206_tlb_params { - /* -* For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV: -* -* - The number of ways of TLB0 must be a power of two between 2 and -* 16. -* - TLB1 must be fully associative. -* - The size of TLB0 must be a multiple of the number of ways, and -* the number of sets must be a power of two. -* - The size of TLB1 may not exceed 64 entries. -* - TLB0 supports 4 KiB pages. -* - The page sizes supported by TLB1 are as indicated by -* TLB1CFG (if MMUCFG[MAVN] = 0) or TLB1PS (if MMUCFG[MAVN] = 1) -* as returned by KVM_GET_SREGS. -* - TLB2 and TLB3 are reserved, and their entries in tlb_sizes[] -* and tlb_ways[] must be zero. -* -* tlb_ways[n] = tlb_sizes[n] means the array is fully associative. -* -* KVM will adjust TLBnCFG based on the sizes configured here, -* though arrays greater than 2048 entries will have TLBnCFG[NENTRY] -* set to zero. -*/ - __u32 tlb_sizes[4]; - __u32 tlb_ways[4]; - __u32 reserved[8]; -}; - -#define KVM_ONE_REG_PPC_HIOR KVM_ONE_REG_PPC | 0x100 - #endif /* __LINUX_KVM_POWERPC_H */ diff --git a/linux-headers/asm-x86/hyperv.h b/linux-headers/asm-x86/hyperv.h index 5df477a..b80420b 100644 --- a/linux-headers/asm-x86/hyperv.h +++ b/linux-headers/asm-x86/hyperv.h @@ -189,5 +189,6 @@ #define HV_STATUS_INVALID_HYPERCALL_CODE 2 #define HV_STATUS_INVALID_HYPERCALL_INPUT 3 #define HV_STATUS_INVALID_ALIGNMENT4 +#define HV_STATUS_INSUFFICIENT_BUFFERS 19 #endif diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index a8761d3..07bd557 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -371,6 +371,7 @@ struct kvm_s390_psw { #define KVM_S390_INT_VIRTIO0x2603u #define KVM_S390_INT_SERVICE 0x2401u #define KVM_S390_INT_EMERGENCY 0x1201u +#define KVM_S390_INT_EXTERNAL_CALL 0x1202u struct kvm_s390_interrupt { __u32 type; @@ -556,8 +557,7 @@ struct kvm_ppc_pvinfo { #define KVM_CAP_MAX_VCPUS 66 /* returns max vcpus per vm */ #define KVM_CAP_PPC_HIOR 67 #define KVM_CAP_PPC_PAPR 68 -#define KVM_CAP_SW_TLB 69 -#define KVM_CAP_ONE_REG 70 +#define KVM_CAP_S390_GMAP 71 #ifdef KVM_CAP_IRQ_ROUTING @@ -637,49 +637,6 @@ struct kvm_clock_data { __u32 pad[9]; }; -#define KVM_MMU_FSL_BOOKE_NOHV 0 -#define KVM_MMU_FSL_BOOKE_HV 1 - -struct kvm_config_tlb { - __u64 params; - __u64 array; - __u32 mmu_type; - __u32 array_len; -}; - -struct kvm_dirty_tlb { - __u64 bitmap; - __u32 num_dirty; -}; - -/* Available with KVM_CAP_ONE_REG */ - -#define KVM_ONE_REG_GENERIC0xULL - -/* - * Architecture specific registers are to be defined in arch headers and - * ORed with the arch identifier. - */ -#define KVM_ONE_REG_PPC0x1000ULL -#define KVM_ONE_REG_X860x2000ULL -#define KVM_ONE_REG_IA64 0x3000ULL -#define KVM_ONE_REG_ARM0x4000ULL -#define KVM_ONE_REG_S390 0x5000ULL - -struct kvm_one_reg { - __u64 id; - union { - __u8 reg8; -
[Qemu-devel] [PATCH 0/3] QEMU kvm: Adding KICK_VCPU capability to i386 kvm
From: Raghavendra K T raghavendra...@linux.vnet.ibm.com Three patch series following this, extends KVM-hypervisor and Linux guest running on KVM-hypervisor to support pv-ticket spinlocks. PV ticket spinlock helps to solve Lock Holder Preemption problem discussed in http://www.amd64.org/fileadmin/user_upload/pub/LHP-commented_slides.pdf. When spinlock is contended,a guest vcpu relinqueshes cpu by halt(). Correspondingly, One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick the halted vcpu to continue with execution. The series will : - Update qemu with latest linux header files (to 3.2.0-rc1). - Enable KICK_VCPU capability in kvm/i386. Raghavendra K T(3): Sync the linux headers to 3.2.0-rc1 Sync the linux headers to patched linux kernel with KICK_VCPU capability. Add KICK_VCPU support in i386 target --- The corresponding kernel patch is available in the thread https://lkml.org/lkml/2011/11/30/62
[Qemu-devel] [PATCH 3/3] QEMU kvm/i386 : Adding KICK_VCPU capability support in i386 target.
Extend the KVM Hypervisor to enable KICK_VCPU feature that allows a vcpu to kick the halted vcpu to continue with execution in PV ticket spinlock. Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 5bfc21f..69bce21 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -97,6 +97,7 @@ struct kvm_para_features { { KVM_CAP_NOP_IO_DELAY, KVM_FEATURE_NOP_IO_DELAY }, { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP }, { KVM_CAP_ASYNC_PF, KVM_FEATURE_ASYNC_PF }, +{ KVM_CAP_KICK_VCPU, KVM_FEATURE_KICK_VCPU }, { -1, -1 } };
[Qemu-devel] [PATCH 2/3] QEMU kvm: Syncing linux headers to support KICK_VCPU capability
Update the kernel header that adds a hypercall to support pv-ticketlocks. Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h index f2ac46a..03d3a36 100644 --- a/linux-headers/asm-x86/kvm_para.h +++ b/linux-headers/asm-x86/kvm_para.h @@ -16,12 +16,14 @@ #define KVM_FEATURE_CLOCKSOURCE0 #define KVM_FEATURE_NOP_IO_DELAY 1 #define KVM_FEATURE_MMU_OP 2 + /* This indicates that the new set of kvmclock msrs * are available. The use of 0x11 and 0x12 is deprecated */ #define KVM_FEATURE_CLOCKSOURCE23 #define KVM_FEATURE_ASYNC_PF 4 #define KVM_FEATURE_STEAL_TIME 5 +#define KVM_FEATURE_KICK_VCPU 6 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 07bd557..47ab6ff 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo { #define KVM_CAP_PPC_HIOR 67 #define KVM_CAP_PPC_PAPR 68 #define KVM_CAP_S390_GMAP 71 +#define KVM_CAP_KICK_VCPU 72 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/linux-headers/linux/kvm_para.h b/linux-headers/linux/kvm_para.h index b315e27..e4a0e3e 100644 --- a/linux-headers/linux/kvm_para.h +++ b/linux-headers/linux/kvm_para.h @@ -19,6 +19,7 @@ #define KVM_HC_MMU_OP 2 #define KVM_HC_FEATURES3 #define KVM_HC_PPC_MAP_MAGIC_PAGE 4 +#define KVM_HC_KICK_CPU5 /* * hypercalls use architecture specific
[Qemu-devel] [PATCH v2 3/6] memory: introduce memory_region_set_alias_offset()
Add an API to update an alias offset of an active alias. This can be used to simplify implementation of dynamic memory banks. Signed-off-by: Avi Kivity a...@redhat.com --- memory.c | 14 ++ memory.h | 13 - 2 files changed, 26 insertions(+), 1 deletions(-) diff --git a/memory.c b/memory.c index a080d21..7e842b3 100644 --- a/memory.c +++ b/memory.c @@ -1345,6 +1345,20 @@ void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr) memory_region_transaction_commit(); } +void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t offset) +{ +target_phys_addr_t old_offset = mr-alias_offset; + +assert(mr-alias); +mr-alias_offset = offset; + +if (offset == old_offset || !mr-parent) { +return; +} + +memory_region_update_topology(mr); +} + void set_system_memory_map(MemoryRegion *mr) { address_space_memory.root = mr; diff --git a/memory.h b/memory.h index db53422..2022de7 100644 --- a/memory.h +++ b/memory.h @@ -527,7 +527,18 @@ void memory_region_set_enabled(MemoryRegion *mr, bool enabled); * @mr: the region to be updated * @addr: new address, relative to parent region */ -void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr); +void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t offset); + +/* + * memory_region_set_alias_offset: dynamically update a memory alias's offset + * + * Dynamically updates the offset into the target region that an alias points + * to, as if the fourth argument to memory_region_init_alias() has changed. + * + * @mr: the #MemoryRegion to be updated; should be an alias. + * @offset: the new offset into the target memory region + */ +void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t addr); /* Start a transaction; changes will be accumulated and made visible only * when the transaction ends. -- 1.7.7.1
[Qemu-devel] [PATCH v2 1/6] memory: introduce memory_region_set_enabled()
This allows users to disable a memory region without removing it from the hierarchy, simplifying the implementation of memory routers. Signed-off-by: Avi Kivity a...@redhat.com --- memory.c | 40 +--- memory.h | 17 + 2 files changed, 46 insertions(+), 11 deletions(-) diff --git a/memory.c b/memory.c index adfdf14..d0f90ca 100644 --- a/memory.c +++ b/memory.c @@ -528,6 +528,10 @@ static void render_memory_region(FlatView *view, FlatRange fr; AddrRange tmp; +if (!mr-enabled) { +return; +} + int128_addto(base, int128_make64(mr-addr)); readonly |= mr-readonly; @@ -750,12 +754,16 @@ static void address_space_update_topology(AddressSpace *as) address_space_update_ioeventfds(as); } -static void memory_region_update_topology(void) +static void memory_region_update_topology(MemoryRegion *mr) { if (memory_region_transaction_depth) { return; } +if (mr !mr-enabled) { +return; +} + if (address_space_memory.root) { address_space_update_topology(address_space_memory); } @@ -773,7 +781,7 @@ void memory_region_transaction_commit(void) { assert(memory_region_transaction_depth); --memory_region_transaction_depth; -memory_region_update_topology(); +memory_region_update_topology(NULL); } static void memory_region_destructor_none(MemoryRegion *mr) @@ -813,6 +821,7 @@ void memory_region_init(MemoryRegion *mr, } mr-addr = 0; mr-offset = 0; +mr-enabled = true; mr-terminates = false; mr-readable = true; mr-readonly = false; @@ -1058,7 +1067,7 @@ void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client) uint8_t mask = 1 client; mr-dirty_log_mask = (mr-dirty_log_mask ~mask) | (log * mask); -memory_region_update_topology(); +memory_region_update_topology(mr); } bool memory_region_get_dirty(MemoryRegion *mr, target_phys_addr_t addr, @@ -1090,7 +1099,7 @@ void memory_region_set_readonly(MemoryRegion *mr, bool readonly) { if (mr-readonly != readonly) { mr-readonly = readonly; -memory_region_update_topology(); +memory_region_update_topology(mr); } } @@ -1098,7 +1107,7 @@ void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable) { if (mr-readable != readable) { mr-readable = readable; -memory_region_update_topology(); +memory_region_update_topology(mr); } } @@ -1203,7 +1212,7 @@ void memory_region_add_eventfd(MemoryRegion *mr, memmove(mr-ioeventfds[i+1], mr-ioeventfds[i], sizeof(*mr-ioeventfds) * (mr-ioeventfd_nb-1 - i)); mr-ioeventfds[i] = mrfd; -memory_region_update_topology(); +memory_region_update_topology(mr); } void memory_region_del_eventfd(MemoryRegion *mr, @@ -1233,7 +1242,7 @@ void memory_region_del_eventfd(MemoryRegion *mr, --mr-ioeventfd_nb; mr-ioeventfds = g_realloc(mr-ioeventfds, sizeof(*mr-ioeventfds)*mr-ioeventfd_nb + 1); -memory_region_update_topology(); +memory_region_update_topology(mr); } static void memory_region_add_subregion_common(MemoryRegion *mr, @@ -1274,7 +1283,7 @@ static void memory_region_add_subregion_common(MemoryRegion *mr, } QTAILQ_INSERT_TAIL(mr-subregions, subregion, subregions_link); done: -memory_region_update_topology(); +memory_region_update_topology(mr); } @@ -1303,19 +1312,28 @@ void memory_region_del_subregion(MemoryRegion *mr, assert(subregion-parent == mr); subregion-parent = NULL; QTAILQ_REMOVE(mr-subregions, subregion, subregions_link); -memory_region_update_topology(); +memory_region_update_topology(mr); +} + +void memory_region_set_enabled(MemoryRegion *mr, bool enabled) +{ +if (enabled == mr-enabled) { +return; +} +mr-enabled = enabled; +memory_region_update_topology(NULL); } void set_system_memory_map(MemoryRegion *mr) { address_space_memory.root = mr; -memory_region_update_topology(); +memory_region_update_topology(NULL); } void set_system_io_map(MemoryRegion *mr) { address_space_io.root = mr; -memory_region_update_topology(); +memory_region_update_topology(NULL); } typedef struct MemoryRegionList MemoryRegionList; diff --git a/memory.h b/memory.h index 53bf261..c6997c4 100644 --- a/memory.h +++ b/memory.h @@ -123,6 +123,7 @@ struct MemoryRegion { bool terminates; bool readable; bool readonly; /* For RAM regions */ +bool enabled; MemoryRegion *alias; target_phys_addr_t alias_offset; unsigned priority; @@ -501,6 +502,22 @@ void memory_region_add_subregion_overlap(MemoryRegion *mr, void memory_region_del_subregion(MemoryRegion *mr, MemoryRegion *subregion); + +/* + * memory_region_set_enabled: dynamically enable or disable a region + * + * Enables or disables a
Re: [Qemu-devel] [PATCH v2 00/18] qom: dynamic properties and composition tree (v2)
On 12/03/2011 03:34 PM, Anthony Liguori wrote: On 12/03/2011 08:24 AM, Paolo Bonzini wrote: On 12/03/2011 03:40 AM, Anthony Liguori wrote: That is still true. The next step, inheritance, will pull the properties into a base class. That base class can be used elsewhere outside of the device model. But this is already a 20 patch series. If you want all of that in one series, it's going to be 100 patches that are not terribly easy to review at once. Without a design document and a roadmap, however, it's impossible to try to understand how the pieces will be together. 100 patches may require some time to digest, but 20 patches require a crystal ball to figure out what's ahead. You can see a bit further by looking at: https://github.com/aliguori/qemu/commits/qom-next That fills out the composition tree pretty well for the pc. The next step is aggressive refactoring such that the qdev objects reflect the composition. IOW, we should create the rtc from within the piix3 initialization function. I've begun the work of introducing proper inheritance. There's a lot going on but the basic idea is: 1) introduce QOM base type (Object), make qdev inherit from it 2) create a dynamic typeinfo based DeviceInfo, make device class point to deviceinfo 3) model qdev hierarchy in QOM 4) starting from the bottom of the hierarchy, remove DeviceInfo subclass and push that functionality into QOM classes 5) once (4) is complete, remove DeviceInfo 6) refactor any use of multiple child busses into separate devices with one bus 7) refactor busstate as an interface 8) refactor device model to make more aggressive use of composition 9) refactor life cycle events into virtual methods The tree I've posted is on step (4). Regards, Anthony Liguori
Re: [Qemu-devel] sub-page-sized mmio regions and address passed to read/write fns
On 4 December 2011 12:17, Avi Kivity a...@redhat.com wrote: On 12/02/2011 04:49 PM, Peter Maydell wrote: However what I found is that the addresses passed to the read/write functions aren't what I would expect. For instance if the board maps the container at address 0x1e00, then a read from 0x1e000100 goes to the functions given by a9_gic_cpu_ops, as it should. However, the offset parameter that the read function is passed is not 0x0 (offset from the start of the a9mp-gic-cpu region) but 0x100 (offset from the start of the page, I think). Is this expected behaviour? I certainly wasn't expecting it... A while ago this was the behaviour across the board. Then 8da3ff1809747 changed addresses to be relative, but apparently missed the subpage case. Having looked a bit more closely at the code I think this is what the comment at the top of cpu_register_physical_memory_log() is referring to: # Both start_addr and region_offset are rounded down to a page boundary # before calculating this offset. This should not be a problem unless # the low bits of start_addr and region_offset differ. In the case of a subregion at a non-page-aligned-address the start_addr is not page aligned, but the region_offset is zero, in the usual case, so we have differing low bits. I looked through the code that's getting called for reads, and it looks to me like exec.c:subpage_readlen() is causing this. We look up the subpage_t based on the address within the page, but we don't then adjust the address we pass to io_mem_read (except by region_offset, which I take from the comment at the top of cpu_register_physical_memory_log() to be for something else.) I think you can use subpage_t's region_offset array for this (adding into it, of course, so the original value remains). Yes. I think the correction has to be calculated and applied in cpu_register_physical_memory_log() -- for a region which starts at a non-page-aligned address and extends over more than a page the correcting offset needs to be applied for the whole region, not just the first partial page. -- PMM
Re: [Qemu-devel] [PATCH v2 3/6] memory: introduce memory_region_set_alias_offset()
On Sun, Dec 4, 2011 at 18:09, Avi Kivity a...@redhat.com wrote: Add an API to update an alias offset of an active alias. This can be used to simplify implementation of dynamic memory banks. Signed-off-by: Avi Kivity a...@redhat.com --- memory.c | 14 ++ memory.h | 13 - 2 files changed, 26 insertions(+), 1 deletions(-) diff --git a/memory.c b/memory.c index a080d21..7e842b3 100644 --- a/memory.c +++ b/memory.c @@ -1345,6 +1345,20 @@ void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr) memory_region_transaction_commit(); } +void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t offset) +{ + target_phys_addr_t old_offset = mr-alias_offset; + + assert(mr-alias); + mr-alias_offset = offset; + + if (offset == old_offset || !mr-parent) { + return; + } + + memory_region_update_topology(mr); +} + void set_system_memory_map(MemoryRegion *mr) { address_space_memory.root = mr; diff --git a/memory.h b/memory.h index db53422..2022de7 100644 --- a/memory.h +++ b/memory.h @@ -527,7 +527,18 @@ void memory_region_set_enabled(MemoryRegion *mr, bool enabled); * @mr: the region to be updated * @addr: new address, relative to parent region */ -void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr); +void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t offset); This isn't the function you are looking for, but still 'addr' is changed to 'offset'. + +/* + * memory_region_set_alias_offset: dynamically update a memory alias's offset + * + * Dynamically updates the offset into the target region that an alias points + * to, as if the fourth argument to memory_region_init_alias() has changed. + * + * @mr: the #MemoryRegion to be updated; should be an alias. + * @offset: the new offset into the target memory region + */ +void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t addr); Here 'addr' doesn't match the description above.
Re: [Qemu-devel] [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 2011-12-04 22:31, Blue Swirl wrote: On Sun, Dec 4, 2011 at 16:35, Avi Kivity a...@redhat.com wrote: On 12/04/2011 05:19 PM, Jan Kiszka wrote: In the sense that kernel-apic is just an accelerated apic. From the guest point of view, there's no difference, and that should be reflected in the device model. That was my goal as well: The guest should not notice the difference, but the admin on the host side should still be able to tell both internally fairly different models apart. This should be some attribute, not the name. Plus the code should be clearly split where there are differences and explicitly shared where there aren't. That's a good goal, yes. I'd prefer an unified device built from a single source file if possible. This conflicts with the build-once model though. Right, another reason to not do this. If I'm reading an apic register, either from the guest or via a monitor debug interface, I shouldn't care whether it's accelerated or not. The guest part already holds, of course. Specifically for the debug scenario, I'd prefer the clear differentiation by name as there can always remain subtle differences in the implementation of kernel vs. user space. Someone debugging the guest and/or qemu/kvm should remain aware of this. Aware, yes, but the name change is too drastic. It should be also possible to migrate from non-KVM device to KVM version, different names would prevent that for ever. It is (theoretically) possible with these patches as the vmstate names are the same. KVM to TCG migration does not work right now, so I was only able to test in-kernel - user space irqchip model migrations. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH for v1.0 1/3] msix: track function masked in pci device state
On Sun, Dec 4, 2011 at 3:20 AM, Michael S. Tsirkin m...@redhat.com wrote: On Fri, Dec 02, 2011 at 04:34:21PM -0700, Cam Macdonell wrote: Based on a git bisect, this patch breaks msi-x interrupt delivery in the ivshmem device. I think the following should fix it. Compiled-only - could you pls check? If yes let's apply to the stable branch. Thanks for the patch Michael. It addresses the need for msix_write_config() to be called, but the addition of the msix_reset() is causing a reset of the vectors after they've been initialized in pci_ivshmem_init(). So, interrupts still aren't delivered with this patch applied as it is. In particular, a reset occurs after pci_ivshmem_init runs, so the msix_entry_used array is reset to 0s, which causes the interrupt delivery to fail. If I comment out the msix_reset(), then interrupts are delivered. Would the reset be caused by a bug in the guest driver? or do I need to reconfigure the msix after reset? I'm unclear as to the proper behaviour after a reset. Thanks, Cam -- ivshmem: add missing msix calls ivshmem used msix but didn't call it on either reset or config write paths. This used to partically work since guests don't use all of msi-x configuration fields, and reset is rarely used, but the patch 'msix: track function masked in pci device state' broke that. Fix by adding appropriate calls. Signed-off-by: Michael S. Tsirkin m...@redhat.com -- diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 242fbea..3680c0f 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -505,6 +505,7 @@ static void ivshmem_reset(DeviceState *d) IVShmemState *s = DO_UPCAST(IVShmemState, dev.qdev, d); s-intrstatus = 0; + msix_reset(s-dev); return; } @@ -610,6 +611,13 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id) return 0; } +static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address, + uint32_t val, int len) +{ + pci_default_write_config(pci_dev, address, val, len); + msix_write_config(pci_dev, address, val, len); +} + static int pci_ivshmem_init(PCIDevice *dev) { IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); @@ -734,6 +742,8 @@ static int pci_ivshmem_init(PCIDevice *dev) } + s-dev.config_write = ivshmem_write_config; + return 0; }
Re: [Qemu-devel] [PATCH v5] block:add-cow file format
Ping... 2011/11/28 Dong Xu Wang wdon...@linux.vnet.ibm.com Any comment? Thanks. 2011/11/15 Dong Xu Wang wdon...@linux.vnet.ibm.com From: Dong Xu Wang wdon...@linux.vnet.ibm.com Provide a new file format: add-cow. The usage can be found in add-cow.txt of this patch. Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com --- Makefile.objs |1 + block.c|2 +- block.h|1 + block/add-cow.c| 417 block_int.h|1 + docs/specs/add-cow.txt | 57 +++ 6 files changed, 478 insertions(+), 1 deletions(-) create mode 100644 block/add-cow.c create mode 100644 docs/specs/add-cow.txt diff --git a/Makefile.objs b/Makefile.objs index d7a6539..ad99243 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -31,6 +31,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o +block-nested-y += add-cow.o block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-nested-y += qed-check.o block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o diff --git a/block.c b/block.c index 86910b0..a2be27b 100644 --- a/block.c +++ b/block.c @@ -106,7 +106,7 @@ int is_windows_drive(const char *filename) #endif /* check if the path starts with protocol: */ -static int path_has_protocol(const char *path) +int path_has_protocol(const char *path) { #ifdef _WIN32 if (is_windows_drive(path) || diff --git a/block.h b/block.h index 051a25d..836284f 100644 --- a/block.h +++ b/block.h @@ -276,6 +276,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn); char *get_human_readable_size(char *buf, int buf_size, int64_t size); int path_is_absolute(const char *path); +int path_has_protocol(const char *path); void path_combine(char *dest, int dest_size, const char *base_path, const char *filename); diff --git a/block/add-cow.c b/block/add-cow.c new file mode 100644 index 000..54d30a9 --- /dev/null +++ b/block/add-cow.c @@ -0,0 +1,417 @@ +#include qemu-common.h +#include block_int.h +#include module.h + +#define ADD_COW_MAGIC (((uint64_t)'A' 56) | ((uint64_t)'D' 48) | \ +((uint64_t)'D' 40) | ((uint64_t)'_' 32) | \ +((uint64_t)'C' 24) | ((uint64_t)'O' 16) | \ +((uint64_t)'W' 8) | 0xFF) +#define ADD_COW_VERSION 1 +#define ADD_COW_FILE_LEN1024 + +typedef struct AddCowHeader { +uint64_tmagic; +uint32_tversion; +charbacking_file[ADD_COW_FILE_LEN]; +charimage_file[ADD_COW_FILE_LEN]; +uint64_tsize; +} QEMU_PACKED AddCowHeader; + +typedef struct BDRVAddCowState { +charimage_file[ADD_COW_FILE_LEN]; +BlockDriverState*image_hd; +uint8_t *bitmap; +uint64_tbitmap_size; +CoMutex lock; +} BDRVAddCowState; + +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename) +{ +const AddCowHeader *header = (const void *)buf; + +if (be64_to_cpu(header-magic) == ADD_COW_MAGIC +be32_to_cpu(header-version) == ADD_COW_VERSION) { +return 100; +} else { +return 0; +} +} + +static int add_cow_open(BlockDriverState *bs, int flags) +{ +AddCowHeaderheader; +int64_t size; +charimage_filename[ADD_COW_FILE_LEN]; +int image_flags; +BlockDriver *image_drv = NULL; +int ret; +BDRVAddCowState *state = (BDRVAddCowState *)(bs-opaque); + +ret = bdrv_pread(bs-file, 0, header, sizeof(header)); +if (ret != sizeof(header)) { +goto fail; +} + +if (be64_to_cpu(header.magic) != ADD_COW_MAGIC || +be32_to_cpu(header.version) != ADD_COW_VERSION) { +ret = -EINVAL; +goto fail; +} + +size = be64_to_cpu(header.size); +bs-total_sectors = size / BDRV_SECTOR_SIZE; + +QEMU_BUILD_BUG_ON(sizeof(state-image_file) != sizeof(header.image_file)); +pstrcpy(bs-backing_file, sizeof(bs-backing_file), +header.backing_file); +pstrcpy(state-image_file, sizeof(state-image_file), +header.image_file); + +state-bitmap_size = ((bs-total_sectors + 7) 3); +state-bitmap = g_malloc0(state-bitmap_size); + +ret = bdrv_pread(bs-file, sizeof(header), state-bitmap, +state-bitmap_size); +if (ret != state-bitmap_size) { +goto fail; +} + /* If there is a image_file, must be together with backing_file */ +if (state-image_file[0] != '\0') { +state-image_hd =
Re: [Qemu-devel] windows guest virtio serial and balloon driver test issues
On 11/29/2011 08:36 PM, Vadim Rozenfeld wrote: On Tue, 2011-11-29 at 08:58 +0800, Cao,Bing Bu wrote: Hi, Rozenfeld,Thanks,got it! And do you know whether there are some sufficient test tools (such as IOmeter) to test the virtio driver performance? IoMeter is good. But you also might be interested in SQLIOSim, database hammer, and diskio (part of WLK) + xperf. On 11/25/2011 02:42 PM, Vadim Rozenfeld wrote: On Fri, 2011-11-25 at 09:59 +0800, Cao,Bing Bu wrote: Hi,all Thanks,Frenkel.The test application of the balloon must be run as admin. But I found 2 problems(question) this week when testing windows guest drivers: * If only virtio serial driver installed,the virtio serial test app can not enumerate/find the virtio serial device, but after virtio balloon driver installed,the app can find the virtio serial device correctly. Because of the same GUID which balloon and serial both use? Correct. This test application is a very simplified one. We published it mostly as an example, but not as a real test application. It doesn't enumerate all virtio serial instances, rather just find the first one and use it. * When inflate/deflate the balloon size using qemu monitor balloon command, the total physical memory did not decrease/increase correspondingly,as seen from resource monitor, only the available memory size descrease/increase.But I test on other Linux guest, the total physical memory of the guest OS was changed. Is it a problem? If not,is it confusing to user? Related to the windows internal memory management? Total physical memory on Windows will always be the same, because we don't hot-plug/unplug physical memory. Balloon driver works with non-paged pool memory instead. So, every time you inflate or deflate balloon in your system, you should see Available memory is changing, while physical will always be the same. Best, Vadim. On 11/21/2011 06:33 PM, Arkady Frenkel wrote: On 11/21/2011 10:39 AM, Cao,Bing Bu wrote: Hi, Recently,I am testing windows guest driver on Win7 and WinXP(32bit) with the latest windows guest driver development source. Download from http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/src/ virtio-blk: It seems OK both on Win7 and WinXP,the r/w performance is better than IDE disk. virtio-serial: I tried to test virtio serial driver using the test application in the project. WinXP: Write: OK Read: Error: Read File Failed. Win7: The test application return error can not find vioserial device. But i debug the code and check that the GetDevicePath() return value is not NULL,and same as the value when testing on WinXP. Why the CreateFile() in init() not called? (: virtio-balloon: QEMU monitor: device_add virtio-balloon-pci On the guest,a new device PCI standard RAM controller added. But the Device Manager prompt No driver installed for this device, but install the driver balloon.sys failed. It said the driver is up to date.Confused. (: How can I install and test the balloon driver on Windows? The kvm-guest-drivers-windows.git on kernel.org is not available,is there any mirror git repository? Any mail-list or bugzilla for windows guest driver? Any help from will be appreciated. You need to run serial test app as admin only. To install balloon you have to go throw additional option when click on Browse my computer for driver software. Choose Let me pick from the list of device drivers on my computer option. Arkady Best regards Cao,Bing Bu Thanks you,Vadim.(: Is there anything TO-DO or need further optimization in current windows guest driver? How could I contribute to the windows guest driver development(test patches,sign-off patches,bug fix,etc.)? -- Best Regards, Cao,Bing Bu
Re: [Qemu-devel] [KVM][Kemari]: Build error fix
On 2011/12/02 21:51, Pradeep Kumar wrote: It fixes build failure. I hit this error, after succsfull migration and sync. (qemu) qemu-system-x86_64: fill buffer failed, Interrupted system call qemu-system-x86_64: recv header failed qemu-system-x86_64: recv ack failed qemu_transaction_begin failed Did you use master branch? It is not latest version. next branch is latest and fixed error. git://kemari.git.sourceforge.net/gitroot/kemari/kemari next Thanks, Kei Any one working on this now? From 827c04da6574be80d8352acd7c40b0b4524af5f4 Mon Sep 17 00:00:00 2001 Date: Fri, 2 Dec 2011 18:11:40 +0530 Subject: [PATCH] [Qemu][Kemari]: Build Failure Signed-off-by: pradeep psuri...@linux.vnet.ibm.com modified: ft_trans_file.c --- ft_trans_file.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/ft_trans_file.c b/ft_trans_file.c index 4e33034..dc36757 100644 --- a/ft_trans_file.c +++ b/ft_trans_file.c @@ -174,7 +174,7 @@ static int ft_trans_send_header(QEMUFileFtTrans *s, static int ft_trans_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, int size) { QEMUFileFtTrans *s = opaque; -ssize_t ret; +ssize_t ret = 0; trace_ft_trans_put_buffer(size, pos);
Re: [Qemu-devel] [BUG] [Seabios] PCI 64bit BARs on Win2008 - unable to start the device. (ACPI lacks the _DSM method)
Hi Michael, Thank you for good advice, you are right. When I added new range above 4GB in _CRS the problem has gone. QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, NonCacheable, ReadWrite, 0x, // Address Space Granularity 0x1,// Address Range Minimum 0x3,// Address Range Maximum 0x, // Address Translation Offset 0x4,// Address Length ,, , AddressRangeMemory, TypeStatic) The only big problem with this range - as soon as I have more than 3GB of RAM, windows will boot in BSOD. The problem relates to memory range intersection. Unfortunately it is not possible to predict how many GB of RAM the virtual machine could have - so it's difficult to specify a particular region. Do you have any ideas what can be done to solve this problem? Regards, Alexey On Thu, Dec 01, 2011 at 06:49:54PM +1300, Alexey Korolev wrote: Isaku san, I've just added you to discussion. There are some issues with PCI 64bit support in Windows. Windows fails to assign the resource if it doesn't fit in first 4GB window. I really don't know why it happens. One of the possibilities is related to lack of _DSM method in ACPI. Another guesse could be related to the fact that 440FX only supports 32bit PCI bus interface and windows may limit PCI address range to first 4GB for PCI devices under this bridge. I remember you were working on Q35 chipset simulation, I wonder if it is working and would it be possible to try? Thanks, Alexey Maybe the range above 4G needs to be declared in the _CRS resource?
Re: [Qemu-devel] [BUG] [Seabios] PCI 64bit BARs on Win2008 - unable to start the device. (ACPI lacks the _DSM method)
Hi Gerd, We have very early prototype of data acquisition device, with quite large MMIO buffer. It is an emulated device. We are running the 0.15 release. 0.15 doesn't work correctly with 64bit BARs so I've already added some hacks to Seabios to let OS to choose the memory region. Thus you see bar 1, addr 0 in seabios log. Sorry that I haven't specified all this initially. I just want to make 64bit PCI bar working properly. Linux guests works correctly (except early versions - not investigated this yet). At the moment I have some issues with windows which relies on ACPI _CRS. Thanks, Alexey Hi, PCI: map device bus 0, bfd 0x28 bar 0, addr febe, size 1 [mem] bar 1, addr 0, size 2000 [mem] Somehow seabios didn't recognise the bar correctly it seems (both 512 and 256 MB cases look the same). For the 256 MB case seabios should have mapped the bar @ 0xe000. ... and it should also have figured it is prefetchable memory. Was pci config space messed up somehow? What does 'lspci -v' say once you've booted the machine with linux? What qemu version you are running? What kind of device is this? Emulated? Code somewhere? Or a real device passed through to the guest? cheers, Gerd
Re: [Qemu-devel] [PATCH v4 2/3] Extract code to nbd_setup function to be used for many purposes
2011/12/3 Paolo Bonzini pbonz...@redhat.com On 12/02/2011 04:27 PM, Chunyan Liu wrote: @@ -42,6 +42,18 @@ static int verbose; static char *device; static char *srcpath; static char *sockpath; +static int is_sockpath_option; +static int sigterm_fd[2]; +static off_t dev_offset; +static uint32_t nbdflags; +static bool disconnect; +static const char *bindto = 0.0.0.0; +static int port = NBD_DEFAULT_PORT; +static int li; +static int flags = BDRV_O_RDWR; +static int partition = -1; +static int shared = 1; +static int persistent; A lot of statics... li seems unused. Using these statics simply because most of them are global parameters getting from command line options, will be used later. Otherwise, the nbd_setup() function should take many parameters. Ahh, li could be defined in main(). After getting parameters from option, later places can use port. case 'p': li = strtol(optarg, end, 0); if (*end) { errx(EXIT_FAILURE, Invalid port `%s', optarg); } if (li 1 || li 65535) { errx(EXIT_FAILURE, Port out of range `%s', optarg); } port = (uint16_t)li; I took patch 1/3 in my tree (git://github.com/bonzini/**qemu.githttp://github.com/bonzini/qemu.gitbranch nbd-server). I'll post it together with my patches next week. Paolo
Re: [Qemu-devel] Improve QEMU performance with LLVM codegen and other techniques
We ask TCG to disassemble the guest binary where the trace beginning with _again_ to get a set of TCG blocks, then sent them to the LLVM translator. So you have two TCG backends? One to generate real host code and one that goes into your LLVM generator? Ah..., I should say we ask QEMU frontend to disassemble the guest binary to TCG again. Regards, chenwj -- Wei-Ren Chen (陳韋任) Computer Systems Lab, Institute of Information Science, Academia Sinica, Taiwan (R.O.C.) Tel:886-2-2788-3799 #1667 Homepage: http://people.cs.nctu.edu.tw/~chenwj
Re: [Qemu-devel] [BUG] [Seabios] PCI 64bit BARs on Win2008 - unable to start the device. (ACPI lacks the _DSM method)
On Mon, Dec 05, 2011 at 05:20:32PM +1300, Alexey Korolev wrote: Hi Michael, Thank you for good advice, you are right. When I added new range above 4GB in _CRS the problem has gone. QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, NonCacheable, ReadWrite, 0x, // Address Space Granularity 0x1,// Address Range Minimum 0x3,// Address Range Maximum 0x, // Address Translation Offset 0x4,// Address Length ,, , AddressRangeMemory, TypeStatic) The only big problem with this range - as soon as I have more than 3GB of RAM, windows will boot in BSOD. The problem relates to memory range intersection. Unfortunately it is not possible to predict how many GB of RAM the virtual machine could have - so it's difficult to specify a particular region. Do you have any ideas what can be done to solve this problem? Regards, Alexey Two possible ideas: 1. Pass the value in from qemu 2. Get a range toward the upper end of the memory, around 140 On Thu, Dec 01, 2011 at 06:49:54PM +1300, Alexey Korolev wrote: Isaku san, I've just added you to discussion. There are some issues with PCI 64bit support in Windows. Windows fails to assign the resource if it doesn't fit in first 4GB window. I really don't know why it happens. One of the possibilities is related to lack of _DSM method in ACPI. Another guesse could be related to the fact that 440FX only supports 32bit PCI bus interface and windows may limit PCI address range to first 4GB for PCI devices under this bridge. I remember you were working on Q35 chipset simulation, I wonder if it is working and would it be possible to try? Thanks, Alexey Maybe the range above 4G needs to be declared in the _CRS resource?