Re: [PATCH v3 2/2] virtio-rng: skip reading when we start to remove the device

2014-09-12 Thread Rusty Russell
Amit Shah  writes:
> On (Wed) 10 Sep 2014 [14:11:37], Amos Kong wrote:
>> Before we really unregister the hwrng device, reading will get stuck if
>> the virtio device is reset. We should return error for reading when we
>> start to remove the device.
>> 
>> Signed-off-by: Amos Kong 
>> Cc: sta...@vger.kernel.org
>
> Reviewed-by: Amit Shah 

Thanks, applied.

They're sitting in my fixes branch.  If there are no screams from
linux-next, I'll push to Linus Monday.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mm: export symbol dependencies of is_zero_pfn()

2014-09-12 Thread Ard Biesheuvel
On 12 September 2014 23:14, Andrew Morton  wrote:
> On Fri, 12 Sep 2014 22:17:23 +0200 Ard Biesheuvel  
> wrote:
>
>> In order to make the static inline function is_zero_pfn() callable by
>> modules, export its symbol dependencies 'zero_pfn' and (for s390 and
>> mips) 'zero_page_mask'.
>
> So hexagon and score get the export if/when needed.
>

Exactly.

>> We need this for KVM, as CONFIG_KVM is a tristate for all supported
>> architectures except ARM and arm64, and testing a pfn whether it refers
>> to the zero page is required to correctly distinguish the zero page
>> from other special RAM ranges that may also have the PG_reserved bit
>> set, but need to be treated as MMIO memory.
>>
>> Signed-off-by: Ard Biesheuvel 
>> ---
>>  arch/mips/mm/init.c | 1 +
>>  arch/s390/mm/init.c | 1 +
>>  mm/memory.c | 2 ++
>
> Looks OK to me.  Please include the patch in whichever tree is is that
> needs it, and merge it up via that tree.
>

Thanks.

@Paolo: could you please take this (with Andrew's ack), and put it
before the patch you took earlier today?

Thanks,
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mm: export symbol dependencies of is_zero_pfn()

2014-09-12 Thread Andrew Morton
On Fri, 12 Sep 2014 22:17:23 +0200 Ard Biesheuvel  
wrote:

> In order to make the static inline function is_zero_pfn() callable by
> modules, export its symbol dependencies 'zero_pfn' and (for s390 and
> mips) 'zero_page_mask'.

So hexagon and score get the export if/when needed.

> We need this for KVM, as CONFIG_KVM is a tristate for all supported
> architectures except ARM and arm64, and testing a pfn whether it refers
> to the zero page is required to correctly distinguish the zero page
> from other special RAM ranges that may also have the PG_reserved bit
> set, but need to be treated as MMIO memory.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/mips/mm/init.c | 1 +
>  arch/s390/mm/init.c | 1 +
>  mm/memory.c | 2 ++

Looks OK to me.  Please include the patch in whichever tree is is that
needs it, and merge it up via that tree.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Using the tlb flush util function where applicable

2014-09-12 Thread Liang Chen
Using kvm_mmu_flush_tlb as the other places to make sure vcpu
 stat is incremented

Signed-off-by: Liang Chen 
---
 arch/x86/kvm/vmx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..439682e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1810,7 +1810,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct desc_ptr *gdt = &__get_cpu_var(host_gdt);
unsigned long sysenter_esp;
 
-   kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+   kvm_mmu_flush_tlb(vcpu);
local_irq_disable();
crash_disable_local_vmclear(cpu);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mm: export symbol dependencies of is_zero_pfn()

2014-09-12 Thread Ard Biesheuvel
In order to make the static inline function is_zero_pfn() callable by
modules, export its symbol dependencies 'zero_pfn' and (for s390 and
mips) 'zero_page_mask'.

We need this for KVM, as CONFIG_KVM is a tristate for all supported
architectures except ARM and arm64, and testing a pfn whether it refers
to the zero page is required to correctly distinguish the zero page
from other special RAM ranges that may also have the PG_reserved bit
set, but need to be treated as MMIO memory.

Signed-off-by: Ard Biesheuvel 
---
 arch/mips/mm/init.c | 1 +
 arch/s390/mm/init.c | 1 +
 mm/memory.c | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 571aab064936..f42e35e42790 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -53,6 +53,7 @@
  */
 unsigned long empty_zero_page, zero_page_mask;
 EXPORT_SYMBOL_GPL(empty_zero_page);
+EXPORT_SYMBOL(zero_page_mask);
 
 /*
  * Not static inline because used by IP27 special magic initialization code
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 0c1073ed1e84..c7235e01fd67 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -43,6 +43,7 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] 
__attribute__((__aligned__(PAGE_SIZE)));
 
 unsigned long empty_zero_page, zero_page_mask;
 EXPORT_SYMBOL(empty_zero_page);
+EXPORT_SYMBOL(zero_page_mask);
 
 static void __init setup_zero_pages(void)
 {
diff --git a/mm/memory.c b/mm/memory.c
index adeac306610f..d17f1bcd2a91 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -118,6 +118,8 @@ __setup("norandmaps", disable_randmaps);
 unsigned long zero_pfn __read_mostly;
 unsigned long highest_memmap_pfn __read_mostly;
 
+EXPORT_SYMBOL(zero_pfn);
+
 /*
  * CONFIG_MMU architectures set up ZERO_PAGE in their paging_init()
  */
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-12 Thread Christian Borntraeger
On 09/12/2014 01:54 PM, Ming Lei wrote:
> On Thu, Sep 11, 2014 at 6:26 PM, Christian Borntraeger
>  wrote:
>> Folks,
>>
>> we have seen the following bug with 3.16 as a KVM guest. It suspect the 
>> blk-mq rework that happened between 3.15 and 3.16, but it can be something 
>> completely different.
>>
> 
> Care to share how you reproduce the issue?

Host with 16GB RAM 32GB swap. 15 guest all with 2 GB RAM (and varying amount of 
CPUs). All do heavy file I/O.
It did not happen with 3.15/3.15 in guest/host and does happen with 3.16/3.16. 
So our next step is to check
3.15/3.16 and 3.16/3.15 to identify if its host memory mgmt or guest block 
layer.

Christian

> 
>> [   65.992022] Unable to handle kernel pointer dereference in virtual kernel 
>> address space
>> [   65.992187] failing address: d000 TEID: d803
>> [   65.992363] Fault in home space mode while using kernel ASCE.
>> [   65.992365] AS:00a7c007 R3:0024
>> [   65.993754] Oops: 0038 [#1] SMP
>> [   65.993923] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi 
>> scsi_transport_iscsi virtio_balloon vhost_net vhost macvtap macvlan kvm 
>> dm_multipath virtio_net virtio_blk sunrpc
>> [   65.994274] CPU: 0 PID: 44 Comm: kworker/u6:2 Not tainted 
>> 3.16.0-20140814.0.c66c84c.fc18-s390xfrob #1
>> [   65.996043] Workqueue: writeback bdi_writeback_workfn (flush-251:32)
>> [   65.996222] task: 0225 ti: 02258000 task.ti: 
>> 02258000
>> [   65.996228] Krnl PSW : 0704f0018000 003ed114 
>> (blk_mq_tag_to_rq+0x20/0x38)
>> [   65.997299]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 
>> EA:3
>>Krnl GPRS: 0040  01619000 
>> 004e
>> [   65.997301]004e  0001 
>> 00a0de18
>> [   65.997302]77ffbe18 77ffbd50 6d72d620 
>> 004f
>> [   65.997304]01a99400 0080 003eddee 
>> 77ffbc28
>> [   65.997864] Krnl Code: 003ed106: e3102034lg  
>> %r1,48(%r2)
>>   003ed10c: 91082044tm  
>> 68(%r2),8
>>  #003ed110: a7840009brc 
>> 8,3ed122
>>  >003ed114: e34016880004lg  
>> %r4,1672(%r1)
>>   003ed11a: 59304100c   
>> %r3,256(%r4)
>>   003ed11e: a7840003brc 
>> 8,3ed124
>>   003ed122: 07febcr 
>> 15,%r14
>>   003ed124: b9040024lgr 
>> %r2,%r4
>> [   65.998221] Call Trace:
>> [   65.998224] ([<0001>] 0x1)
>> [   65.998227]  [<003f17b6>] blk_mq_tag_busy_iter+0x7a/0xc4
>> [   65.998228]  [<003edcd6>] blk_mq_rq_timer+0x96/0x13c
>> [   65.999226]  [<0013ee60>] call_timer_fn+0x40/0x110
>> [   65.999230]  [<0013f642>] run_timer_softirq+0x2de/0x3d0
>> [   65.999238]  [<00135b70>] __do_softirq+0x124/0x2ac
>> [   65.999241]  [<00136000>] irq_exit+0xc4/0xe4
>> [   65.999435]  [<0010bc08>] do_IRQ+0x64/0x84
>> [   66.437533]  [<0067ccd8>] ext_skip+0x42/0x46
>> [   66.437541]  [<003ed7b4>] __blk_mq_alloc_request+0x58/0x1e8
>> [   66.437544] ([<003ed788>] __blk_mq_alloc_request+0x2c/0x1e8)
>> [   66.437547]  [<003eef82>] blk_mq_map_request+0xc2/0x208
>> [   66.437549]  [<003ef860>] blk_sq_make_request+0xac/0x350
>> [   66.437721]  [<003e2d6c>] generic_make_request+0xc4/0xfc
>> [   66.437723]  [<003e2e56>] submit_bio+0xb2/0x1a8
>> [   66.438373]  [<0031e8aa>] ext4_io_submit+0x52/0x80
>> [   66.438375]  [<0031ccfa>] ext4_writepages+0x7c6/0xd0c
>> [   66.438378]  [<002aea20>] __writeback_single_inode+0x54/0x274
>> [   66.438379]  [<002b0134>] writeback_sb_inodes+0x28c/0x4ec
>> [   66.438380]  [<002b042e>] __writeback_inodes_wb+0x9a/0xe4
>> [   66.438382]  [<002b06a2>] wb_writeback+0x22a/0x358
>> [   66.438383]  [<002b0cd0>] bdi_writeback_workfn+0x354/0x538
>> [   66.438618]  [<0014e3aa>] process_one_work+0x1aa/0x418
>> [   66.438621]  [<0014ef94>] worker_thread+0x48/0x524
>> [   66.438625]  [<001560ca>] kthread+0xee/0x108
>> [   66.438627]  [<0067c76e>] kernel_thread_starter+0x6/0xc
>> [   66.438628]  [<0067c768>] kernel_thread_starter+0x0/0xc
>> [   66.438629] Last Breaking-Event-Address:
>> [   66.438631]  [<003edde8>] blk_mq_timeout_check+0x6c/0xb8
>>
>> I looked into the dump, and the full function is  (annotated by me to match 
>> the source code)
>> r2= tags
>> r3= tag (4e)
>> Dump of assembler code for function blk_mq_tag_to_rq:
>>0x003ed0f4 <+0>: lg  %r1,96(%r2) # 

Re: [kvm:master 11/11] ERROR: "zero_pfn" [arch/powerpc/kvm/kvm.ko] undefined!

2014-09-12 Thread Ard Biesheuvel
On 12 September 2014 19:25, kbuild test robot  wrote:
> tree:   git://git.kernel.org/pub/scm/virt/kvm/kvm.git master
> head:   e20e1bde3bb158cd3d08b9d94a90d3cabf1ba7cb
> commit: e20e1bde3bb158cd3d08b9d94a90d3cabf1ba7cb [11/11] KVM: check for 
> !is_zero_pfn() in kvm_is_mmio_pfn()
> config: powerpc-defconfig
> reproduce:
>   wget 
> https://github.com/fengguang/reproduce-kernel-bug/raw/master/cross-build/make.cross
>  -O ~/bin/make.cross
>   chmod +x ~/bin/make.cross
>   git checkout e20e1bde3bb158cd3d08b9d94a90d3cabf1ba7cb
>   make.cross ARCH=powerpc  defconfig
>   make.cross ARCH=powerpc
>
> All error/warnings:
>
>>> ERROR: "zero_pfn" [arch/powerpc/kvm/kvm.ko] undefined!
>

OK, so apparently, zero_pfn, which is used by the inline_zero_pfn() is
not exported, which is unfortunate.

I will go ahead and propose a patch to add this EXPORT(), but
unfortunately, s390's and mips's definition of is_zero_pfn() requires
zero_page_mask to be exported as well.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Convert openpic lock to raw_spinlock

2014-09-12 Thread Scott Wood
On Fri, 2014-09-12 at 09:12 -0500, Purcareata Bogdan-B43198 wrote:
> > -Original Message-
> > From: Wood Scott-B07421
> > Sent: Thursday, September 11, 2014 9:19 PM
> > To: Purcareata Bogdan-B43198
> > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org
> > Subject: Re: [PATCH] KVM: PPC: Convert openpic lock to raw_spinlock
> > 
> > On Thu, 2014-09-11 at 15:25 -0400, Bogdan Purcareata wrote:
> > > This patch enables running intensive I/O workloads, e.g. netperf, in a 
> > > guest
> > > deployed on a RT host. No change for !RT kernels.
> > >
> > > The openpic spinlock becomes a sleeping mutex on a RT system. This no 
> > > longer
> > > guarantees that EPR is atomic with exception delivery. The guest VCPU 
> > > thread
> > > fails due to a BUG_ON(preemptible()) when running netperf.
> > >
> > > In order to make the kvmppc_mpic_set_epr() call safe on RT from non-atomic
> > > context, convert the openpic lock to a raw_spinlock. A similar approach 
> > > can
> > > be seen for x86 platforms in the following commit [1].
> > >
> > > Here are some comparative cyclitest measurements run inside a high 
> > > priority
> > RT
> > > guest run on a RT host. The guest has 1 VCPU and the test has been run for
> > 15
> > > minutes. The guest runs ~750 hackbench processes as background stress.
> > 
> > Does hackbench involve triggering interrupts that would go through the
> > MPIC?  You may want to try an I/O-heavy benchmark to stress the MPIC
> > code (the more interrupt sources are active at once, the "better").
> 
> Before this patch, running netperf/iperf in the guest always resulted
> in hitting the afore-mentioned BUG_ON, when the host was RT. This is
> why I can't provide comparative cyclitest measurements before and after
> the patch, with heavy I/O stress. Since I had no problem running
> hackbench before, I'm assuming it doesn't involve interrupts passing
> through the MPIC. The measurements were posted just to show that the
> patch doesn't mess up anything somewhere else.

I know you can't provide before/after, but it would be nice to see what
the after numbers are with heavy MPIC activity.

> > Also try a guest with many vcpus.
> 
> AFAIK, without the MSI affinity patches [1], all vfio interrupts will
> go to core 0 in the guest. In this case, I guess there won't be
> contention induced latencies due to multiple VCPUs expecting to have
> their interrupts delivered. Am I getting it wrong?

It's not about contention, but about loops in the MPIC code that iterate
over the entire set of vcpus.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/8] arm/arm64: KVM: vgic: switch to dynamic allocation

2014-09-12 Thread Christoffer Dall
On Fri, Sep 12, 2014 at 10:13:11AM +0100, Marc Zyngier wrote:
> On 11/09/14 23:36, Christoffer Dall wrote:
> > On Thu, Sep 11, 2014 at 12:09:09PM +0100, Marc Zyngier wrote:
> >> So far, all the VGIC data structures are statically defined by the
> >> *maximum* number of vcpus and interrupts it supports. It means that
> >> we always have to oversize it to cater for the worse case.
> >>
> >> Start by changing the data structures to be dynamically sizeable,
> >> and allocate them at runtime.
> >>
> >> The sizes are still very static though.
> >>
> >> Signed-off-by: Marc Zyngier 
> >> ---
> >>  arch/arm/kvm/arm.c |   3 +
> >>  include/kvm/arm_vgic.h |  76 
> >>  virt/kvm/arm/vgic.c| 237 
> >> ++---
> >>  3 files changed, 267 insertions(+), 49 deletions(-)
> >>
> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >> index a99e0cd..923a01d 100644
> >> --- a/arch/arm/kvm/arm.c
> >> +++ b/arch/arm/kvm/arm.c
> >> @@ -172,6 +172,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> >>   kvm->vcpus[i] = NULL;
> >>   }
> >>   }
> >> +
> >> + kvm_vgic_destroy(kvm);
> >>  }
> >>
> >>  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >> @@ -253,6 +255,7 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
> >>  {
> >>   kvm_mmu_free_memory_caches(vcpu);
> >>   kvm_timer_vcpu_terminate(vcpu);
> >> + kvm_vgic_vcpu_destroy(vcpu);
> >>   kmem_cache_free(kvm_vcpu_cache, vcpu);
> >>  }
> >>
> >> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> >> index f074539..bdaac57 100644
> >> --- a/include/kvm/arm_vgic.h
> >> +++ b/include/kvm/arm_vgic.h
> >> @@ -54,19 +54,33 @@
> >>   * - a bunch of shared interrupts (SPI)
> >>   */
> >>  struct vgic_bitmap {
> >> - union {
> >> - u32 reg[VGIC_NR_PRIVATE_IRQS / 32];
> >> - DECLARE_BITMAP(reg_ul, VGIC_NR_PRIVATE_IRQS);
> >> - } percpu[VGIC_MAX_CPUS];
> >> - union {
> >> - u32 reg[VGIC_NR_SHARED_IRQS / 32];
> >> - DECLARE_BITMAP(reg_ul, VGIC_NR_SHARED_IRQS);
> >> - } shared;
> >> + /*
> >> +  * - One UL per VCPU for private interrupts (assumes UL is at
> >> +  *   least 32 bits)
> >> +  * - As many UL as necessary for shared interrupts.
> >> +  *
> >> +  * The private interrupts are accessed via the "private"
> >> +  * field, one UL per vcpu (the state for vcpu n is in
> >> +  * private[n]). The shared interrupts are accessed via the
> >> +  * "shared" pointer (IRQn state is at bit n-32 in the bitmap).
> >> +  */
> >> + unsigned long *private;
> >> + unsigned long *shared;
> > 
> > the comment above the define for REG_OFFSET_SWIZZLE still talks about
> > the unions in struct vgic_bitmap, which is no longer true.  Mind
> > updating that comment?
> 
> Damned, thought I fixed that. Will update it.
> 
> >>  };
> >>
> >>  struct vgic_bytemap {
> >> - u32 percpu[VGIC_MAX_CPUS][VGIC_NR_PRIVATE_IRQS / 4];
> >> - u32 shared[VGIC_NR_SHARED_IRQS  / 4];
> >> + /*
> >> +  * - 8 u32 per VCPU for private interrupts
> >> +  * - As many u32 as necessary for shared interrupts.
> >> +  *
> >> +  * The private interrupts are accessed via the "private"
> >> +  * field, (the state for vcpu n is in private[n*8] to
> >> +  * private[n*8 + 7]). The shared interrupts are accessed via
> >> +  * the "shared" pointer (IRQn state is at byte (n-32)%4 of the
> >> +  * shared[(n-32)/4] word).
> >> +  */
> >> + u32 *private;
> >> + u32 *shared;
> >>  };
> >>
> >>  struct kvm_vcpu;
> >> @@ -127,6 +141,9 @@ struct vgic_dist {
> >>   boolin_kernel;
> >>   boolready;
> >>
> >> + int nr_cpus;
> >> + int nr_irqs;
> >> +
> >>   /* Virtual control interface mapping */
> >>   void __iomem*vctrl_base;
> >>
> >> @@ -166,15 +183,36 @@ struct vgic_dist {
> >>   /* Level/edge triggered */
> >>   struct vgic_bitmap  irq_cfg;
> >>
> >> - /* Source CPU per SGI and target CPU */
> >> - u8  irq_sgi_sources[VGIC_MAX_CPUS][VGIC_NR_SGIS];
> >> + /*
> >> +  * Source CPU per SGI and target CPU:
> >> +  *
> >> +  * Each byte represent a SGI observable on a VCPU, each bit of
> >> +  * this byte indicating if the corresponding VCPU has
> >> +  * generated this interrupt. This is a GICv2 feature only.
> >> +  *
> >> +  * For VCPUn (n < 8), irq_sgi_sources[n*16] to [n*16 + 15] are
> >> +  * the SGIs observable on VCPUn.
> >> +  */
> >> + u8  *irq_sgi_sources;
> >>
> >> - /* Target CPU for each IRQ */
> >> - u8  irq_spi_cpu[VGIC_NR_SHARED_IRQS];
> >> - struct vgic_bitmap  irq_spi_target[VGIC_MAX_CPUS];
> >> + /*
> >> +  * Target CPU for each SPI:
> >> +  *
> >> +  * Array of

Re: [Qemu-devel] QEMU with KVM does not start Win8 on kernel 3.4.67 and core2duo

2014-09-12 Thread Jan Kiszka
On 2014-09-12 19:15, Jan Kiszka wrote:
> On 2014-09-12 14:29, Erik Rull wrote:
>>> On September 11, 2014 at 3:32 PM Jan Kiszka  wrote:
>>>
>>>
>>> On 2014-09-11 15:25, Erik Rull wrote:
> On August 6, 2014 at 1:19 PM Erik Rull  wrote:
>
>
> Hi all,
>
> I did already several tests and I'm not completely sure what's going 
> wrong,
> but
> here my scenario:
>
> When I start up QEMU w/ KVM 1.7.0 on a Core2Duo machine running a vanilla
> kernel
> 3.4.67 to run a Windows 8.0 guest, the guest freezes at boot without any
> error.
> When I dump the CPU registers via "info registers", nothing changes, that
> means
> the system really stalled. Same happens with QEMU 2.0.0.
>
> But - when I run the very same guest using Kernel 2.6.32.12 and QEMU 1.7.0
> on
> the host side it works on the Core2Duo. Also the system above but just 
> with
> an
> i3 or i5 CPU it works, too.
>
> I already disabled networking and USB for the guest and changed the
> graphics
> card - no effect. I assume that some mean bits and bytes have to be set up
> properly to get the thing running.
>
> Any hint what to change / test would be really appreciated.
>
> Thanks in advance,
>
> Best regards,
>
> Erik
>

 Hi all,

 I opened a qemu bug report on that and Jan helped me creating a kvm trace. 
 I
 attached it to the bug report.
 https://bugs.launchpad.net/qemu/+bug/1366836

 If you have further questions, please let me know.
>>>
>>> "File possibly truncated. Need at least 346583040, but file size is
>>> 133414912."
>>>
>>> Does "trace-cmd report" work for you? Is your file larger?
>>>
>>> Again, please also validate the behavior on latest next branch from kvm.git.
>>>
>>> Jan
>>>
>>
>> Hi all,
>>
>> confirmed. The issue is still existing in the kvm.git Version of the kernel.
>> The trace.tgz was uploaded to the bugtracker.
> 
> Thanks. Could you provide a good-case of your setup as well, i.e. with
> that older kernel version? At least I'm not yet seeing something
> obviously wrong.

Well, except that we have continuously EXTERNAL_INTERRUPTs, vector 0xf6,
throughout most of the trace. Maybe a self-IPI (this is single-core),
maybe something external that is stuck. You could do a full trace (-e
all) and check for what happens after things like

kvm_exit: reason EXTERNAL_INTERRUPT rip 0x8168ed83 info 0 80ef

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] QEMU with KVM does not start Win8 on kernel 3.4.67 and core2duo

2014-09-12 Thread Jan Kiszka
On 2014-09-12 14:29, Erik Rull wrote:
>> On September 11, 2014 at 3:32 PM Jan Kiszka  wrote:
>>
>>
>> On 2014-09-11 15:25, Erik Rull wrote:
 On August 6, 2014 at 1:19 PM Erik Rull  wrote:


 Hi all,

 I did already several tests and I'm not completely sure what's going wrong,
 but
 here my scenario:

 When I start up QEMU w/ KVM 1.7.0 on a Core2Duo machine running a vanilla
 kernel
 3.4.67 to run a Windows 8.0 guest, the guest freezes at boot without any
 error.
 When I dump the CPU registers via "info registers", nothing changes, that
 means
 the system really stalled. Same happens with QEMU 2.0.0.

 But - when I run the very same guest using Kernel 2.6.32.12 and QEMU 1.7.0
 on
 the host side it works on the Core2Duo. Also the system above but just with
 an
 i3 or i5 CPU it works, too.

 I already disabled networking and USB for the guest and changed the
 graphics
 card - no effect. I assume that some mean bits and bytes have to be set up
 properly to get the thing running.

 Any hint what to change / test would be really appreciated.

 Thanks in advance,

 Best regards,

 Erik

>>>
>>> Hi all,
>>>
>>> I opened a qemu bug report on that and Jan helped me creating a kvm trace. I
>>> attached it to the bug report.
>>> https://bugs.launchpad.net/qemu/+bug/1366836
>>>
>>> If you have further questions, please let me know.
>>
>> "File possibly truncated. Need at least 346583040, but file size is
>> 133414912."
>>
>> Does "trace-cmd report" work for you? Is your file larger?
>>
>> Again, please also validate the behavior on latest next branch from kvm.git.
>>
>> Jan
>>
> 
> Hi all,
> 
> confirmed. The issue is still existing in the kvm.git Version of the kernel.
> The trace.tgz was uploaded to the bugtracker.

Thanks. Could you provide a good-case of your setup as well, i.e. with
that older kernel version? At least I'm not yet seeing something
obviously wrong.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: ioapic: conditionally delay irq delivery duringeoi broadcast

2014-09-12 Thread Paolo Bonzini
Il 11/09/2014 10:47, Zhang Haoyu ha scritto:
> Currently, we call ioapic_service() immediately when we find the irq is still
> active during eoi broadcast. But for real hardware, there's some dealy between
> the EOI writing and irq delivery (system bus latency?). So we need to emulate
> this behavior. Otherwise, for a guest who haven't register a proper irq 
> handler
> , it would stay in the interrupt routine as this irq would be re-injected
> immediately after guest enables interrupt. This would lead guest can't move
> forward and may miss the possibility to get proper irq handler registered (one
> example is windows guest resuming from hibernation).
> 
> As there's no way to differ the unhandled irq from new raised ones, this patch
> solve this problems by scheduling a delayed work when the count of irq 
> injected
> during eoi broadcast exceeds a threshold value. After this patch, the guest 
> can
> move a little forward when there's no suitable irq handler in case it may
> register one very soon and for guest who has a bad irq detection routine ( 
> such
> as note_interrupt() in linux ), this bad irq would be recognized soon as in 
> the
> past.
> 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Jason Wang 
> Signed-off-by: Zhang Haoyu 
> ---
>  include/trace/events/kvm.h | 20 +++
>  virt/kvm/ioapic.c  | 50 
> --
>  virt/kvm/ioapic.h  |  6 ++
>  3 files changed, 74 insertions(+), 2 deletions(-)
> 
> diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
> index 908925a..ab679c3 100644
> --- a/include/trace/events/kvm.h
> +++ b/include/trace/events/kvm.h
> @@ -95,6 +95,26 @@ TRACE_EVENT(kvm_ioapic_set_irq,
> __entry->coalesced ? " (coalesced)" : "")
>  );
>  
> +TRACE_EVENT(kvm_ioapic_delayed_eoi_inj,
> + TP_PROTO(__u64 e),
> + TP_ARGS(e),
> +
> + TP_STRUCT__entry(
> + __field(__u64,  e   )
> + ),
> +
> + TP_fast_assign(
> + __entry->e  = e;
> + ),
> +
> + TP_printk("dst %x vec=%u (%s|%s|%s%s)",
> +   (u8)(__entry->e >> 56), (u8)__entry->e,
> +   __print_symbolic((__entry->e >> 8 & 0x7), kvm_deliver_mode),
> +   (__entry->e & (1<<11)) ? "logical" : "physical",
> +   (__entry->e & (1<<15)) ? "level" : "edge",
> +   (__entry->e & (1<<16)) ? "|masked" : "")
> +);
> +
>  TRACE_EVENT(kvm_msi_set_irq,
>   TP_PROTO(__u64 address, __u64 data),
>   TP_ARGS(address, data),
> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> index e8ce34c..8e1dc67 100644
> --- a/virt/kvm/ioapic.c
> +++ b/virt/kvm/ioapic.c
> @@ -405,6 +405,24 @@ void kvm_ioapic_clear_all(struct kvm_ioapic *ioapic, int 
> irq_source_id)
>   spin_unlock(&ioapic->lock);
>  }
>  
> +static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
> +{
> + int i;
> + struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic,
> +  eoi_inject.work);
> + spin_lock(&ioapic->lock);
> + for (i = 0; i < IOAPIC_NUM_PINS; i++) {
> + union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
> +
> + if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG)
> + continue;
> +
> + if (ioapic->irr & (1 << i) && !ent->fields.remote_irr)
> + ioapic_service(ioapic, i, false);
> + }
> + spin_unlock(&ioapic->lock);
> +}
> +
>  static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>   struct kvm_ioapic *ioapic, int vector, int trigger_mode)
>  {
> @@ -435,8 +453,32 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu 
> *vcpu,
>  
>   ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>   ent->fields.remote_irr = 0;
> - if (ioapic->irr & (1 << i))
> - ioapic_service(ioapic, i, false);
> + if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
> + ++ioapic->irq_eoi[i];
> + if (ioapic->irq_eoi[i] == 
> IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT) {
> + /*
> +  * Real hardware does not deliver the irq so
> +  * immediately during eoi broadcast, so we need
> +  * to emulate this behavior. Otherwise, for
> +  * guests who has not registered handler of a
> +  * level irq, this irq would be injected
> +  * immediately after guest enables interrupt
> +  * (which happens usually at the end of the
> +  * common interrupt routine). This would lead
> +  * guest can't move forward and may miss the
> +  * possibility to get proper irq handle

Re: Howto connect to a terminal in an emalated linux-livecd?

2014-09-12 Thread Kashyap Chamarthy
On Fri, Sep 12, 2014 at 01:43:18PM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 11, 2014 at 01:48:51PM +0200, Oliver Rath wrote:
> > after hours for searching in google-world, i didnt find any appropriate
> > for this problem:
> > 
> > I want to boot a live-cd (i.e. ubuntu 14.04.1-desktop) in qemu, which
> > starts with an graphical interface, done i.e. by
> > 
> > qemu-system-x86_64 -m 3G -smp 2 -drive
> > file=ubuntu-14.04.1-desktop-i386.iso,media=cdrom,if=virtio --enable-kvm
> > 
> > Now i want to access to the console of the ubuntu-livecd. At the moment
> > i can do this over changing to text mode via
> > 
> > sendkey ctrl-alt-f1
> > 
> > in qemu-console (Alt-2), then switching back to qemu-window (alt-1). Now
> > i have access to tty1 of my livecd.
> > 
> > But IMHO there should be a more simple way to access to such a console
> > with qemu, i.e. through a pipe, a serial console etc., but i didnt found
> > anything working. The best i got was with -chardev pty,id=myid, which
> > resulted in a "char device redirected to /dev/pts/0 (label myid)".  But
> > with a "screen /dev/pts/0" i wasnt able to see any input or output.
> > 
> > ssh is unfortunatly not available at this time on the livecd (so i could
> > connect i.e. via -net user,hostfwd:tcp:10022-:22)
> > 
> > Any hints to connect directly to a console in an emulated linux?
> 
> I use the serial console:
> 
>   $ qemu-system-x86_64 -serial stdio ...
> 
> Make sure the guest has console=ttyS0 on the kernel command-line.

Just to add a little more to what Stefan wrote, here's a working CLI
(not be the most optimal) I use w/ serial console:

  $ /usr/bin/qemu-system-x86_64 -m 2048 \
  -nographic -nodefconfig -nodefaults \
  -machine accel=kvm -m 2048 \
  -drive file=./snap1-f20vm.qcow2,if=ide,format=qcow2 \
  -serial stdio

And, a little more info here[1] 

  [1]  
http://rwmj.wordpress.com/2011/07/08/setting-up-a-serial-console-in-qemu-and-libvirt/

--
/kashyap
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: PPC: Convert openpic lock to raw_spinlock

2014-09-12 Thread bogdan.purcare...@freescale.com
> -Original Message-
> From: Wood Scott-B07421
> Sent: Thursday, September 11, 2014 9:19 PM
> To: Purcareata Bogdan-B43198
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org
> Subject: Re: [PATCH] KVM: PPC: Convert openpic lock to raw_spinlock
> 
> On Thu, 2014-09-11 at 15:25 -0400, Bogdan Purcareata wrote:
> > This patch enables running intensive I/O workloads, e.g. netperf, in a guest
> > deployed on a RT host. No change for !RT kernels.
> >
> > The openpic spinlock becomes a sleeping mutex on a RT system. This no longer
> > guarantees that EPR is atomic with exception delivery. The guest VCPU thread
> > fails due to a BUG_ON(preemptible()) when running netperf.
> >
> > In order to make the kvmppc_mpic_set_epr() call safe on RT from non-atomic
> > context, convert the openpic lock to a raw_spinlock. A similar approach can
> > be seen for x86 platforms in the following commit [1].
> >
> > Here are some comparative cyclitest measurements run inside a high priority
> RT
> > guest run on a RT host. The guest has 1 VCPU and the test has been run for
> 15
> > minutes. The guest runs ~750 hackbench processes as background stress.
> 
> Does hackbench involve triggering interrupts that would go through the
> MPIC?  You may want to try an I/O-heavy benchmark to stress the MPIC
> code (the more interrupt sources are active at once, the "better").

Before this patch, running netperf/iperf in the guest always resulted in 
hitting the afore-mentioned BUG_ON, when the host was RT. This is why I can't 
provide comparative cyclitest measurements before and after the patch, with 
heavy I/O stress. Since I had no problem running hackbench before, I'm assuming 
it doesn't involve interrupts passing through the MPIC. The measurements were 
posted just to show that the patch doesn't mess up anything somewhere else.

> Also try a guest with many vcpus.

AFAIK, without the MSI affinity patches [1], all vfio interrupts will go to 
core 0 in the guest. In this case, I guess there won't be contention induced 
latencies due to multiple VCPUs expecting to have their interrupts delivered. 
Am I getting it wrong?

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-August/120247.html

Thanks,
Bogdan P.


Re: [PATCH] KVM: Refactor making request to makes it meaningful

2014-09-12 Thread Paolo Bonzini
Il 12/09/2014 07:43, guohliu ha scritto:
> This patch replace the set_bit method by kvm_make_request
> to makes it more readable and consistency.
> 
> Signed-off-by: Guo Hui Liu 
> ---
>  arch/x86/kvm/x86.c | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 916e895..5fed2de 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1518,7 +1518,7 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)
>   pvclock_update_vm_gtod_copy(kvm);
>  
>   kvm_for_each_vcpu(i, vcpu, kvm)
> - set_bit(KVM_REQ_CLOCK_UPDATE, &vcpu->requests);
> + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>  
>   /* guest entries allowed */
>   kvm_for_each_vcpu(i, vcpu, kvm)
> @@ -1661,7 +1661,7 @@ static void kvmclock_update_fn(struct work_struct *work)
>   struct kvm_vcpu *vcpu;
>  
>   kvm_for_each_vcpu(i, vcpu, kvm) {
> - set_bit(KVM_REQ_CLOCK_UPDATE, &vcpu->requests);
> + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>   kvm_vcpu_kick(vcpu);
>   }
>  }
> @@ -1670,7 +1670,7 @@ static void kvm_gen_kvmclock_update(struct kvm_vcpu *v)
>  {
>   struct kvm *kvm = v->kvm;
>  
> - set_bit(KVM_REQ_CLOCK_UPDATE, &v->requests);
> + kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
>   schedule_delayed_work(&kvm->arch.kvmclock_update_work,
>   KVMCLOCK_UPDATE_DELAY);
>  }
> @@ -2846,7 +2846,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>   if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
>   adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment);
>   vcpu->arch.tsc_offset_adjustment = 0;
> - set_bit(KVM_REQ_CLOCK_UPDATE, &vcpu->requests);
> + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>   }
>  
>   if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
> @@ -5600,7 +5600,7 @@ static void pvclock_gtod_update_fn(struct work_struct 
> *work)
>   spin_lock(&kvm_lock);
>   list_for_each_entry(kvm, &vm_list, vm_list)
>   kvm_for_each_vcpu(i, vcpu, kvm)
> - set_bit(KVM_REQ_MASTERCLOCK_UPDATE, &vcpu->requests);
> + kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
>   atomic_set(&kvm_guest_has_master_clock, 0);
>   spin_unlock(&kvm_lock);
>  }
> @@ -6978,7 +6978,7 @@ int kvm_arch_hardware_enable(void)
>   list_for_each_entry(kvm, &vm_list, vm_list) {
>   kvm_for_each_vcpu(i, vcpu, kvm) {
>   if (!stable && vcpu->cpu == smp_processor_id())
> - set_bit(KVM_REQ_CLOCK_UPDATE, &vcpu->requests);
> + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>   if (stable && vcpu->arch.last_host_tsc > local_tsc) {
>   backwards_tsc = true;
>   if (vcpu->arch.last_host_tsc > max_tsc)
> @@ -7032,8 +7032,7 @@ int kvm_arch_hardware_enable(void)
>   kvm_for_each_vcpu(i, vcpu, kvm) {
>   vcpu->arch.tsc_offset_adjustment += delta_cyc;
>   vcpu->arch.last_host_tsc = local_tsc;
> - set_bit(KVM_REQ_MASTERCLOCK_UPDATE,
> - &vcpu->requests);
> + kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, 
> vcpu);
>   }
>  
>   /*
> 

Thanks, applied to kvm/queue.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()

2014-09-12 Thread Paolo Bonzini
Il 12/09/2014 15:16, Ard Biesheuvel ha scritto:
> Read-only memory ranges may be backed by the zero page, so avoid
> misidentifying it a a MMIO pfn.
> 
> Signed-off-by: Ard Biesheuvel 
> Fixes: b88657674d39 ("ARM: KVM: user_mem_abort: support stage 2 MMIO page 
> mapping")
> ---
> 
> This fixes another issue I identified when testing QEMU+KVM_UEFI, where
> a read to an uninitialized emulated NOR flash brought in the zero page,
> but mapped as a read-write device region, because kvm_is_mmio_pfn()
> misidentifies it as a MMIO pfn due to its PG_reserved bit being set.
> 
>  virt/kvm/kvm_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 36b887dd0c84..f8adaabeac13 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -110,7 +110,7 @@ static bool largepages_enabled = true;
>  bool kvm_is_mmio_pfn(pfn_t pfn)
>  {
>   if (pfn_valid(pfn))
> - return PageReserved(pfn_to_page(pfn));
> + return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn));
>  
>   return true;
>  }
> 

Thanks, applying to kvm/master.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()

2014-09-12 Thread Ard Biesheuvel
Read-only memory ranges may be backed by the zero page, so avoid
misidentifying it a a MMIO pfn.

Signed-off-by: Ard Biesheuvel 
Fixes: b88657674d39 ("ARM: KVM: user_mem_abort: support stage 2 MMIO page 
mapping")
---

This fixes another issue I identified when testing QEMU+KVM_UEFI, where
a read to an uninitialized emulated NOR flash brought in the zero page,
but mapped as a read-write device region, because kvm_is_mmio_pfn()
misidentifies it as a MMIO pfn due to its PG_reserved bit being set.

 virt/kvm/kvm_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 36b887dd0c84..f8adaabeac13 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -110,7 +110,7 @@ static bool largepages_enabled = true;
 bool kvm_is_mmio_pfn(pfn_t pfn)
 {
if (pfn_valid(pfn))
-   return PageReserved(pfn_to_page(pfn));
+   return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn));
 
return true;
 }
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Howto connect to a terminal in an emalated linux-livecd?

2014-09-12 Thread Stefan Hajnoczi
On Thu, Sep 11, 2014 at 01:48:51PM +0200, Oliver Rath wrote:
> after hours for searching in google-world, i didnt find any appropriate
> for this problem:
> 
> I want to boot a live-cd (i.e. ubuntu 14.04.1-desktop) in qemu, which
> starts with an graphical interface, done i.e. by
> 
> qemu-system-x86_64 -m 3G -smp 2 -drive
> file=ubuntu-14.04.1-desktop-i386.iso,media=cdrom,if=virtio --enable-kvm
> 
> Now i want to access to the console of the ubuntu-livecd. At the moment
> i can do this over changing to text mode via
> 
> sendkey ctrl-alt-f1
> 
> in qemu-console (Alt-2), then switching back to qemu-window (alt-1). Now
> i have access to tty1 of my livecd.
> 
> But IMHO there should be a more simple way to access to such a console
> with qemu, i.e. through a pipe, a serial console etc., but i didnt found
> anything working. The best i got was with -chardev pty,id=myid, which
> resulted in a "char device redirected to /dev/pts/0 (label myid)".  But
> with a "screen /dev/pts/0" i wasnt able to see any input or output.
> 
> ssh is unfortunatly not available at this time on the livecd (so i could
> connect i.e. via -net user,hostfwd:tcp:10022-:22)
> 
> Any hints to connect directly to a console in an emulated linux?

I use the serial console:

  $ qemu-system-x86_64 -serial stdio ...

Make sure the guest has console=ttyS0 on the kernel command-line.

Stefan


pgpWubrzAj75D.pgp
Description: PGP signature


Re: Recommended Kernel and KVM version

2014-09-12 Thread Stefan Hajnoczi
On Thu, Sep 11, 2014 at 04:31:40PM -0300, Flávio Ramalho wrote:
> I am running an OpenStack infrastructure and some compute nodes are
> frequently having kernel panic, as far as I see, the kernel panics are
> related with KVM.

Can you post the kernel panic?

If you can't easily get at the text, take a picture of the screen with
your phone.

> Do you guys have any recommendation about the kernel and KVM version
> to be used in
> an production environment?

The latest stable packages from your Linux distribution.  File bugs or
request support from your distribution.

This mailing list is for discussing KVM kernel module upstream
development.  Distributions may ship old versions or apply patches so
they are the first point of contact for support.

Stefan


pgpQDzvdEbQCa.pgp
Description: PGP signature


Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial

2014-09-12 Thread Stefan Hajnoczi
On Fri, Sep 12, 2014 at 11:21:37AM +0800, Zhang Haoyu wrote:
> >>> > > If virtio-blk and virtio-serial share an IRQ, the guest operating 
> >>> > > system has to check each virtqueue for activity. Maybe there is some 
> >>> > > inefficiency doing that.
> >>> > > AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) 
> >>> > > even if everything is unused.
> >>> > 
> >>> > That could be the case if MSI is disabled.
> >>> 
> >>> Do the windows virtio drivers enable MSIs, in their inf file?
> >>
> >>It depends on the version of the drivers, but it is a reasonable guess
> >>at what differs between Linux and Windows.  Haoyu, can you give us the
> >>output of lspci from a Linux guest?
> >>
> >I made a test with fio on rhel-6.5 guest, the same degradation happened too, 
> > this degradation can be reproduced on rhel6.5 guest 100%.
> >virtio_console module installed:
> >64K-write-sequence: 285 MBPS, 4380 IOPS
> >virtio_console module uninstalled:
> >64K-write-sequence: 370 MBPS, 5670 IOPS
> >
> I use top -d 1 -H -p  to monitor the cpu usage, and found that,
> virtio_console module installed:
> qemu main thread cpu usage: 98%
> virtio_console module uninstalled:
> qemu main thread cpu usage: 60%
> 
> perf top -p  result,
> virtio_console module installed:
>PerfTop:9868 irqs/sec  kernel:76.4%  exact:  0.0% [4000Hz cycles],  
> (target_pid: 88381)
> --
> 
> 11.80%  [kernel] [k] _raw_spin_lock_irqsave
>  8.42%  [kernel] [k] _raw_spin_unlock_irqrestore
>  7.33%  [kernel] [k] fget_light
>  6.28%  [kernel] [k] fput
>  3.61%  [kernel] [k] do_sys_poll
>  3.30%  qemu-system-x86_64   [.] qcow2_check_metadata_overlap
>  3.10%  [kernel] [k] __pollwait
>  2.15%  qemu-system-x86_64   [.] qemu_iohandler_poll
>  1.44%  libglib-2.0.so.0.3200.4  [.] g_array_append_vals
>  1.36%  libc-2.13.so [.] 0x0011fc2a
>  1.31%  libpthread-2.13.so   [.] pthread_mutex_lock
>  1.24%  libglib-2.0.so.0.3200.4  [.] 0x0001f961
>  1.20%  libpthread-2.13.so   [.] __pthread_mutex_unlock_usercnt
>  0.99%  [kernel] [k] eventfd_poll
>  0.98%  [vdso]   [.] 0x0771
>  0.97%  [kernel] [k] remove_wait_queue
>  0.96%  qemu-system-x86_64   [.] qemu_iohandler_fill
>  0.95%  [kernel] [k] add_wait_queue
>  0.69%  [kernel] [k] __srcu_read_lock
>  0.58%  [kernel] [k] poll_freewait
>  0.57%  [kernel] [k] _raw_spin_lock_irq
>  0.54%  [kernel] [k] __srcu_read_unlock
>  0.47%  [kernel] [k] copy_user_enhanced_fast_string
>  0.46%  [kvm_intel]  [k] vmx_vcpu_run
>  0.46%  [kvm][k] vcpu_enter_guest
>  0.42%  [kernel] [k] tcp_poll
>  0.41%  [kernel] [k] system_call_after_swapgs
>  0.40%  libglib-2.0.so.0.3200.4  [.] g_slice_alloc
>  0.40%  [kernel] [k] system_call
>  0.38%  libpthread-2.13.so   [.] 0xe18d
>  0.38%  libglib-2.0.so.0.3200.4  [.] g_slice_free1
>  0.38%  qemu-system-x86_64   [.] address_space_translate_internal
>  0.38%  [kernel] [k] _raw_spin_lock
>  0.37%  qemu-system-x86_64   [.] phys_page_find
>  0.36%  [kernel] [k] get_page_from_freelist
>  0.35%  [kernel] [k] sock_poll
>  0.34%  [kernel] [k] fsnotify
>  0.31%  libglib-2.0.so.0.3200.4  [.] g_main_context_check
>  0.30%  [kernel] [k] do_direct_IO
>  0.29%  libpthread-2.13.so   [.] pthread_getspecific
> 
> virtio_console module uninstalled:
>PerfTop:9138 irqs/sec  kernel:71.7%  exact:  0.0% [4000Hz cycles],  
> (target_pid: 88381)
> --
> 
>  5.72%  qemu-system-x86_64   [.] qcow2_check_metadata_overlap
>  4.51%  [kernel] [k] fget_light
>  3.98%  [kernel] [k] _raw_spin_lock_irqsave
>  2.55%  [kernel] [k] fput
>  2.48%  libpthread-2.13.so   [.] pthread_mutex_lock
>  2.46%  [kernel] [k] _raw_spin_unlock_irqrestore
>  2.21%  libpthread-2.13.so   [.] __pthread_mutex_unlock_usercnt
>  1.71%  [vdso]   [.] 0x060c
>  1.68%  libc-2.13.so [.] 0x000e751f
>  1.64%  libglib-2.0.so.0.3200.4  [.] 0x0004fca0
>  1.20%  [kernel] [k] __srcu_read_lock
>  1.14%  [kernel] [k] do_s

Re: [Qemu-devel] QEMU with KVM does not start Win8 on kernel 3.4.67 and core2duo

2014-09-12 Thread Erik Rull
> On September 11, 2014 at 3:32 PM Jan Kiszka  wrote:
>
>
> On 2014-09-11 15:25, Erik Rull wrote:
> >> On August 6, 2014 at 1:19 PM Erik Rull  wrote:
> >>
> >>
> >> Hi all,
> >>
> >> I did already several tests and I'm not completely sure what's going wrong,
> >> but
> >> here my scenario:
> >>
> >> When I start up QEMU w/ KVM 1.7.0 on a Core2Duo machine running a vanilla
> >> kernel
> >> 3.4.67 to run a Windows 8.0 guest, the guest freezes at boot without any
> >> error.
> >> When I dump the CPU registers via "info registers", nothing changes, that
> >> means
> >> the system really stalled. Same happens with QEMU 2.0.0.
> >>
> >> But - when I run the very same guest using Kernel 2.6.32.12 and QEMU 1.7.0
> >> on
> >> the host side it works on the Core2Duo. Also the system above but just with
> >> an
> >> i3 or i5 CPU it works, too.
> >>
> >> I already disabled networking and USB for the guest and changed the
> >> graphics
> >> card - no effect. I assume that some mean bits and bytes have to be set up
> >> properly to get the thing running.
> >>
> >> Any hint what to change / test would be really appreciated.
> >>
> >> Thanks in advance,
> >>
> >> Best regards,
> >>
> >> Erik
> >>
> >
> > Hi all,
> >
> > I opened a qemu bug report on that and Jan helped me creating a kvm trace. I
> > attached it to the bug report.
> > https://bugs.launchpad.net/qemu/+bug/1366836
> >
> > If you have further questions, please let me know.
>
> "File possibly truncated. Need at least 346583040, but file size is
> 133414912."
>
> Does "trace-cmd report" work for you? Is your file larger?
>
> Again, please also validate the behavior on latest next branch from kvm.git.
>
> Jan
>

Hi all,

confirmed. The issue is still existing in the kvm.git Version of the kernel.
The trace.tgz was uploaded to the bugtracker.

Best regards,

Erik
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 11/17] COLO ctl: implement colo checkpoint protocol

2014-09-12 Thread Dr. David Alan Gilbert
* Hongyang Yang (yan...@cn.fujitsu.com) wrote:
> 
> 
> ??? 09/12/2014 07:17 PM, Dr. David Alan Gilbert ??:
> >* Hongyang Yang (yan...@cn.fujitsu.com) wrote:
> >>
> >>
> >>??? 08/01/2014 11:03 PM, Dr. David Alan Gilbert ??:
> >>>* Yang Hongyang (yan...@cn.fujitsu.com) wrote:
> >
> >
> >
> +static int do_colo_transaction(MigrationState *s, QEMUFile *control,
> +   QEMUFile *trans)
> +{
> +int ret;
> +
> +ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
> +if (ret) {
> +goto out;
> +}
> +
> +ret = colo_ctl_get(control, COLO_CHECKPOINT_SUSPENDED);
> >>>
> >>>What happens at this point if the slave just doesn't respond?
> >>>(i.e. the socket doesn't drop - you just don't get the byte).
> >>
> >>If the socket return bytes that were not expected, exit. If
> >>socket return error, do some cleanup and quit COLO process.
> >>refer to: colo_ctl_get() and colo_ctl_get_value()
> >
> >But what happens if the slave just doesn't respond at all; e.g.
> >if the slave host loses power, it'll take a while (many seconds)
> >before the socket will timeout.
> 
> It will wait until the call returns timeout error, and then do some
> cleanup and quit COLO process.

If it was to wait here for ~30seconds for the timeout what would happen
to the primary? Would it be stopped from sending any network traffic
for those 30 seconds - I think that's too long to fail over.

> There may be better way to handle this?

In postcopy I always take reads coming back from the destination
in a separate thread, because that thread can't block the main thread
going out (I originally did that using async reads but the thread
is nicer).  You could also use something like a poll() with a shorter
timeout to however long you are happy for COLO to go before it fails.

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-12 Thread Ming Lei
On Thu, Sep 11, 2014 at 6:26 PM, Christian Borntraeger
 wrote:
> Folks,
>
> we have seen the following bug with 3.16 as a KVM guest. It suspect the 
> blk-mq rework that happened between 3.15 and 3.16, but it can be something 
> completely different.
>

Care to share how you reproduce the issue?

> [   65.992022] Unable to handle kernel pointer dereference in virtual kernel 
> address space
> [   65.992187] failing address: d000 TEID: d803
> [   65.992363] Fault in home space mode while using kernel ASCE.
> [   65.992365] AS:00a7c007 R3:0024
> [   65.993754] Oops: 0038 [#1] SMP
> [   65.993923] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi 
> scsi_transport_iscsi virtio_balloon vhost_net vhost macvtap macvlan kvm 
> dm_multipath virtio_net virtio_blk sunrpc
> [   65.994274] CPU: 0 PID: 44 Comm: kworker/u6:2 Not tainted 
> 3.16.0-20140814.0.c66c84c.fc18-s390xfrob #1
> [   65.996043] Workqueue: writeback bdi_writeback_workfn (flush-251:32)
> [   65.996222] task: 0225 ti: 02258000 task.ti: 
> 02258000
> [   65.996228] Krnl PSW : 0704f0018000 003ed114 
> (blk_mq_tag_to_rq+0x20/0x38)
> [   65.997299]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 
> EA:3
>Krnl GPRS: 0040  01619000 
> 004e
> [   65.997301]004e  0001 
> 00a0de18
> [   65.997302]77ffbe18 77ffbd50 6d72d620 
> 004f
> [   65.997304]01a99400 0080 003eddee 
> 77ffbc28
> [   65.997864] Krnl Code: 003ed106: e3102034lg  
> %r1,48(%r2)
>   003ed10c: 91082044tm  
> 68(%r2),8
>  #003ed110: a7840009brc 
> 8,3ed122
>  >003ed114: e34016880004lg  
> %r4,1672(%r1)
>   003ed11a: 59304100c   
> %r3,256(%r4)
>   003ed11e: a7840003brc 
> 8,3ed124
>   003ed122: 07febcr 
> 15,%r14
>   003ed124: b9040024lgr 
> %r2,%r4
> [   65.998221] Call Trace:
> [   65.998224] ([<0001>] 0x1)
> [   65.998227]  [<003f17b6>] blk_mq_tag_busy_iter+0x7a/0xc4
> [   65.998228]  [<003edcd6>] blk_mq_rq_timer+0x96/0x13c
> [   65.999226]  [<0013ee60>] call_timer_fn+0x40/0x110
> [   65.999230]  [<0013f642>] run_timer_softirq+0x2de/0x3d0
> [   65.999238]  [<00135b70>] __do_softirq+0x124/0x2ac
> [   65.999241]  [<00136000>] irq_exit+0xc4/0xe4
> [   65.999435]  [<0010bc08>] do_IRQ+0x64/0x84
> [   66.437533]  [<0067ccd8>] ext_skip+0x42/0x46
> [   66.437541]  [<003ed7b4>] __blk_mq_alloc_request+0x58/0x1e8
> [   66.437544] ([<003ed788>] __blk_mq_alloc_request+0x2c/0x1e8)
> [   66.437547]  [<003eef82>] blk_mq_map_request+0xc2/0x208
> [   66.437549]  [<003ef860>] blk_sq_make_request+0xac/0x350
> [   66.437721]  [<003e2d6c>] generic_make_request+0xc4/0xfc
> [   66.437723]  [<003e2e56>] submit_bio+0xb2/0x1a8
> [   66.438373]  [<0031e8aa>] ext4_io_submit+0x52/0x80
> [   66.438375]  [<0031ccfa>] ext4_writepages+0x7c6/0xd0c
> [   66.438378]  [<002aea20>] __writeback_single_inode+0x54/0x274
> [   66.438379]  [<002b0134>] writeback_sb_inodes+0x28c/0x4ec
> [   66.438380]  [<002b042e>] __writeback_inodes_wb+0x9a/0xe4
> [   66.438382]  [<002b06a2>] wb_writeback+0x22a/0x358
> [   66.438383]  [<002b0cd0>] bdi_writeback_workfn+0x354/0x538
> [   66.438618]  [<0014e3aa>] process_one_work+0x1aa/0x418
> [   66.438621]  [<0014ef94>] worker_thread+0x48/0x524
> [   66.438625]  [<001560ca>] kthread+0xee/0x108
> [   66.438627]  [<0067c76e>] kernel_thread_starter+0x6/0xc
> [   66.438628]  [<0067c768>] kernel_thread_starter+0x0/0xc
> [   66.438629] Last Breaking-Event-Address:
> [   66.438631]  [<003edde8>] blk_mq_timeout_check+0x6c/0xb8
>
> I looked into the dump, and the full function is  (annotated by me to match 
> the source code)
> r2= tags
> r3= tag (4e)
> Dump of assembler code for function blk_mq_tag_to_rq:
>0x003ed0f4 <+0>: lg  %r1,96(%r2) # r1 
> has now tags->rqs
>0x003ed0fa <+6>: sllg%r2,%r3,3   # r2 
> has tag*8
>0x003ed100 <+12>:lg  %r2,0(%r2,%r1)  # r2 
> now has rq (=tags->rqs[tag])
>0x003ed106 <+18>:lg  %r1,48(%r2) # r1 
> now has rq->q
>0x003ed10c <+24>:tm  68(%r2),8   # 
> test for rq->cmd_flags & REQ_FLUSH_SEQ
>0x000

Re: [RFC PATCH 11/17] COLO ctl: implement colo checkpoint protocol

2014-09-12 Thread Hongyang Yang



在 09/12/2014 07:17 PM, Dr. David Alan Gilbert 写道:

* Hongyang Yang (yan...@cn.fujitsu.com) wrote:



??? 08/01/2014 11:03 PM, Dr. David Alan Gilbert ??:

* Yang Hongyang (yan...@cn.fujitsu.com) wrote:





+static int do_colo_transaction(MigrationState *s, QEMUFile *control,
+   QEMUFile *trans)
+{
+int ret;
+
+ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
+if (ret) {
+goto out;
+}
+
+ret = colo_ctl_get(control, COLO_CHECKPOINT_SUSPENDED);


What happens at this point if the slave just doesn't respond?
(i.e. the socket doesn't drop - you just don't get the byte).


If the socket return bytes that were not expected, exit. If
socket return error, do some cleanup and quit COLO process.
refer to: colo_ctl_get() and colo_ctl_get_value()


But what happens if the slave just doesn't respond at all; e.g.
if the slave host loses power, it'll take a while (many seconds)
before the socket will timeout.


It will wait until the call returns timeout error, and then do some
cleanup and quit COLO process. There may be better way to handle
this?



Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
.



--
Thanks,
Yang.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 11/17] COLO ctl: implement colo checkpoint protocol

2014-09-12 Thread Dr. David Alan Gilbert
* Hongyang Yang (yan...@cn.fujitsu.com) wrote:
> 
> 
> ??? 08/01/2014 11:03 PM, Dr. David Alan Gilbert ??:
> >* Yang Hongyang (yan...@cn.fujitsu.com) wrote:



> >>+static int do_colo_transaction(MigrationState *s, QEMUFile *control,
> >>+   QEMUFile *trans)
> >>+{
> >>+int ret;
> >>+
> >>+ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
> >>+if (ret) {
> >>+goto out;
> >>+}
> >>+
> >>+ret = colo_ctl_get(control, COLO_CHECKPOINT_SUSPENDED);
> >
> >What happens at this point if the slave just doesn't respond?
> >(i.e. the socket doesn't drop - you just don't get the byte).
> 
> If the socket return bytes that were not expected, exit. If
> socket return error, do some cleanup and quit COLO process.
> refer to: colo_ctl_get() and colo_ctl_get_value()

But what happens if the slave just doesn't respond at all; e.g.
if the slave host loses power, it'll take a while (many seconds)
before the socket will timeout.

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-12 Thread Christian Borntraeger
On 09/11/2014 12:26 PM, Christian Borntraeger wrote:
> Folks,
> 
> we have seen the following bug with 3.16 as a KVM guest. It suspect the 
> blk-mq rework that happened between 3.15 and 3.16, but it can be something 
> completely different.
> 
> 
> [   65.992022] Unable to handle kernel pointer dereference in virtual kernel 
> address space
> [   65.992187] failing address: d000 TEID: d803
> [   65.992363] Fault in home space mode while using kernel ASCE.
> [   65.992365] AS:00a7c007 R3:0024 
> [   65.993754] Oops: 0038 [#1] SMP 
> [   65.993923] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi 
> scsi_transport_iscsi virtio_balloon vhost_net vhost macvtap macvlan kvm 
> dm_multipath virtio_net virtio_blk sunrpc
> [   65.994274] CPU: 0 PID: 44 Comm: kworker/u6:2 Not tainted 
> 3.16.0-20140814.0.c66c84c.fc18-s390xfrob #1
> [   65.996043] Workqueue: writeback bdi_writeback_workfn (flush-251:32)
> [   65.996222] task: 0225 ti: 02258000 task.ti: 
> 02258000
> [   65.996228] Krnl PSW : 0704f0018000 003ed114 
> (blk_mq_tag_to_rq+0x20/0x38)
> [   65.997299]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 
> EA:3
>Krnl GPRS: 0040  01619000 
> 004e
> [   65.997301]004e  0001 
> 00a0de18
> [   65.997302]77ffbe18 77ffbd50 6d72d620 
> 004f
> [   65.997304]01a99400 0080 003eddee 
> 77ffbc28
> [   65.997864] Krnl Code: 003ed106: e3102034lg  
> %r1,48(%r2)
>   003ed10c: 91082044tm  
> 68(%r2),8
>  #003ed110: a7840009brc 
> 8,3ed122
>  >003ed114: e34016880004lg  
> %r4,1672(%r1)
>   003ed11a: 59304100c   
> %r3,256(%r4)
>   003ed11e: a7840003brc 
> 8,3ed124
>   003ed122: 07febcr 
> 15,%r14
>   003ed124: b9040024lgr 
> %r2,%r4
> [   65.998221] Call Trace:
> [   65.998224] ([<0001>] 0x1)
> [   65.998227]  [<003f17b6>] blk_mq_tag_busy_iter+0x7a/0xc4
> [   65.998228]  [<003edcd6>] blk_mq_rq_timer+0x96/0x13c
> [   65.999226]  [<0013ee60>] call_timer_fn+0x40/0x110
> [   65.999230]  [<0013f642>] run_timer_softirq+0x2de/0x3d0
> [   65.999238]  [<00135b70>] __do_softirq+0x124/0x2ac
> [   65.999241]  [<00136000>] irq_exit+0xc4/0xe4
> [   65.999435]  [<0010bc08>] do_IRQ+0x64/0x84
> [   66.437533]  [<0067ccd8>] ext_skip+0x42/0x46
> [   66.437541]  [<003ed7b4>] __blk_mq_alloc_request+0x58/0x1e8
> [   66.437544] ([<003ed788>] __blk_mq_alloc_request+0x2c/0x1e8)
> [   66.437547]  [<003eef82>] blk_mq_map_request+0xc2/0x208

I am currently asking myself if blk_mq_map_request should protect against 
softirq here but I cant say for sure,as I have never looked into that code 
before.

Christian

> [   66.437549]  [<003ef860>] blk_sq_make_request+0xac/0x350
> [   66.437721]  [<003e2d6c>] generic_make_request+0xc4/0xfc
> [   66.437723]  [<003e2e56>] submit_bio+0xb2/0x1a8
> [   66.438373]  [<0031e8aa>] ext4_io_submit+0x52/0x80
> [   66.438375]  [<0031ccfa>] ext4_writepages+0x7c6/0xd0c
> [   66.438378]  [<002aea20>] __writeback_single_inode+0x54/0x274
> [   66.438379]  [<002b0134>] writeback_sb_inodes+0x28c/0x4ec
> [   66.438380]  [<002b042e>] __writeback_inodes_wb+0x9a/0xe4
> [   66.438382]  [<002b06a2>] wb_writeback+0x22a/0x358
> [   66.438383]  [<002b0cd0>] bdi_writeback_workfn+0x354/0x538
> [   66.438618]  [<0014e3aa>] process_one_work+0x1aa/0x418
> [   66.438621]  [<0014ef94>] worker_thread+0x48/0x524
> [   66.438625]  [<001560ca>] kthread+0xee/0x108
> [   66.438627]  [<0067c76e>] kernel_thread_starter+0x6/0xc
> [   66.438628]  [<0067c768>] kernel_thread_starter+0x0/0xc
> [   66.438629] Last Breaking-Event-Address:
> [   66.438631]  [<003edde8>] blk_mq_timeout_check+0x6c/0xb8
> 
> I looked into the dump, and the full function is  (annotated by me to match 
> the source code)
> r2= tags
> r3= tag (4e)
> Dump of assembler code for function blk_mq_tag_to_rq:
>0x003ed0f4 <+0>: lg  %r1,96(%r2)   # r1 
> has now tags->rqs
>0x003ed0fa <+6>: sllg%r2,%r3,3 # r2 
> has tag*8
>0x003ed100 <+12>:lg  %r2,0(%r2,%r1)
> # r2 now has rq (=tags->rqs[tag])
>0x003ed106 <+18>:lg  %r1,48(%r2)   # r1 
> now has rq->q
>  

Re: [PATCH v4 2/8] arm/arm64: KVM: vgic: switch to dynamic allocation

2014-09-12 Thread Marc Zyngier
On 11/09/14 23:36, Christoffer Dall wrote:
> On Thu, Sep 11, 2014 at 12:09:09PM +0100, Marc Zyngier wrote:
>> So far, all the VGIC data structures are statically defined by the
>> *maximum* number of vcpus and interrupts it supports. It means that
>> we always have to oversize it to cater for the worse case.
>>
>> Start by changing the data structures to be dynamically sizeable,
>> and allocate them at runtime.
>>
>> The sizes are still very static though.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm/kvm/arm.c |   3 +
>>  include/kvm/arm_vgic.h |  76 
>>  virt/kvm/arm/vgic.c| 237 
>> ++---
>>  3 files changed, 267 insertions(+), 49 deletions(-)
>>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index a99e0cd..923a01d 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -172,6 +172,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>>   kvm->vcpus[i] = NULL;
>>   }
>>   }
>> +
>> + kvm_vgic_destroy(kvm);
>>  }
>>
>>  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> @@ -253,6 +255,7 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
>>  {
>>   kvm_mmu_free_memory_caches(vcpu);
>>   kvm_timer_vcpu_terminate(vcpu);
>> + kvm_vgic_vcpu_destroy(vcpu);
>>   kmem_cache_free(kvm_vcpu_cache, vcpu);
>>  }
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index f074539..bdaac57 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -54,19 +54,33 @@
>>   * - a bunch of shared interrupts (SPI)
>>   */
>>  struct vgic_bitmap {
>> - union {
>> - u32 reg[VGIC_NR_PRIVATE_IRQS / 32];
>> - DECLARE_BITMAP(reg_ul, VGIC_NR_PRIVATE_IRQS);
>> - } percpu[VGIC_MAX_CPUS];
>> - union {
>> - u32 reg[VGIC_NR_SHARED_IRQS / 32];
>> - DECLARE_BITMAP(reg_ul, VGIC_NR_SHARED_IRQS);
>> - } shared;
>> + /*
>> +  * - One UL per VCPU for private interrupts (assumes UL is at
>> +  *   least 32 bits)
>> +  * - As many UL as necessary for shared interrupts.
>> +  *
>> +  * The private interrupts are accessed via the "private"
>> +  * field, one UL per vcpu (the state for vcpu n is in
>> +  * private[n]). The shared interrupts are accessed via the
>> +  * "shared" pointer (IRQn state is at bit n-32 in the bitmap).
>> +  */
>> + unsigned long *private;
>> + unsigned long *shared;
> 
> the comment above the define for REG_OFFSET_SWIZZLE still talks about
> the unions in struct vgic_bitmap, which is no longer true.  Mind
> updating that comment?

Damned, thought I fixed that. Will update it.

>>  };
>>
>>  struct vgic_bytemap {
>> - u32 percpu[VGIC_MAX_CPUS][VGIC_NR_PRIVATE_IRQS / 4];
>> - u32 shared[VGIC_NR_SHARED_IRQS  / 4];
>> + /*
>> +  * - 8 u32 per VCPU for private interrupts
>> +  * - As many u32 as necessary for shared interrupts.
>> +  *
>> +  * The private interrupts are accessed via the "private"
>> +  * field, (the state for vcpu n is in private[n*8] to
>> +  * private[n*8 + 7]). The shared interrupts are accessed via
>> +  * the "shared" pointer (IRQn state is at byte (n-32)%4 of the
>> +  * shared[(n-32)/4] word).
>> +  */
>> + u32 *private;
>> + u32 *shared;
>>  };
>>
>>  struct kvm_vcpu;
>> @@ -127,6 +141,9 @@ struct vgic_dist {
>>   boolin_kernel;
>>   boolready;
>>
>> + int nr_cpus;
>> + int nr_irqs;
>> +
>>   /* Virtual control interface mapping */
>>   void __iomem*vctrl_base;
>>
>> @@ -166,15 +183,36 @@ struct vgic_dist {
>>   /* Level/edge triggered */
>>   struct vgic_bitmap  irq_cfg;
>>
>> - /* Source CPU per SGI and target CPU */
>> - u8  irq_sgi_sources[VGIC_MAX_CPUS][VGIC_NR_SGIS];
>> + /*
>> +  * Source CPU per SGI and target CPU:
>> +  *
>> +  * Each byte represent a SGI observable on a VCPU, each bit of
>> +  * this byte indicating if the corresponding VCPU has
>> +  * generated this interrupt. This is a GICv2 feature only.
>> +  *
>> +  * For VCPUn (n < 8), irq_sgi_sources[n*16] to [n*16 + 15] are
>> +  * the SGIs observable on VCPUn.
>> +  */
>> + u8  *irq_sgi_sources;
>>
>> - /* Target CPU for each IRQ */
>> - u8  irq_spi_cpu[VGIC_NR_SHARED_IRQS];
>> - struct vgic_bitmap  irq_spi_target[VGIC_MAX_CPUS];
>> + /*
>> +  * Target CPU for each SPI:
>> +  *
>> +  * Array of available SPI, each byte indicating the target
>> +  * VCPU for SPI. IRQn (n >=32) is at irq_spi_cpu[n-32].
>> +  */
>> + u8  *irq_spi_cpu;
>> +
>> + /*
>> +  * Reverse lookup of irq_spi_cpu for faster compute pending:
>> +  *
>> +  * Array of bitmaps, one per VCPU, descri