RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-22 Thread Gonglei (Arei)
> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Tuesday, December 22, 2015 11:51 PM
> To: Gonglei (Arei)
> Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
> Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar
> Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> problem on qemu-kvm platform
> 
> On Tue, Dec 22, 2015 at 02:14:12AM +, Gonglei (Arei) wrote:
> > > From: Kevin O'Connor [mailto:ke...@koconnor.net]
> > > Sent: Tuesday, December 22, 2015 2:47 AM
> > > To: Gonglei (Arei)
> > > Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
> > > Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar
> > > Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> > > problem on qemu-kvm platform
> > >
> > > On Mon, Dec 21, 2015 at 09:41:32AM +, Gonglei (Arei) wrote:
> > > > When the gurb of OS is booting, then the softirq and C function
> > > > send_disk_op() may use extra stack of SeaBIOS. If we inject a NMI,
> > > > romlayout.S: irqentry_extrastack is invoked, and the extra stack will
> > > > be used again. And the stack of first calling will be broken, so that 
> > > > the
> > > SeaBIOS stuck.
> > > >
> > > > You can easily reproduce the problem.
> > > >
> > > > 1. start on guest
> > > > 2. reset the guest
> > > > 3. inject a NMI when the guest show the grub surface 4. then the guest
> > > > stuck
> > >
> > > Does the SeaBIOS patch below help?
> >
> > Sorry, it doesn't work. What's worse is we cannot stop SeaBIOS stuck by
> > Setting "CONFIG_ENTRY_EXTRASTACK=n" after applying this patch.
> 
> Oops, can you try with the patch below instead?
> 

It works now. Thanks!

But do we need to check other possible situations
that maybe cause *extra stack* broken or overridden?


> > > I'm not familiar with how to "inject a
> > > NMI" - can you describe the process in more detail?
> > >
> >
> > 1. Qemu Command line:
> >
> > #: /home/qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096
> -smp 8 -name suse -vnc 0.0.0.0:10 \
> > -device virtio-scsi-pci,id=scsi0 -drive
> file=/home/suse11_sp3_32_2,if=none,id=drive-scsi0-0-0-0,format=raw,cache=
> none,aio=native \
> > -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \
> > -chardev file,id=seabios,path=/home/seabios.log -device
> isa-debugcon,iobase=0x402,chardev=seabios \
> > -monitor stdio -qmp unix:/tmp/qmp,server,nowait
> >
> > 2. Inject a NMI by QMP:
> >
> > #: /home/qemu/scripts/qmp # ./qmp-shell /tmp/qmp
> > Welcome to the QMP low-level shell!
> > Connected to QEMU 2.5.0
> >
> > (QEMU) system_reset
> > {"return": {}}
> > (QEMU) inject-nmi
> > {"return": {}}
> > (QEMU) inject-nmi
> > {"return": {}}
> >
> 
> I tried a few simple tests but was not able to reproduce.
> 
After reset the guest, then you inject an NMI when you see the grub surface
ASAP. 

Kevin, I sent you a picture in private. :)


Regards,
-Gonglei

> -Kevin
> 
> 
> --- a/src/romlayout.S
> +++ b/src/romlayout.S
> @@ -548,7 +548,10 @@ entry_post:
>  ENTRY_INTO32 _cfunc32flat_handle_post   // Normal entry point
> 
>  ORG 0xe2c3
> -IRQ_ENTRY 02
> +.global entry_02
> +entry_02:
> +ENTRY handle_02  // NMI handler does not switch onto extra stack
> +iretw
> 
>  ORG 0xe3fe
>  .global entry_13_official
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-21 Thread Gonglei (Arei)
> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Tuesday, December 22, 2015 2:47 AM
> To: Gonglei (Arei)
> Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
> Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar
> Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> problem on qemu-kvm platform
> 
> On Mon, Dec 21, 2015 at 09:41:32AM +, Gonglei (Arei) wrote:
> > When the gurb of OS is booting, then the softirq and C function
> > send_disk_op() may use extra stack of SeaBIOS. If we inject a NMI,
> > romlayout.S: irqentry_extrastack is invoked, and the extra stack will
> > be used again. And the stack of first calling will be broken, so that the
> SeaBIOS stuck.
> >
> > You can easily reproduce the problem.
> >
> > 1. start on guest
> > 2. reset the guest
> > 3. inject a NMI when the guest show the grub surface 4. then the guest
> > stuck
> 
> Does the SeaBIOS patch below help?  

Sorry, it doesn't work. What's worse is we cannot stop SeaBIOS stuck by
Setting "CONFIG_ENTRY_EXTRASTACK=n" after applying this patch. 


> I'm not familiar with how to "inject a
> NMI" - can you describe the process in more detail?
> 

1. Qemu Command line:

#: /home/qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -smp 8 
-name suse -vnc 0.0.0.0:10 \
-device virtio-scsi-pci,id=scsi0 -drive 
file=/home/suse11_sp3_32_2,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none,aio=native
 \
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \
-chardev file,id=seabios,path=/home/seabios.log -device 
isa-debugcon,iobase=0x402,chardev=seabios \
-monitor stdio -qmp unix:/tmp/qmp,server,nowait 

2. Inject a NMI by QMP:

#: /home/qemu/scripts/qmp # ./qmp-shell /tmp/qmp
Welcome to the QMP low-level shell!
Connected to QEMU 2.5.0

(QEMU) system_reset
{"return": {}}
(QEMU) inject-nmi  
{"return": {}}
(QEMU) inject-nmi
{"return": {}}


Regards,
-Gonglei

> -Kevin
> 
> 
> --- a/src/romlayout.S
> +++ b/src/romlayout.S
> @@ -548,7 +548,9 @@ entry_post:
>  ENTRY_INTO32 _cfunc32flat_handle_post   // Normal entry point
> 
>  ORG 0xe2c3
> -IRQ_ENTRY 02
> +.global entry_02
> +entry_02:
> +ENTRY handle_02  // NMI handler does not switch onto extra
> +stack
> 
>  ORG 0xe3fe
>  .global entry_13_official
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-21 Thread Gonglei (Arei)
Dear Kevin,

> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Sunday, December 20, 2015 10:33 PM
> To: Gonglei (Arei)
> Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
> Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar
> Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> problem on qemu-kvm platform
> 
> On Sun, Dec 20, 2015 at 09:49:54AM +, Gonglei (Arei) wrote:
> > > From: Kevin O'Connor [mailto:ke...@koconnor.net]
> > > Sent: Saturday, December 19, 2015 11:12 PM
> > > On Sat, Dec 19, 2015 at 12:03:15PM +, Gonglei (Arei) wrote:
> > > > Maybe the root cause is not NMI but INTR, so yield() can open hardware
> > > interrupt,
> > > > And then execute interrupt handler, but the interrupt handler make the
> > > SeaBIOS
> > > > stack broken, so that the BSP can't execute the instruction and occur
> > > exception,
> > > > VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs
> except
> > > > the surface phenomenon.
> > >
> > > I can't see any reason why allowing interrupts at this location would
> > > be a problem.
> > >
> > Does it have any relationship with *extra stack* of SeaBIOS?
> 
> None that I can see.  Also, the kvm trace seems to show the code
> trying to execute at rip=0x03 - that will crash long before the extra
> stack is used.
> 
When the gurb of OS is booting, then the softirq and C function send_disk_op()
may use extra stack of SeaBIOS. If we inject a NMI, romlayout.S: 
irqentry_extrastack
is invoked, and the extra stack will be used again. And the stack of first 
calling
will be broken, so that the SeaBIOS stuck. 

You can easily reproduce the problem.

1. start on guest
2. reset the guest
3. inject a NMI when the guest show the grub surface
4. then the guest stuck

If we disabled extra stack by setting

 CONFIG_ENTRY_EXTRASTACK=n

Then the problem is gone.

Besides, I have another thought:

Is it possible when one cpu is using the extra stack, but other cpus (APs)
still be waked up by hardware interrupt after yield() or br->flags = F_IF 
and used the extra stack again?


Regards,
-Gonglei

> > > > Kevin, can we drop yield() in smp_setup() ?
> > >
> > > It's possible to eliminate this instance of yield, but I think it
> > > would just push the crash to the next time interrupts are enabled.
> > >
> > Perhaps. I'm not sure.
> >
> > > > Is it really useful and allowable for SeaBIOS? Maybe for other
> components?
> > > > I'm not sure. Because we found that when SeaBIOS is booting, if we 
> > > > inject
> a
> > > > NMI by QMP, the guest will *stuck*. And the kvm tracing log is the same
> with
> > > > the current problem.
> > >
> > > If you apply the patches you had to prevent that NMI crash problem,
> > > does it also prevent the above crash?
> > >
> > Yes, but we cannot prevent the NMI injection (though I'll submit some
> patches to
> > forbid users' NMI injection after NMI_EN disabled by RTC bit7 of port 0x70).
> >
> 
> -Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-20 Thread Gonglei (Arei)

> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Saturday, December 19, 2015 11:12 PM
> On Sat, Dec 19, 2015 at 12:03:15PM +, Gonglei (Arei) wrote:
> > Maybe the root cause is not NMI but INTR, so yield() can open hardware
> interrupt,
> > And then execute interrupt handler, but the interrupt handler make the
> SeaBIOS
> > stack broken, so that the BSP can't execute the instruction and occur
> exception,
> > VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs 
> > except
> > the surface phenomenon.
> 
> I can't see any reason why allowing interrupts at this location would
> be a problem.
> 
Does it have any relationship with *extra stack* of SeaBIOS?

> > Kevin, can we drop yield() in smp_setup() ?
> 
> It's possible to eliminate this instance of yield, but I think it
> would just push the crash to the next time interrupts are enabled.
> 
Perhaps. I'm not sure.

> > Is it really useful and allowable for SeaBIOS? Maybe for other components?
> > I'm not sure. Because we found that when SeaBIOS is booting, if we inject a
> > NMI by QMP, the guest will *stuck*. And the kvm tracing log is the same with
> > the current problem.
> 
> If you apply the patches you had to prevent that NMI crash problem,
> does it also prevent the above crash?
> 
Yes, but we cannot prevent the NMI injection (though I'll submit some patches to
forbid users' NMI injection after NMI_EN disabled by RTC bit7 of port 0x70).


Regards,
-Gonglei
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-19 Thread Gonglei (Arei)
Hi Kevin,


> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> 
> On Fri, Dec 18, 2015 at 03:04:58AM +, Gonglei (Arei) wrote:
> > Hi Kevin & Paolo,
> >
> > Luckily, I reproduced this problem last night. And I got the below log when
> SeaBIOS is stuck.
> [...]
> > [2015-12-18 10:38:10]  gonglei: finish while
> [...]
> > <...>-31509 [035] 154753.180077: kvm_exit: reason EXCEPTION_NMI rip 0x3
> info 0 8306
> > <...>-31509 [035] 154753.180077: kvm_emulate_insn: 0:3:f0 53 (real)
> > <...>-31509 [035] 154753.180077: kvm_inj_exception: #UD (0x0)
> > <...>-31509 [035] 154753.180077: kvm_entry: vcpu 0
> 
> This is an odd finding.  It seems to indicate that the code is caught
> in an infinite irq loop once irqs are enabled.  What doesn't make
> sense is that an NMI shouldn't depend on the cpu irq enable flag.

Maybe the root cause is not NMI but INTR, so yield() can open hardware 
interrupt,
And then execute interrupt handler, but the interrupt handler make the SeaBIOS
stack broken, so that the BSP can't execute the instruction and occur exception,
VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs except
the surface phenomenon.

Kevin, can we drop yield() in smp_setup() ?

diff --git a/src/fw/smp.c b/src/fw/smp.c
index 579acdb..dd23eda 100644
--- a/src/fw/smp.c
+++ b/src/fw/smp.c
@@ -136,7 +136,6 @@ smp_setup(void)
 "  jc 1b\n"
 : "+m" (SMPLock), "+m" (SMPStack)
 : : "cc", "memory");
-yield();
 
 // Restore memory.
 *(u64*)BUILD_AP_BOOT_ADDR = old;

Is it really useful and allowable for SeaBIOS? Maybe for other components?
I'm not sure. Because we found that when SeaBIOS is booting, if we inject a
NMI by QMP, the guest will *stuck*. And the kvm tracing log is the same with
the current problem.


Regards,
-Gonglei

> Also, I can't explain why rip would be 0x03, nor why a #UD in an
> exception handler wouldn't result in a triple fault.  Maybe someone
> with more kvm knowledge could help here.
> 
> I did notice that you appear to be running with SeaBIOS v1.8.1 - I
> recommend you upgrade to the latest.  There were two important fixes
> in this area (8b9942fa and 3156b71a).  I don't think either of these
> fixes would explain the log above, but it would be best to eliminate
> the possibility.
> 
> -Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-18 Thread Gonglei (Arei)
>
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Saturday, December 19, 2015 7:13 AM
> To: Gonglei (Arei)
> Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
> Huangweidong (C); kvm@vger.kernel.org
> Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> problem on qemu-kvm platform
> 
> On Fri, Dec 18, 2015 at 03:04:58AM +, Gonglei (Arei) wrote:
> > Hi Kevin & Paolo,
> >
> > Luckily, I reproduced this problem last night. And I got the below log when
> SeaBIOS is stuck.
> [...]
> > [2015-12-18 10:38:10]  gonglei: finish while
> [...]
> > <...>-31509 [035] 154753.180077: kvm_exit: reason EXCEPTION_NMI rip 0x3
> info 0 8306
> > <...>-31509 [035] 154753.180077: kvm_emulate_insn: 0:3:f0 53 (real)
> > <...>-31509 [035] 154753.180077: kvm_inj_exception: #UD (0x0)
> > <...>-31509 [035] 154753.180077: kvm_entry: vcpu 0
> 
> This is an odd finding.  It seems to indicate that the code is caught
> in an infinite irq loop once irqs are enabled.  What doesn't make
> sense is that an NMI shouldn't depend on the cpu irq enable flag.
> Also, I can't explain why rip would be 0x03, nor why a #UD in an
> exception handler wouldn't result in a triple fault.  Maybe someone
> with more kvm knowledge could help here.
> 

Ccing Paolo and Radim.

> I did notice that you appear to be running with SeaBIOS v1.8.1 - I
> recommend you upgrade to the latest.  There were two important fixes
> in this area (8b9942fa and 3156b71a).  I don't think either of these
> fixes would explain the log above, but it would be best to eliminate
> the possibility.
> 
We can reproduce the problem using latest SeaBIOS too. :(


Regards,
-Gonglei
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-18 Thread Gonglei (Arei)
Hi Kevin & Paolo,

Luckily, I reproduced this problem last night. And I got the below log when 
SeaBIOS is stuck.

[BTW, the whole SeaBIOS log attached]

[2015-12-18 10:38:10] >>>>>gonglei: enter smp_setup()...
[2015-12-18 10:38:10] >>>>>gonglei: begine to enable local APIC...
[2015-12-18 10:38:10] >>>>>gonglei: finish enable local APIC...
[2015-12-18 10:38:10] >>>gonglei: cmos_smp_count=8
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=5
[2015-12-18 10:38:10] ===: CountCPUs=2, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=7
[2015-12-18 10:38:10] ===: CountCPUs=3, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=1
[2015-12-18 10:38:10] ===: CountCPUs=4, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=2
[2015-12-18 10:38:10] ===: CountCPUs=5, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=4
[2015-12-18 10:38:10] ===: CountCPUs=6, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=3
[2015-12-18 10:38:10] ===: CountCPUs=7, SMPStack=0x6d84
[2015-12-18 10:38:10] >>> enter handle_smp...
[2015-12-18 10:38:10] handle_smp: apic_id=6
[2015-12-18 10:38:10] ===: CountCPUs=8, SMPStack=0x6d84
[2015-12-18 10:38:10]  gonglei: finish while   

[pid 31509 is a vcpu thread used 100% cpu overhead]

# cat /proc/31509/stack  
[] vmx_vcpu_run+0x35c/0x580 [kvm_intel]
[] em_push+0x0/0x20 [kvm]
[] x86_emulate_instruction+0x20c/0x440 [kvm]
[] handle_exception+0xe4/0x1b58 [kvm_intel]
[] vcpu_enter_guest+0x565/0x790 [kvm]
[] vmx_get_segment_base+0x0/0xb0 [kvm_intel]
[] __vcpu_run+0x198/0x260 [kvm]
[] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
[] vcpu_load+0x4e/0x80 [kvm]
[] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
[] futex_wake+0xfd/0x110
[] security_file_permission+0x1c/0xa0
[] do_vfs_ioctl+0x8b/0x3b0
[] sys_ioctl+0xa1/0xb0
[] system_call_fastpath+0x16/0x1b
[] 0x

And kvm tracing information:

<...>-31509 [035] 154753.180077: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 
8306
<...>-31509 [035] 154753.180077: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180077: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180077: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180078: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 
8306
<...>-31509 [035] 154753.180078: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180079: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180079: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180079: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 
8306
<...>-31509 [035] 154753.180080: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180080: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180080: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180081: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 
8306
<...>-31509 [035] 154753.180081: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180081: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180081: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180082: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 
8306
<...>-31509 [035] 154753.180083: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180083: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180083: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180084: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 
8306
<...>-31509 [035] 154753.180084: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180084: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180084: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180085: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 
8306
<...>-31509 [035] 154753.180085: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180085: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180085: kvm_entry: vcpu 0
<...>-31509 [035] 154753.180086: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 
8306

Now, it's very clear that the guest stuck in yiled(), and then kvm encounter 
the exception #UD.

Do you have any thoughts? Thanks!


The Seabios patch below:

diff --git a/roms/seabios/src/boot.c b/roms/seabios/src/boot.c
index f23e9e1..552914a 100644
--- a/roms/seabios/src/boot.c
+++ b/roms/seabios/src/boot.c
@@ -93,7 +93,7 @@ glob_prefix(const char *glob, const char *str)
 static int
 find_prio(const char *glob)
 {
-dprintf(1, "Searching bootorder for: %s\n", glob);
+//dprintf(1, "Searching bootorder for: %s\n", glob);
 int i;
 for (i = 0; i <

Re: [PATCH RFC 3/3] pci-testdev: add RO pages for ioeventfd

2015-08-30 Thread Gonglei
On 2015/8/30 17:20, Michael S. Tsirkin wrote:
> This seems hackish - would it be better to create this region
> automatically within kvm? Suggestions are welcome.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  hw/misc/pci-testdev.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/hw/misc/pci-testdev.c b/hw/misc/pci-testdev.c
> index 94141a3..55efc32 100644
> --- a/hw/misc/pci-testdev.c
> +++ b/hw/misc/pci-testdev.c
> @@ -21,6 +21,7 @@
>  #include "hw/pci/pci.h"
>  #include "qemu/event_notifier.h"
>  #include "qemu/osdep.h"
> +#include 
>  
>  typedef struct PCITestDevHdr {
>  uint8_t test;
> @@ -82,11 +83,13 @@ typedef struct PCITestDevState {
>  PCIDevice parent_obj;
>  /*< public >*/
>  
> +MemoryRegion zeromr;
>  MemoryRegion mmio;
>  MemoryRegion mbar;
>  MemoryRegion portio;
>  IOTest *tests;
>  int current;
> +void *zero;
>  } PCITestDevState;
>  
>  #define TYPE_PCI_TEST_DEV "pci-testdev"
> @@ -242,6 +245,11 @@ static void pci_testdev_realize(PCIDevice *pci_dev, 
> Error **errp)
>  uint8_t *pci_conf;
>  char *name;
>  int r, i;
> +d->zero = mmap(NULL, IOTEST_MEMSIZE * 2, PROT_READ,
> + MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> +

Do we need think about hotplugging pci-testdev ? If yes, then we should release 
some resources
when hot-unplug a pci-testdev device:
munmap(d->zero, ...)
memory_region_del_subregion(&d->mbar, &d->mmio)
...

Regards,
-Gonglei

> +memory_region_init_ram_ptr(&d->zeromr, OBJECT(d), "pci-testdev-zero", 
> 0x1000, d->zero);
> +memory_region_set_readonly(&d->zeromr, true);
>  
>  pci_conf = pci_dev->config;
>  
> @@ -286,6 +294,11 @@ static void pci_testdev_realize(PCIDevice *pci_dev, 
> Error **errp)
>  test->hasnotifier = false;
>  continue;
>  }
> +
> +if (test->hasnotifier && !test->size) {
> +memory_region_add_subregion_overlap(&d->mbar, 
> le32_to_cpu(test->hdr->offset),
> +&d->zeromr, 2 /* prio */);
> +}
>  r = event_notifier_init(&test->notifier, 0);
>  assert(r >= 0);
>  test->hasnotifier = true;
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] Why KVM don't support last branch recording for intel CPUs?

2015-06-27 Thread Gonglei (Arei)
Hi all,

XEN supported last branch recording(LBR) since 2007. 
And KVM has supported vPMU, but hasn't support LBR for intel CPUs yet. 
May I ask why? Is there any considerations to implement it? Thanks.

PS: I tried to google some reasons about this, but I got nothing.  :(

Regards,
-Gonglei


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost-scsi: introduce an ioctl to get the minimum tpgt

2015-01-30 Thread Gonglei
On 2015/1/26 21:13, Gonglei (Arei) wrote:

> From: Gonglei 
> 
> In order to support to assign a boot order for
> vhost-scsi device, we should get the tpgt for
> user level (such as Qemu). and at present, we
> only support the minimum tpgt can boot.
> 

Ping...

> Signed-off-by: Gonglei 
> Signed-off-by: Bo Su 
> ---
>  drivers/vhost/scsi.c   | 41 +
>  include/uapi/linux/vhost.h |  2 ++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index d695b16..12e79b9 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -1522,6 +1522,38 @@ err_dev:
>   return ret;
>  }
>  
> +static int vhost_scsi_get_first_tpgt(
> + struct vhost_scsi *vs,
> + struct vhost_scsi_target *t)
> +{
> + struct tcm_vhost_tpg *tv_tpg;
> + struct tcm_vhost_tport *tv_tport;
> + int tpgt = -1;
> +
> + mutex_lock(&tcm_vhost_mutex);
> + mutex_lock(&vs->dev.mutex);
> +
> + list_for_each_entry(tv_tpg, &tcm_vhost_list, tv_tpg_list) {
> + tv_tport = tv_tpg->tport;
> +
> + if (!strcmp(tv_tport->tport_name, t->vhost_wwpn)) {
> + if (tpgt < 0)
> + tpgt = tv_tpg->tport_tpgt;
> + else if (tpgt > tv_tpg->tport_tpgt)
> + tpgt = tv_tpg->tport_tpgt;
> + }
> + }
> +
> + mutex_unlock(&vs->dev.mutex);
> + mutex_unlock(&tcm_vhost_mutex);
> +
> + if (tpgt < 0)
> + return -ENXIO;
> +
> + t->vhost_tpgt = tpgt;
> + return 0;
> +}
> +
>  static int vhost_scsi_set_features(struct vhost_scsi *vs, u64 features)
>  {
>   struct vhost_virtqueue *vq;
> @@ -1657,6 +1689,15 @@ vhost_scsi_ioctl(struct file *f,
>   if (put_user(events_missed, eventsp))
>   return -EFAULT;
>   return 0;
> + case VHOST_SCSI_GET_TPGT:
> + if (copy_from_user(&backend, argp, sizeof(backend)))
> + return -EFAULT;
> + r = vhost_scsi_get_first_tpgt(vs, &backend);
> + if (r < 0)
> + return r;
> + if (copy_to_user(argp, &backend, sizeof(backend)))
> + return -EFAULT;
> + return 0;
>   case VHOST_GET_FEATURES:
>   features = VHOST_SCSI_FEATURES;
>   if (copy_to_user(featurep, &features, sizeof features))
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index bb6a5b4..5d350f7 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -155,4 +155,6 @@ struct vhost_scsi_target {
>  #define VHOST_SCSI_SET_EVENTS_MISSED _IOW(VHOST_VIRTIO, 0x43, __u32)
>  #define VHOST_SCSI_GET_EVENTS_MISSED _IOW(VHOST_VIRTIO, 0x44, __u32)
>  
> +#define VHOST_SCSI_GET_TPGT _IOW(VHOST_VIRTIO, 0x45, struct 
> vhost_scsi_target)
> +
>  #endif



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH] kvm: ioapic: conditionally delay irq delivery duringeoi broadcast

2014-09-11 Thread Gonglei (Arei)
> Subject: [Qemu-devel] [PATCH] kvm: ioapic: conditionally delay irq delivery
> duringeoi broadcast
> 
> Currently, we call ioapic_service() immediately when we find the irq is still
> active during eoi broadcast. But for real hardware, there's some dealy between
> the EOI writing and irq delivery (system bus latency?). So we need to emulate
> this behavior. Otherwise, for a guest who haven't register a proper irq 
> handler
> , it would stay in the interrupt routine as this irq would be re-injected
> immediately after guest enables interrupt. This would lead guest can't move
> forward and may miss the possibility to get proper irq handler registered (one
> example is windows guest resuming from hibernation).
> 
> As there's no way to differ the unhandled irq from new raised ones, this patch
> solve this problems by scheduling a delayed work when the count of irq 
> injected
> during eoi broadcast exceeds a threshold value. After this patch, the guest 
> can
> move a little forward when there's no suitable irq handler in case it may
> register one very soon and for guest who has a bad irq detection routine ( 
> such
> as note_interrupt() in linux ), this bad irq would be recognized soon as in 
> the
> past.
> 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Jason Wang 
> Signed-off-by: Zhang Haoyu 
> ---
>  include/trace/events/kvm.h | 20 +++
>  virt/kvm/ioapic.c  | 50
> --
>  virt/kvm/ioapic.h  |  6 ++
>  3 files changed, 74 insertions(+), 2 deletions(-)
> 
If this is a new version, please add a v2/v3 suffix and describe the changes at
those different versions .

You can get more information from:
http://wiki.qemu.org/Contribute/SubmitAPatch

Best regards,
-Gonglei
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH v3 1/2] contrib: add ivshmem client and server

2014-08-09 Thread Gonglei
strerror(errno));
Why not debug_log() at this here?

> +return -1;
> +}
> +return 0;
> +}
> +
> +/* send a notification to all vectors of a peer */
> +int
> +ivshmem_client_notify_all_vects(const struct ivshmem_client *client,
> +const struct ivshmem_client_peer
> *peer)
> +{
> +unsigned vector;
> +int ret = 0;
> +
> +for (vector = 0; vector < peer->vectors_count; vector++) {
> +    if (ivshmem_client_notify(client, peer, vector) < 0) {
> +ret = -1;
The ret's value will be covered when multi clients failed. Do we need
store the failed status for server?.

> +}
> +}
> +
> +return ret;
> +}
> +
> +/* send a notification to all peers */
> +int
> +ivshmem_client_notify_broadcast(const struct ivshmem_client *client)
> +{
> +struct ivshmem_client_peer *peer;
> +int ret = 0;
> +
> +TAILQ_FOREACH(peer, &client->peer_list, next) {
> +if (ivshmem_client_notify_all_vects(client, peer) < 0) {
> +ret = -1;
> +}
> +}
> +
> +return ret;
> +}
> +
> +/* lookup peer from its id */
> +struct ivshmem_client_peer *
> +ivshmem_client_search_peer(struct ivshmem_client *client, long peer_id)
> +{
> +struct ivshmem_client_peer *peer;
> +
> +if (peer_id == client->local.id) {
> +return &client->local;
> +}
> +
> +TAILQ_FOREACH(peer, &client->peer_list, next) {
> +if (peer->id == peer_id) {
> +return peer;
> +}
> +}
> +return NULL;
> +}
> +
> +/* dump our info, the list of peers their vectors on stdout */
> +void
> +ivshmem_client_dump(const struct ivshmem_client *client)
> +{
> +const struct ivshmem_client_peer *peer;
> +unsigned vector;
> +
> +/* dump local infos */
> +peer = &client->local;
> +printf("our_id = %ld\n", peer->id);
> +for (vector = 0; vector < peer->vectors_count; vector++) {
> +printf("  vector %d is enabled (fd=%d)\n", vector,
> +   peer->vectors[vector]);
> +}
> +
> +/* dump peers */
> +TAILQ_FOREACH(peer, &client->peer_list, next) {
> +printf("peer_id = %ld\n", peer->id);
> +
> +for (vector = 0; vector < peer->vectors_count; vector++) {
> +printf("  vector %d is enabled (fd=%d)\n", vector,
> +   peer->vectors[vector]);
> +}
> +}
> +}

To be continued...

Best regards,
-Gonglei

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH v3 0/2] ivshmem: update documentation, add client/server tools

2014-08-08 Thread Gonglei (Arei)
Hi,

> Subject: Re: [Qemu-devel] [PATCH v3 0/2] ivshmem: update documentation,
> add client/server tools
> 
> Hello Gonglei,
> 
> On 08/08/2014 11:30 AM, Gonglei (Arei) wrote:
> > If you can describe the steps of using example about
> > your ivshmem-client and ivshmem-server will be great IMHO.
> 
> I already have included a note in the qemu-doc.texi file on how to start
> the ivshmem-server.
> The (debug) client is started by only specifying -S /path/to/ivshmem_socket.
> 
> We made comments into the source code, so I am not sure what could be
> added. What do you miss ?
> 
OK, thanks. 
I will test it and review the patch sets during the next few days.

Best regards,
-Gonglei
N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf

RE: [Qemu-devel] [PATCH v3 0/2] ivshmem: update documentation, add client/server tools

2014-08-08 Thread Gonglei (Arei)
Hi,

> Subject: [Qemu-devel] [PATCH v3 0/2] ivshmem: update documentation, add
> client/server tools
> 
> Here is a patchset containing an update on ivshmem specs documentation and
> importing ivshmem server and client tools.
> These tools have been written from scratch and are not related to what is
> available in nahanni repository.
> I put them in contrib/ directory as the qemu-doc.texi was already telling the
> server was supposed to be there.
> 
> Changes since v2:
> - fixed license issues in ivshmem client/server (I took hw/virtio/virtio-rng.c
>   file as a reference).
> 
> Changes since v1:
> - moved client/server import patch before doc update,
> - tried to re-organise the ivshmem_device_spec.txt file based on Claudio
>   comments (still not sure if the result is that great, comments welcome),
> - incorporated comments from Claudio, Eric and Cam,
> - added more details on the server <-> client messages exchange (but sorry, no
>   ASCII art here).
> 
> By the way, there are still some functionnalities that need description (use 
> of
> ioeventfd, the lack of irqfd support) and some parts of the ivshmem code 
> clearly
> need cleanup. I will try to address this in future patches when these first
> patches are ok.
> 
> 
If you can describe the steps of using example about
your ivshmem-client and ivshmem-server will be great IMHO.

Best regards,
-Gonglei

> --
> David Marchand
> 
> David Marchand (2):
>   contrib: add ivshmem client and server
>   docs: update ivshmem device spec
> 
>  contrib/ivshmem-client/Makefile |   29 +++
>  contrib/ivshmem-client/ivshmem-client.c |  418
> ++
>  contrib/ivshmem-client/ivshmem-client.h |  238 ++
>  contrib/ivshmem-client/main.c   |  246 ++
>  contrib/ivshmem-server/Makefile |   29 +++
>  contrib/ivshmem-server/ivshmem-server.c |  420
> +++
>  contrib/ivshmem-server/ivshmem-server.h |  185 ++
>  contrib/ivshmem-server/main.c   |  296
> ++
>  docs/specs/ivshmem_device_spec.txt  |  124 ++---
>  qemu-doc.texi   |   10 +-
>  10 files changed, 1961 insertions(+), 34 deletions(-)
>  create mode 100644 contrib/ivshmem-client/Makefile
>  create mode 100644 contrib/ivshmem-client/ivshmem-client.c
>  create mode 100644 contrib/ivshmem-client/ivshmem-client.h
>  create mode 100644 contrib/ivshmem-client/main.c
>  create mode 100644 contrib/ivshmem-server/Makefile
>  create mode 100644 contrib/ivshmem-server/ivshmem-server.c
>  create mode 100644 contrib/ivshmem-server/ivshmem-server.h
>  create mode 100644 contrib/ivshmem-server/main.c
> 
> --
> 1.7.10.4
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH]pci-assign: Fix memory out of bound when MSI-X table not fit in a single page

2014-04-01 Thread Gonglei (Arei)
> > Hi,
> >
> > I have a problem about SR-IOV pass-through.
> >
> > The PF is Emulex Corporation OneConnect NIC (Lancer)(rev 10),
> > and the VF pci config is as follow:
> >
> > LINUX:/sys/bus/pci/devices/:04:00.6 # hexdump config
> > 000    0010 0010 0200  0080
> > 010        
> > 020       10df e264
> > 030   0054     
> > 040   0008     
> > 050   6009 0008 2b41 c002  
> > 060 7805 018a      
> > 070     8411 03ff 4000 
> > 080 3400  9403     
> > 090   0010 0002 8724 1000  
> > 0a0 dc83 0041      
> > 0b0     001f 0010  
> > 0c0 000e       
> > 0d0        
> >
> > We can see the msix_max is 0x3ff and msix_table_entry is 0x4000 (4 pages).
> But QEMU
> > only mmap MSIX_PAGE_SIZE memory for all pci devices in funciton
> assigned_dev_register_msix_mmio,
> > meanwhile the set the one page memmory to zero, so the rest memory will
> be random value
> > (maybe etnry.data is not 0).
> >
> > In function assigned_dev_update_msix_mmio maybe occur the issue of
> entry_nr > 256,
> > and the kmod reports the EINVAL error.
> >
> > My patch fix this issue which alloc memory according to the real size of pci
> device config.
> >
> > Any ideas? Thnaks.
> >
> > Signed-off-by: Gonglei 
> > ---
> >  hw/i386/kvm/pci-assign.c |   24 +++-
> >  1 files changed, 19 insertions(+), 5 deletions(-)
> >
> > diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> > index a825871..daa191c 100644
> > --- a/hw/i386/kvm/pci-assign.c
> > +++ b/hw/i386/kvm/pci-assign.c
> > @@ -1591,10 +1591,6 @@ static void
> assigned_dev_msix_reset(AssignedDevice *dev)
> >  MSIXTableEntry *entry;
> >  int i;
> >
> > -if (!dev->msix_table) {
> > -return;
> > -}
> > -
> >  memset(dev->msix_table, 0, MSIX_PAGE_SIZE);
> >
> >  for (i = 0, entry = dev->msix_table; i < dev->msix_max; i++, entry++) {
> > @@ -1604,13 +1600,31 @@ static void
> assigned_dev_msix_reset(AssignedDevice *dev)
> >
> >  static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> >  {
> > -dev->msix_table = mmap(NULL, MSIX_PAGE_SIZE,
> PROT_READ|PROT_WRITE,
> > +int nr_pages;
> > +int size;
> > +int entry_per_page = MSIX_PAGE_SIZE / sizeof(struct MSIXTableEntry);
> > +
> > +if (dev->msix_max > entry_per_page) {
> > +nr_pages = dev->msix_max / entry_per_page;
> > +if (dev->msix_max % entry_per_page) {
> > +nr_pages += 1;
> > +}
> > +} else {
> > +nr_pages = 1;
> > +}
> 
> It's usually not a good idea to special-case corner-cases like this.
> 
IMHO, we should assure the memory page-aligned, so I use ROUND_UP according to 
dev->msix_max.

> 
> > +
> > +size = MSIX_PAGE_SIZE * nr_pages;
> 
> Just use ROUND_UP?
> 
> > +dev->msix_table = mmap(NULL, size, PROT_READ|PROT_WRITE,
> > MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
> 
> Need to fix unmap as well?
> 
Yep, it should be unmap for new size memory in 
assigned_dev_unregister_msix_mmio. Thanks, Michael.

BTW, do you think the KVM should upsize the max support MSI-X entry to 2048.
Because the MSI-X supports a maximum table size of 2048 entries, which is 
descript in 
PCI specification 3.0 version 6.8.3.2: "MSI-X Configuration".

The history patch about downsize the MSI-X entry size to 256:
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/38852/focus=38849

Best regards,
-Gonglei
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC]Two ideas to optimize updating irq routing table

2014-03-26 Thread Gonglei (Arei)
> On my system I have HZ=100 and lots of CPUs. So RCUs "every cpu has
> scheduled"
> is certainly slower than SRCUs algorithm
> (/*
>  * We use an adaptive strategy for synchronize_srcu() and especially for
>  * synchronize_srcu_expedited().  We spin for a fixed time period
>  * (defined below) to allow SRCU readers to exit their read-side critical
>  * sections.  If there are still some readers after 10 microseconds,
>  * we repeatedly block for 1-millisecond time periods.  This approach
>  * has done well in testing, so there is no need for a config parameter.
>  */
> )
> 
> With HZ==1000 and a NO. CPUs small SRCUs spinning might be in the same
> delay
> range than classic RCU depending on how long the read side critical
> section is (if we move from spinning to blocking)
> So using synchronize_srcu_expedited is certainly something to test as it
> increased the spinning time.
> 
> Christian
Yes, after we changed to synchronize_srcu_expedited, grace period latency 
improves much 
and overall this is good. However as I mentioned in another mail, in our 
setting-IRQ-affinity 
and ping test, we can still see some impact of KVM_SET_GSI_ROUTING ioctl. I 
wrote another 
patch in that mail and want to be examined to see if it is acceptable or has 
any problem, thank you.


Best regards,
-Gonglei

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC]Two ideas to optimize updating irq routing table

2014-03-26 Thread Gonglei (Arei)
ntries, kvm->to_update_entries, nr * sizeof(*entries));
+
+   atomic_set(&kvm->have_new, 0);
+   mutex_unlock(&kvm->irq_routing_lock);
+
+   kvm_set_irq_routing(kvm, entries, nr, flags);
+
+   return 0;
+}
+
+static int do_irq_routing_rcu(void *data)
+{
+   struct kvm *kvm = (struct kvm *)data;
+
+   while (1) {
+   wait_event_interruptible(kvm->wq,
+   atomic_read(&kvm->have_new) || kthread_should_stop());
+
+   if (kthread_should_stop())
+   break;
+
+   do_irq_routing_table_update(kvm);
+   }
+
+   return 0;
+}
+
 static struct kvm *kvm_create_vm(unsigned long type)
 {
int r, i;
@@ -529,6 +573,12 @@ static struct kvm *kvm_create_vm(unsigne
kvm_init_memslots_id(kvm);
if (init_srcu_struct(&kvm->srcu))
goto out_err_nosrcu;
+
+   atomic_set(&kvm->have_new, 0);
+   init_waitqueue_head(&kvm->wq);
+   mutex_init(&kvm->irq_routing_lock);
+   kvm->kthread = kthread_run(do_irq_routing_rcu, kvm, "irq_routing");
+
for (i = 0; i < KVM_NR_BUSES; i++) {
kvm->buses[i] = kzalloc(sizeof(struct kvm_io_bus),
GFP_KERNEL);
@@ -635,6 +685,11 @@ static void kvm_destroy_vm(struct kvm *k
list_del(&kvm->vm_list);
raw_spin_unlock(&kvm_lock);
kvm_free_irq_routing(kvm);
+
+   kthread_stop(kvm->kthread);
+   if (kvm->to_update_entries)
+   vfree(kvm->to_update_entries);
+
for (i = 0; i < KVM_NR_BUSES; i++)
kvm_io_bus_destroy(kvm->buses[i]);
kvm_coalesced_mmio_free(kvm);


Best regards,
-Gonglei
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC]Two ideas to optimize updating irq routing table

2014-03-24 Thread Gonglei (Arei)
Hi, 

Based on discussions in:
http://lists.gnu.org/archive/html/qemu-devel/2013-11/threads.html#03322

About KVM_SET_GSI_ROUTING ioctl, I tested changing RCU to SRCU, but 
unfortunately 
it looks like SRCU's grace period is no better than RCU. I haven't got any idea 
why this, but I suppose the test suggests that SRCU is not very ideal. And this 
article(https://lwn.net/Articles/264090/) says that SRCU's grace period is 
about 
the same to RCU. Although QRCU may have good grace period latency, it's not 
merged 
in Linux kernel yet.

So I come up with these two ideas.
1) Doing rate limit in kmod's kvm_set_irq_routing, if ioctl rate is OK, we do 
call_rcu, else we do synchronize_rcu, and thus avoid from OOM.

Or 
2) we start a kthread for each VM, and let the kthread waiting for notification 
from ioctl, fetching newest irq routing table, and do the RCU update thing; and 
in the ioctl, we simply copy routing table from user space, but without RCU 
update, 
instead, we notify kernel thread do that. Since the ioctls may be very 
frequent, 
irq routings that are not set by kthread in time are override with newest irq 
tables 
from user space. This way, we don't have to set a threshold for ioctl 
frequency, 
and the ioctl may return sooner than synchronize RCU, letting the ioctl vCPU 
have 
a better response.

How do you think? Or do you have any better ideas? Thanks in advance.


Best regards,
-Gonglei


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html