Re: How KVM hypervisor allocates physical pages to the VM.

2014-09-17 Thread Paolo Bonzini
Il 17/09/2014 05:56, Steven ha scritto: When size = 10MB and 20MB, it looks like that KVM use kmem_cache_alloc_node and kmalloc_node to allocate physical pages. However, when size = 40MB, KVM hypervisor uses mm_page_alloc to allocator physical pages. The former is based on the slab allocator,

Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-17 Thread Paolo Bonzini
Il 16/09/2014 20:42, Andres Lagar-Cavilla ha scritto: On Tue, Sep 16, 2014 at 11:29 AM, Paolo Bonzini pbonz...@redhat.com wrote: I think a first patch should introduce kvm_get_user_page_retry (Retry a fault after a gup with FOLL_NOWAIT.) and the second would add FOLL_TRIED (This

Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-17 Thread Christian Borntraeger
On 09/12/2014 10:09 PM, Christian Borntraeger wrote: On 09/12/2014 01:54 PM, Ming Lei wrote: On Thu, Sep 11, 2014 at 6:26 PM, Christian Borntraeger borntrae...@de.ibm.com wrote: Folks, we have seen the following bug with 3.16 as a KVM guest. It suspect the blk-mq rework that happened

Re: [PATCH v6 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-17 Thread Tang Chen
On 09/16/2014 07:24 PM, Paolo Bonzini wrote: Il 16/09/2014 12:42, Tang Chen ha scritto: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 33712fb..0df82c1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -210,6 +210,11 @@ void kvm_make_scan_ioapic_request(struct kvm

Re: [PATCH v6 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-17 Thread Paolo Bonzini
Il 17/09/2014 10:13, Tang Chen ha scritto: Please add a new function kvm_arch_mmu_notifier_invalidate_page, and call it outside the mmu_lock. Then I think we need a macro to control the calling of this arch function since other architectures do not have it. You can add an inline

Re: [PATCH v2 0/3] fix stuck in accessing hwrng attributes

2014-09-17 Thread Herbert Xu
On Tue, Sep 16, 2014 at 12:02:26AM +0800, Amos Kong wrote: If we read hwrng by long-running dd process, it takes too much cpu time and almost hold the mutex lock. When we check hwrng attributes from sysfs by cat, it gets stuck in waiting the lock releaseing. The problem can only be reproduced

Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-17 Thread Ming Lei
On Wed, Sep 17, 2014 at 3:59 PM, Christian Borntraeger borntrae...@de.ibm.com wrote: On 09/12/2014 10:09 PM, Christian Borntraeger wrote: On 09/12/2014 01:54 PM, Ming Lei wrote: On Thu, Sep 11, 2014 at 6:26 PM, Christian Borntraeger borntrae...@de.ibm.com wrote: Folks, we have seen the

Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-17 Thread Gleb Natapov
On Mon, Sep 15, 2014 at 01:11:25PM -0700, Andres Lagar-Cavilla wrote: When KVM handles a tdp fault it uses FOLL_NOWAIT. If the guest memory has been swapped out or is behind a filemap, this will trigger async readahead and return immediately. The rationale is that KVM will kick back the guest

Re: [PATCH] Using the tlb flush util function where applicable

2014-09-17 Thread Radim Krčmář
2014-09-17 08:15+0800, Wanpeng Li: Hi Radim, On Mon, Sep 15, 2014 at 09:33:52PM +0200, Radim Krčmář wrote: Do you prefer the current behavior? --- 8 --- KVM: x86: count actual tlb flushes - we count KVM_REQ_TLB_FLUSH requests, not actual flushes So there maybe multiple requests

Re: [PATCH] Using the tlb flush util function where applicable

2014-09-17 Thread Paolo Bonzini
Il 17/09/2014 12:45, Radim Krčmář ha scritto: a) count local KVM_REQ_TLB_FLUSH requests b) count all TLB flushes c) both (a) and (b) I was thinking that when you look at /sys/kernel/debug/kvm/tlb_flushes, you are interested in the number of TLB flushes that VMs did, not requests, so you

Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-17 Thread Radim Krčmář
2014-09-17 13:26+0300, Gleb Natapov: For async_pf_execute() you do not need to even retry. Next guest's page fault will retry it for you. Wouldn't that be a waste of vmentries? The guest might be able to handle interrupts while we are waiting, so if we used async-io-done notifier, this could

Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-17 Thread Radim Krčmář
[Repost for lists, the last mail was eaten by a security troll.] 2014-09-16 14:01-0700, Andres Lagar-Cavilla: On Tue, Sep 16, 2014 at 1:51 PM, Radim Krčmář rkrc...@redhat.com wrote: 2014-09-15 13:11-0700, Andres Lagar-Cavilla: +int kvm_get_user_page_retry(struct task_struct *tsk, struct

Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-17 Thread Gleb Natapov
On Wed, Sep 17, 2014 at 01:27:14PM +0200, Radim Krčmář wrote: 2014-09-17 13:26+0300, Gleb Natapov: For async_pf_execute() you do not need to even retry. Next guest's page fault will retry it for you. Wouldn't that be a waste of vmentries? This is how it will work with or without this

Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-17 Thread David Hildenbrand
Does anyone have an idea? The request itself is completely filled with cc That is very weird, the 'rq' is got from hctx-tags, and rq should be valid, and rq-q shouldn't have been changed even though it was double free or double allocation. I am currently asking myself if

Re: [PATCH 0/3] x86: structs for cpuid info in x86

2014-09-17 Thread Ingo Molnar
* Nadav Amit nadav.a...@gmail.com wrote: On 9/16/14 4:22 PM, Ingo Molnar wrote: * Nadav Amit na...@cs.technion.ac.il wrote: The code that deals with x86 cpuid fields is hard to follow since it performs many bit operations and does not refer to cpuid field explicitly. To

Re: [PATCH 0/3] x86: structs for cpuid info in x86

2014-09-17 Thread Borislav Petkov
On Wed, Sep 17, 2014 at 02:37:10PM +0200, Ingo Molnar wrote: Opinions, objections? Can I see those patches please? I can't find them on lkml or on the net - I only see this sub-thread... Thanks. -- Regards/Gruss, Boris. -- -- To unsubscribe from this list: send the line unsubscribe kvm in

[RESEND PATCH 0/3] x86: structs for cpuid info in x86

2014-09-17 Thread Nadav Amit
The code that deals with x86 cpuid fields is hard to follow since it performs many bit operations and does not refer to cpuid field explicitly. To eliminate the need of openning a spec whenever dealing with cpuid fields, this patch-set introduces structs that reflect the various cpuid functions.

[RESEND PATCH 1/3] x86: Adding structs to reflect cpuid fields

2014-09-17 Thread Nadav Amit
Adding structs that reflect various cpuid fields in x86 architecture. Structs were added only for functions that are not pure bitmaps. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/include/asm/cpuid_def.h | 163 +++ 1 file changed, 163

[RESEND PATCH 2/3] x86: Use new cpuid structs in cpuid functions

2014-09-17 Thread Nadav Amit
The current code that decodes cpuid fields is somewhat cryptic, since it uses many bit operations. Using cpuid structs instead for clarifying the code. Introducign no functional change. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kernel/cpu/common.c | 56

[RESEND PATCH 3/3] KVM: x86: Using cpuid structs in KVM

2014-09-17 Thread Nadav Amit
Using cpuid structs in KVM to eliminate cryptic code with many bit operations. The code does not introduce functional changes. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/cpuid.c | 36 ++-- 1 file changed, 22 insertions(+), 14 deletions(-)

Re: [RESEND PATCH 1/3] x86: Adding structs to reflect cpuid fields

2014-09-17 Thread Borislav Petkov
On Wed, Sep 17, 2014 at 03:54:12PM +0300, Nadav Amit wrote: Adding structs that reflect various cpuid fields in x86 architecture. Structs were added only for functions that are not pure bitmaps. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/include/asm/cpuid_def.h | 163

Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-17 Thread Ming Lei
On Wed, 17 Sep 2014 14:00:34 +0200 David Hildenbrand d...@linux.vnet.ibm.com wrote: Does anyone have an idea? The request itself is completely filled with cc That is very weird, the 'rq' is got from hctx-tags, and rq should be valid, and rq-q shouldn't have been changed even

Re: [RESEND PATCH 1/3] x86: Adding structs to reflect cpuid fields

2014-09-17 Thread Nadav Amit
Boris, Thanks for you comments - please see inline. On Wed, Sep 17, 2014 at 4:21 PM, Borislav Petkov b...@alien8.de wrote: On Wed, Sep 17, 2014 at 03:54:12PM +0300, Nadav Amit wrote: Adding structs that reflect various cpuid fields in x86 architecture. Structs were added only for functions

Re: [RESEND PATCH 1/3] x86: Adding structs to reflect cpuid fields

2014-09-17 Thread Borislav Petkov
On Wed, Sep 17, 2014 at 04:53:39PM +0300, Nadav Amit wrote: AFAIK backward compatibility is usually maintained in x86. I did not see in Intel SDM anything that says this CPUID field means something for CPU X and something else for CPU Y. Anyhow, it is not different than bitmasks in this

Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-17 Thread David Hildenbrand
On Wed, 17 Sep 2014 14:00:34 +0200 David Hildenbrand d...@linux.vnet.ibm.com wrote: Does anyone have an idea? The request itself is completely filled with cc That is very weird, the 'rq' is got from hctx-tags, and rq should be valid, and rq-q shouldn't have been changed

Re: [RESEND PATCH 1/3] x86: Adding structs to reflect cpuid fields

2014-09-17 Thread Peter Zijlstra
On Wed, Sep 17, 2014 at 03:54:12PM +0300, Nadav Amit wrote: Adding structs that reflect various cpuid fields in x86 architecture. Structs were added only for functions that are not pure bitmaps. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/include/asm/cpuid_def.h | 163

Re: [PATCH 0/3] x86: structs for cpuid info in x86

2014-09-17 Thread Peter Zijlstra
On Wed, Sep 17, 2014 at 02:37:10PM +0200, Ingo Molnar wrote: If hpa, tglx or Linus objects I'll yield to that objection though. Opinions, objections? They generally look fine to me. I appreciate the bitfields for readability. I often use the same when having to deal with hardware bitfields.

Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-17 Thread Jens Axboe
On 2014-09-17 07:52, Ming Lei wrote: On Wed, 17 Sep 2014 14:00:34 +0200 David Hildenbrand d...@linux.vnet.ibm.com wrote: Does anyone have an idea? The request itself is completely filled with cc That is very weird, the 'rq' is got from hctx-tags, and rq should be valid, and rq-q shouldn't

Re: [RESEND PATCH 1/3] x86: Adding structs to reflect cpuid fields

2014-09-17 Thread Radim Krčmář
2014-09-17 16:06+0200, Borislav Petkov: On Wed, Sep 17, 2014 at 04:53:39PM +0300, Nadav Amit wrote: AFAIK backward compatibility is usually maintained in x86. I did not see in Intel SDM anything that says this CPUID field means something for CPU X and something else for CPU Y. Anyhow, it is

Re: [RESEND PATCH 1/3] x86: Adding structs to reflect cpuid fields

2014-09-17 Thread Borislav Petkov
On Wed, Sep 17, 2014 at 05:04:33PM +0200, Radim Krčmář wrote: which would result in a similar if-else hack if (family X) ebx.split.max_monitor_line_size_after_family_X = 0 else ebx.split.max_monitor_line_size = 0 other options are

Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-17 Thread Ming Lei
On Wed, Sep 17, 2014 at 10:22 PM, Jens Axboe ax...@kernel.dk wrote: Another way would be to ensure that the timeout handler doesn't touch hw_ctx or tag_sets that aren't fully initialized yet. But I think this is safer/cleaner. That may not be easy or enough to check if hw_ctx/tag_sets are

Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-17 Thread Andres Lagar-Cavilla
On Wed, Sep 17, 2014 at 12:43 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 16/09/2014 20:42, Andres Lagar-Cavilla ha scritto: On Tue, Sep 16, 2014 at 11:29 AM, Paolo Bonzini pbonz...@redhat.com wrote: I think a first patch should introduce kvm_get_user_page_retry (Retry a fault

Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-17 Thread Andres Lagar-Cavilla
On Wed, Sep 17, 2014 at 4:42 AM, Gleb Natapov g...@kernel.org wrote: On Wed, Sep 17, 2014 at 01:27:14PM +0200, Radim Krčmář wrote: 2014-09-17 13:26+0300, Gleb Natapov: For async_pf_execute() you do not need to even retry. Next guest's page fault will retry it for you. Wouldn't that be a

Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-17 Thread Gleb Natapov
On Wed, Sep 17, 2014 at 10:00:32AM -0700, Andres Lagar-Cavilla wrote: On Wed, Sep 17, 2014 at 4:42 AM, Gleb Natapov g...@kernel.org wrote: On Wed, Sep 17, 2014 at 01:27:14PM +0200, Radim Krčmář wrote: 2014-09-17 13:26+0300, Gleb Natapov: For async_pf_execute() you do not need to even

Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-17 Thread Andres Lagar-Cavilla
On Wed, Sep 17, 2014 at 10:08 AM, Gleb Natapov g...@kernel.org wrote: On Wed, Sep 17, 2014 at 10:00:32AM -0700, Andres Lagar-Cavilla wrote: On Wed, Sep 17, 2014 at 4:42 AM, Gleb Natapov g...@kernel.org wrote: On Wed, Sep 17, 2014 at 01:27:14PM +0200, Radim Krčmář wrote: 2014-09-17

Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-17 Thread Gleb Natapov
On Wed, Sep 17, 2014 at 10:13:45AM -0700, Andres Lagar-Cavilla wrote: On Wed, Sep 17, 2014 at 10:08 AM, Gleb Natapov g...@kernel.org wrote: On Wed, Sep 17, 2014 at 10:00:32AM -0700, Andres Lagar-Cavilla wrote: On Wed, Sep 17, 2014 at 4:42 AM, Gleb Natapov g...@kernel.org wrote: On Wed, Sep

Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-17 Thread Andres Lagar-Cavilla
On Wed, Sep 17, 2014 at 10:21 AM, Gleb Natapov g...@kernel.org wrote: On Wed, Sep 17, 2014 at 10:13:45AM -0700, Andres Lagar-Cavilla wrote: On Wed, Sep 17, 2014 at 10:08 AM, Gleb Natapov g...@kernel.org wrote: On Wed, Sep 17, 2014 at 10:00:32AM -0700, Andres Lagar-Cavilla wrote: On Wed, Sep

[PATCH v2] kvm: Faults which trigger IO release the mmap_sem

2014-09-17 Thread Andres Lagar-Cavilla
When KVM handles a tdp fault it uses FOLL_NOWAIT. If the guest memory has been swapped out or is behind a filemap, this will trigger async readahead and return immediately. The rationale is that KVM will kick back the guest with an async page fault and allow for some other guest process to take

[PATCH v2] KVM: x86: count actual tlb flushes

2014-09-17 Thread Liang Chen
- we count KVM_REQ_TLB_FLUSH requests, not actual flushes (KVM can have multiple requests for one flush) - flushes from kvm_flush_remote_tlbs aren't counted - it's easy to make a direct request by mistake Solve these by postponing the counting to kvm_check_request(), and refactor the code to use

Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-17 Thread David Hildenbrand
On Wed, Sep 17, 2014 at 10:22 PM, Jens Axboe ax...@kernel.dk wrote: Another way would be to ensure that the timeout handler doesn't touch hw_ctx or tag_sets that aren't fully initialized yet. But I think this is safer/cleaner. That may not be easy or enough to check if hw_ctx/tag_sets

Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-17 Thread Paolo Bonzini
Il 17/09/2014 18:58, Andres Lagar-Cavilla ha scritto: Understood. So in patch 1, would kvm_gup_retry be ... just a wrapper around gup? That looks thin to me, and the naming of the function will not be accurate. Depends on how you interpret retry (with retry vs. retry after _fast). :) My point

Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-17 Thread Jens Axboe
On 09/17/2014 01:09 PM, David Hildenbrand wrote: 0. That should already be sufficient to hinder blk_mq_tag_to_rq and the calling method to do the wrong thing. Yes, clearing rq-cmd_flags should be enough. And looks better to move rq initialization to __blk_mq_free_request() too, otherwise

Re: [PATCH v2] kvm: Faults which trigger IO release the mmap_sem

2014-09-17 Thread Wanpeng Li
Hi Andres, On Wed, Sep 17, 2014 at 10:51:48AM -0700, Andres Lagar-Cavilla wrote: [...] static inline int check_user_page_hwpoison(unsigned long addr) { int rc, flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_WRITE; @@ -1177,9 +1214,15 @@ static int hva_to_pfn_slow(unsigned long addr, bool

Re: [RESEND PATCH 1/3] x86: Adding structs to reflect cpuid fields

2014-09-17 Thread Radim Krčmář
2014-09-17 17:22+0200, Borislav Petkov: On Wed, Sep 17, 2014 at 05:04:33PM +0200, Radim Krčmář wrote: which would result in a similar if-else hack if (family X) ebx.split.max_monitor_line_size_after_family_X = 0 else ebx.split.max_monitor_line_size = 0 other options

Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-17 Thread Ming Lei
On Thu, Sep 18, 2014 at 3:09 AM, David Hildenbrand d...@linux.vnet.ibm.com wrote: On Wed, Sep 17, 2014 at 10:22 PM, Jens Axboe ax...@kernel.dk wrote: Another way would be to ensure that the timeout handler doesn't touch hw_ctx or tag_sets that aren't fully initialized yet. But I think

Re: [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes

2014-09-17 Thread Rusty Russell
Amos Kong ak...@redhat.com writes: I started a QEMU (non-smp) guest with one virtio-rng device, and read random data from /dev/hwrng by dd: # dd if=/dev/hwrng of=/dev/null In the same time, if I check hwrng attributes from sysfs by cat: # cat /sys/class/misc/hw_random/rng_* The cat

Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-17 Thread Andy Lutomirski
Hi all- I would like to standardize on a very simple protocol by which a guest OS can obtain an RNG seed early in boot. The main design requirements are: - The interface should be very easy to use. Linux, at least, will want to use it extremely early in boot as part of kernel ASLR. This

[PATCH 2/5] hw_random: use reference counts on each struct hwrng.

2014-09-17 Thread Rusty Russell
current_rng holds one reference, and we bump it every time we want to do a read from it. This means we only hold the rng_mutex to grab or drop a reference, so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't block on read of /dev/hwrng. Using a kref is overkill (we're always

[PATCH 5/5] hw_random: don't init list element we're about to add to list.

2014-09-17 Thread Rusty Russell
Another interesting anti-pattern. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/char/hw_random/core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 6a34feca6b43..96fa06716e95 100644 ---

[PATCH 1/5] hw_random: place mutex around read functions and buffers.

2014-09-17 Thread Rusty Russell
There's currently a big lock around everything, and it means that we can't query sysfs (eg /sys/devices/virtual/misc/hw_random/rng_current) while the rng is reading. This is a real problem when the rng is slow, or blocked (eg. virtio_rng with qemu's default /dev/random backend) This doesn't help

[PATCH 4/5] hw_random: don't double-check old_rng.

2014-09-17 Thread Rusty Russell
Interesting anti-pattern. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/char/hw_random/core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index b4a21e9521cf..6a34feca6b43 100644 ---

[PATCH 3/5] hw_random: fix unregister race.

2014-09-17 Thread Rusty Russell
The previous patch added one potential problem: we can still be reading from a hwrng when it's unregistered. Add a wait for zero in the hwrng_unregister path. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/char/hw_random/core.c | 5 + 1 file changed, 5 insertions(+) diff

Re: [PATCH v2] KVM: x86: count actual tlb flushes

2014-09-17 Thread Xiao Guangrong
On 09/18/2014 02:35 AM, Liang Chen wrote: - we count KVM_REQ_TLB_FLUSH requests, not actual flushes (KVM can have multiple requests for one flush) - flushes from kvm_flush_remote_tlbs aren't counted - it's easy to make a direct request by mistake Solve these by postponing the counting to