On Fri, 30 Mar 2018 17:52:07 +0800 Shannon Zhao <zhaoshengl...@huawei.com> wrote:
> > > On 2018/3/30 17:01, Marc Zyngier wrote: > > On Fri, 30 Mar 2018 09:56:10 +0800 > > Shannon Zhao <zhaoshengl...@huawei.com> wrote: > > > >> On 2018/3/30 0:48, Marc Zyngier wrote: > >>> On Thu, 29 Mar 2018 16:27:58 +0100, > >>> Mark Rutland wrote: > >>>> > >>>> On Thu, Mar 29, 2018 at 11:00:24PM +0800, Shannon Zhao wrote: > >>>>> From: zhaoshenglong <zhaoshengl...@huawei.com> > >>>>> > >>>>> Currently the VMID for some VM is allocated during VCPU entry/exit > >>>>> context and will be updated when kvm_next_vmid inversion. So this will > >>>>> cause the existing VMs exiting from guest and flush the tlb and icache. > >>>>> > >>>>> Also, while a platform with 8 bit VMID supports 255 VMs, it can create > >>>>> more than 255 VMs and if we create e.g. 256 VMs, some VMs will occur > >>>>> page fault since at some moment two VMs have same VMID. > >>>> > >>>> Have you seen this happen? > >>>> > >> Yes, we've started 256 VMs on D05. We saw kernel page fault in some guests. > > > > What kind of fault? Kernel configuration? Can you please share some > > traces with us? What is the workload? What happens if all the guests are > > running on the same NUMA node? > > > > We need all the information we can get. > > > All 256 VMs run without special workload. The testcase is just starting > 256 VMs and then shutting down them. We found several VMs will not > shutdown since the guest kernel crash. While if we only start 255 VMs, > it works well. > > We didn't run the testcase that pins all VMs to the same NUMA node. I'll > try. > > The fault is > [ 2204.633871] Unable to handle kernel NULL pointer dereference at > virtual address 00000008 > [ 2204.633875] Unable to handle kernel paging request at virtual address > a57f4a9095032 > > Please see the attachment for the detailed log. Thanks. It looks pretty ugly indeed. Can you please share your host kernel config (and version number -- I really hope the host is something more recent than the 4.1.44 stuff you run as a guest...)? For the record, I'm currently running 5 concurrent Debian installs, each with 2 vcpus, on a 4 CPU system artificially configured to have only 2 bits of VMID (and thus at most 3 running VMs at any given time), a setup that is quite similar to what you're doing, only on a smaller scale. It is pretty slow (as you'd expect), but so far I haven't seen any issue. Thanks, M. -- Without deviation from the norm, progress is not possible. _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm