Re: VM boot failure on nodes not having DMA32 zone

2018-07-25 Thread Liang C
Thank you very much for the quick reply and confirmation!  I just made
and submitted a patch according to your advice.

Thanks,
Liang

On Wed, Jul 25, 2018 at 2:05 AM, Paolo Bonzini  wrote:
> On 24/07/2018 09:53, Liang C wrote:
>> Hi,
>>
>> We have a situation where our qemu processes need to be launched under
>> cgroup cpuset.mems control. This introduces an similar issue that was
>> discussed a few years ago. The difference here is that for our case,
>> not being able to allocate from DMA32 zone is a result a cgroup
>> restriction not mempolicy enforcement. Here is the steps to reproduce
>> the failure,
>>
>> mkdir /sys/fs/cgroup/cpuset/nodeX (where X is a node not having DMA32 zone)
>> echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.mems
>> echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.cpus
>> echo 1 > /sys/fs/cgroup/cpuset/node0/cpuset.mem_hardwall
>> echo $$ > /sys/fs/cgroup/cpuset/nodeX/tasks
>>
>> #launch a virtual machine
>> kvm_init_vcpu failed: Cannot allocate memory
>>
>> There are workarounds, like always putting qemu processes onto the
>> node with DMA32 zone or not restricting qemu processes memory
>> allocation until that DMA32 alloc finishes (difficult to be precise).
>> But we would like to find a way to address the root cause.
>>
>> Considering the fact that the pae_root shadow should not be needed
>> when ept is in use, which is indeed our case - ept is always available
>> for us (guessing this is the same case for most of other users), we
>> made a patch roughly like this,
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index d594690..1d1b61e 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -5052,7 +5052,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
>> vcpu->arch.mmu.translate_gpa = translate_gpa;
>> vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa;
>>
>> -   return alloc_mmu_pages(vcpu);
>> +   return tdp_enabled ? 0 : alloc_mmu_pages(vcpu);
>>  }
>>
>>  void kvm_mmu_setup(struct kvm_vcpu *vcpu)
>>
>>
>> It works through our test cases. But we would really like to have your
>> insight on this patch before applying it in production environment and
>> contributing it back to the community. Thanks in advance for any help
>> you may provide!
>
> Yes, this looks good.  However, I'd place the "if" in alloc_mmu_pages
> itself.
>
> Thanks,
>
> Paolo


Re: VM boot failure on nodes not having DMA32 zone

2018-07-25 Thread Liang C
Thank you very much for the quick reply and confirmation!  I just made
and submitted a patch according to your advice.

Thanks,
Liang

On Wed, Jul 25, 2018 at 2:05 AM, Paolo Bonzini  wrote:
> On 24/07/2018 09:53, Liang C wrote:
>> Hi,
>>
>> We have a situation where our qemu processes need to be launched under
>> cgroup cpuset.mems control. This introduces an similar issue that was
>> discussed a few years ago. The difference here is that for our case,
>> not being able to allocate from DMA32 zone is a result a cgroup
>> restriction not mempolicy enforcement. Here is the steps to reproduce
>> the failure,
>>
>> mkdir /sys/fs/cgroup/cpuset/nodeX (where X is a node not having DMA32 zone)
>> echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.mems
>> echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.cpus
>> echo 1 > /sys/fs/cgroup/cpuset/node0/cpuset.mem_hardwall
>> echo $$ > /sys/fs/cgroup/cpuset/nodeX/tasks
>>
>> #launch a virtual machine
>> kvm_init_vcpu failed: Cannot allocate memory
>>
>> There are workarounds, like always putting qemu processes onto the
>> node with DMA32 zone or not restricting qemu processes memory
>> allocation until that DMA32 alloc finishes (difficult to be precise).
>> But we would like to find a way to address the root cause.
>>
>> Considering the fact that the pae_root shadow should not be needed
>> when ept is in use, which is indeed our case - ept is always available
>> for us (guessing this is the same case for most of other users), we
>> made a patch roughly like this,
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index d594690..1d1b61e 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -5052,7 +5052,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
>> vcpu->arch.mmu.translate_gpa = translate_gpa;
>> vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa;
>>
>> -   return alloc_mmu_pages(vcpu);
>> +   return tdp_enabled ? 0 : alloc_mmu_pages(vcpu);
>>  }
>>
>>  void kvm_mmu_setup(struct kvm_vcpu *vcpu)
>>
>>
>> It works through our test cases. But we would really like to have your
>> insight on this patch before applying it in production environment and
>> contributing it back to the community. Thanks in advance for any help
>> you may provide!
>
> Yes, this looks good.  However, I'd place the "if" in alloc_mmu_pages
> itself.
>
> Thanks,
>
> Paolo


VM boot failure on nodes not having DMA32 zone

2018-07-24 Thread Liang C
Hi,

We have a situation where our qemu processes need to be launched under
cgroup cpuset.mems control. This introduces an similar issue that was
discussed a few years ago. The difference here is that for our case,
not being able to allocate from DMA32 zone is a result a cgroup
restriction not mempolicy enforcement. Here is the steps to reproduce
the failure,

mkdir /sys/fs/cgroup/cpuset/nodeX (where X is a node not having DMA32 zone)
echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.mems
echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.cpus
echo 1 > /sys/fs/cgroup/cpuset/node0/cpuset.mem_hardwall
echo $$ > /sys/fs/cgroup/cpuset/nodeX/tasks

#launch a virtual machine
kvm_init_vcpu failed: Cannot allocate memory

There are workarounds, like always putting qemu processes onto the
node with DMA32 zone or not restricting qemu processes memory
allocation until that DMA32 alloc finishes (difficult to be precise).
But we would like to find a way to address the root cause.

Considering the fact that the pae_root shadow should not be needed
when ept is in use, which is indeed our case - ept is always available
for us (guessing this is the same case for most of other users), we
made a patch roughly like this,

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d594690..1d1b61e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5052,7 +5052,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
vcpu->arch.mmu.translate_gpa = translate_gpa;
vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa;

-   return alloc_mmu_pages(vcpu);
+   return tdp_enabled ? 0 : alloc_mmu_pages(vcpu);
 }

 void kvm_mmu_setup(struct kvm_vcpu *vcpu)


It works through our test cases. But we would really like to have your
insight on this patch before applying it in production environment and
contributing it back to the community. Thanks in advance for any help
you may provide!

Thanks,
Liang Chen


VM boot failure on nodes not having DMA32 zone

2018-07-24 Thread Liang C
Hi,

We have a situation where our qemu processes need to be launched under
cgroup cpuset.mems control. This introduces an similar issue that was
discussed a few years ago. The difference here is that for our case,
not being able to allocate from DMA32 zone is a result a cgroup
restriction not mempolicy enforcement. Here is the steps to reproduce
the failure,

mkdir /sys/fs/cgroup/cpuset/nodeX (where X is a node not having DMA32 zone)
echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.mems
echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.cpus
echo 1 > /sys/fs/cgroup/cpuset/node0/cpuset.mem_hardwall
echo $$ > /sys/fs/cgroup/cpuset/nodeX/tasks

#launch a virtual machine
kvm_init_vcpu failed: Cannot allocate memory

There are workarounds, like always putting qemu processes onto the
node with DMA32 zone or not restricting qemu processes memory
allocation until that DMA32 alloc finishes (difficult to be precise).
But we would like to find a way to address the root cause.

Considering the fact that the pae_root shadow should not be needed
when ept is in use, which is indeed our case - ept is always available
for us (guessing this is the same case for most of other users), we
made a patch roughly like this,

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d594690..1d1b61e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5052,7 +5052,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
vcpu->arch.mmu.translate_gpa = translate_gpa;
vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa;

-   return alloc_mmu_pages(vcpu);
+   return tdp_enabled ? 0 : alloc_mmu_pages(vcpu);
 }

 void kvm_mmu_setup(struct kvm_vcpu *vcpu)


It works through our test cases. But we would really like to have your
insight on this patch before applying it in production environment and
contributing it back to the community. Thanks in advance for any help
you may provide!

Thanks,
Liang Chen


Re: [PATCH v2] bcache: explicitly destroy mutex while exiting

2017-10-30 Thread Liang C
Hi Michael,
Would you please to include this patch in your tree for the next
release? It seems passed the review. Thank you.

Thanks,
Liang

On Tue, Oct 10, 2017 at 11:44 PM, Michael Lyle  wrote:
> On 10/10/2017 05:25 AM, Coly Li wrote:
>> On 2017/10/10 下午5:00, Liang Chen wrote:
>>> mutex_destroy does nothing most of time, but it's better to call
>>> it to make the code future proof and it also has some meaning
>>> for like mutex debug.
>>>
>>> As Coly pointed out in a previous review, bcache_exit() may not be
>>> able to handle all the references properly if userspace registers
>>> cache and backing devices right before bch_debug_init runs and
>>> bch_debug_init failes later. So not exposing userspace interface
>>> until everything is ready to avoid that issue.
>>>
>>> Signed-off-by: Liang Chen 
>>
>> Hi Liang,
>>
>> No more comment from me, it looks good. Thanks.
>>
>> Reviewed-by: Coly Li 
>
> Looks good to me too.
>
> Reviewed-by: Michael Lyle 
>


Re: [PATCH v2] bcache: explicitly destroy mutex while exiting

2017-10-30 Thread Liang C
Hi Michael,
Would you please to include this patch in your tree for the next
release? It seems passed the review. Thank you.

Thanks,
Liang

On Tue, Oct 10, 2017 at 11:44 PM, Michael Lyle  wrote:
> On 10/10/2017 05:25 AM, Coly Li wrote:
>> On 2017/10/10 下午5:00, Liang Chen wrote:
>>> mutex_destroy does nothing most of time, but it's better to call
>>> it to make the code future proof and it also has some meaning
>>> for like mutex debug.
>>>
>>> As Coly pointed out in a previous review, bcache_exit() may not be
>>> able to handle all the references properly if userspace registers
>>> cache and backing devices right before bch_debug_init runs and
>>> bch_debug_init failes later. So not exposing userspace interface
>>> until everything is ready to avoid that issue.
>>>
>>> Signed-off-by: Liang Chen 
>>
>> Hi Liang,
>>
>> No more comment from me, it looks good. Thanks.
>>
>> Reviewed-by: Coly Li 
>
> Looks good to me too.
>
> Reviewed-by: Michael Lyle 
>