[PATCH] drm/amdkfd: fix KFDSVMRangeTest.PartialUnmapSysMemTest fails

2021-08-14 Thread Yifan Zhang
[ RUN  ] KFDSVMRangeTest.PartialUnmapSysMemTest
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDTestUtil.cpp:245: Failure
Value of: (hsaKmtAllocMemory(m_Node, m_Size, m_Flags, &m_pBuf))
  Actual: 1
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDTestUtil.cpp:248: Failure
Value of: (hsaKmtMapMemoryToGPUNodes(m_pBuf, m_Size, __null, mapFlags, 1, 
&m_Node))
  Actual: 1
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDTestUtil.cpp:306: Failure
Expected: ((void *)__null) != (ptr), actual: NULL vs NULL
Segmentation fault (core dumped)
[  ] Profile: Full Test
[  ] HW capabilities: 0x9

kernel log:

[  102.029150]  ret_from_fork+0x22/0x30
[  102.029158] ---[ end trace 15c34e782714f9a3 ]---
[ 3613.603598] amdgpu: Address: 0x7f7149ccc000 already allocated by SVM
[ 3613.610620] show_signal_msg: 27 callbacks suppressed

These is race with deferred actions from previous memory map
changes (e.g. munmap).Flush pending deffered work to avoid such case.

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 3177c4a0e753..982bf538dc3d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1261,6 +1261,13 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
*filep,
return -EINVAL;
 
 #if IS_ENABLED(CONFIG_HSA_AMD_SVM)
+   /* Flush pending deferred work to avoid racing with deferred actions 
from
+* previous memory map changes (e.g. munmap). Concurrent memory map 
changes
+* can still race with get_attr because we don't hold the mmap lock. 
But that
+* would be a race condition in the application anyway, and undefined
+* behaviour is acceptable in that case.
+*/
+   flush_work(&svms->deferred_list_work);
mutex_lock(&svms->lock);
if (interval_tree_iter_first(&svms->objects,
 args->va_addr >> PAGE_SHIFT,
-- 
2.25.1



RE: [PATCH] drm/amdkfd: fix random KFDSVMRangeTest.SetGetAttributesTest test failure

2021-08-14 Thread Zhang, Yifan
[Public]

Hi Felix,

Thanks for comments. I just found a similar race condition in KFD test. Sent it 
in another thread "[PATCH] drm/amdkfd: fix 
KFDSVMRangeTest.PartialUnmapSysMemTest fails"

BRs,
Yifan

-Original Message-
From: Kuehling, Felix  
Sent: Friday, August 13, 2021 4:23 AM
To: Zhang, Yifan ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdkfd: fix random 
KFDSVMRangeTest.SetGetAttributesTest test failure

Am 2021-08-12 um 11:49 a.m. schrieb Zhang, Yifan:
> [AMD Official Use Only]
>
> Yes, it is a sync issue b/w SVMRange unmap deferred list and get-attribute 
> call. There is time slot when Process vma has been unmapped from CPU side, 
> but prange is not removed in deferred list. If get-attribute is called  in 
> this time slot, then there will be a problem.
>
> This issue happens when KFDSVMRangeTest,SetGetAttributesTest runs after 
> KFDSVMRangeTest.BasicSystemMemTest test.
>
> TEST_F(KFDSVMRangeTest, SetGetAttributesTest) {
>  TEST_REQUIRE_ENV_CAPABILITIES(ENVCAPS_64BITLINUX);
>  TEST_START(TESTPROFILE_RUNALL)
> ..
>  HSAint32 enable = -1;
>  EXPECT_SUCCESS(hsaKmtGetXNACKMode(&enable));
>  expectedDefaultResults[4] = 
> (enable)?HSA_SVM_ATTR_ACCESS:HSA_SVM_ATTR_NO_ACCESS;
>  sysBuffer = new HsaSVMRange(BufSize); // It returns the same addr as the 
> last test case since the addr is already munmap for CPU.
>  char *pBuf = sysBuffer->As();
>  
>  LOG() << "Get default atrributes" << std::endl;
>  memcpy(outputAttributes, inputAttributes, nAttributes * 
> sizeof(HSA_SVM_ATTRIBUTE));
>  EXPECT_SUCCESS(hsaKmtSVMGetAttr(pBuf, BufSize, // This is the 
> problematic get-attribute. It returns the the prange attributes in the last 
> test case since prange is not removed.
>  nAttributes, outputAttributes));
>
> sysBufer address is an unregistered SVM range. Thus hsaKmtSVMGetAttr should 
> return the default attributes. But there are some corner cases, deferred list 
> from last test case is not finished when get attribute is called, so svm 
> range list remains the contents of the last test. In such cases, 
> hsaKmtSVMGetAttr returns the attributes in the last test instead of default 
> values. (The two test cases are in the same process, mmap in HsaSVMRange 
> constructor return the same addr) and makes test case fail.

OK. Thanks for investigating further. I think your patch is the the correct 
solution after all. I'd just add a carefully worded comment:

/* Flush pending deferred work to avoid racing with deferred actions from
 * previous memory map changes (e.g. munmap). Concurrent memory map changes
 * can still race with get_attr because we don't hold the mmap lock. But that
 * would be a race condition in the application anyway, and undefined
 * behaviour is acceptable in that case.
 */

With that, the patch is

Reviewed-by: Felix Kuehling 

Regards,
  Felix


>
>
> -Original Message-
> From: Kuehling, Felix 
> Sent: Tuesday, August 10, 2021 11:57 PM
> To: Zhang, Yifan ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdkfd: fix random 
> KFDSVMRangeTest.SetGetAttributesTest test failure
>
> Am 2021-08-10 um 12:43 a.m. schrieb Yifan Zhang:
>> KFDSVMRangeTest.SetGetAttributesTest randomly fails in stress test.
>>
>> Note: Google Test filter = KFDSVMRangeTest.* [==] Running 18 
>> tests from 1 test case.
>> [--] Global test environment set-up.
>> [--] 18 tests from KFDSVMRangeTest
>> [ RUN  ] KFDSVMRangeTest.BasicSystemMemTest
>> [   OK ] KFDSVMRangeTest.BasicSystemMemTest (30 ms)
>> [ RUN  ] KFDSVMRangeTest.SetGetAttributesTest
>> [  ] Get default atrributes
>> /home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:15
>> 4
>> : Failure Value of: expectedDefaultResults[i]
>>   Actual: 4294967295
>> Expected: outputAttributes[i].value
>> Which is: 0
>> /home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:15
>> 4
>> : Failure Value of: expectedDefaultResults[i]
>>   Actual: 4294967295
>> Expected: outputAttributes[i].value
>> Which is: 0
>> /home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:15
>> 2
>> : Failure Value of: expectedDefaultResults[i]
>>   Actual: 4
>> Expected: outputAttributes[i].type
>> Which is: 2
>> [  ] Setting/Getting atrributes
>> [  FAILED  ]
>>
>> the root cause is that svm work queue has not finished when 
>> svm_range_get_attr is called, thus some garbage svm interval tree 
>> data make svm_range_get_attr get wrong result. Flush work queue before 
>> iterate svm interval tree.
>>
>> Signed-off-by: Yifan Zhang 
>> ---
>>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index f811a3a24cd2..192e9401bed5 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -3072,6 +3072,9 @@ svm_range_get_a

Re: [PATCH v2 02/12] mm: Introduce a function to check for virtualization protection features

2021-08-14 Thread Tom Lendacky

On 8/14/21 1:32 PM, Borislav Petkov wrote:

On Fri, Aug 13, 2021 at 11:59:21AM -0500, Tom Lendacky wrote:

diff --git a/include/linux/protected_guest.h b/include/linux/protected_guest.h
new file mode 100644
index ..43d4dde94793
--- /dev/null
+++ b/include/linux/protected_guest.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Protected Guest (and Host) Capability checks
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ */
+
+#ifndef _PROTECTED_GUEST_H
+#define _PROTECTED_GUEST_H
+
+#ifndef __ASSEMBLY__

   ^

Do you really need that guard? It builds fine without it too. Or
something coming later does need it...?


No, I probably did it out of habit. I can remove it in the next version.

Thanks,
Tom