RE: [PATCH] drm/amd/powerplay: fix the confusing ppfeature mask calculations

2019-02-18 Thread Feng, Kenneth
Reviewed-by: Kenneth Feng 

-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of Evan 
Quan
Sent: Tuesday, February 19, 2019 12:28 PM
To: amd-gfx@lists.freedesktop.org
Cc: Kuehling, Felix ; Quan, Evan 
Subject: [PATCH] drm/amd/powerplay: fix the confusing ppfeature mask 
calculations

Simplify the ppfeature mask calculations.

Change-Id: I41dee48c2ac8ed1fd7736bc8bb1f832da22cba13
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c | 4 ++--  
drivers/gpu/drm/amd/powerplay/hwmgr/vega12_hwmgr.c | 4 ++--  
drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
index 5479125ff4f6..6d8e9609e900 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
@@ -4407,9 +4407,9 @@ static int vega10_set_ppfeature_status(struct pp_hwmgr 
*hwmgr, uint64_t new_ppfe
return ret;
 
features_to_disable =
-   (features_enabled ^ new_ppfeature_masks) & features_enabled;
+   features_enabled & ~new_ppfeature_masks;
features_to_enable =
-   (features_enabled ^ new_ppfeature_masks) ^ features_to_disable;
+   ~features_enabled & new_ppfeature_masks;
 
pr_debug("features_to_disable 0x%llx\n", features_to_disable);
pr_debug("features_to_enable 0x%llx\n", features_to_enable); diff --git 
a/drivers/gpu/drm/amd/powerplay/hwmgr/vega12_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega12_hwmgr.c
index 6c8e78611c03..bdb48e94eff6 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega12_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega12_hwmgr.c
@@ -2009,9 +2009,9 @@ static int vega12_set_ppfeature_status(struct pp_hwmgr 
*hwmgr, uint64_t new_ppfe
return ret;
 
features_to_disable =
-   (features_enabled ^ new_ppfeature_masks) & features_enabled;
+   features_enabled & ~new_ppfeature_masks;
features_to_enable =
-   (features_enabled ^ new_ppfeature_masks) ^ features_to_disable;
+   ~features_enabled & new_ppfeature_masks;
 
pr_debug("features_to_disable 0x%llx\n", features_to_disable);
pr_debug("features_to_enable 0x%llx\n", features_to_enable); diff --git 
a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
index aad79affb081..2cf45a69faf7 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
@@ -3088,9 +3088,9 @@ static int vega20_set_ppfeature_status(struct pp_hwmgr 
*hwmgr, uint64_t new_ppfe
return ret;
 
features_to_disable =
-   (features_enabled ^ new_ppfeature_masks) & features_enabled;
+   features_enabled & ~new_ppfeature_masks;
features_to_enable =
-   (features_enabled ^ new_ppfeature_masks) ^ features_to_disable;
+   ~features_enabled & new_ppfeature_masks;
 
pr_debug("features_to_disable 0x%llx\n", features_to_disable);
pr_debug("features_to_enable 0x%llx\n", features_to_enable);
--
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/powerplay: fix the confusing ppfeature mask calculations

2019-02-18 Thread Evan Quan
Simplify the ppfeature mask calculations.

Change-Id: I41dee48c2ac8ed1fd7736bc8bb1f832da22cba13
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c | 4 ++--
 drivers/gpu/drm/amd/powerplay/hwmgr/vega12_hwmgr.c | 4 ++--
 drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
index 5479125ff4f6..6d8e9609e900 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
@@ -4407,9 +4407,9 @@ static int vega10_set_ppfeature_status(struct pp_hwmgr 
*hwmgr, uint64_t new_ppfe
return ret;
 
features_to_disable =
-   (features_enabled ^ new_ppfeature_masks) & features_enabled;
+   features_enabled & ~new_ppfeature_masks;
features_to_enable =
-   (features_enabled ^ new_ppfeature_masks) ^ features_to_disable;
+   ~features_enabled & new_ppfeature_masks;
 
pr_debug("features_to_disable 0x%llx\n", features_to_disable);
pr_debug("features_to_enable 0x%llx\n", features_to_enable);
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega12_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega12_hwmgr.c
index 6c8e78611c03..bdb48e94eff6 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega12_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega12_hwmgr.c
@@ -2009,9 +2009,9 @@ static int vega12_set_ppfeature_status(struct pp_hwmgr 
*hwmgr, uint64_t new_ppfe
return ret;
 
features_to_disable =
-   (features_enabled ^ new_ppfeature_masks) & features_enabled;
+   features_enabled & ~new_ppfeature_masks;
features_to_enable =
-   (features_enabled ^ new_ppfeature_masks) ^ features_to_disable;
+   ~features_enabled & new_ppfeature_masks;
 
pr_debug("features_to_disable 0x%llx\n", features_to_disable);
pr_debug("features_to_enable 0x%llx\n", features_to_enable);
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
index aad79affb081..2cf45a69faf7 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
@@ -3088,9 +3088,9 @@ static int vega20_set_ppfeature_status(struct pp_hwmgr 
*hwmgr, uint64_t new_ppfe
return ret;
 
features_to_disable =
-   (features_enabled ^ new_ppfeature_masks) & features_enabled;
+   features_enabled & ~new_ppfeature_masks;
features_to_enable =
-   (features_enabled ^ new_ppfeature_masks) ^ features_to_disable;
+   ~features_enabled & new_ppfeature_masks;
 
pr_debug("features_to_disable 0x%llx\n", features_to_disable);
pr_debug("features_to_enable 0x%llx\n", features_to_enable);
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH libdrm] tests/amdgpu: add dispatch test

2019-02-18 Thread Zhang, Hawking
Although the shader is simple enough, please work with CQE to test it on all 
gfx9 ASICs before push it.

The patch is Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Cui, Flora
Sent: 2019年2月18日 12:56
To: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org
Cc: Cui, Flora 
Subject: [PATCH libdrm] tests/amdgpu: add dispatch test

From: Flora Cui 

Change-Id: I6f5dfa4379cb21c41c68757fae0105527a03e54f
Signed-off-by: Flora Cui 
---
 tests/amdgpu/basic_tests.c | 175 -
 1 file changed, 173 insertions(+), 2 deletions(-)

diff --git a/tests/amdgpu/basic_tests.c b/tests/amdgpu/basic_tests.c index 
d79859a..649c5a4 100644
--- a/tests/amdgpu/basic_tests.c
+++ b/tests/amdgpu/basic_tests.c
@@ -49,6 +49,7 @@ static void amdgpu_userptr_test(void);  static void 
amdgpu_semaphore_test(void);  static void amdgpu_sync_dependency_test(void);
 static void amdgpu_bo_eviction_test(void);
+static void amdgpu_memset_dispatch_test(void);
 static void amdgpu_direct_gma_test(void);
 
 static void amdgpu_command_submission_write_linear_helper(unsigned ip_type); 
@@ -71,6 +72,7 @@ CU_TestInfo basic_tests[] = {
{ "Command submission Test (SDMA)", amdgpu_command_submission_sdma },
{ "SW semaphore Test",  amdgpu_semaphore_test },
{ "Sync dependency Test",  amdgpu_sync_dependency_test },
+   { "Memset dispatch Test",  amdgpu_memset_dispatch_test },
{ "Direct GMA", amdgpu_direct_gma_test },
CU_TEST_INFO_NULL,
 };
@@ -119,6 +121,7 @@ CU_TestInfo basic_tests[] = {
 #define PACKET3(op, n) ((PACKET_TYPE3 << 30) | \
 (((op) & 0xFF) << 8) | \
 ((n) & 0x3FFF) << 16)
+#define PACKET3_COMPUTE(op, n) PACKET3(op, n) | (1 << 1)
 
 /* Packet 3 types */
 #definePACKET3_NOP 0x10
@@ -247,8 +250,8 @@ CU_TestInfo basic_tests[] = {
 #definePACKET3_SET_SH_REG_START
0x2c00
 
 #definePACKET3_DISPATCH_DIRECT 0x15
-
-
+#define PACKET3_EVENT_WRITE0x46
+#define PACKET3_ACQUIRE_MEM0x58
 /* gfx 8 */
 #define mmCOMPUTE_PGM_LO   
 0x2e0c
 #define mmCOMPUTE_PGM_RSRC1
 0x2e12
@@ -1945,6 +1948,174 @@ static void amdgpu_sync_dependency_test(void)
free(ibs_request.dependencies);
 }
 
+static const uint32_t bufferclear_cs_shader_gfx9[] = {
+0xD1FD, 0x04010C08, 0x7E020204, 0x7E040205,
+0x7E060206, 0x7E080207, 0xE01C2000, 0x8100,
+0xBF81
+};
+static const uint32_t bufferclear_cs_shader_registers_gfx9[][2] = {
+   {0x2e12, 0x000C0041},   //{ mmCOMPUTE_PGM_RSRC1,  0x000C0041 },
+   {0x2e13, 0x0090},   //{ mmCOMPUTE_PGM_RSRC2,  0x0090 },
+   {0x2e07, 0x0040},   //{ mmCOMPUTE_NUM_THREAD_X, 0x0040 },
+   {0x2e08, 0x0001},   //{ mmCOMPUTE_NUM_THREAD_Y, 0x0001 },
+   {0x2e09, 0x0001},   //{ mmCOMPUTE_NUM_THREAD_Z, 0x0001 }
+};
+static void amdgpu_memset_dispatch_gfx_test_gfx9()
+{
+   amdgpu_context_handle context_handle;
+   amdgpu_bo_handle bo_dst, bo_shader, resources[2];
+   volatile unsigned char *ptr_dst;
+   void *ptr_shader;
+   uint64_t mc_address_dst, mc_address_shader;
+   amdgpu_va_handle va_dst, va_shader;
+   int i, j, r;
+   uint32_t *ptr;
+   int bo_dst_size = 16384;
+   struct amdgpu_cs_request ibs_request;
+   struct amdgpu_cs_ib_info ib_info;
+
+   ptr = calloc(256, sizeof(*ptr));
+   CU_ASSERT_NOT_EQUAL(ptr, NULL);
+   memset(ptr, 0, 256);
+
+   r = amdgpu_cs_ctx_create(device_handle, _handle);
+   CU_ASSERT_EQUAL(r, 0);
+
+   r = amdgpu_bo_alloc_and_map(device_handle, 4096, 4096,
+   AMDGPU_GEM_DOMAIN_VRAM, 0,
+   _shader, _shader,
+   _address_shader, _shader);
+   CU_ASSERT_EQUAL(r, 0);
+
+   r = amdgpu_bo_alloc_and_map(device_handle, bo_dst_size, 4096,
+   AMDGPU_GEM_DOMAIN_VRAM, 0,
+   _dst, _dst,
+   _address_dst, _dst);
+   CU_ASSERT_EQUAL(r, 0);
+
+   memcpy(ptr_shader, bufferclear_cs_shader_gfx9, 
+sizeof(bufferclear_cs_shader_gfx9));
+
+   i = 0;
+   ptr[i++] = PACKET3(PKT3_CONTEXT_CONTROL, 1);
+   ptr[i++] = 0x8000;
+   ptr[i++] = 0x8000;
+
+   /* Issue commands to set default compute state. */
+   /* clear mmCOMPUTE_START_Z - mmCOMPUTE_START_X */
+   ptr[i++] = PACKET3_COMPUTE(PKT3_SET_SH_REG, 3);
+   ptr[i++] = 0x204;
+   ptr[i++] = 0;
+   ptr[i++] = 0;
+   ptr[i++] = 0;
+   /* clear 

Re: [PATCH] drm/amdgpu/powerplay/polaris10_smumgr: Mark expected switch fall-through

2019-02-18 Thread Gustavo A. R. Silva


On 2/18/19 4:40 PM, Alex Deucher wrote:
> On Fri, Feb 15, 2019 at 1:50 PM Gustavo A. R. Silva
>  wrote:
>>
>> In preparation to enabling -Wimplicit-fallthrough, mark switch
>> cases where we are expecting to fall through.
>>
>> Warning level 3 was used: -Wimplicit-fallthrough=3
>>
>> This patch is part of the ongoing efforts to enable
>> -Wimplicit-fallthrough.
>>
>> Signed-off-by: Gustavo A. R. Silva 
>> ---
>>  drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c 
>> b/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
>> index 52abca065764..92de1bbb2e33 100644
>> --- a/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
>> +++ b/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
>> @@ -2330,6 +2330,7 @@ static uint32_t polaris10_get_offsetof(uint32_t type, 
>> uint32_t member)
>> case DRAM_LOG_BUFF_SIZE:
>> return offsetof(SMU74_SoftRegisters, 
>> DRAM_LOG_BUFF_SIZE);
>> }
>> +   /* fall through */
> 
> These should be breaks, although I don't think we ever currently hit
> this case.  I've sent out a patch to fix it and applied the rest of
> the radeon and amdgpu patches.  Thanks!
> 

Awesome!

Thanks, Alex.

--
Gustavo
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu/powerplay/polaris10_smumgr: Mark expected switch fall-through

2019-02-18 Thread Alex Deucher
On Fri, Feb 15, 2019 at 1:50 PM Gustavo A. R. Silva
 wrote:
>
> In preparation to enabling -Wimplicit-fallthrough, mark switch
> cases where we are expecting to fall through.
>
> Warning level 3 was used: -Wimplicit-fallthrough=3
>
> This patch is part of the ongoing efforts to enable
> -Wimplicit-fallthrough.
>
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c 
> b/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
> index 52abca065764..92de1bbb2e33 100644
> --- a/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
> @@ -2330,6 +2330,7 @@ static uint32_t polaris10_get_offsetof(uint32_t type, 
> uint32_t member)
> case DRAM_LOG_BUFF_SIZE:
> return offsetof(SMU74_SoftRegisters, 
> DRAM_LOG_BUFF_SIZE);
> }
> +   /* fall through */

These should be breaks, although I don't think we ever currently hit
this case.  I've sent out a patch to fix it and applied the rest of
the radeon and amdgpu patches.  Thanks!

Alex

> case SMU_Discrete_DpmTable:
> switch (member) {
> case UvdBootLevel:
> --
> 2.20.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu/powerplay: add missing breaks in polaris10_smumgr

2019-02-18 Thread Alex Deucher
This was noticed by Gustavo and his -Wimplicit-fallthrough
patches.  However, in this case, I believe we should have breaks
rather than falling though, that said, in practice we should
never fall through in the first place so there should be no
change in behavior.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c 
b/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
index 52abca065764..2d4cfe14f72e 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
@@ -2330,6 +2330,7 @@ static uint32_t polaris10_get_offsetof(uint32_t type, 
uint32_t member)
case DRAM_LOG_BUFF_SIZE:
return offsetof(SMU74_SoftRegisters, 
DRAM_LOG_BUFF_SIZE);
}
+   break;
case SMU_Discrete_DpmTable:
switch (member) {
case UvdBootLevel:
@@ -2339,6 +2340,7 @@ static uint32_t polaris10_get_offsetof(uint32_t type, 
uint32_t member)
case LowSclkInterruptThreshold:
return offsetof(SMU74_Discrete_DpmTable, 
LowSclkInterruptThreshold);
}
+   break;
}
pr_warn("can't get the offset of type %x member %x\n", type, member);
return 0;
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/radeon/evergreen_cs: fix missing break in switch statement

2019-02-18 Thread Alex Deucher
On Fri, Feb 15, 2019 at 5:40 PM Gustavo A. R. Silva
 wrote:
>
> Add missing break statement in order to prevent the code from falling
> through to case CB_TARGET_MASK.
>
> This bug was found thanks to the ongoing efforts to enable
> -Wimplicit-fallthrough.
>
> Fixes: dd220a00e8bd ("drm/radeon/kms: add support for streamout v7")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Gustavo A. R. Silva 
> ---
> NOTE: Notice that this code has been out there since 2012. So, it
> would be helpful if someone can double-check this.

Applied.  thanks!

Alex

>
>  drivers/gpu/drm/radeon/evergreen_cs.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
> b/drivers/gpu/drm/radeon/evergreen_cs.c
> index f471537c852f..1e14c6921454 100644
> --- a/drivers/gpu/drm/radeon/evergreen_cs.c
> +++ b/drivers/gpu/drm/radeon/evergreen_cs.c
> @@ -1299,6 +1299,7 @@ static int evergreen_cs_handle_reg(struct 
> radeon_cs_parser *p, u32 reg, u32 idx)
> return -EINVAL;
> }
> ib[idx] += (u32)((reloc->gpu_offset >> 8) & 0x);
> +   break;
> case CB_TARGET_MASK:
> track->cb_target_mask = radeon_get_ib_value(p, idx);
> track->cb_dirty = true;
> --
> 2.20.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] gpu: drm: radeon: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime

2019-02-18 Thread Alex Deucher
On Sun, Feb 17, 2019 at 4:26 PM Rafael J. Wysocki  wrote:
>
> On Sun, Feb 17, 2019 at 12:37 AM Alex Deucher  wrote:
> >
> > On Sat, Feb 16, 2019 at 1:01 AM Lukas Wunner  wrote:
> > >
> > > On Fri, Feb 15, 2019 at 11:01:04AM -0500, Alex Deucher wrote:
> > > > On Fri, Feb 15, 2019 at 10:39 AM Rafael J. Wysocki  
> > > > wrote:
> > > > > On HP ProBook 4540s, if PM-runtime is enabled in the radeon driver
> > > > > and the direct-complete optimization is used for the radeon device
> > > > > during system-wide suspend, the system doesn't resume.
> > > > >
> > > > > Preventing direct-complete from being used with the radeon device by
> > > > > setting the DPM_FLAG_NEVER_SKIP driver flag for it makes the problem
> > > > > go away, which indicates that direct-complete is not safe for the
> > > > > radeon driver in general and should not be used with it (at least
> > > > > for now).
> > > > >
> > > > > This fixes a regression introduced by commit c62ec4610c40
> > > > > ("PM / core: Fix direct_complete handling for devices with no
> > > > > callbacks") which allowed direct-complete to be applied to
> > > > > devices without PM callbacks (again) which in turn unlocked
> > > > > direct-complete for radeon on HP ProBook 4540s.
> > > >
> > > > Do other similar drivers like amdgpu and nouveau need the same fix?
> > > > I'm not too familiar with the direct_complete feature in general.
> > >
> > > direct_complete means that a discrete GPU which is in D3cold upon
> > > entering system sleep is left as is, i.e. it is not woken.  It is
> > > also expected to still be in D3cold when resuming from system sleep
> > > from the PM core's point of view.  (If it is in D0uninitialized, the
> > > GPU's driver needs to ensure it is transitioned to D3cold again.)
> > >
> > > I know for a fact that resuming the discrete GPU is not necessary
> > > on my MacBook Pro with Nvidia GPU.  I'd expect those with AMD GPUs
> > > to behave the same.  The apple-gmux driver takes care of putting
> > > the GPU into D3cold on resume from system sleep if it was in D3cold
> > > when entering system sleep (see drivers/platform/x86/apple-gmux.c,
> > > gmux_resume()).
> > >
> > > I think it is desirable to use direct_complete because it saves power
> > > (no need to gratuitously wake the GPU upon entering system sleep,
> > > only to immediately cut its power) and it also speeds up the suspend
> > > process by about half a second.
> >
> > Thanks for the info.  It sounds like we need a similar patch for
> > amdgpu.  With dGPUs controlled by the ACPI ATPX method, I believe the
> > dGPU is powered by automatically on resume from S3/S4.  I think there
> > may be a way to change that behavior in some revisions of ATPX (i.e.,
> > to keep the state across suspend cycles), but it's not the default.
> > I'm not sure about the newer _PR3 stuff in Hybrid Graphics laptops.  I
> > think it retains state.  In both radeon and amdgpu we probably need to
> > check if the system is using ATPX or _PR3 and disable direct complete
> > for ATPX at least.
>
> I would disable direct-complete entirely for them then and possibly
> consider using DPM_FLAG_SMART_SUSPEND in the cases when that would be
> safe.
>
> Anyway, I posted this patch for radeon, because it addresses a
> specific regression and I'm not super-familiar with GPU drivers in
> general.

Thanks.  I've applied this patch and sent out a patch for amdgpu.

Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime

2019-02-18 Thread Alex Deucher
Based on a similar patch from Rafael for radeon.

When using ATPX to control dGPU power, the state is not retained
across suspend and resume cycles by default.  This can probably
be loosened for Hybrid Graphics (_PR3) laptops where I think the
state is properly retained.

Fixes: c62ec4610c40 ("PM / core: Fix direct_complete handling for devices with 
no callbacks")
Cc: Rafael J. Wysocki 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index d63cb53ff2bb..eaf90cdc848d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -212,6 +212,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
}
 
if (amdgpu_device_is_px(dev)) {
+   dev_pm_set_driver_flags(dev->dev, DPM_FLAG_NEVER_SKIP);
pm_runtime_use_autosuspend(dev->dev);
pm_runtime_set_autosuspend_delay(dev->dev, 5000);
pm_runtime_set_active(dev->dev);
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-18 Thread Thomas Hellstrom
On Mon, 2019-02-18 at 18:07 +0100, Christian König wrote:
> Am 18.02.19 um 10:47 schrieb Thomas Hellstrom:
> > On Mon, 2019-02-18 at 09:20 +, Koenig, Christian wrote:
> > > Another good question is also why the heck the acc_size counts
> > > towards
> > > the DMA32 zone?
> > DMA32 TTM pages are accounted in the DMA32 zone. Other pages are
> > not.
> 
> Yeah, I'm perfectly aware of this. But this is for the accounting
> size!
> 
> We have an accounting for the stuff needed additional to the pages 
> backing the BO (e.g. the page and DMA addr array).
> 
> And from the bug description it sounds like we use the DMA32 zone
> for 
> this accounting which of course is completely nonsense.

It's actually accounted in all available zones, since it would be
pretty hard to determine exactly where that memory should be accounted.
In particular if it's vmalloced. It might be DMA32, it might not. Given
the objective of stopping malicious user-space from exhausting the
DMA32 zone it was, at the time the code was written, a reasonable
approximation. With ever increasing memory sizes, there might be better
solutions?

/Thomas

> 
> Christian.
> 
> > For small persistent allocations using ttm_mem_global_alloc(), they
> > are
> > accounted also in the DMA32 zone, which may cause over-accounting
> > of
> > that zone, but that's pretty unlikely to be a big problem..
> > 
> > /Thomas
> > 
> > 
> > 
> > 
> > 
> > > In other words why should the internal bookkeeping pages be
> > > allocated
> > > in
> > > the DMA32 zone?
> > > 
> > > That doesn't sounds valid to me in any way,
> > > Christian.
> > > 
> > > Am 18.02.19 um 09:02 schrieb Thomas Hellstrom:
> > > > Hmm,
> > > > 
> > > > This zone was intended to stop TTM page allocations from
> > > > exhausting
> > > > the DMA32 zone. IIRC dma_alloc_coherent() uses DMA32 by
> > > > default,
> > > > which
> > > > means if we drop this check, other devices may stop functioning
> > > > unexpectedly?
> > > > 
> > > > However, in the end I'd expect the kernel page allocation
> > > > system
> > > > to
> > > > make sure there are some pages left in the DMA32 zone,
> > > > otherwise
> > > > random non-IO page allocations would also potentially exhaust
> > > > the
> > > > DMA32 zone without anybody caring, which means removing this
> > > > zone
> > > > wouldn't be any worse than whatever other subsystems may be
> > > > doing
> > > > already...
> > > > 
> > > > /Thomas
> > > > 
> > > > 
> > > > On 2/16/19 12:02 AM, Kuehling, Felix wrote:
> > > > > This is an RFC. I'm not sure this is the right solution, but
> > > > > it
> > > > > highlights the problem I'm trying to solve.
> > > > > 
> > > > > The dma32_zone limits the acc_size of all allocated BOs to
> > > > > 2GB.
> > > > > On a
> > > > > 64-bit system with hundreds of GB of system memory and GPU
> > > > > memory,
> > > > > this can become a bottle neck. We're seeing TTM memory
> > > > > allocation
> > > > > failures not because we're truly out of memory, but because
> > > > > we're
> > > > > out of space in the dma32_zone for the acc_size needed for
> > > > > our BO
> > > > > book-keeping.
> > > > > 
> > > > > Signed-off-by: Felix Kuehling 
> > > > > CC: thellst...@vmware.com
> > > > > CC: christian.koe...@amd.com
> > > > > ---
> > > > >drivers/gpu/drm/ttm/ttm_memory.c | 4 ++--
> > > > >1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/ttm/ttm_memory.c
> > > > > b/drivers/gpu/drm/ttm/ttm_memory.c
> > > > > index f1567c3..bb05365 100644
> > > > > --- a/drivers/gpu/drm/ttm/ttm_memory.c
> > > > > +++ b/drivers/gpu/drm/ttm/ttm_memory.c
> > > > > @@ -363,7 +363,7 @@ static int
> > > > > ttm_mem_init_highmem_zone(struct
> > > > > ttm_mem_global *glob,
> > > > >glob->zones[glob->num_zones++] = zone;
> > > > >return 0;
> > > > >}
> > > > > -#else
> > > > > +#elifndef CONFIG_64BIT
> > > > >static int ttm_mem_init_dma32_zone(struct ttm_mem_global
> > > > > *glob,
> > > > >   const struct sysinfo *si)
> > > > >{
> > > > > @@ -441,7 +441,7 @@ int ttm_mem_global_init(struct
> > > > > ttm_mem_global
> > > > > *glob)
> > > > >ret = ttm_mem_init_highmem_zone(glob, );
> > > > >if (unlikely(ret != 0))
> > > > >goto out_no_zone;
> > > > > -#else
> > > > > +#elifndef CONFIG_64BIT
> > > > >ret = ttm_mem_init_dma32_zone(glob, );
> > > > >if (unlikely(ret != 0))
> > > > >goto out_no_zone;
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Cthellstrom%40vmware.com%7C1a84a2a700cd48b3980a08d695c38ade%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636861064581303965sdata=6FCAP6YbostgKfEEhRKKh9eQSrfXR%2BfCczYB8WwlPVY%3Dreserved=0
___
amd-gfx mailing list

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-18 Thread Christian König

Am 18.02.19 um 10:47 schrieb Thomas Hellstrom:

On Mon, 2019-02-18 at 09:20 +, Koenig, Christian wrote:

Another good question is also why the heck the acc_size counts
towards
the DMA32 zone?

DMA32 TTM pages are accounted in the DMA32 zone. Other pages are not.


Yeah, I'm perfectly aware of this. But this is for the accounting size!

We have an accounting for the stuff needed additional to the pages 
backing the BO (e.g. the page and DMA addr array).


And from the bug description it sounds like we use the DMA32 zone for 
this accounting which of course is completely nonsense.


Christian.



For small persistent allocations using ttm_mem_global_alloc(), they are
accounted also in the DMA32 zone, which may cause over-accounting of
that zone, but that's pretty unlikely to be a big problem..

/Thomas






In other words why should the internal bookkeeping pages be allocated
in
the DMA32 zone?

That doesn't sounds valid to me in any way,
Christian.

Am 18.02.19 um 09:02 schrieb Thomas Hellstrom:

Hmm,

This zone was intended to stop TTM page allocations from
exhausting
the DMA32 zone. IIRC dma_alloc_coherent() uses DMA32 by default,
which
means if we drop this check, other devices may stop functioning
unexpectedly?

However, in the end I'd expect the kernel page allocation system
to
make sure there are some pages left in the DMA32 zone, otherwise
random non-IO page allocations would also potentially exhaust the
DMA32 zone without anybody caring, which means removing this zone
wouldn't be any worse than whatever other subsystems may be doing
already...

/Thomas


On 2/16/19 12:02 AM, Kuehling, Felix wrote:

This is an RFC. I'm not sure this is the right solution, but it
highlights the problem I'm trying to solve.

The dma32_zone limits the acc_size of all allocated BOs to 2GB.
On a
64-bit system with hundreds of GB of system memory and GPU
memory,
this can become a bottle neck. We're seeing TTM memory allocation
failures not because we're truly out of memory, but because we're
out of space in the dma32_zone for the acc_size needed for our BO
book-keeping.

Signed-off-by: Felix Kuehling 
CC: thellst...@vmware.com
CC: christian.koe...@amd.com
---
   drivers/gpu/drm/ttm/ttm_memory.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_memory.c
b/drivers/gpu/drm/ttm/ttm_memory.c
index f1567c3..bb05365 100644
--- a/drivers/gpu/drm/ttm/ttm_memory.c
+++ b/drivers/gpu/drm/ttm/ttm_memory.c
@@ -363,7 +363,7 @@ static int ttm_mem_init_highmem_zone(struct
ttm_mem_global *glob,
   glob->zones[glob->num_zones++] = zone;
   return 0;
   }
-#else
+#elifndef CONFIG_64BIT
   static int ttm_mem_init_dma32_zone(struct ttm_mem_global *glob,
  const struct sysinfo *si)
   {
@@ -441,7 +441,7 @@ int ttm_mem_global_init(struct ttm_mem_global
*glob)
   ret = ttm_mem_init_highmem_zone(glob, );
   if (unlikely(ret != 0))
   goto out_no_zone;
-#else
+#elifndef CONFIG_64BIT
   ret = ttm_mem_init_dma32_zone(glob, );
   if (unlikely(ret != 0))
   goto out_no_zone;

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 09/11] drm/syncobj: add transition iotcls between binary and timeline

2019-02-18 Thread Koenig, Christian
Am 18.02.19 um 13:07 schrieb Lionel Landwerlin:
Thanks guys :)

You mentioned that signaling out of order is illegal.
Is this illegal with regard to the vulkan spec or to the syncobj implementation?

David is the expert on that, but as far as I know that is forbidden by the 
vulkan spec.

I'm not finding anything in the vulkan spec that makes out of order signaling 
illegal.
That's why I came up with this test, just verifying that the timeline does not 
go backward in term of its payload.

Well we need to handle this case gracefully in the kernel, so it is still a 
good testcase.

Christian.


-Lionel

On 18/02/2019 11:01, Koenig, Christian wrote:
Hi David,

well I think Lionel is testing the invalid signal order on purpose :)

Anyway we really need to handle invalid order graceful here. E.g. either the 
same way as during CS or we abort and return an error message.

I think just using the same approach as during CS ist the best we can do.

Regards,
Christian


Am 18.02.2019 11:35 schrieb "Zhou, David(ChunMing)" 
:

Hi Lionel,

I checked your igt test case,

uint64_t points[5] = { 1, 5, 3, 7, 6 };

which is illegal signal order.

I must admit we should handle it gracefully if signal isn't in-order, and we 
shouldn't lead to deadlock.

Hi Christian,

Can we just ignore when signal point X <= timeline Y? Or just give a warning?

Otherwise like Lionel's unexpected use cases, which easily leads to deadlock.


-David

On 2019年02月15日 22:28, Lionel Landwerlin wrote:

Hi David,

Thanks a lot for point me to the tests you've added in IGT.
While adding a test with that signals fences imported into a timeline
syncobj out of order, I ran into a deadlock.
Here is the test :
https://github.com/djdeath/intel-gpu-tools/commit/1e46cf7e7bff09b78a24367ddc2314f97eb0a1b9

Trying to kill the deadlocked process I got this backtrace :


[   33.969136] [IGT] syncobj_timeline: starting subtest signal-order
[   60.452823] watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
[syncobj_timelin:2021]
[   60.452826] Modules linked in: rfcomm cmac bnep binfmt_misc
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio sch_fq_codel ib_iser snd_hda_intel
rdma_cm iw_cm snd_hda_codec ib_cm snd_hda_core snd_hwdep intel_rapl
snd_pcm ib_core x86_pkg_temp_thermal intel_powerclamp configf
s coretemp iscsi_tcp snd_seq_midi libiscsi_tcp snd_seq_midi_event
libiscsi kvm_intel scsi_transport_iscsi kvm btusb snd_rawmidi irqbypass
btrtl intel_cstate intel_rapl_perf btbcm btintel bluetooth snd_seq
snd_seq_device snd_timer input_leds ecdh_generic snd soundcore mei_me
mei intel_pch_thermal mac_hid acpi_pad parp
ort_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit
ghash_clmulni_intel prime_numbers
drm_kms_helper aesni_intel syscopyarea sysfillrect
[   60.452876]  sysimgblt fb_sys_fops aes_x86_64 crypto_simd sdhci_pci
cryptd drm e1000e glue_helper cqhci sdhci wmi video
[   60.452881] CPU: 6 PID: 2021 Comm: syncobj_timelin Tainted: G
U5.0.0-rc5+ #337
[   60.452882] Hardware name:  /NUC6i7KYB, BIOS
KYSKLi70.86A.0042.2016.0929.1933 09/29/2016
[   60.452886] RIP: 0010:dma_fence_chain_walk+0x22c/0x260
[   60.452888] Code: ff e9 93 fe ff ff 48 8b 45 08 48 8b 40 18 48 85 c0
74 0c 48 89 ef e8 33 0f 58 00 84 c0 75 23 f0 41 ff 4d 00 0f 88 99 87 2f
00 <0f> 85 05 fe ff ff 4c 89 ef e8 56 ea ff ff 48 89 d8 5b 5d 41 5c 41
[   60.452888] RSP: 0018:9a5804653ca8 EFLAGS: 00010296 ORIG_RAX:
ff13
[   60.452889] RAX:  RBX: 8f5690fb2480 RCX:
8f5690fb2f00
[   60.452890] RDX: 003e3730 RSI:  RDI:
8f5690fb2180
[   60.452891] RBP: 8f5690fb2180 R08:  R09:
8f5690fb2eb0
[   60.452891] R10:  R11: 8f5660469860 R12:
8f5690fb2f68
[   60.452892] R13: 8f5690fb2f00 R14: 0003 R15:
8f5655a45fc0
[   60.452913] FS:  7fdc5c459980() GS:8f569eb8()
knlGS:
[   60.452913] CS:  0010 DS:  ES:  CR0: 80050033
[   60.452914] CR2: 7f9d74336dd8 CR3: 00084a67e004 CR4:
003606e0
[   60.452915] DR0:  DR1:  DR2:

[   60.452915] DR3:  DR6: fffe0ff0 DR7:
0400
[   60.452916] Call Trace:
[   60.452958]  drm_syncobj_add_point+0x102/0x160 [drm]
[   60.452965]  ? drm_syncobj_fd_to_handle_ioctl+0x1b0/0x1b0 [drm]
[   60.452971]  drm_syncobj_transfer_ioctl+0x10f/0x180 [drm]
[   60.452978]  drm_ioctl_kernel+0xac/0xf0 [drm]
[   60.452984]  drm_ioctl+0x2eb/0x3b0 [drm]
[   60.452990]  ? drm_syncobj_fd_to_handle_ioctl+0x1b0/0x1b0 [drm]
[   60.452992]  ? sw_sync_ioctl+0x347/0x370
[   60.452994]  do_vfs_ioctl+0xa4/0x640
[   60.452995]  ? 

[PATCH -next] drm/radeon: remove set but not used variable 'vbi_time_out'

2019-02-18 Thread YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/gpu/drm/radeon/si_dpm.c: In function 'si_program_response_times':
drivers/gpu/drm/radeon/si_dpm.c:3640:29: warning:
 variable 'backbias_response_time' set but not used [-Wunused-but-set-variable]

It never used since introduction.

Signed-off-by: YueHaibing 
---
 drivers/gpu/drm/radeon/si_dpm.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/si_dpm.c b/drivers/gpu/drm/radeon/si_dpm.c
index 0a785ef0ab66..de8bd245497f 100644
--- a/drivers/gpu/drm/radeon/si_dpm.c
+++ b/drivers/gpu/drm/radeon/si_dpm.c
@@ -3637,14 +3637,13 @@ static int si_notify_smc_display_change(struct 
radeon_device *rdev,
 
 static void si_program_response_times(struct radeon_device *rdev)
 {
-   u32 voltage_response_time, backbias_response_time, acpi_delay_time, 
vbi_time_out;
+   u32 voltage_response_time, acpi_delay_time, vbi_time_out;
u32 vddc_dly, acpi_dly, vbi_dly;
u32 reference_clock;
 
si_write_smc_soft_register(rdev, SI_SMC_SOFT_REGISTER_mvdd_chg_time, 1);
 
voltage_response_time = (u32)rdev->pm.dpm.voltage_response_time;
-   backbias_response_time = (u32)rdev->pm.dpm.backbias_response_time;
 
if (voltage_response_time == 0)
voltage_response_time = 1000;



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 09/11] drm/syncobj: add transition iotcls between binary and timeline

2019-02-18 Thread Lionel Landwerlin

Thanks guys :)

You mentioned that signaling out of order is illegal.
Is this illegal with regard to the vulkan spec or to the syncobj 
implementation?


I'm not finding anything in the vulkan spec that makes out of order 
signaling illegal.
That's why I came up with this test, just verifying that the timeline 
does not go backward in term of its payload.


-Lionel

On 18/02/2019 11:01, Koenig, Christian wrote:

Hi David,

well I think Lionel is testing the invalid signal order on purpose :)

Anyway we really need to handle invalid order graceful here. E.g. 
either the same way as during CS or we abort and return an error message.


I think just using the same approach as during CS ist the best we can do.

Regards,
Christian


Am 18.02.2019 11:35 schrieb "Zhou, David(ChunMing)" :

Hi Lionel,

I checked your igt test case,

uint64_t points[5] = { 1, 5, 3, 7, 6 };

which is illegal signal order.

I must admit we should handle it gracefully if signal isn't in-order, 
and we shouldn't lead to deadlock.


Hi Christian,

Can we just ignore when signal point X <= timeline Y? Or just give a 
warning?


Otherwise like Lionel's unexpected use cases, which easily leads to 
deadlock.



-David


On 2019年02月15日 22:28, Lionel Landwerlin wrote:

Hi David,

Thanks a lot for point me to the tests you've added in IGT.
While adding a test with that signals fences imported into a timeline
syncobj out of order, I ran into a deadlock.
Here is the test :
https://github.com/djdeath/intel-gpu-tools/commit/1e46cf7e7bff09b78a24367ddc2314f97eb0a1b9

Trying to kill the deadlocked process I got this backtrace :


[   33.969136] [IGT] syncobj_timeline: starting subtest signal-order
[   60.452823] watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
[syncobj_timelin:2021]
[   60.452826] Modules linked in: rfcomm cmac bnep binfmt_misc
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio sch_fq_codel ib_iser snd_hda_intel
rdma_cm iw_cm snd_hda_codec ib_cm snd_hda_core snd_hwdep intel_rapl
snd_pcm ib_core x86_pkg_temp_thermal intel_powerclamp configf
s coretemp iscsi_tcp snd_seq_midi libiscsi_tcp snd_seq_midi_event
libiscsi kvm_intel scsi_transport_iscsi kvm btusb snd_rawmidi irqbypass
btrtl intel_cstate intel_rapl_perf btbcm btintel bluetooth snd_seq
snd_seq_device snd_timer input_leds ecdh_generic snd soundcore mei_me
mei intel_pch_thermal mac_hid acpi_pad parp
ort_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit
ghash_clmulni_intel prime_numbers
drm_kms_helper aesni_intel syscopyarea sysfillrect
[   60.452876]  sysimgblt fb_sys_fops aes_x86_64 crypto_simd sdhci_pci
cryptd drm e1000e glue_helper cqhci sdhci wmi video
[   60.452881] CPU: 6 PID: 2021 Comm: syncobj_timelin Tainted: G
U    5.0.0-rc5+ #337
[   60.452882] Hardware name:  /NUC6i7KYB, BIOS
KYSKLi70.86A.0042.2016.0929.1933 09/29/2016
[   60.452886] RIP: 0010:dma_fence_chain_walk+0x22c/0x260
[   60.452888] Code: ff e9 93 fe ff ff 48 8b 45 08 48 8b 40 18 48 85 c0
74 0c 48 89 ef e8 33 0f 58 00 84 c0 75 23 f0 41 ff 4d 00 0f 88 99 87 2f
00 <0f> 85 05 fe ff ff 4c 89 ef e8 56 ea ff ff 48 89 d8 5b 5d 41 5c 41
[   60.452888] RSP: 0018:9a5804653ca8 EFLAGS: 00010296 ORIG_RAX:
ff13
[   60.452889] RAX:  RBX: 8f5690fb2480 RCX:
8f5690fb2f00
[   60.452890] RDX: 003e3730 RSI:  RDI:
8f5690fb2180
[   60.452891] RBP: 8f5690fb2180 R08:  R09:
8f5690fb2eb0
[   60.452891] R10:  R11: 8f5660469860 R12:
8f5690fb2f68
[   60.452892] R13: 8f5690fb2f00 R14: 0003 R15:
8f5655a45fc0
[   60.452913] FS:  7fdc5c459980() GS:8f569eb8()
knlGS:
[   60.452913] CS:  0010 DS:  ES:  CR0: 80050033
[   60.452914] CR2: 7f9d74336dd8 CR3: 00084a67e004 CR4:
003606e0
[   60.452915] DR0:  DR1:  DR2:

[   60.452915] DR3:  DR6: fffe0ff0 DR7:
0400
[   60.452916] Call Trace:
[   60.452958]  drm_syncobj_add_point+0x102/0x160 [drm]
[   60.452965]  ? drm_syncobj_fd_to_handle_ioctl+0x1b0/0x1b0 [drm]
[   60.452971]  drm_syncobj_transfer_ioctl+0x10f/0x180 [drm]
[   60.452978]  drm_ioctl_kernel+0xac/0xf0 [drm]
[   60.452984]  drm_ioctl+0x2eb/0x3b0 [drm]
[   60.452990]  ? drm_syncobj_fd_to_handle_ioctl+0x1b0/0x1b0 [drm]
[   60.452992]  ? sw_sync_ioctl+0x347/0x370
[   60.452994]  do_vfs_ioctl+0xa4/0x640
[   60.452995]  ? __fput+0x134/0x220
[   60.452997]  ? do_fcntl+0x1a5/0x650
[   60.452998]  ksys_ioctl+0x70/0x80
[   60.452999]  __x64_sys_ioctl+0x16/0x20
[   60.453002]  do_syscall_64+0x55/0x110
[   60.453004]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   60.453005] RIP: 0033:0x7fdc5b6e45d7
[ 

Re: [PATCH 06/11] drm/syncobj: add timeline payload query ioctl v4

2019-02-18 Thread Lionel Landwerlin

On 18/02/2019 07:28, Koenig, Christian wrote:

Am 18.02.19 um 04:10 schrieb zhoucm1:


On 2019年02月17日 03:22, Christian König wrote:

Am 15.02.19 um 20:31 schrieb Lionel Landwerlin via amd-gfx:

On 07/12/2018 09:55, Chunming Zhou wrote:

user mode can query timeline payload.
v2: check return value of copy_to_user
v3: handle querying entry by entry
v4: rebase on new chain container, simplify interface

Signed-off-by: Chunming Zhou 
Cc: Daniel Rakos 
Cc: Jason Ekstrand 
Cc: Bas Nieuwenhuizen 
Cc: Dave Airlie 
Cc: Christian König 
Cc: Chris Wilson 
---
   drivers/gpu/drm/drm_internal.h |  2 ++
   drivers/gpu/drm/drm_ioctl.c    |  2 ++
   drivers/gpu/drm/drm_syncobj.c  | 43
++
   include/uapi/drm/drm.h | 10 
   4 files changed, 57 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h
b/drivers/gpu/drm/drm_internal.h
index 18b41e10195c..dab4d5936441 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -184,6 +184,8 @@ int drm_syncobj_reset_ioctl(struct drm_device
*dev, void *data,
   struct drm_file *file_private);
   int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
    struct drm_file *file_private);
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+    struct drm_file *file_private);
     /* drm_framebuffer.c */
   void drm_framebuffer_print_info(struct drm_printer *p, unsigned
int indent,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index a9a17ed35cc4..7578ef6dc1d1 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -681,6 +681,8 @@ static const struct drm_ioctl_desc drm_ioctls[]
= {
     DRM_UNLOCKED|DRM_RENDER_ALLOW),
   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL,
drm_syncobj_signal_ioctl,
     DRM_UNLOCKED|DRM_RENDER_ALLOW),
+    DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
+  DRM_UNLOCKED|DRM_RENDER_ALLOW),
   DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE,
drm_crtc_get_sequence_ioctl, DRM_UNLOCKED),
   DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE,
drm_crtc_queue_sequence_ioctl, DRM_UNLOCKED),
   DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE,
drm_mode_create_lease_ioctl, DRM_MASTER|DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c
b/drivers/gpu/drm/drm_syncobj.c
index 348079bb0965..f97fa00ca1d0 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1061,3 +1061,46 @@ drm_syncobj_signal_ioctl(struct drm_device
*dev, void *data,
     return ret;
   }
+
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+    struct drm_file *file_private)
+{
+    struct drm_syncobj_timeline_array *args = data;
+    struct drm_syncobj **syncobjs;
+    uint64_t __user *points = u64_to_user_ptr(args->points);
+    uint32_t i;
+    int ret;
+
+    if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+    return -ENODEV;
+
+    if (args->pad != 0)
+    return -EINVAL;
+
+    if (args->count_handles == 0)
+    return -EINVAL;
+
+    ret = drm_syncobj_array_find(file_private,
+ u64_to_user_ptr(args->handles),
+ args->count_handles,
+ );
+    if (ret < 0)
+    return ret;
+
+    for (i = 0; i < args->count_handles; i++) {
+    struct dma_fence_chain *chain;
+    struct dma_fence *fence;
+    uint64_t point;
+
+    fence = drm_syncobj_fence_get(syncobjs[i]);
+    chain = to_dma_fence_chain(fence);
+    point = chain ? fence->seqno : 0;


Sorry, I don' t want to sound annoying, but this looks like this
could report values going backward.

Well please be annoying as much as you can :) But yeah all that stuff
has been discussed before as well.


Anything add a point X to a timeline that has reached value Y with X
< Y would trigger that.

Yes, that can indeed happen.

trigger what? when adding x (x < y), then return 0 when query?
Why would this happen?
No, syncobj->fence should always be there and the last chain node, if
it is ever added.

Well maybe Lionel should clarify a bit what he means?

I thought he is concerned that the call could return values where X < Y,
but that doesn't seem to be the case.

Christian.



I meant something like this :


t = create_timeline_syncobj()

signal(t, 2)

query(t) => 2

signal(t, 1)

query(t) => 1


-Lionel






-David

But adding a timeline point X which is before the already added point
Y is illegal in the first place :)

So when the application does something stupid and breaks it can just
keep the pieces.

In the kernel we still do the most defensive thing and sync to
everything in this case.

I'm just not sure if we should print an error into syslog or just
continue silently.

Regards,
Christian.


Either through the submission or userspace signaling or importing
another syncpoint's fence.


-Lionel



+    ret = copy_to_user([i], , sizeof(uint64_t));
+    ret = 

Re: [PATCH 09/11] drm/syncobj: add transition iotcls between binary and timeline

2019-02-18 Thread Koenig, Christian
Hi David,

well I think Lionel is testing the invalid signal order on purpose :)

Anyway we really need to handle invalid order graceful here. E.g. either the 
same way as during CS or we abort and return an error message.

I think just using the same approach as during CS ist the best we can do.

Regards,
Christian


Am 18.02.2019 11:35 schrieb "Zhou, David(ChunMing)" :

Hi Lionel,

I checked your igt test case,

uint64_t points[5] = { 1, 5, 3, 7, 6 };

which is illegal signal order.

I must admit we should handle it gracefully if signal isn't in-order, and we 
shouldn't lead to deadlock.

Hi Christian,

Can we just ignore when signal point X <= timeline Y? Or just give a warning?

Otherwise like Lionel's unexpected use cases, which easily leads to deadlock.


-David

On 2019年02月15日 22:28, Lionel Landwerlin wrote:

Hi David,

Thanks a lot for point me to the tests you've added in IGT.
While adding a test with that signals fences imported into a timeline
syncobj out of order, I ran into a deadlock.
Here is the test :
https://github.com/djdeath/intel-gpu-tools/commit/1e46cf7e7bff09b78a24367ddc2314f97eb0a1b9

Trying to kill the deadlocked process I got this backtrace :


[   33.969136] [IGT] syncobj_timeline: starting subtest signal-order
[   60.452823] watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
[syncobj_timelin:2021]
[   60.452826] Modules linked in: rfcomm cmac bnep binfmt_misc
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio sch_fq_codel ib_iser snd_hda_intel
rdma_cm iw_cm snd_hda_codec ib_cm snd_hda_core snd_hwdep intel_rapl
snd_pcm ib_core x86_pkg_temp_thermal intel_powerclamp configf
s coretemp iscsi_tcp snd_seq_midi libiscsi_tcp snd_seq_midi_event
libiscsi kvm_intel scsi_transport_iscsi kvm btusb snd_rawmidi irqbypass
btrtl intel_cstate intel_rapl_perf btbcm btintel bluetooth snd_seq
snd_seq_device snd_timer input_leds ecdh_generic snd soundcore mei_me
mei intel_pch_thermal mac_hid acpi_pad parp
ort_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit
ghash_clmulni_intel prime_numbers
drm_kms_helper aesni_intel syscopyarea sysfillrect
[   60.452876]  sysimgblt fb_sys_fops aes_x86_64 crypto_simd sdhci_pci
cryptd drm e1000e glue_helper cqhci sdhci wmi video
[   60.452881] CPU: 6 PID: 2021 Comm: syncobj_timelin Tainted: G
U5.0.0-rc5+ #337
[   60.452882] Hardware name:  /NUC6i7KYB, BIOS
KYSKLi70.86A.0042.2016.0929.1933 09/29/2016
[   60.452886] RIP: 0010:dma_fence_chain_walk+0x22c/0x260
[   60.452888] Code: ff e9 93 fe ff ff 48 8b 45 08 48 8b 40 18 48 85 c0
74 0c 48 89 ef e8 33 0f 58 00 84 c0 75 23 f0 41 ff 4d 00 0f 88 99 87 2f
00 <0f> 85 05 fe ff ff 4c 89 ef e8 56 ea ff ff 48 89 d8 5b 5d 41 5c 41
[   60.452888] RSP: 0018:9a5804653ca8 EFLAGS: 00010296 ORIG_RAX:
ff13
[   60.452889] RAX:  RBX: 8f5690fb2480 RCX:
8f5690fb2f00
[   60.452890] RDX: 003e3730 RSI:  RDI:
8f5690fb2180
[   60.452891] RBP: 8f5690fb2180 R08:  R09:
8f5690fb2eb0
[   60.452891] R10:  R11: 8f5660469860 R12:
8f5690fb2f68
[   60.452892] R13: 8f5690fb2f00 R14: 0003 R15:
8f5655a45fc0
[   60.452913] FS:  7fdc5c459980() GS:8f569eb8()
knlGS:
[   60.452913] CS:  0010 DS:  ES:  CR0: 80050033
[   60.452914] CR2: 7f9d74336dd8 CR3: 00084a67e004 CR4:
003606e0
[   60.452915] DR0:  DR1:  DR2:

[   60.452915] DR3:  DR6: fffe0ff0 DR7:
0400
[   60.452916] Call Trace:
[   60.452958]  drm_syncobj_add_point+0x102/0x160 [drm]
[   60.452965]  ? drm_syncobj_fd_to_handle_ioctl+0x1b0/0x1b0 [drm]
[   60.452971]  drm_syncobj_transfer_ioctl+0x10f/0x180 [drm]
[   60.452978]  drm_ioctl_kernel+0xac/0xf0 [drm]
[   60.452984]  drm_ioctl+0x2eb/0x3b0 [drm]
[   60.452990]  ? drm_syncobj_fd_to_handle_ioctl+0x1b0/0x1b0 [drm]
[   60.452992]  ? sw_sync_ioctl+0x347/0x370
[   60.452994]  do_vfs_ioctl+0xa4/0x640
[   60.452995]  ? __fput+0x134/0x220
[   60.452997]  ? do_fcntl+0x1a5/0x650
[   60.452998]  ksys_ioctl+0x70/0x80
[   60.452999]  __x64_sys_ioctl+0x16/0x20
[   60.453002]  do_syscall_64+0x55/0x110
[   60.453004]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   60.453005] RIP: 0033:0x7fdc5b6e45d7
[   60.453006] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00
48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
[   60.453007] RSP: 002b:7fff25c4d198 EFLAGS: 0206 ORIG_RAX:
0010
[   60.453008] RAX: ffda RBX:  RCX:
7fdc5b6e45d7
[   60.453008] RDX: 7fff25c4d200 RSI: 

Re: [PATCH 09/11] drm/syncobj: add transition iotcls between binary and timeline

2019-02-18 Thread zhoucm1

Hi Lionel,

I checked your igt test case,

uint64_t points[5] = { 1, 5, 3, 7, 6 };

which is illegal signal order.

I must admit we should handle it gracefully if signal isn't in-order, 
and we shouldn't lead to deadlock.


Hi Christian,

Can we just ignore when signal point X <= timeline Y? Or just give a 
warning?


Otherwise like Lionel's unexpected use cases, which easily leads to 
deadlock.



-David


On 2019年02月15日 22:28, Lionel Landwerlin wrote:

Hi David,

Thanks a lot for point me to the tests you've added in IGT.
While adding a test with that signals fences imported into a timeline
syncobj out of order, I ran into a deadlock.
Here is the test :
https://github.com/djdeath/intel-gpu-tools/commit/1e46cf7e7bff09b78a24367ddc2314f97eb0a1b9

Trying to kill the deadlocked process I got this backtrace :


[   33.969136] [IGT] syncobj_timeline: starting subtest signal-order
[   60.452823] watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
[syncobj_timelin:2021]
[   60.452826] Modules linked in: rfcomm cmac bnep binfmt_misc
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio sch_fq_codel ib_iser snd_hda_intel
rdma_cm iw_cm snd_hda_codec ib_cm snd_hda_core snd_hwdep intel_rapl
snd_pcm ib_core x86_pkg_temp_thermal intel_powerclamp configf
s coretemp iscsi_tcp snd_seq_midi libiscsi_tcp snd_seq_midi_event
libiscsi kvm_intel scsi_transport_iscsi kvm btusb snd_rawmidi irqbypass
btrtl intel_cstate intel_rapl_perf btbcm btintel bluetooth snd_seq
snd_seq_device snd_timer input_leds ecdh_generic snd soundcore mei_me
mei intel_pch_thermal mac_hid acpi_pad parp
ort_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit
ghash_clmulni_intel prime_numbers
drm_kms_helper aesni_intel syscopyarea sysfillrect
[   60.452876]  sysimgblt fb_sys_fops aes_x86_64 crypto_simd sdhci_pci
cryptd drm e1000e glue_helper cqhci sdhci wmi video
[   60.452881] CPU: 6 PID: 2021 Comm: syncobj_timelin Tainted: G
U    5.0.0-rc5+ #337
[   60.452882] Hardware name:  /NUC6i7KYB, BIOS
KYSKLi70.86A.0042.2016.0929.1933 09/29/2016
[   60.452886] RIP: 0010:dma_fence_chain_walk+0x22c/0x260
[   60.452888] Code: ff e9 93 fe ff ff 48 8b 45 08 48 8b 40 18 48 85 c0
74 0c 48 89 ef e8 33 0f 58 00 84 c0 75 23 f0 41 ff 4d 00 0f 88 99 87 2f
00 <0f> 85 05 fe ff ff 4c 89 ef e8 56 ea ff ff 48 89 d8 5b 5d 41 5c 41
[   60.452888] RSP: 0018:9a5804653ca8 EFLAGS: 00010296 ORIG_RAX:
ff13
[   60.452889] RAX:  RBX: 8f5690fb2480 RCX:
8f5690fb2f00
[   60.452890] RDX: 003e3730 RSI:  RDI:
8f5690fb2180
[   60.452891] RBP: 8f5690fb2180 R08:  R09:
8f5690fb2eb0
[   60.452891] R10:  R11: 8f5660469860 R12:
8f5690fb2f68
[   60.452892] R13: 8f5690fb2f00 R14: 0003 R15:
8f5655a45fc0
[   60.452913] FS:  7fdc5c459980() GS:8f569eb8()
knlGS:
[   60.452913] CS:  0010 DS:  ES:  CR0: 80050033
[   60.452914] CR2: 7f9d74336dd8 CR3: 00084a67e004 CR4:
003606e0
[   60.452915] DR0:  DR1:  DR2:

[   60.452915] DR3:  DR6: fffe0ff0 DR7:
0400
[   60.452916] Call Trace:
[   60.452958]  drm_syncobj_add_point+0x102/0x160 [drm]
[   60.452965]  ? drm_syncobj_fd_to_handle_ioctl+0x1b0/0x1b0 [drm]
[   60.452971]  drm_syncobj_transfer_ioctl+0x10f/0x180 [drm]
[   60.452978]  drm_ioctl_kernel+0xac/0xf0 [drm]
[   60.452984]  drm_ioctl+0x2eb/0x3b0 [drm]
[   60.452990]  ? drm_syncobj_fd_to_handle_ioctl+0x1b0/0x1b0 [drm]
[   60.452992]  ? sw_sync_ioctl+0x347/0x370
[   60.452994]  do_vfs_ioctl+0xa4/0x640
[   60.452995]  ? __fput+0x134/0x220
[   60.452997]  ? do_fcntl+0x1a5/0x650
[   60.452998]  ksys_ioctl+0x70/0x80
[   60.452999]  __x64_sys_ioctl+0x16/0x20
[   60.453002]  do_syscall_64+0x55/0x110
[   60.453004]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   60.453005] RIP: 0033:0x7fdc5b6e45d7
[   60.453006] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00
48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
[   60.453007] RSP: 002b:7fff25c4d198 EFLAGS: 0206 ORIG_RAX:
0010
[   60.453008] RAX: ffda RBX:  RCX:
7fdc5b6e45d7
[   60.453008] RDX: 7fff25c4d200 RSI: c02064cc RDI:
0003
[   60.453009] RBP: 7fff25c4d1d0 R08:  R09:
001e
[   60.453010] R10:  R11: 0206 R12:
563d3959e4d0
[   60.453010] R13: 7fff25c4d620 R14:  R15:

[   88.447359] watchdog: BUG: soft lockup - CPU#6 stuck for 22s!
[syncobj_timelin:2021]

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-18 Thread Thomas Hellstrom
On Mon, 2019-02-18 at 09:20 +, Koenig, Christian wrote:
> Another good question is also why the heck the acc_size counts
> towards 
> the DMA32 zone?

DMA32 TTM pages are accounted in the DMA32 zone. Other pages are not.

For small persistent allocations using ttm_mem_global_alloc(), they are
accounted also in the DMA32 zone, which may cause over-accounting of
that zone, but that's pretty unlikely to be a big problem..

/Thomas





> 
> In other words why should the internal bookkeeping pages be allocated
> in 
> the DMA32 zone?
> 
> That doesn't sounds valid to me in any way,
> Christian.
> 
> Am 18.02.19 um 09:02 schrieb Thomas Hellstrom:
> > Hmm,
> > 
> > This zone was intended to stop TTM page allocations from
> > exhausting 
> > the DMA32 zone. IIRC dma_alloc_coherent() uses DMA32 by default,
> > which 
> > means if we drop this check, other devices may stop functioning 
> > unexpectedly?
> > 
> > However, in the end I'd expect the kernel page allocation system
> > to 
> > make sure there are some pages left in the DMA32 zone, otherwise 
> > random non-IO page allocations would also potentially exhaust the 
> > DMA32 zone without anybody caring, which means removing this zone 
> > wouldn't be any worse than whatever other subsystems may be doing 
> > already...
> > 
> > /Thomas
> > 
> > 
> > On 2/16/19 12:02 AM, Kuehling, Felix wrote:
> > > This is an RFC. I'm not sure this is the right solution, but it
> > > highlights the problem I'm trying to solve.
> > > 
> > > The dma32_zone limits the acc_size of all allocated BOs to 2GB.
> > > On a
> > > 64-bit system with hundreds of GB of system memory and GPU
> > > memory,
> > > this can become a bottle neck. We're seeing TTM memory allocation
> > > failures not because we're truly out of memory, but because we're
> > > out of space in the dma32_zone for the acc_size needed for our BO
> > > book-keeping.
> > > 
> > > Signed-off-by: Felix Kuehling 
> > > CC: thellst...@vmware.com
> > > CC: christian.koe...@amd.com
> > > ---
> > >   drivers/gpu/drm/ttm/ttm_memory.c | 4 ++--
> > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/ttm/ttm_memory.c 
> > > b/drivers/gpu/drm/ttm/ttm_memory.c
> > > index f1567c3..bb05365 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_memory.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_memory.c
> > > @@ -363,7 +363,7 @@ static int ttm_mem_init_highmem_zone(struct 
> > > ttm_mem_global *glob,
> > >   glob->zones[glob->num_zones++] = zone;
> > >   return 0;
> > >   }
> > > -#else
> > > +#elifndef CONFIG_64BIT
> > >   static int ttm_mem_init_dma32_zone(struct ttm_mem_global *glob,
> > >  const struct sysinfo *si)
> > >   {
> > > @@ -441,7 +441,7 @@ int ttm_mem_global_init(struct ttm_mem_global
> > > *glob)
> > >   ret = ttm_mem_init_highmem_zone(glob, );
> > >   if (unlikely(ret != 0))
> > >   goto out_no_zone;
> > > -#else
> > > +#elifndef CONFIG_64BIT
> > >   ret = ttm_mem_init_dma32_zone(glob, );
> > >   if (unlikely(ret != 0))
> > >   goto out_no_zone;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-18 Thread Koenig, Christian
Another good question is also why the heck the acc_size counts towards 
the DMA32 zone?

In other words why should the internal bookkeeping pages be allocated in 
the DMA32 zone?

That doesn't sounds valid to me in any way,
Christian.

Am 18.02.19 um 09:02 schrieb Thomas Hellstrom:
> Hmm,
>
> This zone was intended to stop TTM page allocations from exhausting 
> the DMA32 zone. IIRC dma_alloc_coherent() uses DMA32 by default, which 
> means if we drop this check, other devices may stop functioning 
> unexpectedly?
>
> However, in the end I'd expect the kernel page allocation system to 
> make sure there are some pages left in the DMA32 zone, otherwise 
> random non-IO page allocations would also potentially exhaust the 
> DMA32 zone without anybody caring, which means removing this zone 
> wouldn't be any worse than whatever other subsystems may be doing 
> already...
>
> /Thomas
>
>
> On 2/16/19 12:02 AM, Kuehling, Felix wrote:
>> This is an RFC. I'm not sure this is the right solution, but it
>> highlights the problem I'm trying to solve.
>>
>> The dma32_zone limits the acc_size of all allocated BOs to 2GB. On a
>> 64-bit system with hundreds of GB of system memory and GPU memory,
>> this can become a bottle neck. We're seeing TTM memory allocation
>> failures not because we're truly out of memory, but because we're
>> out of space in the dma32_zone for the acc_size needed for our BO
>> book-keeping.
>>
>> Signed-off-by: Felix Kuehling 
>> CC: thellst...@vmware.com
>> CC: christian.koe...@amd.com
>> ---
>>   drivers/gpu/drm/ttm/ttm_memory.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_memory.c 
>> b/drivers/gpu/drm/ttm/ttm_memory.c
>> index f1567c3..bb05365 100644
>> --- a/drivers/gpu/drm/ttm/ttm_memory.c
>> +++ b/drivers/gpu/drm/ttm/ttm_memory.c
>> @@ -363,7 +363,7 @@ static int ttm_mem_init_highmem_zone(struct 
>> ttm_mem_global *glob,
>>   glob->zones[glob->num_zones++] = zone;
>>   return 0;
>>   }
>> -#else
>> +#elifndef CONFIG_64BIT
>>   static int ttm_mem_init_dma32_zone(struct ttm_mem_global *glob,
>>  const struct sysinfo *si)
>>   {
>> @@ -441,7 +441,7 @@ int ttm_mem_global_init(struct ttm_mem_global *glob)
>>   ret = ttm_mem_init_highmem_zone(glob, );
>>   if (unlikely(ret != 0))
>>   goto out_no_zone;
>> -#else
>> +#elifndef CONFIG_64BIT
>>   ret = ttm_mem_init_dma32_zone(glob, );
>>   if (unlikely(ret != 0))
>>   goto out_no_zone;
>
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu/powerplay: fix spelling mistake: "Attemp" -> "Attempt"

2019-02-18 Thread Colin King
From: Colin Ian King 

There are multiple occurrances of the same spelling mistake in
messages in PP_ASSERT_WITH_CODE macros. Fix these.

Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/amd/powerplay/smumgr/vega12_smumgr.c | 12 ++--
 drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c | 12 ++--
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/vega12_smumgr.c 
b/drivers/gpu/drm/amd/powerplay/smumgr/vega12_smumgr.c
index ddb801517667..7331a5ef199f 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/vega12_smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/vega12_smumgr.c
@@ -121,20 +121,20 @@ int vega12_enable_smc_features(struct pp_hwmgr *hwmgr,
if (enable) {
PP_ASSERT_WITH_CODE(smu9_send_msg_to_smc_with_parameter(hwmgr,
PPSMC_MSG_EnableSmuFeaturesLow, 
smu_features_low) == 0,
-   "[EnableDisableSMCFeatures] Attemp to enable 
SMU features Low failed!",
+   "[EnableDisableSMCFeatures] Attempt to enable 
SMU features Low failed!",
return -EINVAL);
PP_ASSERT_WITH_CODE(smu9_send_msg_to_smc_with_parameter(hwmgr,
PPSMC_MSG_EnableSmuFeaturesHigh, 
smu_features_high) == 0,
-   "[EnableDisableSMCFeatures] Attemp to enable 
SMU features High failed!",
+   "[EnableDisableSMCFeatures] Attempt to enable 
SMU features High failed!",
return -EINVAL);
} else {
PP_ASSERT_WITH_CODE(smu9_send_msg_to_smc_with_parameter(hwmgr,
PPSMC_MSG_DisableSmuFeaturesLow, 
smu_features_low) == 0,
-   "[EnableDisableSMCFeatures] Attemp to disable 
SMU features Low failed!",
+   "[EnableDisableSMCFeatures] Attempt to disable 
SMU features Low failed!",
return -EINVAL);
PP_ASSERT_WITH_CODE(smu9_send_msg_to_smc_with_parameter(hwmgr,
PPSMC_MSG_DisableSmuFeaturesHigh, 
smu_features_high) == 0,
-   "[EnableDisableSMCFeatures] Attemp to disable 
SMU features High failed!",
+   "[EnableDisableSMCFeatures] Attempt to disable 
SMU features High failed!",
return -EINVAL);
}
 
@@ -151,13 +151,13 @@ int vega12_get_enabled_smc_features(struct pp_hwmgr 
*hwmgr,
 
PP_ASSERT_WITH_CODE(smu9_send_msg_to_smc(hwmgr,
PPSMC_MSG_GetEnabledSmuFeaturesLow) == 0,
-   "[GetEnabledSMCFeatures] Attemp to get SMU features Low 
failed!",
+   "[GetEnabledSMCFeatures] Attempt to get SMU features 
Low failed!",
return -EINVAL);
smc_features_low = smu9_get_argument(hwmgr);
 
PP_ASSERT_WITH_CODE(smu9_send_msg_to_smc(hwmgr,
PPSMC_MSG_GetEnabledSmuFeaturesHigh) == 0,
-   "[GetEnabledSMCFeatures] Attemp to get SMU features 
High failed!",
+   "[GetEnabledSMCFeatures] Attempt to get SMU features 
High failed!",
return -EINVAL);
smc_features_high = smu9_get_argument(hwmgr);
 
diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c 
b/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c
index b7ff7d4d6f44..be8a49702e82 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c
@@ -304,20 +304,20 @@ int vega20_enable_smc_features(struct pp_hwmgr *hwmgr,
if (enable) {
PP_ASSERT_WITH_CODE((ret = 
vega20_send_msg_to_smc_with_parameter(hwmgr,
PPSMC_MSG_EnableSmuFeaturesLow, 
smu_features_low)) == 0,
-   "[EnableDisableSMCFeatures] Attemp to enable 
SMU features Low failed!",
+   "[EnableDisableSMCFeatures] Attempt to enable 
SMU features Low failed!",
return ret);
PP_ASSERT_WITH_CODE((ret = 
vega20_send_msg_to_smc_with_parameter(hwmgr,
PPSMC_MSG_EnableSmuFeaturesHigh, 
smu_features_high)) == 0,
-   "[EnableDisableSMCFeatures] Attemp to enable 
SMU features High failed!",
+   "[EnableDisableSMCFeatures] Attempt to enable 
SMU features High failed!",
return ret);
} else {
PP_ASSERT_WITH_CODE((ret = 
vega20_send_msg_to_smc_with_parameter(hwmgr,
PPSMC_MSG_DisableSmuFeaturesLow, 
smu_features_low)) == 0,
-   "[EnableDisableSMCFeatures] Attemp to disable 
SMU features Low failed!",
+   

Re: [PATCH] drm: prefix header search paths with $(srctree)/

2019-02-18 Thread Masahiro Yamada
On Thu, Jan 31, 2019 at 1:01 PM Masahiro Yamada
 wrote:
>
> Currently, the Kbuild core manipulates header search paths in a crazy
> way [1].
>
> To fix this mess, I want all Makefiles to add explicit $(srctree)/ to
> the search paths in the srctree. Some Makefiles are already written in
> that way, but not all. The goal of this work is to make the notation
> consistent, and finally get rid of the gross hacks.
>
> Having whitespaces after -I does not matter since commit 48f6e3cf5bc6
> ("kbuild: do not drop -I without parameter").
>
> [1]: https://patchwork.kernel.org/patch/9632347/
>
> Signed-off-by: Masahiro Yamada 
> ---


Could you take a look at this series, please?

Thanks.





> I put all gpu/drm changes into a single patch because
> they are trivial conversion.
>
> Please let me know if I need to split this into per-driver patches.
>
>
>  drivers/gpu/drm/amd/amdgpu/Makefile | 2 +-
>  drivers/gpu/drm/amd/lib/Makefile| 2 +-
>  drivers/gpu/drm/i915/gvt/Makefile   | 2 +-
>  drivers/gpu/drm/msm/Makefile| 6 +++---
>  drivers/gpu/drm/nouveau/Kbuild  | 8 
>  5 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
> b/drivers/gpu/drm/amd/amdgpu/Makefile
> index f76bcb9..b21ebb0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -23,7 +23,7 @@
>  # Makefile for the drm device driver.  This driver provides support for the
>  # Direct Rendering Infrastructure (DRI) in XFree86 4.1.0 and higher.
>
> -FULL_AMD_PATH=$(src)/..
> +FULL_AMD_PATH=$(srctree)/$(src)/..
>  DISPLAY_FOLDER_NAME=display
>  FULL_AMD_DISPLAY_PATH = $(FULL_AMD_PATH)/$(DISPLAY_FOLDER_NAME)
>
> diff --git a/drivers/gpu/drm/amd/lib/Makefile 
> b/drivers/gpu/drm/amd/lib/Makefile
> index 6902430..d534992 100644
> --- a/drivers/gpu/drm/amd/lib/Makefile
> +++ b/drivers/gpu/drm/amd/lib/Makefile
> @@ -27,6 +27,6 @@
>  # driver components or later moved to kernel/lib for sharing with
>  # other drivers.
>
> -ccflags-y := -I$(src)/../include
> +ccflags-y := -I $(srctree)/$(src)/../include
>
>  obj-$(CONFIG_CHASH) += chash.o
> diff --git a/drivers/gpu/drm/i915/gvt/Makefile 
> b/drivers/gpu/drm/i915/gvt/Makefile
> index b016dc7..a4a5a96 100644
> --- a/drivers/gpu/drm/i915/gvt/Makefile
> +++ b/drivers/gpu/drm/i915/gvt/Makefile
> @@ -5,6 +5,6 @@ GVT_SOURCE := gvt.o aperture_gm.o handlers.o vgpu.o 
> trace_points.o firmware.o \
> execlist.o scheduler.o sched_policy.o mmio_context.o cmd_parser.o 
> debugfs.o \
> fb_decoder.o dmabuf.o page_track.o
>
> -ccflags-y  += -I$(src) -I$(src)/$(GVT_DIR)
> +ccflags-y  += -I $(srctree)/$(src) -I 
> $(srctree)/$(src)/$(GVT_DIR)/
>  i915-y += $(addprefix $(GVT_DIR)/, 
> $(GVT_SOURCE))
>  obj-$(CONFIG_DRM_I915_GVT_KVMGT)   += $(GVT_DIR)/kvmgt.o
> diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
> index 56a70c7..b7b1ebd 100644
> --- a/drivers/gpu/drm/msm/Makefile
> +++ b/drivers/gpu/drm/msm/Makefile
> @@ -1,7 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0
> -ccflags-y := -Idrivers/gpu/drm/msm
> -ccflags-y += -Idrivers/gpu/drm/msm/disp/dpu1
> -ccflags-$(CONFIG_DRM_MSM_DSI) += -Idrivers/gpu/drm/msm/dsi
> +ccflags-y := -I $(srctree)/$(src)
> +ccflags-y += -I $(srctree)/$(src)/disp/dpu1
> +ccflags-$(CONFIG_DRM_MSM_DSI) += -I $(srctree)/$(src)/dsi
>
>  msm-y := \
> adreno/adreno_device.o \
> diff --git a/drivers/gpu/drm/nouveau/Kbuild b/drivers/gpu/drm/nouveau/Kbuild
> index b17843d..b4bc88ad 100644
> --- a/drivers/gpu/drm/nouveau/Kbuild
> +++ b/drivers/gpu/drm/nouveau/Kbuild
> @@ -1,7 +1,7 @@
> -ccflags-y += -I$(src)/include
> -ccflags-y += -I$(src)/include/nvkm
> -ccflags-y += -I$(src)/nvkm
> -ccflags-y += -I$(src)
> +ccflags-y += -I $(srctree)/$(src)/include
> +ccflags-y += -I $(srctree)/$(src)/include/nvkm
> +ccflags-y += -I $(srctree)/$(src)/nvkm
> +ccflags-y += -I $(srctree)/$(src)
>
>  # NVKM - HW resource manager
>  #- code also used by various userspace tools/tests
> --
> 2.7.4
>


-- 
Best Regards
Masahiro Yamada
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-18 Thread Thomas Hellstrom

Hmm,

This zone was intended to stop TTM page allocations from exhausting the 
DMA32 zone. IIRC dma_alloc_coherent() uses DMA32 by default, which means 
if we drop this check, other devices may stop functioning unexpectedly?


However, in the end I'd expect the kernel page allocation system to make 
sure there are some pages left in the DMA32 zone, otherwise random 
non-IO page allocations would also potentially exhaust the DMA32 zone 
without anybody caring, which means removing this zone wouldn't be any 
worse than whatever other subsystems may be doing already...


/Thomas


On 2/16/19 12:02 AM, Kuehling, Felix wrote:

This is an RFC. I'm not sure this is the right solution, but it
highlights the problem I'm trying to solve.

The dma32_zone limits the acc_size of all allocated BOs to 2GB. On a
64-bit system with hundreds of GB of system memory and GPU memory,
this can become a bottle neck. We're seeing TTM memory allocation
failures not because we're truly out of memory, but because we're
out of space in the dma32_zone for the acc_size needed for our BO
book-keeping.

Signed-off-by: Felix Kuehling 
CC: thellst...@vmware.com
CC: christian.koe...@amd.com
---
  drivers/gpu/drm/ttm/ttm_memory.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c
index f1567c3..bb05365 100644
--- a/drivers/gpu/drm/ttm/ttm_memory.c
+++ b/drivers/gpu/drm/ttm/ttm_memory.c
@@ -363,7 +363,7 @@ static int ttm_mem_init_highmem_zone(struct ttm_mem_global 
*glob,
glob->zones[glob->num_zones++] = zone;
return 0;
  }
-#else
+#elifndef CONFIG_64BIT
  static int ttm_mem_init_dma32_zone(struct ttm_mem_global *glob,
   const struct sysinfo *si)
  {
@@ -441,7 +441,7 @@ int ttm_mem_global_init(struct ttm_mem_global *glob)
ret = ttm_mem_init_highmem_zone(glob, );
if (unlikely(ret != 0))
goto out_no_zone;
-#else
+#elifndef CONFIG_64BIT
ret = ttm_mem_init_dma32_zone(glob, );
if (unlikely(ret != 0))
goto out_no_zone;



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx