date:20190215

[PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-15 Thread Kuehling, Felix

This is an RFC. I'm not sure this is the right solution, but it
highlights the problem I'm trying to solve.

The dma32_zone limits the acc_size of all allocated BOs to 2GB. On a
64-bit system with hundreds of GB of system memory and GPU memory,
this can become a bottle neck. We're seeing TTM memory allocation
failures not because we're truly out of memory, but because we're
out of space in the dma32_zone for the acc_size needed for our BO
book-keeping.

Signed-off-by: Felix Kuehling 
CC: thellst...@vmware.com
CC: christian.koe...@amd.com
---
 drivers/gpu/drm/ttm/ttm_memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c
index f1567c3..bb05365 100644
--- a/drivers/gpu/drm/ttm/ttm_memory.c
+++ b/drivers/gpu/drm/ttm/ttm_memory.c
@@ -363,7 +363,7 @@ static int ttm_mem_init_highmem_zone(struct ttm_mem_global 
*glob,
glob->zones[glob->num_zones++] = zone;
return 0;
 }
-#else
+#elifndef CONFIG_64BIT
 static int ttm_mem_init_dma32_zone(struct ttm_mem_global *glob,
   const struct sysinfo *si)
 {
@@ -441,7 +441,7 @@ int ttm_mem_global_init(struct ttm_mem_global *glob)
ret = ttm_mem_init_highmem_zone(glob, &si);
if (unlikely(ret != 0))
goto out_no_zone;
-#else
+#elifndef CONFIG_64BIT
ret = ttm_mem_init_dma32_zone(glob, &si);
if (unlikely(ret != 0))
goto out_no_zone;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 6/6] drm/amdgpu: use BACO on vega12 if platform supports it

2019-02-15 Thread Alex Deucher via amd-gfx

Use BACO for reset of the platform supports it.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/soc15.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 99ebcf29dcb0..b2cbe4b42a3d 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -461,6 +461,7 @@ static int soc15_asic_reset(struct amdgpu_device *adev)
 
switch (adev->asic_type) {
case CHIP_VEGA10:
+   case CHIP_VEGA12:
case CHIP_VEGA20:
soc15_asic_get_baco_capability(adev, &baco_reset);
break;
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 3/6] drm/amdgpu/powerplay: fix typo in BACO header guards

2019-02-15 Thread Alex Deucher via amd-gfx

s/BOCO/BACO/g

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.h | 4 ++--
 drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.h 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.h
index a93b1e6d1c66..f7a3ffa744b3 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.h
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.h
@@ -20,8 +20,8 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  */
-#ifndef __VEGA10_BOCO_H__
-#define __VEGA10_BOCO_H__
+#ifndef __VEGA10_BACO_H__
+#define __VEGA10_BACO_H__
 #include "hwmgr.h"
 #include "common_baco.h"
 
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.h 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.h
index c51988a9ed77..51c7f8392925 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.h
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.h
@@ -20,8 +20,8 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  */
-#ifndef __VEGA20_BOCO_H__
-#define __VEGA20_BOCO_H__
+#ifndef __VEGA20_BACO_H__
+#define __VEGA20_BACO_H__
 #include "hwmgr.h"
 #include "common_baco.h"
 
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 5/6] drm/amdgpu/powerplay: split out common smu9 BACO code

2019-02-15 Thread Alex Deucher via amd-gfx

Several of the BACO functions are common across smu9-based
asics.  Split the common code out.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/Makefile  |  2 +-
 .../gpu/drm/amd/powerplay/hwmgr/smu9_baco.c   | 66 +++
 .../gpu/drm/amd/powerplay/hwmgr/smu9_baco.h   | 31 +
 .../gpu/drm/amd/powerplay/hwmgr/vega10_baco.c | 39 +--
 .../gpu/drm/amd/powerplay/hwmgr/vega10_baco.h |  5 +-
 .../drm/amd/powerplay/hwmgr/vega10_hwmgr.c|  4 +-
 .../gpu/drm/amd/powerplay/hwmgr/vega12_baco.c | 39 +--
 .../gpu/drm/amd/powerplay/hwmgr/vega12_baco.h |  5 +-
 .../drm/amd/powerplay/hwmgr/vega12_hwmgr.c|  4 +-
 9 files changed, 106 insertions(+), 89 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/powerplay/hwmgr/smu9_baco.c
 create mode 100644 drivers/gpu/drm/amd/powerplay/hwmgr/smu9_baco.h

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/Makefile 
b/drivers/gpu/drm/amd/powerplay/hwmgr/Makefile
index d1adf68f4c64..cc63705920dc 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/Makefile
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/Makefile
@@ -36,7 +36,7 @@ HARDWARE_MGR = hwmgr.o processpptables.o \
pp_overdriver.o smu_helper.o \
vega20_processpptables.o vega20_hwmgr.o vega20_powertune.o \
vega20_thermal.o common_baco.o vega10_baco.o  vega20_baco.o \
-   vega12_baco.o
+   vega12_baco.o smu9_baco.o
 
 AMD_PP_HWMGR = $(addprefix $(AMD_PP_PATH)/hwmgr/,$(HARDWARE_MGR))
 
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu9_baco.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/smu9_baco.c
new file mode 100644
index ..de0a37f7c632
--- /dev/null
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu9_baco.c
@@ -0,0 +1,66 @@
+/*
+ * Copyright 2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include "amdgpu.h"
+#include "soc15.h"
+#include "soc15_hw_ip.h"
+#include "vega10_ip_offset.h"
+#include "soc15_common.h"
+#include "vega10_inc.h"
+#include "smu9_baco.h"
+
+int smu9_baco_get_capability(struct pp_hwmgr *hwmgr, bool *cap)
+{
+   struct amdgpu_device *adev = (struct amdgpu_device *)(hwmgr->adev);
+   uint32_t reg, data;
+
+   *cap = false;
+   if (!phm_cap_enabled(hwmgr->platform_descriptor.platformCaps, 
PHM_PlatformCaps_BACO))
+   return 0;
+
+   WREG32(0x12074, 0xFFF0003B);
+   data = RREG32(0x12075);
+
+   if (data == 0x1) {
+   reg = RREG32_SOC15(NBIF, 0, mmRCC_BIF_STRAP0);
+
+   if (reg & RCC_BIF_STRAP0__STRAP_PX_CAPABLE_MASK)
+   *cap = true;
+   }
+
+   return 0;
+}
+
+int smu9_baco_get_state(struct pp_hwmgr *hwmgr, enum BACO_STATE *state)
+{
+   struct amdgpu_device *adev = (struct amdgpu_device *)(hwmgr->adev);
+   uint32_t reg;
+
+   reg = RREG32_SOC15(NBIF, 0, mmBACO_CNTL);
+
+   if (reg & BACO_CNTL__BACO_MODE_MASK)
+   /* gfx has already entered BACO state */
+   *state = BACO_STATE_IN;
+   else
+   *state = BACO_STATE_OUT;
+   return 0;
+}
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu9_baco.h 
b/drivers/gpu/drm/amd/powerplay/hwmgr/smu9_baco.h
new file mode 100644
index ..84e90f801ac3
--- /dev/null
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu9_baco.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright 2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be inclu

[PATCH 4/6] drm/amdgpu/powerplay: add BACO support for vega12

2019-02-15 Thread Alex Deucher via amd-gfx

This implements BACO (Bus Active, Chip Off) support
for vega12.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/Makefile  |   3 +-
 .../gpu/drm/amd/powerplay/hwmgr/vega12_baco.c | 156 ++
 .../gpu/drm/amd/powerplay/hwmgr/vega12_baco.h |  32 
 .../drm/amd/powerplay/hwmgr/vega12_hwmgr.c|   5 +
 .../gpu/drm/amd/powerplay/hwmgr/vega12_inc.h  |   2 +
 5 files changed, 197 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/powerplay/hwmgr/vega12_baco.c
 create mode 100644 drivers/gpu/drm/amd/powerplay/hwmgr/vega12_baco.h

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/Makefile 
b/drivers/gpu/drm/amd/powerplay/hwmgr/Makefile
index 0b3c6d1d52e4..d1adf68f4c64 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/Makefile
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/Makefile
@@ -35,7 +35,8 @@ HARDWARE_MGR = hwmgr.o processpptables.o \
vega12_thermal.o \
pp_overdriver.o smu_helper.o \
vega20_processpptables.o vega20_hwmgr.o vega20_powertune.o \
-   vega20_thermal.o common_baco.o vega10_baco.o  vega20_baco.o
+   vega20_thermal.o common_baco.o vega10_baco.o  vega20_baco.o \
+   vega12_baco.o
 
 AMD_PP_HWMGR = $(addprefix $(AMD_PP_PATH)/hwmgr/,$(HARDWARE_MGR))
 
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega12_baco.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega12_baco.c
new file mode 100644
index ..c2cc15385012
--- /dev/null
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega12_baco.c
@@ -0,0 +1,156 @@
+/*
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include "amdgpu.h"
+#include "soc15.h"
+#include "soc15_hw_ip.h"
+#include "vega10_ip_offset.h"
+#include "soc15_common.h"
+#include "vega12_inc.h"
+#include "vega12_ppsmc.h"
+#include "vega12_baco.h"
+
+static const struct soc15_baco_cmd_entry  pre_baco_tbl[] =
+{
+   { CMD_READMODIFYWRITE, NBIF_HWID, 0, mmBIF_DOORBELL_CNTL_BASE_IDX, 
mmBIF_DOORBELL_CNTL, BIF_DOORBELL_CNTL__DOORBELL_MONITOR_EN_MASK, 
BIF_DOORBELL_CNTL__DOORBELL_MONITOR_EN__SHIFT, 0, 0 },
+   { CMD_WRITE, NBIF_HWID, 0, mmBIF_FB_EN_BASE_IDX, mmBIF_FB_EN, 0, 0, 0, 
0 },
+   { CMD_READMODIFYWRITE, NBIF_HWID, 0, mmRCC_BACO_CNTL_MISC_BASE_IDX, 
mmBACO_CNTL, BACO_CNTL__BACO_DSTATE_BYPASS_MASK, 
BACO_CNTL__BACO_DSTATE_BYPASS__SHIFT, 0, 1 },
+   { CMD_READMODIFYWRITE, NBIF_HWID, 0, mmRCC_BACO_CNTL_MISC_BASE_IDX, 
mmBACO_CNTL, BACO_CNTL__BACO_RST_INTR_MASK_MASK, 
BACO_CNTL__BACO_RST_INTR_MASK__SHIFT, 0, 1 }
+};
+
+static const struct soc15_baco_cmd_entry enter_baco_tbl[] =
+{
+   { CMD_WAITFOR, THM_HWID, 0, mmTHM_BACO_CNTL_BASE_IDX, mmTHM_BACO_CNTL, 
THM_BACO_CNTL__SOC_DOMAIN_IDLE_MASK, THM_BACO_CNTL__SOC_DOMAIN_IDLE__SHIFT, 
0x, 0x8000 },
+   { CMD_READMODIFYWRITE, NBIF_HWID, 0, mmRCC_BACO_CNTL_MISC_BASE_IDX, 
mmBACO_CNTL, BACO_CNTL__BACO_EN_MASK, BACO_CNTL__BACO_EN__SHIFT, 0, 1 },
+   { CMD_READMODIFYWRITE, NBIF_HWID, 0, mmRCC_BACO_CNTL_MISC_BASE_IDX, 
mmBACO_CNTL, BACO_CNTL__BACO_BIF_LCLK_SWITCH_MASK, 
BACO_CNTL__BACO_BIF_LCLK_SWITCH__SHIFT, 0, 1 },
+   { CMD_READMODIFYWRITE, NBIF_HWID, 0, mmRCC_BACO_CNTL_MISC_BASE_IDX, 
mmBACO_CNTL, BACO_CNTL__BACO_DUMMY_EN_MASK, BACO_CNTL__BACO_DUMMY_EN__SHIFT, 0, 
1 },
+   { CMD_READMODIFYWRITE, THM_HWID, 0, mmTHM_BACO_CNTL_BASE_IDX, 
mmTHM_BACO_CNTL, THM_BACO_CNTL__BACO_SOC_VDCI_RESET_MASK, 
THM_BACO_CNTL__BACO_SOC_VDCI_RESET__SHIFT, 0, 1 },
+   { CMD_READMODIFYWRITE, THM_HWID, 0, mmTHM_BACO_CNTL_BASE_IDX, 
mmTHM_BACO_CNTL, THM_BACO_CNTL__BACO_SMNCLK_MUX_MASK, 
THM_BACO_CNTL__BACO_SMNCLK_MUX__SHIFT, 0, 1 },
+   { CMD_READMODIFYWRITE, THM_HWID, 0, mmTHM_BACO_CNTL_BASE_IDX, 
mmTHM_BACO_CNTL, THM_BACO_CNTL__BACO_ISO_EN_MASK, 
THM_BACO_CNTL__BACO_ISO_EN__SHIFT, 0, 1 },
+   { CMD_READMODIFYWRITE, THM_HWID, 0, mmTHM_BACO_CNTL_BASE_IDX, 
mmTHM_BACO_CNTL, THM_BACO_CNTL__BACO_AEB_ISO

[PATCH 1/6] drm/amdgpu: add missing license on baco files

2019-02-15 Thread Alex Deucher via amd-gfx

Trivial.

Signed-off-by: Alex Deucher 
---
 .../gpu/drm/amd/powerplay/hwmgr/vega10_baco.c | 22 +++
 .../gpu/drm/amd/powerplay/hwmgr/vega20_baco.c | 22 +++
 2 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.c
index f94dab27f486..d5232110ec84 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.c
@@ -1,3 +1,25 @@
+/*
+ * Copyright 2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
 #include "amdgpu.h"
 #include "soc15.h"
 #include "soc15_hw_ip.h"
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.c
index 0d883b358df2..edf00da8424b 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.c
@@ -1,3 +1,25 @@
+/*
+ * Copyright 2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
 #include "amdgpu.h"
 #include "soc15.h"
 #include "soc15_hw_ip.h"
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/6] drm/amdgpu/powerplay: fix return codes in BACO code

2019-02-15 Thread Alex Deucher via amd-gfx

Use a proper return code rather than -1.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.c | 4 ++--
 drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.c | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.c
index d5232110ec84..7337be5602e4 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.c
@@ -136,7 +136,7 @@ int vega10_baco_set_state(struct pp_hwmgr *hwmgr, enum 
BACO_STATE state)
if (soc15_baco_program_registers(hwmgr, pre_baco_tbl,
 ARRAY_SIZE(pre_baco_tbl))) {
if (smum_send_msg_to_smc(hwmgr, PPSMC_MSG_EnterBaco))
-   return -1;
+   return -EINVAL;
 
if (soc15_baco_program_registers(hwmgr, enter_baco_tbl,
   ARRAY_SIZE(enter_baco_tbl)))
@@ -154,5 +154,5 @@ int vega10_baco_set_state(struct pp_hwmgr *hwmgr, enum 
BACO_STATE state)
}
}
 
-   return -1;
+   return -EINVAL;
 }
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.c
index edf00da8424b..5e8602a79b1c 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.c
@@ -89,14 +89,14 @@ int vega20_baco_set_state(struct pp_hwmgr *hwmgr, enum 
BACO_STATE state)
 
 
if(smum_send_msg_to_smc_with_parameter(hwmgr, 
PPSMC_MSG_EnterBaco, 0))
-   return -1;
+   return -EINVAL;
 
} else if (state == BACO_STATE_OUT) {
if (smum_send_msg_to_smc(hwmgr, PPSMC_MSG_ExitBaco))
-   return -1;
+   return -EINVAL;
if (!soc15_baco_program_registers(hwmgr, clean_baco_tbl,
 
ARRAY_SIZE(clean_baco_tbl)))
-   return -1;
+   return -EINVAL;
}
 
return 0;
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH RFC] drm: add func to better detect wether swiotlb is needed

2019-02-15 Thread Michael D Labriola via amd-gfx

This commit fixes DRM failures on Xen PV systems that were introduced in
v4.17 by the following commits:

82626363 drm: add func to get max iomem address v2
fd5fd480 drm/amdgpu: only enable swiotlb alloc when need v2
1bc3d3cc drm/radeon: only enable swiotlb path when need v2

The introduction of ->need_swiotlb to the ttm_dma_populate() conditionals
in the radeon and amdgpu device drivers causes Gnome to immediately crash
on Xen PV systems, returning the user to the login screen.  The following
kernel errors get logged:

[   28.554259] radeon_dp_aux_transfer_native: 200 callbacks suppressed
[   31.219821] radeon :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[   31.220030] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to 
allocate GEM object (16384000, 2, 4096, -14)
[   31.226109] radeon :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[   31.226300] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to 
allocate GEM object (16384000, 2, 4096, -14)
[   31.300734] gnome-shell[1935]: segfault at 88 ip 7f39151cd904 sp 
7ffc97611ad8 error 4 in libmutter-cogl.so[7f3915178000+aa000]
[   31.300745] Code: 5f c3 0f 1f 40 00 48 8b 47 78 48 8b 40 40 ff e0 66 0f 1f 
44 00 00 48 8b 47 78 48 8b 40 48 ff e0 66 0f 1f 44 00 00 48 8b 47 78 <48> 8b 80 
88 00 00 00 ff e0 0f 1f 00 48 8b 47 78 48 8b 40 68 ff e0
[   38.193302] radeon_dp_aux_transfer_native: 116 callbacks suppressed
[   40.009317] radeon :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[   40.009488] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to 
allocate GEM object (16384000, 2, 4096, -14)
[   40.015114] radeon :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[   40.015297] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to 
allocate GEM object (16384000, 2, 4096, -14)
[   40.028302] gnome-shell[2431]: segfault at 2dadf40 ip 02dadf40 sp 
7ffcd24ea5f8 error 15
[   40.028306] Code: 20 6e 31 00 00 00 00 00 00 00 00 37 e3 3d 2d 7f 00 00 80 
f4 e6 3d 2d 7f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 
00 00 00 00 00 c1 00 00 00 00 00 00 00 80 e1 d2 03 00 00

This commit renames drm_get_max_iomem() to drm_need_swiotlb(), adds a
xen_pv_domain() check to it, and moves the bit shifting comparison that
always follows its usage into the function (simplifying the drm driver
code).
---
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  2 +-
 drivers/gpu/drm/drm_memory.c   | 19 ---
 drivers/gpu/drm/radeon/radeon_device.c |  2 +-
 include/drm/drm_cache.h|  2 +-
 6 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index 910c4ce..6bc0266 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -1029,7 +1029,7 @@ static int gmc_v7_0_sw_init(void *handle)
pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
pr_warn("amdgpu: No coherent DMA available\n");
}
-   adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
+   adev->need_swiotlb = drm_need_swiotlb(dma_bits);
 
r = gmc_v7_0_init_microcode(adev);
if (r) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 747c068..8638adf 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -1155,7 +1155,7 @@ static int gmc_v8_0_sw_init(void *handle)
pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
pr_warn("amdgpu: No coherent DMA available\n");
}
-   adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
+   adev->need_swiotlb = drm_need_swiotlb(dma_bits);
 
r = gmc_v8_0_init_microcode(adev);
if (r) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index f35d7a5..4f67093 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -989,7 +989,7 @@ static int gmc_v9_0_sw_init(void *handle)
pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
printk(KERN_WARNING "amdgpu: No coherent DMA available.\n");
}
-   adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
+   adev->need_swiotlb = drm_need_swiotlb(dma_bits);
 
if (adev->asic_type == CHIP_VEGA20) {
r = gfxhub_v1_1_get_xgmi_info(adev);
diff --git a/drivers/gpu/drm/drm_memory.c b/drivers/gpu/drm/drm_memory.c
index d69e4fc..6af59a6 100644
--- a/drivers/gpu/drm/drm_memory.c
+++ b/drivers/gpu/drm/drm_memory.c
@@ -35,6 +35,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include "drm_legacy.h"
 
@@ -150,15 +151,27 @@ void drm_legacy_ioremapfree(struct drm_local_map *map, 
struct drm_device *dev)
 }
 EXPORT_SYMBOL(d

[PATCH] drm/radeon/evergreen_cs: fix missing break in switch statement

2019-02-15 Thread Gustavo A. R. Silva

Add missing break statement in order to prevent the code from falling
through to case CB_TARGET_MASK.

This bug was found thanks to the ongoing efforts to enable
-Wimplicit-fallthrough.

Fixes: dd220a00e8bd ("drm/radeon/kms: add support for streamout v7")
Cc: sta...@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva 
---
NOTE: Notice that this code has been out there since 2012. So, it
would be helpful if someone can double-check this.

 drivers/gpu/drm/radeon/evergreen_cs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
b/drivers/gpu/drm/radeon/evergreen_cs.c
index f471537c852f..1e14c6921454 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -1299,6 +1299,7 @@ static int evergreen_cs_handle_reg(struct 
radeon_cs_parser *p, u32 reg, u32 idx)
return -EINVAL;
}
ib[idx] += (u32)((reloc->gpu_offset >> 8) & 0x);
+   break;
case CB_TARGET_MASK:
track->cb_target_mask = radeon_get_ib_value(p, idx);
track->cb_dirty = true;
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/radeon/si_dpm: Mark expected switch fall-throughs

2019-02-15 Thread Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/gpu/drm/radeon/si_dpm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/radeon/si_dpm.c b/drivers/gpu/drm/radeon/si_dpm.c
index 0a785ef0ab66..c9f6cb77e857 100644
--- a/drivers/gpu/drm/radeon/si_dpm.c
+++ b/drivers/gpu/drm/radeon/si_dpm.c
@@ -5762,10 +5762,12 @@ static void 
si_request_link_speed_change_before_state_change(struct radeon_devic
si_pi->force_pcie_gen = RADEON_PCIE_GEN2;
if (current_link_speed == RADEON_PCIE_GEN2)
break;
+   /* fall through */
case RADEON_PCIE_GEN2:
if (radeon_acpi_pcie_performance_request(rdev, 
PCIE_PERF_REQ_PECI_GEN2, false) == 0)
break;
 #endif
+   /* fall through */
default:
si_pi->force_pcie_gen = si_get_current_pcie_speed(rdev);
break;
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu/si_dpm: Mark expected switch fall-throughs

2019-02-15 Thread Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/gpu/drm/amd/amdgpu/si_dpm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/si_dpm.c 
b/drivers/gpu/drm/amd/amdgpu/si_dpm.c
index da58040fdbdc..41e01a7f57a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/si_dpm.c
+++ b/drivers/gpu/drm/amd/amdgpu/si_dpm.c
@@ -6216,10 +6216,12 @@ static void 
si_request_link_speed_change_before_state_change(struct amdgpu_devic
si_pi->force_pcie_gen = AMDGPU_PCIE_GEN2;
if (current_link_speed == AMDGPU_PCIE_GEN2)
break;
+   /* fall through */
case AMDGPU_PCIE_GEN2:
if (amdgpu_acpi_pcie_performance_request(adev, 
PCIE_PERF_REQ_PECI_GEN2, false) == 0)
break;
 #endif
+   /* fall through */
default:
si_pi->force_pcie_gen = si_get_current_pcie_speed(adev);
break;
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/radeon/ci_dpm: Mark expected switch fall-throughs

2019-02-15 Thread Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/gpu/drm/radeon/ci_dpm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/radeon/ci_dpm.c b/drivers/gpu/drm/radeon/ci_dpm.c
index a97294ac96d5..a12439266bb0 100644
--- a/drivers/gpu/drm/radeon/ci_dpm.c
+++ b/drivers/gpu/drm/radeon/ci_dpm.c
@@ -4869,10 +4869,12 @@ static void 
ci_request_link_speed_change_before_state_change(struct radeon_devic
pi->force_pcie_gen = RADEON_PCIE_GEN2;
if (current_link_speed == RADEON_PCIE_GEN2)
break;
+   /* fall through */
case RADEON_PCIE_GEN2:
if (radeon_acpi_pcie_performance_request(rdev, 
PCIE_PERF_REQ_PECI_GEN2, false) == 0)
break;
 #endif
+   /* fall through */
default:
pi->force_pcie_gen = ci_get_current_pcie_speed(rdev);
break;
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu/powerplay/polaris10_smumgr: Mark expected switch fall-through

2019-02-15 Thread Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c 
b/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
index 52abca065764..92de1bbb2e33 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
@@ -2330,6 +2330,7 @@ static uint32_t polaris10_get_offsetof(uint32_t type, 
uint32_t member)
case DRAM_LOG_BUFF_SIZE:
return offsetof(SMU74_SoftRegisters, 
DRAM_LOG_BUFF_SIZE);
}
+   /* fall through */
case SMU_Discrete_DpmTable:
switch (member) {
case UvdBootLevel:
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/powerplay: Get fix clock info when dpm is disabled for the clock

2019-02-15 Thread Liu, Shaoyun

When DPM for the specific clock is difabled, driver should still able to
get fix clock info from the pptable

Change-Id: Ic609203b3b87aa75b0cfd57b57717b3bb89daf48
Signed-off-by: shaoyunl 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 16 
 1 file changed, 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
index aad79aff..2eae0b4 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
@@ -2641,10 +2641,6 @@ static int vega20_get_sclks(struct pp_hwmgr *hwmgr,
struct vega20_single_dpm_table *dpm_table = 
&(data->dpm_table.gfx_table);
int i, count;
 
-   PP_ASSERT_WITH_CODE(data->smu_features[GNLD_DPM_GFXCLK].enabled,
-   "[GetSclks]: gfxclk dpm not enabled!\n",
-   return -EPERM);
-
count = (dpm_table->count > MAX_NUM_CLOCKS) ? MAX_NUM_CLOCKS : 
dpm_table->count;
clocks->num_levels = count;
 
@@ -2670,10 +2666,6 @@ static int vega20_get_memclocks(struct pp_hwmgr *hwmgr,
struct vega20_single_dpm_table *dpm_table = 
&(data->dpm_table.mem_table);
int i, count;
 
-   PP_ASSERT_WITH_CODE(data->smu_features[GNLD_DPM_UCLK].enabled,
-   "[GetMclks]: uclk dpm not enabled!\n",
-   return -EPERM);
-
count = (dpm_table->count > MAX_NUM_CLOCKS) ? MAX_NUM_CLOCKS : 
dpm_table->count;
clocks->num_levels = data->mclk_latency_table.count = count;
 
@@ -2696,10 +2688,6 @@ static int vega20_get_dcefclocks(struct pp_hwmgr *hwmgr,
struct vega20_single_dpm_table *dpm_table = 
&(data->dpm_table.dcef_table);
int i, count;
 
-   PP_ASSERT_WITH_CODE(data->smu_features[GNLD_DPM_DCEFCLK].enabled,
-   "[GetDcfclocks]: dcefclk dpm not enabled!\n",
-   return -EPERM);
-
count = (dpm_table->count > MAX_NUM_CLOCKS) ? MAX_NUM_CLOCKS : 
dpm_table->count;
clocks->num_levels = count;
 
@@ -2719,10 +2707,6 @@ static int vega20_get_socclocks(struct pp_hwmgr *hwmgr,
struct vega20_single_dpm_table *dpm_table = 
&(data->dpm_table.soc_table);
int i, count;
 
-   PP_ASSERT_WITH_CODE(data->smu_features[GNLD_DPM_SOCCLK].enabled,
-   "[GetSocclks]: socclk dpm not enabled!\n",
-   return -EPERM);
-
count = (dpm_table->count > MAX_NUM_CLOCKS) ? MAX_NUM_CLOCKS : 
dpm_table->count;
clocks->num_levels = count;
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/4] drm/amdkfd: Move a constant definition around

2019-02-15 Thread Zhao, Yong

Pushed. Thanks.

On 2019-02-15 4:14 p.m., Kuehling, Felix wrote:
> The series is Reviewed-by: Felix Kuehling 
>
> On 2019-02-14 6:45 p.m., Zhao, Yong wrote:
>> The similar definitions should be consecutive.
>>
>> Change-Id: I936cf076363e641c60e0704d8405ae9493718e18
>> Signed-off-by: Yong Zhao P
>> ---
>>drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 11 ++-
>>1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> index 12b66330fc6d..e5ebcca7f031 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> @@ -97,17 +97,18 @@
>>#define KFD_CWSR_TBA_TMA_SIZE (PAGE_SIZE * 2)
>>#define KFD_CWSR_TMA_OFFSET PAGE_SIZE
>>
>> +#define KFD_MAX_NUM_OF_QUEUES_PER_DEVICE\
>> +(KFD_MAX_NUM_OF_PROCESSES * \
>> +KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
>> +
>> +#define KFD_KERNEL_QUEUE_SIZE 2048
>> +
>>/*
>> * Kernel module parameter to specify maximum number of supported queues 
>> per
>> * device
>> */
>>extern int max_num_of_queues_per_device;
>>
>> -#define KFD_MAX_NUM_OF_QUEUES_PER_DEVICE\
>> -(KFD_MAX_NUM_OF_PROCESSES * \
>> -KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
>> -
>> -#define KFD_KERNEL_QUEUE_SIZE 2048
>>
>>/* Kernel module parameter to specify the scheduling policy */
>>extern int sched_policy;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/4] drm/amdkfd: Move a constant definition around

2019-02-15 Thread Kuehling, Felix

The series is Reviewed-by: Felix Kuehling 

On 2019-02-14 6:45 p.m., Zhao, Yong wrote:
> The similar definitions should be consecutive.
>
> Change-Id: I936cf076363e641c60e0704d8405ae9493718e18
> Signed-off-by: Yong Zhao 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 11 ++-
>   1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 12b66330fc6d..e5ebcca7f031 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -97,17 +97,18 @@
>   #define KFD_CWSR_TBA_TMA_SIZE (PAGE_SIZE * 2)
>   #define KFD_CWSR_TMA_OFFSET PAGE_SIZE
>   
> +#define KFD_MAX_NUM_OF_QUEUES_PER_DEVICE \
> + (KFD_MAX_NUM_OF_PROCESSES * \
> + KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
> +
> +#define KFD_KERNEL_QUEUE_SIZE 2048
> +
>   /*
>* Kernel module parameter to specify maximum number of supported queues per
>* device
>*/
>   extern int max_num_of_queues_per_device;
>   
> -#define KFD_MAX_NUM_OF_QUEUES_PER_DEVICE \
> - (KFD_MAX_NUM_OF_PROCESSES * \
> - KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
> -
> -#define KFD_KERNEL_QUEUE_SIZE 2048
>   
>   /* Kernel module parameter to specify the scheduling policy */
>   extern int sched_policy;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 06/11] drm/syncobj: add timeline payload query ioctl v4

2019-02-15 Thread Lionel Landwerlin via amd-gfx


On 07/12/2018 09:55, Chunming Zhou wrote:

user mode can query timeline payload.
v2: check return value of copy_to_user
v3: handle querying entry by entry
v4: rebase on new chain container, simplify interface

Signed-off-by: Chunming Zhou 
Cc: Daniel Rakos 
Cc: Jason Ekstrand 
Cc: Bas Nieuwenhuizen 
Cc: Dave Airlie 
Cc: Christian König 
Cc: Chris Wilson 
---
  drivers/gpu/drm/drm_internal.h |  2 ++
  drivers/gpu/drm/drm_ioctl.c|  2 ++
  drivers/gpu/drm/drm_syncobj.c  | 43 ++
  include/uapi/drm/drm.h | 10 
  4 files changed, 57 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 18b41e10195c..dab4d5936441 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -184,6 +184,8 @@ int drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
struct drm_file *file_private);
  int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private);
  
  /* drm_framebuffer.c */

  void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index a9a17ed35cc4..7578ef6dc1d1 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -681,6 +681,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 
DRM_MASTER|DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 348079bb0965..f97fa00ca1d0 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1061,3 +1061,46 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
  
  	return ret;

  }
+
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private)
+{
+   struct drm_syncobj_timeline_array *args = data;
+   struct drm_syncobj **syncobjs;
+   uint64_t __user *points = u64_to_user_ptr(args->points);
+   uint32_t i;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   return -ENODEV;
+
+   if (args->pad != 0)
+   return -EINVAL;
+
+   if (args->count_handles == 0)
+   return -EINVAL;
+
+   ret = drm_syncobj_array_find(file_private,
+u64_to_user_ptr(args->handles),
+args->count_handles,
+&syncobjs);
+   if (ret < 0)
+   return ret;
+
+   for (i = 0; i < args->count_handles; i++) {
+   struct dma_fence_chain *chain;
+   struct dma_fence *fence;
+   uint64_t point;
+
+   fence = drm_syncobj_fence_get(syncobjs[i]);
+   chain = to_dma_fence_chain(fence);
+   point = chain ? fence->seqno : 0;



Sorry, I don' t want to sound annoying, but this looks like this could 
report values going backward.


Anything add a point X to a timeline that has reached value Y with X < Y 
would trigger that.


Either through the submission or userspace signaling or importing 
another syncpoint's fence.



-Lionel



+   ret = copy_to_user(&points[i], &point, sizeof(uint64_t));
+   ret = ret ? -EFAULT : 0;
+   if (ret)
+   break;
+   }
+   drm_syncobj_array_free(syncobjs, args->count_handles);
+
+   return ret;
+}
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 0092111d002c..b2c36f2b2599 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -767,6 +767,14 @@ struct drm_syncobj_array {
__u32 pad;
  };
  
+struct drm_syncobj_timeline_array {

+   __u64 handles;
+   __u64 points;
+   __u32 count_handles;
+   __u32 pad;
+};
+
+
  /* Query current scanout sequence number */
  struct drm_crtc_get_sequence {
__u32 crtc_id;  /* requested crtc_id */
@@ -924,6 +932,8 @@ extern "C" {
  #define DRM_IOCTL_MODE_REVOKE_LEASE   DRM_IOWR(0xC9, struct 
drm_mode_revoke_lease)
  
  #define DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT	DRM_IOWR(0xCA, struct drm_syncobj_timeline_wait)

+#define DRM_IOCTL_SYNCOBJ_QUERYDRM_IOWR(0xCB, struct 
drm_

Re: [PATCH 02/11] dma-buf: add new dma_fence_chain container v4

2019-02-15 Thread Jason Ekstrand

On Fri, Feb 15, 2019 at 12:33 PM Koenig, Christian 
wrote:

> Am 15.02.19 um 19:16 schrieb Jason Ekstrand:
>
> On Fri, Feb 15, 2019 at 11:51 AM Christian König <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
>> Am 15.02.19 um 17:49 schrieb Jason Ekstrand:
>>
>> On Fri, Feb 15, 2019 at 9:52 AM Lionel Landwerlin via dri-devel <
>> dri-de...@lists.freedesktop.org> wrote:
>>
>>> On 15/02/2019 14:32, Koenig, Christian wrote:
>>> > Am 15.02.19 um 15:23 schrieb Lionel Landwerlin:
>>> >> Hi Christian, David,
>>> >>
>>> >> For timeline semaphore we need points to signaled in order.
>>> >> I'm struggling to understand how this fence-chain implementation
>>> >> preserves ordering of the seqnos.
>>> >>
>>> >> One of the scenario I can see an issue happening is when you have a
>>> >> timeline with points 1 & 2 and userspace submits for 2 different
>>> >> engines :
>>> >>  - first with let's say a blitter style engine on point 2
>>> >>  - then a 3d style engine on point 1
>>> > Yeah, and where exactly is the problem?
>>> >
>>> > Seqno 1 will signal when the 3d style engine finishes work.
>>> >
>>> > And seqno 2 will signal when both seqno 1 is signaled and the blitter
>>> > style engine has finished its work.
>>>
>>
>> That's an interesting interpretation of the spec.  I think it's legal and
>> I could see that behavior may be desirable in some ways.
>>
>>
>> Well we actually had this discussion multiple times now, both internally
>> as well as on the mailing list. Please also see the previous mails with
>> Daniel on this topic.
>>
>
> I dug through dri-devel and read everything I could find with a search for
> "timeline semaphore"  I didn't find all that much but this did come up once.
>
>
> Need to dig through my mails as well, that was back in November/December
> last year.
>
>
>
>> My initial suggestion was actually to exactly what Leonid suggested as
>> well.
>>
>> And following this I used a rather simple container for the
>> implementation, e.g. just a ring buffer indexed by the sequence number. In
>> this scenario userspace can specify on syncobj creation time how big the
>> window for sequence numbers should be, e.g. in this implementation how big
>> the ring buffer would be.
>>
>> This was rejected by our guys who actually wrote a good part of the
>> Vulkan specification. Daniel then has gone into the same direction during
>> the public discussion.
>>
>
> I agree with whoever said that specifying a ringbuffer size is
> unacceptable.  I'm not really sure how that's relevant though.  Is a
> ringbuffer required to implement the behavior that is being suggested
> here?  Genuine question; I'm trying to get back up to speed.
>
>
> Using a ring buffer was just an example how we could do it if we follow my
> and Lionel's suggestion.
>
> Key point is that we could simplify the implementation massively if
> sequence numbers don't need to depend on each other.
>
> In other words we just see the syncobj as container where fences are added
> and retrieved from instead of something actively involved in the signaling.
>

In principal, I think this is a reasonable argument.  Having it involved in
signalling doesn't seem terrible to me but it would mean that a driver
wouldn't be able to detect that the fence it's waiting on actually belongs
to itself and optimize things.


> Main reason we didn't do it this way is because the AMD Vulkan team has
> rejected this approach.
>

Clearly, there's not quite as much agreement as I'd thought there was.  Oh,
well, that's why we have these discussions.


> Additional to that chaining sequence numbers together is really the more
> defensive approach, e.g. it is less likely that applications can shoot
> themselves in the foot.
>

Yeah, I can see how the "everything prior to n must be signalled" could be
safer.  I think both wait-any and wait-all have their ups and downs.  It
just took me by surprise.


>
>  4. If you do get into a sticky situation, you can unblock an entire
>> timeline by using the CPU signal ioctl to set it to a high value.
>>
>>
>> Well I think that this could be problematic as well. Keep in mind that
>> main use case for this is sharing timelines between processes.
>>
>> In other words you don't want applications to be able to mess with it to
>> much.
>>
>
> Cross-process is exactly why you want it.  Suppose you're a compositor and
> you have a timeline shared with another application and you've submitted
> work which waits on it.  Then you get a notification somehow (SIGHUP?) that
> the client has died leaving you hanging.  What do you do?  You take the
> semaphore that's shared with you and the client and whack it to UINT64_MAX
> to unblock yourself.  Of course, this can be abused and that's always the
> risk you take with timelines.
>
>
> My last status is that basically everybody agrees now that wait before
> signal in the kernel is forbidden.
>

Agreed.  I'm not saying that wait before signal in the kernel should be a
thing.  I think we're all agreed that wa

[PATCH] drm/amd/display/dce_mem_input: Mark expected switch fall-through

2019-02-15 Thread Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

Warning level 3 was used: -Wimplicit-fallthrough=3

Notice that, in this particular case, the code comment is modified
in accordance with what GCC is expecting to find.

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c 
b/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c
index 85686d917636..a24a2bda8656 100644
--- a/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c
+++ b/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c
@@ -479,7 +479,7 @@ static void program_grph_pixel_format(
case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F:
sign = 1;
floating = 1;
-   /* no break */
+   /* fall through */
case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: /* shouldn't this get 
float too? */
case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616:
grph_depth = 3;
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/powerplay/smu7_hwmgr: Mark expected switch fall-throughs

2019-02-15 Thread Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
index c8f5c00dd1e7..48187acac59e 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
@@ -3681,10 +3681,12 @@ static int 
smu7_request_link_speed_change_before_state_change(
data->force_pcie_gen = PP_PCIEGen2;
if (current_link_speed == PP_PCIEGen2)
break;
+   /* fall through */
case PP_PCIEGen2:
if (0 == 
amdgpu_acpi_pcie_performance_request(hwmgr->adev, PCIE_PERF_REQ_GEN2, false))
break;
 #endif
+   /* fall through */
default:
data->force_pcie_gen = 
smu7_get_current_pcie_speed(hwmgr);
break;
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/display/dc/bios_parser2: Mark expected switch fall-throughs

2019-02-15 Thread Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c 
b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
index a1c56f29cfeb..fd5266a58297 100644
--- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
+++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
@@ -265,6 +265,7 @@ static struct atom_display_object_path_v2 *get_bios_object(
&& id.enum_id == obj_id.enum_id)
return 
&bp->object_info_tbl.v1_4->display_path[i];
}
+   /* fall through */
case OBJECT_TYPE_CONNECTOR:
case OBJECT_TYPE_GENERIC:
/* Both Generic and Connector Object ID
@@ -277,6 +278,7 @@ static struct atom_display_object_path_v2 *get_bios_object(
&& id.enum_id == obj_id.enum_id)
return 
&bp->object_info_tbl.v1_4->display_path[i];
}
+   /* fall through */
default:
return NULL;
}
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 02/11] dma-buf: add new dma_fence_chain container v4

2019-02-15 Thread Koenig, Christian

Am 15.02.19 um 19:16 schrieb Jason Ekstrand:
On Fri, Feb 15, 2019 at 11:51 AM Christian König 
mailto:ckoenig.leichtzumer...@gmail.com>> 
wrote:
Am 15.02.19 um 17:49 schrieb Jason Ekstrand:
On Fri, Feb 15, 2019 at 9:52 AM Lionel Landwerlin via dri-devel 
mailto:dri-de...@lists.freedesktop.org>> wrote:
On 15/02/2019 14:32, Koenig, Christian wrote:
> Am 15.02.19 um 15:23 schrieb Lionel Landwerlin:
>> Hi Christian, David,
>>
>> For timeline semaphore we need points to signaled in order.
>> I'm struggling to understand how this fence-chain implementation
>> preserves ordering of the seqnos.
>>
>> One of the scenario I can see an issue happening is when you have a
>> timeline with points 1 & 2 and userspace submits for 2 different
>> engines :
>>  - first with let's say a blitter style engine on point 2
>>  - then a 3d style engine on point 1
> Yeah, and where exactly is the problem?
>
> Seqno 1 will signal when the 3d style engine finishes work.
>
> And seqno 2 will signal when both seqno 1 is signaled and the blitter
> style engine has finished its work.

That's an interesting interpretation of the spec.  I think it's legal and I 
could see that behavior may be desirable in some ways.

Well we actually had this discussion multiple times now, both internally as 
well as on the mailing list. Please also see the previous mails with Daniel on 
this topic.

I dug through dri-devel and read everything I could find with a search for 
"timeline semaphore"  I didn't find all that much but this did come up once.

Need to dig through my mails as well, that was back in November/December last 
year.

My initial suggestion was actually to exactly what Leonid suggested as well.

And following this I used a rather simple container for the implementation, 
e.g. just a ring buffer indexed by the sequence number. In this scenario 
userspace can specify on syncobj creation time how big the window for sequence 
numbers should be, e.g. in this implementation how big the ring buffer would be.

This was rejected by our guys who actually wrote a good part of the Vulkan 
specification. Daniel then has gone into the same direction during the public 
discussion.

I agree with whoever said that specifying a ringbuffer size is unacceptable.  
I'm not really sure how that's relevant though.  Is a ringbuffer required to 
implement the behavior that is being suggested here?  Genuine question; I'm 
trying to get back up to speed.

Using a ring buffer was just an example how we could do it if we follow my and 
Lionel's suggestion.

Key point is that we could simplify the implementation massively if sequence 
numbers don't need to depend on each other.

In other words we just see the syncobj as container where fences are added and 
retrieved from instead of something actively involved in the signaling.

Main reason we didn't do it this way is because the AMD Vulkan team has 
rejected this approach.

Additional to that chaining sequence numbers together is really the more 
defensive approach, e.g. it is less likely that applications can shoot 
themselves in the foot.

 4. If you do get into a sticky situation, you can unblock an entire timeline 
by using the CPU signal ioctl to set it to a high value.

Well I think that this could be problematic as well. Keep in mind that main use 
case for this is sharing timelines between processes.

In other words you don't want applications to be able to mess with it to much.

Cross-process is exactly why you want it.  Suppose you're a compositor and you 
have a timeline shared with another application and you've submitted work which 
waits on it.  Then you get a notification somehow (SIGHUP?) that the client has 
died leaving you hanging.  What do you do?  You take the semaphore that's 
shared with you and the client and whack it to UINT64_MAX to unblock yourself.  
Of course, this can be abused and that's always the risk you take with 
timelines.

My last status is that basically everybody agrees now that wait before signal 
in the kernel is forbidden.

So when you get a SIGHUB because your client is dead you just kill your thread 
waiting on it.

Regards,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Update sdma golden setting for vega20

2019-02-15 Thread Deucher, Alexander

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Liu, Shaoyun 

Sent: Friday, February 15, 2019 11:56 AM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Shaoyun
Subject: [PATCH] drm/amdgpu: Update sdma golden setting for vega20

According to hardware engineer, WTITE_BUST_LENGTH [9:8] in register
SDMA0_CHICKEN_BITS need to change to 3 for better performance

Change-Id: I32121ac19a62c0794b43755078e89d447724bf07
Signed-off-by: shaoyunl 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index 127b859..c816e55 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -128,7 +128,7 @@ static const struct soc15_reg_golden 
golden_settings_sdma0_4_2_init[] = {

 static const struct soc15_reg_golden golden_settings_sdma0_4_2[] =
 {
-   SOC15_REG_GOLDEN_VALUE(SDMA0, 0, mmSDMA0_CHICKEN_BITS, 0xfe931f07, 
0x02831d07),
+   SOC15_REG_GOLDEN_VALUE(SDMA0, 0, mmSDMA0_CHICKEN_BITS, 0xfe931f07, 
0x02831f07),
 SOC15_REG_GOLDEN_VALUE(SDMA0, 0, mmSDMA0_CLK_CTRL, 0x, 
0x3f000100),
 SOC15_REG_GOLDEN_VALUE(SDMA0, 0, mmSDMA0_GB_ADDR_CONFIG, 0x773f, 
0x4002),
 SOC15_REG_GOLDEN_VALUE(SDMA0, 0, mmSDMA0_GB_ADDR_CONFIG_READ, 
0x773f, 0x4002),
@@ -158,7 +158,7 @@ static const struct soc15_reg_golden 
golden_settings_sdma0_4_2[] =
 };

 static const struct soc15_reg_golden golden_settings_sdma1_4_2[] = {
-   SOC15_REG_GOLDEN_VALUE(SDMA1, 0, mmSDMA1_CHICKEN_BITS, 0xfe931f07, 
0x02831d07),
+   SOC15_REG_GOLDEN_VALUE(SDMA1, 0, mmSDMA1_CHICKEN_BITS, 0xfe931f07, 
0x02831f07),
 SOC15_REG_GOLDEN_VALUE(SDMA1, 0, mmSDMA1_CLK_CTRL, 0x, 
0x3f000100),
 SOC15_REG_GOLDEN_VALUE(SDMA1, 0, mmSDMA1_GB_ADDR_CONFIG, 0x773f, 
0x4002),
 SOC15_REG_GOLDEN_VALUE(SDMA1, 0, mmSDMA1_GB_ADDR_CONFIG_READ, 
0x773f, 0x4002),
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 02/11] dma-buf: add new dma_fence_chain container v4

2019-02-15 Thread Jason Ekstrand

On Fri, Feb 15, 2019 at 11:51 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 15.02.19 um 17:49 schrieb Jason Ekstrand:
>
> On Fri, Feb 15, 2019 at 9:52 AM Lionel Landwerlin via dri-devel <
> dri-de...@lists.freedesktop.org> wrote:
>
>> On 15/02/2019 14:32, Koenig, Christian wrote:
>> > Am 15.02.19 um 15:23 schrieb Lionel Landwerlin:
>> >> Hi Christian, David,
>> >>
>> >> For timeline semaphore we need points to signaled in order.
>> >> I'm struggling to understand how this fence-chain implementation
>> >> preserves ordering of the seqnos.
>> >>
>> >> One of the scenario I can see an issue happening is when you have a
>> >> timeline with points 1 & 2 and userspace submits for 2 different
>> >> engines :
>> >>  - first with let's say a blitter style engine on point 2
>> >>  - then a 3d style engine on point 1
>> > Yeah, and where exactly is the problem?
>> >
>> > Seqno 1 will signal when the 3d style engine finishes work.
>> >
>> > And seqno 2 will signal when both seqno 1 is signaled and the blitter
>> > style engine has finished its work.
>>
>
> That's an interesting interpretation of the spec.  I think it's legal and
> I could see that behavior may be desirable in some ways.
>
>
> Well we actually had this discussion multiple times now, both internally
> as well as on the mailing list. Please also see the previous mails with
> Daniel on this topic.
>

I dug through dri-devel and read everything I could find with a search for
"timeline semaphore"  I didn't find all that much but this did come up once.


> My initial suggestion was actually to exactly what Leonid suggested as
> well.
>
> And following this I used a rather simple container for the
> implementation, e.g. just a ring buffer indexed by the sequence number. In
> this scenario userspace can specify on syncobj creation time how big the
> window for sequence numbers should be, e.g. in this implementation how big
> the ring buffer would be.
>
> This was rejected by our guys who actually wrote a good part of the Vulkan
> specification. Daniel then has gone into the same direction during the
> public discussion.
>

I agree with whoever said that specifying a ringbuffer size is
unacceptable.  I'm not really sure how that's relevant though.  Is a
ringbuffer required to implement the behavior that is being suggested
here?  Genuine question; I'm trying to get back up to speed.


> [SNIP]
>
> I think what Christian is suggesting is a valid interpretation of the spec
> though it is rather unconventional.  The Vulkan spec, as it stands today,
> requires that the application ensure that at the time of signaling, the
> timeline semaphore value increases.  This means that all of the above
> possible cases are technically illegal in Vulkan and so it doesn't really
> matter what we do as long as we don't do anyting especially stupid.
>
>
> And exactly that's the point. When an application does something stupid
> with its own submissions then this is not much of a problem.
>
> But this interface is meant to be made for communication between
> processes, and here we want to be sure that nobody can do anything stupid.
>
> My understanding of how this works on Windows is that a wait operation on
> 3 is a wait until x >= 3 where x is a 64-bit value and a signal operation
> is simply a write to x.  This means that, in the above cases, waits on 1
> will be triggered immediately when 2 is written but waits on 2 may or may
> not happen at all depending on whether the GPU write which overwrites x to
> 1 or the CPU (or potentially GPU in a different context) read gets there
> first such that the reader observes 2.  If you mess this up and something
> isn't signaled, that's your fault.
>
>
> Yeah and I think that this is actually not a good idea at all.
> Implementing it like this ultimately means that you can only use polling on
> the number.
>

Yeah, there are problems with it.  I'm just putting it out there for
reference and because it's what developers expect regardless of whether
that's a good thing or not.

 4. If you do get into a sticky situation, you can unblock an entire
> timeline by using the CPU signal ioctl to set it to a high value.
>
>
> Well I think that this could be problematic as well. Keep in mind that
> main use case for this is sharing timelines between processes.
>
> In other words you don't want applications to be able to mess with it to
> much.
>

Cross-process is exactly why you want it.  Suppose you're a compositor and
you have a timeline shared with another application and you've submitted
work which waits on it.  Then you get a notification somehow (SIGHUP?) that
the client has died leaving you hanging.  What do you do?  You take the
semaphore that's shared with you and the client and whack it to UINT64_MAX
to unblock yourself.  Of course, this can be abused and that's always the
risk you take with timelines.


>
> Of all these reasons, I think 1 and 2 carry the most weight.  2, in
> particular, is interesting if we

Re: [PATCH 02/11] dma-buf: add new dma_fence_chain container v4

2019-02-15 Thread Christian König via amd-gfx

Am 15.02.19 um 17:49 schrieb Jason Ekstrand:
On Fri, Feb 15, 2019 at 9:52 AM Lionel Landwerlin via dri-devel 
> wrote:

On 15/02/2019 14:32, Koenig, Christian wrote:
> Am 15.02.19 um 15:23 schrieb Lionel Landwerlin:
>> Hi Christian, David,
>>
>> For timeline semaphore we need points to signaled in order.
>> I'm struggling to understand how this fence-chain implementation
>> preserves ordering of the seqnos.
>>
>> One of the scenario I can see an issue happening is when you have a
>> timeline with points 1 & 2 and userspace submits for 2 different
>> engines :
>>      - first with let's say a blitter style engine on point 2
>>      - then a 3d style engine on point 1
> Yeah, and where exactly is the problem?
>
> Seqno 1 will signal when the 3d style engine finishes work.
>
> And seqno 2 will signal when both seqno 1 is signaled and the
blitter
> style engine has finished its work.

That's an interesting interpretation of the spec.  I think it's legal 
and I could see that behavior may be desirable in some ways.

Well we actually had this discussion multiple times now, both internally 
as well as on the mailing list. Please also see the previous mails with 
Daniel on this topic.

My initial suggestion was actually to exactly what Leonid suggested as well.

And following this I used a rather simple container for the 
implementation, e.g. just a ring buffer indexed by the sequence number. 
In this scenario userspace can specify on syncobj creation time how big 
the window for sequence numbers should be, e.g. in this implementation 
how big the ring buffer would be.

This was rejected by our guys who actually wrote a good part of the 
Vulkan specification. Daniel then has gone into the same direction 
during the public discussion.

[SNIP]
I think what Christian is suggesting is a valid interpretation of the 
spec though it is rather unconventional.  The Vulkan spec, as it 
stands today, requires that the application ensure that at the time of 
signaling, the timeline semaphore value increases.  This means that 
all of the above possible cases are technically illegal in Vulkan and 
so it doesn't really matter what we do as long as we don't do anyting 
especially stupid.

And exactly that's the point. When an application does something stupid 
with its own submissions then this is not much of a problem.

But this interface is meant to be made for communication between 
processes, and here we want to be sure that nobody can do anything stupid.

My understanding of how this works on Windows is that a wait operation 
on 3 is a wait until x >= 3 where x is a 64-bit value and a signal 
operation is simply a write to x. This means that, in the above cases, 
waits on 1 will be triggered immediately when 2 is written but waits 
on 2 may or may not happen at all depending on whether the GPU write 
which overwrites x to 1 or the CPU (or potentially GPU in a different 
context) read gets there first such that the reader observes 2.  If 
you mess this up and something isn't signaled, that's your fault.

Yeah and I think that this is actually not a good idea at all. 
Implementing it like this ultimately means that you can only use polling 
on the number.

 4. If you do get into a sticky situation, you can unblock an entire 
timeline by using the CPU signal ioctl to set it to a high value.

Well I think that this could be problematic as well. Keep in mind that 
main use case for this is sharing timelines between processes.

In other words you don't want applications to be able to mess with it to 
much.

Of all these reasons, I think 1 and 2 carry the most weight.  2, in 
particular, is interesting if we one day want to implement the same 
behavior with a simple 64-bit value like Windows does.  Immagine, for 
instance, a scenario where the GPU is doing it's own scheduling or 
command buffers are submitted ahead of the signal operation being 
available and told to just sit on the GPU until they see x >= 3.  
(Yes, there are issues here with residency, contention, etc.  I'm 
asking you to use your immagination.)  Assuming you can do 64-bit 
atomics (there are aparently issues here with PCIe that make things 
sticky), the behavior I'm suggesting is completely implementable in 
that way whereas the behavior Christian is suggesting is only 
implementable if you're maintaining a CPU-side list of fences.  I 
don't think we want to paint ourselves into that corner.

Actually we already had such an implementation with radeon. And I can 
only say that it was a totally PAIN IN THE A* to maintain.

This is one of the reason why we are not using hardware semaphores any 
more with amdgpu.

Regards,
Christian.

--Jason

>
> Regards,
> Christian.
>
>> -Lionel
>>
>> On 07/12/2018 09:55, Chunming Zhou wrote:
>>> From: Christian König mailto:ckoenig.leichtzumer...@gmail.com>>

[PATCH] drm/amdgpu: Update sdma golden setting for vega20

2019-02-15 Thread Liu, Shaoyun

According to hardware engineer, WRITE_BURST_LENGTH [9:8] in register
SDMA0_CHICKEN_BITS need to change to 3 for better performance

Change-Id: I32121ac19a62c0794b43755078e89d447724bf07
Signed-off-by: shaoyunl 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index 127b859..c816e55 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -128,7 +128,7 @@ static const struct soc15_reg_golden 
golden_settings_sdma0_4_2_init[] = {
 
 static const struct soc15_reg_golden golden_settings_sdma0_4_2[] =
 {
-   SOC15_REG_GOLDEN_VALUE(SDMA0, 0, mmSDMA0_CHICKEN_BITS, 0xfe931f07, 
0x02831d07),
+   SOC15_REG_GOLDEN_VALUE(SDMA0, 0, mmSDMA0_CHICKEN_BITS, 0xfe931f07, 
0x02831f07),
SOC15_REG_GOLDEN_VALUE(SDMA0, 0, mmSDMA0_CLK_CTRL, 0x, 
0x3f000100),
SOC15_REG_GOLDEN_VALUE(SDMA0, 0, mmSDMA0_GB_ADDR_CONFIG, 0x773f, 
0x4002),
SOC15_REG_GOLDEN_VALUE(SDMA0, 0, mmSDMA0_GB_ADDR_CONFIG_READ, 
0x773f, 0x4002),
@@ -158,7 +158,7 @@ static const struct soc15_reg_golden 
golden_settings_sdma0_4_2[] =
 };
 
 static const struct soc15_reg_golden golden_settings_sdma1_4_2[] = {
-   SOC15_REG_GOLDEN_VALUE(SDMA1, 0, mmSDMA1_CHICKEN_BITS, 0xfe931f07, 
0x02831d07),
+   SOC15_REG_GOLDEN_VALUE(SDMA1, 0, mmSDMA1_CHICKEN_BITS, 0xfe931f07, 
0x02831f07),
SOC15_REG_GOLDEN_VALUE(SDMA1, 0, mmSDMA1_CLK_CTRL, 0x, 
0x3f000100),
SOC15_REG_GOLDEN_VALUE(SDMA1, 0, mmSDMA1_GB_ADDR_CONFIG, 0x773f, 
0x4002),
SOC15_REG_GOLDEN_VALUE(SDMA1, 0, mmSDMA1_GB_ADDR_CONFIG_READ, 
0x773f, 0x4002),
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2 0/3] Make DRM DSC helpers more generally usable

2019-02-15 Thread David Francis

drm_dsc could use some work so that drm drivers other than
i915 can make use of it their own DSC implementations

Move rc compute, a function that forms part of the DSC spec,
into drm. Update it to DSC 1.2. Also split the PPS packing and
SDP header init functions, to allow for drivers with
their own SDP struct headers

v2:
Rebase onto drm-next
Refactor drm_dsc_dp_pps_header_init
Clean up documentation on new drm function

David Francis (3):
  drm/i915: Move dsc rate params compute into drm
  drm/dsc: Add native 420 and 422 support to compute_rc_params
  drm/dsc: Split DSC PPS and SDP header initialisations

 drivers/gpu/drm/drm_dsc.c | 269 +++---
 drivers/gpu/drm/i915/intel_vdsc.c | 133 +--
 include/drm/drm_dsc.h |   9 +-
 3 files changed, 219 insertions(+), 192 deletions(-)

-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2 2/3] drm/dsc: Add native 420 and 422 support to compute_rc_params

2019-02-15 Thread David Francis

Native 420 and 422 transfer modes are new in DSC1.2

In these modes, each two pixels of a slice are treated as one
pixel, so the slice width is half as large (round down) for
the purposes of calucating the groups per line and chunk size
in bytes

In native 422 mode, each pixel has four components, so the
mux component of a group is larger by one additional mux word
and one additional component

Now that there is native 422 support, the configuration option
previously called enable422 is renamed to simple_422 to avoid
confusion

Acked-by: Jani Nikula 
Reviewed-by: Manasi Navare 
Reviewed-by: Harry Wentland 
Signed-off-by: David Francis 
---
 drivers/gpu/drm/drm_dsc.c | 33 ++-
 drivers/gpu/drm/i915/intel_vdsc.c |  4 ++--
 include/drm/drm_dsc.h |  4 ++--
 3 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_dsc.c b/drivers/gpu/drm/drm_dsc.c
index b7f1903508a4..d77570bf6ac4 100644
--- a/drivers/gpu/drm/drm_dsc.c
+++ b/drivers/gpu/drm/drm_dsc.c
@@ -95,7 +95,7 @@ void drm_dsc_pps_infoframe_pack(struct drm_dsc_pps_infoframe 
*pps_sdp,
((dsc_cfg->bits_per_pixel & DSC_PPS_BPP_HIGH_MASK) >>
 DSC_PPS_MSB_SHIFT) |
dsc_cfg->vbr_enable << DSC_PPS_VBR_EN_SHIFT |
-   dsc_cfg->enable422 << DSC_PPS_SIMPLE422_SHIFT |
+   dsc_cfg->simple_422 << DSC_PPS_SIMPLE422_SHIFT |
dsc_cfg->convert_rgb << DSC_PPS_CONVERT_RGB_SHIFT |
dsc_cfg->block_pred_enable << DSC_PPS_BLOCK_PRED_EN_SHIFT;
 
@@ -249,7 +249,7 @@ EXPORT_SYMBOL(drm_dsc_pps_infoframe_pack);
 /**
  * drm_dsc_compute_rc_parameters() - Write rate control
  * parameters to the dsc configuration defined in
- * &struct drm_dsc_config in accordance with the DSC 1.1
+ * &struct drm_dsc_config in accordance with the DSC 1.2
  * specification. Some configuration fields must be present
  * beforehand.
  *
@@ -266,19 +266,34 @@ int drm_dsc_compute_rc_parameters(struct drm_dsc_config 
*vdsc_cfg)
unsigned long final_scale = 0;
unsigned long rbs_min = 0;
 
-   /* Number of groups used to code each line of a slice */
-   groups_per_line = DIV_ROUND_UP(vdsc_cfg->slice_width,
-  DSC_RC_PIXELS_PER_GROUP);
+   if (vdsc_cfg->native_420 || vdsc_cfg->native_422) {
+   /* Number of groups used to code each line of a slice */
+   groups_per_line = DIV_ROUND_UP(vdsc_cfg->slice_width / 2,
+  DSC_RC_PIXELS_PER_GROUP);
 
-   /* chunksize in Bytes */
-   vdsc_cfg->slice_chunk_size = DIV_ROUND_UP(vdsc_cfg->slice_width *
- vdsc_cfg->bits_per_pixel,
- (8 * 16));
+   /* chunksize in Bytes */
+   vdsc_cfg->slice_chunk_size = DIV_ROUND_UP(vdsc_cfg->slice_width 
/ 2 *
+ 
vdsc_cfg->bits_per_pixel,
+ (8 * 16));
+   } else {
+   /* Number of groups used to code each line of a slice */
+   groups_per_line = DIV_ROUND_UP(vdsc_cfg->slice_width,
+  DSC_RC_PIXELS_PER_GROUP);
+
+   /* chunksize in Bytes */
+   vdsc_cfg->slice_chunk_size = DIV_ROUND_UP(vdsc_cfg->slice_width 
*
+ 
vdsc_cfg->bits_per_pixel,
+ (8 * 16));
+   }
 
if (vdsc_cfg->convert_rgb)
num_extra_mux_bits = 3 * (vdsc_cfg->mux_word_size +
  (4 * vdsc_cfg->bits_per_component + 4)
  - 2);
+   else if (vdsc_cfg->native_422)
+   num_extra_mux_bits = 4 * vdsc_cfg->mux_word_size +
+   (4 * vdsc_cfg->bits_per_component + 4) +
+   3 * (4 * vdsc_cfg->bits_per_component) - 2;
else
num_extra_mux_bits = 3 * vdsc_cfg->mux_word_size +
(4 * vdsc_cfg->bits_per_component + 4) +
diff --git a/drivers/gpu/drm/i915/intel_vdsc.c 
b/drivers/gpu/drm/i915/intel_vdsc.c
index 2d059ebc9bd0..8c8d96157333 100644
--- a/drivers/gpu/drm/i915/intel_vdsc.c
+++ b/drivers/gpu/drm/i915/intel_vdsc.c
@@ -368,7 +368,7 @@ int intel_dp_compute_dsc_params(struct intel_dp *intel_dp,
DSC_1_1_MAX_LINEBUF_DEPTH_BITS : line_buf_depth;
 
/* Gen 11 does not support YCbCr */
-   vdsc_cfg->enable422 = false;
+   vdsc_cfg->simple_422 = false;
/* Gen 11 does not support VBR */
vdsc_cfg->vbr_enable = false;
vdsc_cfg->block_pred_enable =
@@ -495,7 +495,7 @@ static void intel_configure_pps_for_dsc_encoder(struct 
intel_encoder *encoder,
pps_val |= DSC_BLOCK_PREDICTION;

[PATCH v2 1/3] drm/i915: Move dsc rate params compute into drm

2019-02-15 Thread David Francis

The function intel_compute_rc_parameters is part of the dsc spec
and is not driver-specific. Other drm drivers might like to use
it.  The function is not changed; just moved and renamed.

Reviewed-by: Harry Wentland 
Signed-off-by: David Francis 
---
 drivers/gpu/drm/drm_dsc.c | 135 ++
 drivers/gpu/drm/i915/intel_vdsc.c | 125 +--
 include/drm/drm_dsc.h |   1 +
 3 files changed, 137 insertions(+), 124 deletions(-)

diff --git a/drivers/gpu/drm/drm_dsc.c b/drivers/gpu/drm/drm_dsc.c
index bce99f95c1a3..b7f1903508a4 100644
--- a/drivers/gpu/drm/drm_dsc.c
+++ b/drivers/gpu/drm/drm_dsc.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -244,3 +245,137 @@ void drm_dsc_pps_infoframe_pack(struct 
drm_dsc_pps_infoframe *pps_sdp,
/* PPS 94 - 127 are O */
 }
 EXPORT_SYMBOL(drm_dsc_pps_infoframe_pack);
+
+/**
+ * drm_dsc_compute_rc_parameters() - Write rate control
+ * parameters to the dsc configuration defined in
+ * &struct drm_dsc_config in accordance with the DSC 1.1
+ * specification. Some configuration fields must be present
+ * beforehand.
+ *
+ * @vdsc_cfg:
+ * DSC Configuration data partially filled by driver
+ */
+int drm_dsc_compute_rc_parameters(struct drm_dsc_config *vdsc_cfg)
+{
+   unsigned long groups_per_line = 0;
+   unsigned long groups_total = 0;
+   unsigned long num_extra_mux_bits = 0;
+   unsigned long slice_bits = 0;
+   unsigned long hrd_delay = 0;
+   unsigned long final_scale = 0;
+   unsigned long rbs_min = 0;
+
+   /* Number of groups used to code each line of a slice */
+   groups_per_line = DIV_ROUND_UP(vdsc_cfg->slice_width,
+  DSC_RC_PIXELS_PER_GROUP);
+
+   /* chunksize in Bytes */
+   vdsc_cfg->slice_chunk_size = DIV_ROUND_UP(vdsc_cfg->slice_width *
+ vdsc_cfg->bits_per_pixel,
+ (8 * 16));
+
+   if (vdsc_cfg->convert_rgb)
+   num_extra_mux_bits = 3 * (vdsc_cfg->mux_word_size +
+ (4 * vdsc_cfg->bits_per_component + 4)
+ - 2);
+   else
+   num_extra_mux_bits = 3 * vdsc_cfg->mux_word_size +
+   (4 * vdsc_cfg->bits_per_component + 4) +
+   2 * (4 * vdsc_cfg->bits_per_component) - 2;
+   /* Number of bits in one Slice */
+   slice_bits = 8 * vdsc_cfg->slice_chunk_size * vdsc_cfg->slice_height;
+
+   while ((num_extra_mux_bits > 0) &&
+  ((slice_bits - num_extra_mux_bits) % vdsc_cfg->mux_word_size))
+   num_extra_mux_bits--;
+
+   if (groups_per_line < vdsc_cfg->initial_scale_value - 8)
+   vdsc_cfg->initial_scale_value = groups_per_line + 8;
+
+   /* scale_decrement_interval calculation according to DSC spec 1.11 */
+   if (vdsc_cfg->initial_scale_value > 8)
+   vdsc_cfg->scale_decrement_interval = groups_per_line /
+   (vdsc_cfg->initial_scale_value - 8);
+   else
+   vdsc_cfg->scale_decrement_interval = 
DSC_SCALE_DECREMENT_INTERVAL_MAX;
+
+   vdsc_cfg->final_offset = vdsc_cfg->rc_model_size -
+   (vdsc_cfg->initial_xmit_delay *
+vdsc_cfg->bits_per_pixel + 8) / 16 + num_extra_mux_bits;
+
+   if (vdsc_cfg->final_offset >= vdsc_cfg->rc_model_size) {
+   DRM_DEBUG_KMS("FinalOfs < RcModelSze for this 
InitialXmitDelay\n");
+   return -ERANGE;
+   }
+
+   final_scale = (vdsc_cfg->rc_model_size * 8) /
+   (vdsc_cfg->rc_model_size - vdsc_cfg->final_offset);
+   if (vdsc_cfg->slice_height > 1)
+   /*
+* NflBpgOffset is 16 bit value with 11 fractional bits
+* hence we multiply by 2^11 for preserving the
+* fractional part
+*/
+   vdsc_cfg->nfl_bpg_offset = 
DIV_ROUND_UP((vdsc_cfg->first_line_bpg_offset << 11),
+   (vdsc_cfg->slice_height 
- 1));
+   else
+   vdsc_cfg->nfl_bpg_offset = 0;
+
+   /* 2^16 - 1 */
+   if (vdsc_cfg->nfl_bpg_offset > 65535) {
+   DRM_DEBUG_KMS("NflBpgOffset is too large for this slice 
height\n");
+   return -ERANGE;
+   }
+
+   /* Number of groups used to code the entire slice */
+   groups_total = groups_per_line * vdsc_cfg->slice_height;
+
+   /* slice_bpg_offset is 16 bit value with 11 fractional bits */
+   vdsc_cfg->slice_bpg_offset = DIV_ROUND_UP(((vdsc_cfg->rc_model_size -
+   vdsc_cfg->initial_offset +
+   num_extra_mux_bits) << 11),
+ groups_total);
+
+   if (final_scale > 9) {
+

[PATCH] drm/amdgpu: Update sdma golden setting for vega20

2019-02-15 Thread Liu, Shaoyun

According to hardware engineer, WTITE_BUST_LENGTH [9:8] in register
SDMA0_CHICKEN_BITS need to change to 3 for better performance

Change-Id: I32121ac19a62c0794b43755078e89d447724bf07
Signed-off-by: shaoyunl 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index 127b859..c816e55 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -128,7 +128,7 @@ static const struct soc15_reg_golden 
golden_settings_sdma0_4_2_init[] = {
 
 static const struct soc15_reg_golden golden_settings_sdma0_4_2[] =
 {
-   SOC15_REG_GOLDEN_VALUE(SDMA0, 0, mmSDMA0_CHICKEN_BITS, 0xfe931f07, 
0x02831d07),
+   SOC15_REG_GOLDEN_VALUE(SDMA0, 0, mmSDMA0_CHICKEN_BITS, 0xfe931f07, 
0x02831f07),
SOC15_REG_GOLDEN_VALUE(SDMA0, 0, mmSDMA0_CLK_CTRL, 0x, 
0x3f000100),
SOC15_REG_GOLDEN_VALUE(SDMA0, 0, mmSDMA0_GB_ADDR_CONFIG, 0x773f, 
0x4002),
SOC15_REG_GOLDEN_VALUE(SDMA0, 0, mmSDMA0_GB_ADDR_CONFIG_READ, 
0x773f, 0x4002),
@@ -158,7 +158,7 @@ static const struct soc15_reg_golden 
golden_settings_sdma0_4_2[] =
 };
 
 static const struct soc15_reg_golden golden_settings_sdma1_4_2[] = {
-   SOC15_REG_GOLDEN_VALUE(SDMA1, 0, mmSDMA1_CHICKEN_BITS, 0xfe931f07, 
0x02831d07),
+   SOC15_REG_GOLDEN_VALUE(SDMA1, 0, mmSDMA1_CHICKEN_BITS, 0xfe931f07, 
0x02831f07),
SOC15_REG_GOLDEN_VALUE(SDMA1, 0, mmSDMA1_CLK_CTRL, 0x, 
0x3f000100),
SOC15_REG_GOLDEN_VALUE(SDMA1, 0, mmSDMA1_GB_ADDR_CONFIG, 0x773f, 
0x4002),
SOC15_REG_GOLDEN_VALUE(SDMA1, 0, mmSDMA1_GB_ADDR_CONFIG_READ, 
0x773f, 0x4002),
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 02/11] dma-buf: add new dma_fence_chain container v4

2019-02-15 Thread Jason Ekstrand

On Fri, Feb 15, 2019 at 9:52 AM Lionel Landwerlin via dri-devel <
dri-de...@lists.freedesktop.org> wrote:

> On 15/02/2019 14:32, Koenig, Christian wrote:
> > Am 15.02.19 um 15:23 schrieb Lionel Landwerlin:
> >> Hi Christian, David,
> >>
> >> For timeline semaphore we need points to signaled in order.
> >> I'm struggling to understand how this fence-chain implementation
> >> preserves ordering of the seqnos.
> >>
> >> One of the scenario I can see an issue happening is when you have a
> >> timeline with points 1 & 2 and userspace submits for 2 different
> >> engines :
> >>  - first with let's say a blitter style engine on point 2
> >>  - then a 3d style engine on point 1
> > Yeah, and where exactly is the problem?
> >
> > Seqno 1 will signal when the 3d style engine finishes work.
> >
> > And seqno 2 will signal when both seqno 1 is signaled and the blitter
> > style engine has finished its work.
>

That's an interesting interpretation of the spec.  I think it's legal and I
could see that behavior may be desirable in some ways.

> That's not really how I understood the spec, but I might be wrong.
>
> What makes me thing 1 should be signaled as soon as 2 is signaled
> (regardless of whether the fence attached on point 1 is been signaled),
> is that the spec defines wait & signal operations in term of the value
> of the timeline.
>
>
> -Lionel
>
> >
> >> Another scenario would be signaling a timeline with points 1 & 2 with
> >> those points in reverse order in the submission array.
> > That is actually illegal in the spec, but actually handled gracefully as
> > well.
> >
> > E.g. when you add seqno 1 to the syncobj container it will only signal
> > when 2 is signaled as well.
>

I think what Christian is suggesting is a valid interpretation of the spec
though it is rather unconventional.  The Vulkan spec, as it stands today,
requires that the application ensure that at the time of signaling, the
timeline semaphore value increases.  This means that all of the above
possible cases are technically illegal in Vulkan and so it doesn't really
matter what we do as long as we don't do anyting especially stupid.

My understanding of how this works on Windows is that a wait operation on 3
is a wait until x >= 3 where x is a 64-bit value and a signal operation is
simply a write to x.  This means that, in the above cases, waits on 1 will
be triggered immediately when 2 is written but waits on 2 may or may not
happen at all depending on whether the GPU write which overwrites x to 1 or
the CPU (or potentially GPU in a different context) read gets there first
such that the reader observes 2.  If you mess this up and something isn't
signaled, that's your fault.

Instead of specifying things to be exactly the Windows behavior, Vulkan
says that you must only ever increase the value and anything else is
illegal and therefore leads to undefined behavior.  The usual consequences
of undefined behavior apply: anything can happen up to and including
process termination.  In other words, how we handle those cases is
completely up to us as long as we do something sane that doesn't result in
kernel crashes or anything like that.  We do have to handle it in some way
because we can't outright prevent those cases from happening.  The question
then becomes what's the best way for the behavior to degrade.

In my opinion, the smoothest degredation is if you take the windows model
and replace the 64-bit write to x with a 64-bit atomic MAX operation.  In
other words, signaling 2 automatically unblocks 1 and any attempt to signal
a value lower than the current value is a no-op.  It has a few nice
advantages:

 1. Signaling N is guaranteed to unblock everything waiting on n <= N
regardless of what else may be pending.
 2. It matches what I think is the next natural evolution of the Windows
model where the write is replaced with an atomic.
 3. It gracefully handles the case where the operation to signal 1 is added
after the one to signal 2.  We can also make this case illegal but this
model extends to one in which it could be legal and well-defined.
 4. If you do get into a sticky situation, you can unblock an entire
timeline by using the CPU signal ioctl to set it to a high value.

Of all these reasons, I think 1 and 2 carry the most weight.  2, in
particular, is interesting if we one day want to implement the same
behavior with a simple 64-bit value like Windows does.  Immagine, for
instance, a scenario where the GPU is doing it's own scheduling or command
buffers are submitted ahead of the signal operation being available and
told to just sit on the GPU until they see x >= 3.  (Yes, there are issues
here with residency, contention, etc.  I'm asking you to use your
immagination.)  Assuming you can do 64-bit atomics (there are aparently
issues here with PCIe that make things sticky), the behavior I'm suggesting
is completely implementable in that way whereas the behavior Christian is
suggesting is only implementable if you're maintaini

Re: [PATCH 02/11] dma-buf: add new dma_fence_chain container v4

2019-02-15 Thread Christian König via amd-gfx


Am 15.02.19 um 16:52 schrieb Lionel Landwerlin:

On 15/02/2019 14:32, Koenig, Christian wrote:

Am 15.02.19 um 15:23 schrieb Lionel Landwerlin:

Hi Christian, David,

For timeline semaphore we need points to signaled in order.
I'm struggling to understand how this fence-chain implementation
preserves ordering of the seqnos.

One of the scenario I can see an issue happening is when you have a
timeline with points 1 & 2 and userspace submits for 2 different
engines :
 - first with let's say a blitter style engine on point 2
 - then a 3d style engine on point 1

Yeah, and where exactly is the problem?

Seqno 1 will signal when the 3d style engine finishes work.

And seqno 2 will signal when both seqno 1 is signaled and the blitter
style engine has finished its work.


That's not really how I understood the spec, but I might be wrong.

What makes me thing 1 should be signaled as soon as 2 is signaled
(regardless of whether the fence attached on point 1 is been signaled),
is that the spec defines wait & signal operations in term of the value
of the timeline.


That's what we had initially as well and it was rejected.

When 2 signals before 1 is signaled you can't call this a timeline any more.

Christian.




-Lionel




Another scenario would be signaling a timeline with points 1 & 2 with
those points in reverse order in the submission array.

That is actually illegal in the spec, but actually handled gracefully as
well.

E.g. when you add seqno 1 to the syncobj container it will only signal
when 2 is signaled as well.







Regards,
Christian.


-Lionel

On 07/12/2018 09:55, Chunming Zhou wrote:

From: Christian König 

Lockless container implementation similar to a dma_fence_array, but 
with

only two elements per node and automatic garbage collection.

v2: properly document dma_fence_chain_for_each, add
dma_fence_chain_find_seqno,
  drop prev reference during garbage collection if it's not a
chain fence.
v3: use head and iterator for dma_fence_chain_for_each
v4: fix reference count in dma_fence_chain_enable_signaling

Signed-off-by: Christian König 
---
   drivers/dma-buf/Makefile  |   3 +-
   drivers/dma-buf/dma-fence-chain.c | 241 
++

   include/linux/dma-fence-chain.h   |  81 ++
   3 files changed, 324 insertions(+), 1 deletion(-)
   create mode 100644 drivers/dma-buf/dma-fence-chain.c
   create mode 100644 include/linux/dma-fence-chain.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index 0913a6ccab5a..1f006e083eb9 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,4 +1,5 @@
-obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o
seqno-fence.o
+obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
+ reservation.o seqno-fence.o
   obj-$(CONFIG_SYNC_FILE)    += sync_file.o
   obj-$(CONFIG_SW_SYNC)    += sw_sync.o sync_debug.o
   obj-$(CONFIG_UDMABUF)    += udmabuf.o
diff --git a/drivers/dma-buf/dma-fence-chain.c
b/drivers/dma-buf/dma-fence-chain.c
new file mode 100644
index ..0c5e3c902fa0
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -0,0 +1,241 @@
+/*
+ * fence-chain: chain fences together in a timeline
+ *
+ * Copyright (C) 2018 Advanced Micro Devices, Inc.
+ * Authors:
+ *    Christian König 
+ *
+ * This program is free software; you can redistribute it and/or
modify it
+ * under the terms of the GNU General Public License version 2 as
published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
License for
+ * more details.
+ */
+
+#include 
+
+static bool dma_fence_chain_enable_signaling(struct dma_fence 
*fence);

+
+/**
+ * dma_fence_chain_get_prev - use RCU to get a reference to the
previous fence
+ * @chain: chain node to get the previous node from
+ *
+ * Use dma_fence_get_rcu_safe to get a reference to the previous
fence of the
+ * chain node.
+ */
+static struct dma_fence *dma_fence_chain_get_prev(struct
dma_fence_chain *chain)
+{
+    struct dma_fence *prev;
+
+    rcu_read_lock();
+    prev = dma_fence_get_rcu_safe(&chain->prev);
+    rcu_read_unlock();
+    return prev;
+}
+
+/**
+ * dma_fence_chain_walk - chain walking function
+ * @fence: current chain node
+ *
+ * Walk the chain to the next node. Returns the next fence or NULL
if we are at
+ * the end of the chain. Garbage collects chain nodes which are 
already

+ * signaled.
+ */
+struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
+{
+    struct dma_fence_chain *chain, *prev_chain;
+    struct dma_fence *prev, *replacement, *tmp;
+
+    chain = to_dma_fence_chain(fence);
+    if (!chain) {
+    dma_fence_put(fence);
+    return NULL;
+    }
+
+    while ((prev = dma_fence_chain_get_prev(chain))) {
+
+    prev_chain = to_dma_fence_chain(prev);

Re: [PATCH] drm: Mark expected switch fall-throughs

2019-02-15 Thread Gustavo A. R. Silva



On 2/15/19 10:11 AM, Alex Deucher wrote:
> On Fri, Feb 15, 2019 at 11:08 AM Gustavo A. R. Silva
>  wrote:
>>
>> In preparation to enabling -Wimplicit-fallthrough, mark switch
>> cases where we are expecting to fall through.
>>
>> Warning level 3 was used: -Wimplicit-fallthrough=3
>>
>> Notice that, in some cases, the code comment is modified
>> in accordance with what GCC is expecting to find.
>>
>> This patch is part of the ongoing efforts to enable
>> -Wimplicit-fallthrough.
>>
>> Signed-off-by: Gustavo A. R. Silva 
> 
> Can you please split this up per driver?  A comment below as well.
> 

OK. Sure.

>> ---
>>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 1 +
>>  drivers/gpu/drm/amd/amdgpu/si_dpm.c | 2 ++
>>  drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c  | 2 ++
>>  drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c  | 2 +-
>>  drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c| 2 ++
>>  drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c | 1 +
>>  drivers/gpu/drm/drm_vm.c| 4 ++--
>>  drivers/gpu/drm/nouveau/nouveau_bo.c| 2 +-
>>  drivers/gpu/drm/radeon/ci_dpm.c | 2 ++
>>  drivers/gpu/drm/radeon/evergreen_cs.c   | 1 +
>>  drivers/gpu/drm/radeon/si_dpm.c | 2 ++
>>  11 files changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> index b8e50a34bdb3..02955e6e9dd9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> @@ -3236,6 +3236,7 @@ static void gfx_v8_0_tiling_mode_table_init(struct 
>> amdgpu_device *adev)
>> dev_warn(adev->dev,
>>  "Unknown chip type (%d) in function 
>> gfx_v8_0_tiling_mode_table_init() falling through to CHIP_CARRIZO\n",
>>  adev->asic_type);
>> +   /* fall through */
>>
>> case CHIP_CARRIZO:
>> modearray[0] = (ARRAY_MODE(ARRAY_2D_TILED_THIN1) |
>> diff --git a/drivers/gpu/drm/amd/amdgpu/si_dpm.c 
>> b/drivers/gpu/drm/amd/amdgpu/si_dpm.c
>> index da58040fdbdc..41e01a7f57a4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/si_dpm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/si_dpm.c
>> @@ -6216,10 +6216,12 @@ static void 
>> si_request_link_speed_change_before_state_change(struct amdgpu_devic
>> si_pi->force_pcie_gen = AMDGPU_PCIE_GEN2;
>> if (current_link_speed == AMDGPU_PCIE_GEN2)
>> break;
>> +   /* fall through */
>> case AMDGPU_PCIE_GEN2:
>> if (amdgpu_acpi_pcie_performance_request(adev, 
>> PCIE_PERF_REQ_PECI_GEN2, false) == 0)
>> break;
>>  #endif
>> +   /* fall through */
>> default:
>> si_pi->force_pcie_gen = 
>> si_get_current_pcie_speed(adev);
>> break;
>> diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c 
>> b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
>> index a1c56f29cfeb..fd5266a58297 100644
>> --- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
>> +++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
>> @@ -265,6 +265,7 @@ static struct atom_display_object_path_v2 
>> *get_bios_object(
>> && id.enum_id == obj_id.enum_id)
>> return 
>> &bp->object_info_tbl.v1_4->display_path[i];
>> }
>> +   /* fall through */
>> case OBJECT_TYPE_CONNECTOR:
>> case OBJECT_TYPE_GENERIC:
>> /* Both Generic and Connector Object ID
>> @@ -277,6 +278,7 @@ static struct atom_display_object_path_v2 
>> *get_bios_object(
>> && id.enum_id == obj_id.enum_id)
>> return 
>> &bp->object_info_tbl.v1_4->display_path[i];
>> }
>> +   /* fall through */
>> default:
>> return NULL;
>> }
>> diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c 
>> b/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c
>> index 85686d917636..a24a2bda8656 100644
>> --- a/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c
>> +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c
>> @@ -479,7 +479,7 @@ static void program_grph_pixel_format(
>> case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F:
>> sign = 1;
>> floating = 1;
>> -   /* no break */
>> +   /* fall through */
>> case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: /* shouldn't this get 
>> float too? */
>> case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616:
>> grph_depth = 3;
>> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c 
>> b/d

Re: [PATCH] drm: Mark expected switch fall-throughs

2019-02-15 Thread Alex Deucher via amd-gfx

On Fri, Feb 15, 2019 at 11:08 AM Gustavo A. R. Silva
 wrote:
>
> In preparation to enabling -Wimplicit-fallthrough, mark switch
> cases where we are expecting to fall through.
>
> Warning level 3 was used: -Wimplicit-fallthrough=3
>
> Notice that, in some cases, the code comment is modified
> in accordance with what GCC is expecting to find.
>
> This patch is part of the ongoing efforts to enable
> -Wimplicit-fallthrough.
>
> Signed-off-by: Gustavo A. R. Silva 

Can you please split this up per driver?  A comment below as well.

> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 1 +
>  drivers/gpu/drm/amd/amdgpu/si_dpm.c | 2 ++
>  drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c  | 2 ++
>  drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c  | 2 +-
>  drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c| 2 ++
>  drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c | 1 +
>  drivers/gpu/drm/drm_vm.c| 4 ++--
>  drivers/gpu/drm/nouveau/nouveau_bo.c| 2 +-
>  drivers/gpu/drm/radeon/ci_dpm.c | 2 ++
>  drivers/gpu/drm/radeon/evergreen_cs.c   | 1 +
>  drivers/gpu/drm/radeon/si_dpm.c | 2 ++
>  11 files changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index b8e50a34bdb3..02955e6e9dd9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -3236,6 +3236,7 @@ static void gfx_v8_0_tiling_mode_table_init(struct 
> amdgpu_device *adev)
> dev_warn(adev->dev,
>  "Unknown chip type (%d) in function 
> gfx_v8_0_tiling_mode_table_init() falling through to CHIP_CARRIZO\n",
>  adev->asic_type);
> +   /* fall through */
>
> case CHIP_CARRIZO:
> modearray[0] = (ARRAY_MODE(ARRAY_2D_TILED_THIN1) |
> diff --git a/drivers/gpu/drm/amd/amdgpu/si_dpm.c 
> b/drivers/gpu/drm/amd/amdgpu/si_dpm.c
> index da58040fdbdc..41e01a7f57a4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/si_dpm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/si_dpm.c
> @@ -6216,10 +6216,12 @@ static void 
> si_request_link_speed_change_before_state_change(struct amdgpu_devic
> si_pi->force_pcie_gen = AMDGPU_PCIE_GEN2;
> if (current_link_speed == AMDGPU_PCIE_GEN2)
> break;
> +   /* fall through */
> case AMDGPU_PCIE_GEN2:
> if (amdgpu_acpi_pcie_performance_request(adev, 
> PCIE_PERF_REQ_PECI_GEN2, false) == 0)
> break;
>  #endif
> +   /* fall through */
> default:
> si_pi->force_pcie_gen = 
> si_get_current_pcie_speed(adev);
> break;
> diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c 
> b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
> index a1c56f29cfeb..fd5266a58297 100644
> --- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
> +++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
> @@ -265,6 +265,7 @@ static struct atom_display_object_path_v2 
> *get_bios_object(
> && id.enum_id == obj_id.enum_id)
> return 
> &bp->object_info_tbl.v1_4->display_path[i];
> }
> +   /* fall through */
> case OBJECT_TYPE_CONNECTOR:
> case OBJECT_TYPE_GENERIC:
> /* Both Generic and Connector Object ID
> @@ -277,6 +278,7 @@ static struct atom_display_object_path_v2 
> *get_bios_object(
> && id.enum_id == obj_id.enum_id)
> return 
> &bp->object_info_tbl.v1_4->display_path[i];
> }
> +   /* fall through */
> default:
> return NULL;
> }
> diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c 
> b/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c
> index 85686d917636..a24a2bda8656 100644
> --- a/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c
> +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c
> @@ -479,7 +479,7 @@ static void program_grph_pixel_format(
> case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F:
> sign = 1;
> floating = 1;
> -   /* no break */
> +   /* fall through */
> case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: /* shouldn't this get 
> float too? */
> case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616:
> grph_depth = 3;
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c 
> b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> index c8f5c00dd1e7..48187acac59e 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> +++ b/drivers/gpu/drm

[PATCH] drm: Mark expected switch fall-throughs

2019-02-15 Thread Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

Warning level 3 was used: -Wimplicit-fallthrough=3

Notice that, in some cases, the code comment is modified
in accordance with what GCC is expecting to find.

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 1 +
 drivers/gpu/drm/amd/amdgpu/si_dpm.c | 2 ++
 drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c  | 2 ++
 drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c  | 2 +-
 drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c| 2 ++
 drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c | 1 +
 drivers/gpu/drm/drm_vm.c| 4 ++--
 drivers/gpu/drm/nouveau/nouveau_bo.c| 2 +-
 drivers/gpu/drm/radeon/ci_dpm.c | 2 ++
 drivers/gpu/drm/radeon/evergreen_cs.c   | 1 +
 drivers/gpu/drm/radeon/si_dpm.c | 2 ++
 11 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index b8e50a34bdb3..02955e6e9dd9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -3236,6 +3236,7 @@ static void gfx_v8_0_tiling_mode_table_init(struct 
amdgpu_device *adev)
dev_warn(adev->dev,
 "Unknown chip type (%d) in function 
gfx_v8_0_tiling_mode_table_init() falling through to CHIP_CARRIZO\n",
 adev->asic_type);
+   /* fall through */
 
case CHIP_CARRIZO:
modearray[0] = (ARRAY_MODE(ARRAY_2D_TILED_THIN1) |
diff --git a/drivers/gpu/drm/amd/amdgpu/si_dpm.c 
b/drivers/gpu/drm/amd/amdgpu/si_dpm.c
index da58040fdbdc..41e01a7f57a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/si_dpm.c
+++ b/drivers/gpu/drm/amd/amdgpu/si_dpm.c
@@ -6216,10 +6216,12 @@ static void 
si_request_link_speed_change_before_state_change(struct amdgpu_devic
si_pi->force_pcie_gen = AMDGPU_PCIE_GEN2;
if (current_link_speed == AMDGPU_PCIE_GEN2)
break;
+   /* fall through */
case AMDGPU_PCIE_GEN2:
if (amdgpu_acpi_pcie_performance_request(adev, 
PCIE_PERF_REQ_PECI_GEN2, false) == 0)
break;
 #endif
+   /* fall through */
default:
si_pi->force_pcie_gen = si_get_current_pcie_speed(adev);
break;
diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c 
b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
index a1c56f29cfeb..fd5266a58297 100644
--- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
+++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
@@ -265,6 +265,7 @@ static struct atom_display_object_path_v2 *get_bios_object(
&& id.enum_id == obj_id.enum_id)
return 
&bp->object_info_tbl.v1_4->display_path[i];
}
+   /* fall through */
case OBJECT_TYPE_CONNECTOR:
case OBJECT_TYPE_GENERIC:
/* Both Generic and Connector Object ID
@@ -277,6 +278,7 @@ static struct atom_display_object_path_v2 *get_bios_object(
&& id.enum_id == obj_id.enum_id)
return 
&bp->object_info_tbl.v1_4->display_path[i];
}
+   /* fall through */
default:
return NULL;
}
diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c 
b/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c
index 85686d917636..a24a2bda8656 100644
--- a/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c
+++ b/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c
@@ -479,7 +479,7 @@ static void program_grph_pixel_format(
case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F:
sign = 1;
floating = 1;
-   /* no break */
+   /* fall through */
case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: /* shouldn't this get 
float too? */
case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616:
grph_depth = 3;
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
index c8f5c00dd1e7..48187acac59e 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
@@ -3681,10 +3681,12 @@ static int 
smu7_request_link_speed_change_before_state_change(
data->force_pcie_gen = PP_PCIEGen2;
if (current_link_speed == PP_PCIEGen2)
break;
+   /* fall through */

Re: [PATCH] gpu: drm: radeon: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime

2019-02-15 Thread Alex Deucher via amd-gfx

On Fri, Feb 15, 2019 at 10:39 AM Rafael J. Wysocki  wrote:
>
> From: Rafael J. Wysocki 
>
> On HP ProBook 4540s, if PM-runtime is enabled in the radeon driver
> and the direct-complete optimization is used for the radeon device
> during system-wide suspend, the system doesn't resume.
>
> Preventing direct-complete from being used with the radeon device by
> setting the DPM_FLAG_NEVER_SKIP driver flag for it makes the problem
> go away, which indicates that direct-complete is not safe for the
> radeon driver in general and should not be used with it (at least
> for now).
>
> This fixes a regression introduced by commit c62ec4610c40
> ("PM / core: Fix direct_complete handling for devices with no
> callbacks") which allowed direct-complete to be applied to
> devices without PM callbacks (again) which in turn unlocked
> direct-complete for radeon on HP ProBook 4540s.

Do other similar drivers like amdgpu and nouveau need the same fix?
I'm not too familiar with the direct_complete feature in general.

Alex

>
> Fixes: c62ec4610c40 ("PM / core: Fix direct_complete handling for devices 
> with no callbacks")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=201519
> Reported-by: Ярослав Семченко 
> Tested-by: Ярослав Семченко 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  drivers/gpu/drm/radeon/radeon_kms.c |1 +
>  1 file changed, 1 insertion(+)
>
> Index: linux-pm/drivers/gpu/drm/radeon/radeon_kms.c
> ===
> --- linux-pm.orig/drivers/gpu/drm/radeon/radeon_kms.c
> +++ linux-pm/drivers/gpu/drm/radeon/radeon_kms.c
> @@ -172,6 +172,7 @@ int radeon_driver_load_kms(struct drm_de
> }
>
> if (radeon_is_px(dev)) {
> +   dev_pm_set_driver_flags(dev->dev, DPM_FLAG_NEVER_SKIP);
> pm_runtime_use_autosuspend(dev->dev);
> pm_runtime_set_autosuspend_delay(dev->dev, 5000);
> pm_runtime_set_active(dev->dev);
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 02/11] dma-buf: add new dma_fence_chain container v4

2019-02-15 Thread Lionel Landwerlin via amd-gfx


On 15/02/2019 14:32, Koenig, Christian wrote:

Am 15.02.19 um 15:23 schrieb Lionel Landwerlin:

Hi Christian, David,

For timeline semaphore we need points to signaled in order.
I'm struggling to understand how this fence-chain implementation
preserves ordering of the seqnos.

One of the scenario I can see an issue happening is when you have a
timeline with points 1 & 2 and userspace submits for 2 different
engines :
     - first with let's say a blitter style engine on point 2
     - then a 3d style engine on point 1

Yeah, and where exactly is the problem?

Seqno 1 will signal when the 3d style engine finishes work.

And seqno 2 will signal when both seqno 1 is signaled and the blitter
style engine has finished its work.


That's not really how I understood the spec, but I might be wrong.

What makes me thing 1 should be signaled as soon as 2 is signaled
(regardless of whether the fence attached on point 1 is been signaled),
is that the spec defines wait & signal operations in term of the value
of the timeline.


-Lionel




Another scenario would be signaling a timeline with points 1 & 2 with
those points in reverse order in the submission array.

That is actually illegal in the spec, but actually handled gracefully as
well.

E.g. when you add seqno 1 to the syncobj container it will only signal
when 2 is signaled as well.







Regards,
Christian.


-Lionel

On 07/12/2018 09:55, Chunming Zhou wrote:

From: Christian König 

Lockless container implementation similar to a dma_fence_array, but with
only two elements per node and automatic garbage collection.

v2: properly document dma_fence_chain_for_each, add
dma_fence_chain_find_seqno,
  drop prev reference during garbage collection if it's not a
chain fence.
v3: use head and iterator for dma_fence_chain_for_each
v4: fix reference count in dma_fence_chain_enable_signaling

Signed-off-by: Christian König 
---
   drivers/dma-buf/Makefile  |   3 +-
   drivers/dma-buf/dma-fence-chain.c | 241 ++
   include/linux/dma-fence-chain.h   |  81 ++
   3 files changed, 324 insertions(+), 1 deletion(-)
   create mode 100644 drivers/dma-buf/dma-fence-chain.c
   create mode 100644 include/linux/dma-fence-chain.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index 0913a6ccab5a..1f006e083eb9 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,4 +1,5 @@
-obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o
seqno-fence.o
+obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
+ reservation.o seqno-fence.o
   obj-$(CONFIG_SYNC_FILE)    += sync_file.o
   obj-$(CONFIG_SW_SYNC)    += sw_sync.o sync_debug.o
   obj-$(CONFIG_UDMABUF)    += udmabuf.o
diff --git a/drivers/dma-buf/dma-fence-chain.c
b/drivers/dma-buf/dma-fence-chain.c
new file mode 100644
index ..0c5e3c902fa0
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -0,0 +1,241 @@
+/*
+ * fence-chain: chain fences together in a timeline
+ *
+ * Copyright (C) 2018 Advanced Micro Devices, Inc.
+ * Authors:
+ *    Christian König 
+ *
+ * This program is free software; you can redistribute it and/or
modify it
+ * under the terms of the GNU General Public License version 2 as
published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
License for
+ * more details.
+ */
+
+#include 
+
+static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
+
+/**
+ * dma_fence_chain_get_prev - use RCU to get a reference to the
previous fence
+ * @chain: chain node to get the previous node from
+ *
+ * Use dma_fence_get_rcu_safe to get a reference to the previous
fence of the
+ * chain node.
+ */
+static struct dma_fence *dma_fence_chain_get_prev(struct
dma_fence_chain *chain)
+{
+    struct dma_fence *prev;
+
+    rcu_read_lock();
+    prev = dma_fence_get_rcu_safe(&chain->prev);
+    rcu_read_unlock();
+    return prev;
+}
+
+/**
+ * dma_fence_chain_walk - chain walking function
+ * @fence: current chain node
+ *
+ * Walk the chain to the next node. Returns the next fence or NULL
if we are at
+ * the end of the chain. Garbage collects chain nodes which are already
+ * signaled.
+ */
+struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
+{
+    struct dma_fence_chain *chain, *prev_chain;
+    struct dma_fence *prev, *replacement, *tmp;
+
+    chain = to_dma_fence_chain(fence);
+    if (!chain) {
+    dma_fence_put(fence);
+    return NULL;
+    }
+
+    while ((prev = dma_fence_chain_get_prev(chain))) {
+
+    prev_chain = to_dma_fence_chain(prev);
+    if (prev_chain) {
+    if (!dma_fence_is_signaled(prev_chain->fence))
+    break;
+
+    replacement = dma_fence_chain_get_prev(prev_chain);
+    } else {
+

Re: [PATCH 0/5] Clean up TTM mmap offsets

2019-02-15 Thread Hans de Goede via amd-gfx


Hi,

On 2/7/19 9:59 AM, Thomas Zimmermann wrote:

Almost all TTM-based drivers use the same values for the mmap-able
range of BO addresses. Each driver therefore duplicates the
DRM_FILE_PAGE_OFFSET constant. OTOH, the mmap range's size is not
configurable by drivers.

This patch set replaces driver-specific configuration with a single
setup. All code is located within TTM. TTM and GEM share the same
range for mmap-able BOs.

Thomas Zimmermann (5):
   staging/vboxvideo: Use same BO mmap offset as other drivers
   drm/ttm: Define a single DRM_FILE_PAGE_OFFSET constant
   drm/ttm: Remove file_page_offset parameter from ttm_bo_device_init()
   drm/ttm: Quick-test mmap offset in ttm_bo_mmap()
   drm: Use the same mmap-range offset and size for GEM and TTM


The first patch looks good to me:

Reviewed-by: Hans de Goede 

The vboxvideo bits in the other patches look good to me to:

Acked-by: Hans de Goede 

For the other patches in the series.

Regards,

Hans

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] gpu: drm: radeon: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime

2019-02-15 Thread Rafael J. Wysocki

From: Rafael J. Wysocki 

On HP ProBook 4540s, if PM-runtime is enabled in the radeon driver
and the direct-complete optimization is used for the radeon device
during system-wide suspend, the system doesn't resume.

Preventing direct-complete from being used with the radeon device by
setting the DPM_FLAG_NEVER_SKIP driver flag for it makes the problem
go away, which indicates that direct-complete is not safe for the
radeon driver in general and should not be used with it (at least
for now).

This fixes a regression introduced by commit c62ec4610c40
("PM / core: Fix direct_complete handling for devices with no
callbacks") which allowed direct-complete to be applied to
devices without PM callbacks (again) which in turn unlocked
direct-complete for radeon on HP ProBook 4540s.

Fixes: c62ec4610c40 ("PM / core: Fix direct_complete handling for devices with 
no callbacks")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201519
Reported-by: Ярослав Семченко 
Tested-by: Ярослав Семченко 
Signed-off-by: Rafael J. Wysocki 
---
 drivers/gpu/drm/radeon/radeon_kms.c |1 +
 1 file changed, 1 insertion(+)

Index: linux-pm/drivers/gpu/drm/radeon/radeon_kms.c
===
--- linux-pm.orig/drivers/gpu/drm/radeon/radeon_kms.c
+++ linux-pm/drivers/gpu/drm/radeon/radeon_kms.c
@@ -172,6 +172,7 @@ int radeon_driver_load_kms(struct drm_de
}
 
if (radeon_is_px(dev)) {
+   dev_pm_set_driver_flags(dev->dev, DPM_FLAG_NEVER_SKIP);
pm_runtime_use_autosuspend(dev->dev);
pm_runtime_set_autosuspend_delay(dev->dev, 5000);
pm_runtime_set_active(dev->dev);

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] libdrm: Fix issue about differrent domainID but same BDF

2019-02-15 Thread Emil Velikov via amd-gfx

Hi Emily,

Please note that code outside of amdgpu/ is used by all open source drivers.
Thus patches should have dri-deve@ in to/cc as mentioned in CONTRIBUTING

On Thu, 14 Feb 2019 at 07:53, Emily Deng  wrote:
>
> For multiple GPUs which has the same BDF, but has different domain ID,
> the drmOpenByBusid will return the wrong fd when startx.
>
> The reproduce sequence as below:
> 1. Call drmOpenByBusid to open Card0, then will return the right fd0, and the
> fd0 is master privilege;
> 2. Call drmOpenByBusid to open Card1. In function drmOpenByBusid, it will
> open Card0 first, this time, the fd1 for opening Card0 is not master
> privilege, and will call drmSetInterfaceVersion to identify the
> domain ID feature, as the fd1 is not master privilege, then 
> drmSetInterfaceVersion
> will fail, and then won't compare domain ID, then return the wrong fd for 
> Card1.
>
> Solution:
> First loop search the best match fd about drm 1.4.
>
First and foremost, I wish we can stop using using these legacy APIs.
They're fairly fragile and as you can see the are strange things
happening.
We could instead use drmGetDevices2() to gather a list of devices and
pick the one we're interested.

That aside, I think we can do a slightly better fix. Have you tried:
 - resetting the pci_domain_ok=1 on each iteration, and
 - continuing to the next device when the second
drmSetInterfaceVersion() call fails

AFAICT it should produce the same result, while being shorter and faster.

Thanks
-Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] libdrm: Fix issue about differrent domainID but same BDF

2019-02-15 Thread Alex Deucher via amd-gfx

Adding dri-devel.

On Thu, Feb 14, 2019 at 2:53 AM Emily Deng  wrote:
>
> For multiple GPUs which has the same BDF, but has different domain ID,
> the drmOpenByBusid will return the wrong fd when startx.
>
> The reproduce sequence as below:
> 1. Call drmOpenByBusid to open Card0, then will return the right fd0, and the
> fd0 is master privilege;
> 2. Call drmOpenByBusid to open Card1. In function drmOpenByBusid, it will
> open Card0 first, this time, the fd1 for opening Card0 is not master
> privilege, and will call drmSetInterfaceVersion to identify the
> domain ID feature, as the fd1 is not master privilege, then 
> drmSetInterfaceVersion
> will fail, and then won't compare domain ID, then return the wrong fd for 
> Card1.
>
> Solution:
> First loop search the best match fd about drm 1.4.
>
> Signed-off-by: Emily Deng 
> ---
>  xf86drm.c | 23 +++
>  1 file changed, 23 insertions(+)
>
> diff --git a/xf86drm.c b/xf86drm.c
> index 336d64d..b60e029 100644
> --- a/xf86drm.c
> +++ b/xf86drm.c
> @@ -584,11 +584,34 @@ static int drmOpenByBusid(const char *busid, int type)
>  if (base < 0)
>  return -1;
>
> +/* We need to try for 1.4 first for proper PCI domain support */
>  drmMsg("drmOpenByBusid: Searching for BusID %s\n", busid);
>  for (i = base; i < base + DRM_MAX_MINOR; i++) {
>  fd = drmOpenMinor(i, 1, type);
>  drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
>  if (fd >= 0) {
> +sv.drm_di_major = 1;
> +sv.drm_di_minor = 4;
> +sv.drm_dd_major = -1;/* Don't care */
> +sv.drm_dd_minor = -1;/* Don't care */
> +if (!drmSetInterfaceVersion(fd, &sv)) {
> +buf = drmGetBusid(fd);
> +drmMsg("drmOpenByBusid: drmGetBusid reports %s\n", buf);
> +if (buf && drmMatchBusID(buf, busid, 1)) {
> +drmFreeBusid(buf);
> +return fd;
> +}
> +if (buf)
> +drmFreeBusid(buf);
> +}
> +close(fd);
> +}
> +}
> +
> +   for (i = base; i < base + DRM_MAX_MINOR; i++) {
> +fd = drmOpenMinor(i, 1, type);
> +drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
> +if (fd >= 0) {
>  /* We need to try for 1.4 first for proper PCI domain support
>   * and if that fails, we know the kernel is busted
>   */
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 02/11] dma-buf: add new dma_fence_chain container v4

2019-02-15 Thread Lionel Landwerlin via amd-gfx


Hi Christian, David,

For timeline semaphore we need points to signaled in order.
I'm struggling to understand how this fence-chain implementation 
preserves ordering of the seqnos.


One of the scenario I can see an issue happening is when you have a 
timeline with points 1 & 2 and userspace submits for 2 different engines :

    - first with let's say a blitter style engine on point 2
    - then a 3d style engine on point 1

Another scenario would be signaling a timeline with points 1 & 2 with 
those points in reverse order in the submission array.


-Lionel

On 07/12/2018 09:55, Chunming Zhou wrote:

From: Christian König 

Lockless container implementation similar to a dma_fence_array, but with
only two elements per node and automatic garbage collection.

v2: properly document dma_fence_chain_for_each, add dma_fence_chain_find_seqno,
 drop prev reference during garbage collection if it's not a chain fence.
v3: use head and iterator for dma_fence_chain_for_each
v4: fix reference count in dma_fence_chain_enable_signaling

Signed-off-by: Christian König 
---
  drivers/dma-buf/Makefile  |   3 +-
  drivers/dma-buf/dma-fence-chain.c | 241 ++
  include/linux/dma-fence-chain.h   |  81 ++
  3 files changed, 324 insertions(+), 1 deletion(-)
  create mode 100644 drivers/dma-buf/dma-fence-chain.c
  create mode 100644 include/linux/dma-fence-chain.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index 0913a6ccab5a..1f006e083eb9 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,4 +1,5 @@
-obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o seqno-fence.o
+obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
+reservation.o seqno-fence.o
  obj-$(CONFIG_SYNC_FILE)   += sync_file.o
  obj-$(CONFIG_SW_SYNC) += sw_sync.o sync_debug.o
  obj-$(CONFIG_UDMABUF) += udmabuf.o
diff --git a/drivers/dma-buf/dma-fence-chain.c 
b/drivers/dma-buf/dma-fence-chain.c
new file mode 100644
index ..0c5e3c902fa0
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -0,0 +1,241 @@
+/*
+ * fence-chain: chain fences together in a timeline
+ *
+ * Copyright (C) 2018 Advanced Micro Devices, Inc.
+ * Authors:
+ * Christian König 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+
+static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
+
+/**
+ * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
+ * @chain: chain node to get the previous node from
+ *
+ * Use dma_fence_get_rcu_safe to get a reference to the previous fence of the
+ * chain node.
+ */
+static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain 
*chain)
+{
+   struct dma_fence *prev;
+
+   rcu_read_lock();
+   prev = dma_fence_get_rcu_safe(&chain->prev);
+   rcu_read_unlock();
+   return prev;
+}
+
+/**
+ * dma_fence_chain_walk - chain walking function
+ * @fence: current chain node
+ *
+ * Walk the chain to the next node. Returns the next fence or NULL if we are at
+ * the end of the chain. Garbage collects chain nodes which are already
+ * signaled.
+ */
+struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
+{
+   struct dma_fence_chain *chain, *prev_chain;
+   struct dma_fence *prev, *replacement, *tmp;
+
+   chain = to_dma_fence_chain(fence);
+   if (!chain) {
+   dma_fence_put(fence);
+   return NULL;
+   }
+
+   while ((prev = dma_fence_chain_get_prev(chain))) {
+
+   prev_chain = to_dma_fence_chain(prev);
+   if (prev_chain) {
+   if (!dma_fence_is_signaled(prev_chain->fence))
+   break;
+
+   replacement = dma_fence_chain_get_prev(prev_chain);
+   } else {
+   if (!dma_fence_is_signaled(prev))
+   break;
+
+   replacement = NULL;
+   }
+
+   tmp = cmpxchg(&chain->prev, prev, replacement);
+   if (tmp == prev)
+   dma_fence_put(tmp);
+   else
+   dma_fence_put(replacement);
+   dma_fence_put(prev);
+   }
+
+   dma_fence_put(fence);
+   return prev;
+}
+EXPORT_SYMBOL(dma_fence_chain_walk);
+
+/**
+ * dma_fence_chain_find_seqno - find fence chain node by seqno
+ * @pfence: pointer to the chain node where to start
+ * @seqno: the sequence number to search for
+ *
+ * Advance the fence point

Re: [PATCH 09/11] drm/syncobj: add transition iotcls between binary and timeline

2019-02-15 Thread Lionel Landwerlin via amd-gfx


Hi David,

Thanks a lot for point me to the tests you've added in IGT.
While adding a test with that signals fences imported into a timeline 
syncobj out of order, I ran into a deadlock.
Here is the test : 
https://github.com/djdeath/intel-gpu-tools/commit/1e46cf7e7bff09b78a24367ddc2314f97eb0a1b9


Trying to kill the deadlocked process I got this backtrace :


[   33.969136] [IGT] syncobj_timeline: starting subtest signal-order
[   60.452823] watchdog: BUG: soft lockup - CPU#6 stuck for 23s! 
[syncobj_timelin:2021]
[   60.452826] Modules linked in: rfcomm cmac bnep binfmt_misc 
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek 
snd_hda_codec_generic ledtrig_audio sch_fq_codel ib_iser snd_hda_intel 
rdma_cm iw_cm snd_hda_codec ib_cm snd_hda_core snd_hwdep intel_rapl 
snd_pcm ib_core x86_pkg_temp_thermal intel_powerclamp configf
s coretemp iscsi_tcp snd_seq_midi libiscsi_tcp snd_seq_midi_event 
libiscsi kvm_intel scsi_transport_iscsi kvm btusb snd_rawmidi irqbypass 
btrtl intel_cstate intel_rapl_perf btbcm btintel bluetooth snd_seq 
snd_seq_device snd_timer input_leds ecdh_generic snd soundcore mei_me 
mei intel_pch_thermal mac_hid acpi_pad parp
ort_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_decompress 
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
hid_generic usbhid hid i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit 
ghash_clmulni_intel prime_numbers

drm_kms_helper aesni_intel syscopyarea sysfillrect
[   60.452876]  sysimgblt fb_sys_fops aes_x86_64 crypto_simd sdhci_pci 
cryptd drm e1000e glue_helper cqhci sdhci wmi video
[   60.452881] CPU: 6 PID: 2021 Comm: syncobj_timelin Tainted: G 
U    5.0.0-rc5+ #337
[   60.452882] Hardware name:  /NUC6i7KYB, BIOS 
KYSKLi70.86A.0042.2016.0929.1933 09/29/2016

[   60.452886] RIP: 0010:dma_fence_chain_walk+0x22c/0x260
[   60.452888] Code: ff e9 93 fe ff ff 48 8b 45 08 48 8b 40 18 48 85 c0 
74 0c 48 89 ef e8 33 0f 58 00 84 c0 75 23 f0 41 ff 4d 00 0f 88 99 87 2f 
00 <0f> 85 05 fe ff ff 4c 89 ef e8 56 ea ff ff 48 89 d8 5b 5d 41 5c 41
[   60.452888] RSP: 0018:9a5804653ca8 EFLAGS: 00010296 ORIG_RAX: 
ff13
[   60.452889] RAX:  RBX: 8f5690fb2480 RCX: 
8f5690fb2f00
[   60.452890] RDX: 003e3730 RSI:  RDI: 
8f5690fb2180
[   60.452891] RBP: 8f5690fb2180 R08:  R09: 
8f5690fb2eb0
[   60.452891] R10:  R11: 8f5660469860 R12: 
8f5690fb2f68
[   60.452892] R13: 8f5690fb2f00 R14: 0003 R15: 
8f5655a45fc0
[   60.452913] FS:  7fdc5c459980() GS:8f569eb8() 
knlGS:

[   60.452913] CS:  0010 DS:  ES:  CR0: 80050033
[   60.452914] CR2: 7f9d74336dd8 CR3: 00084a67e004 CR4: 
003606e0
[   60.452915] DR0:  DR1:  DR2: 

[   60.452915] DR3:  DR6: fffe0ff0 DR7: 
0400

[   60.452916] Call Trace:
[   60.452958]  drm_syncobj_add_point+0x102/0x160 [drm]
[   60.452965]  ? drm_syncobj_fd_to_handle_ioctl+0x1b0/0x1b0 [drm]
[   60.452971]  drm_syncobj_transfer_ioctl+0x10f/0x180 [drm]
[   60.452978]  drm_ioctl_kernel+0xac/0xf0 [drm]
[   60.452984]  drm_ioctl+0x2eb/0x3b0 [drm]
[   60.452990]  ? drm_syncobj_fd_to_handle_ioctl+0x1b0/0x1b0 [drm]
[   60.452992]  ? sw_sync_ioctl+0x347/0x370
[   60.452994]  do_vfs_ioctl+0xa4/0x640
[   60.452995]  ? __fput+0x134/0x220
[   60.452997]  ? do_fcntl+0x1a5/0x650
[   60.452998]  ksys_ioctl+0x70/0x80
[   60.452999]  __x64_sys_ioctl+0x16/0x20
[   60.453002]  do_syscall_64+0x55/0x110
[   60.453004]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   60.453005] RIP: 0033:0x7fdc5b6e45d7
[   60.453006] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 
48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
[   60.453007] RSP: 002b:7fff25c4d198 EFLAGS: 0206 ORIG_RAX: 
0010
[   60.453008] RAX: ffda RBX:  RCX: 
7fdc5b6e45d7
[   60.453008] RDX: 7fff25c4d200 RSI: c02064cc RDI: 
0003
[   60.453009] RBP: 7fff25c4d1d0 R08:  R09: 
001e
[   60.453010] R10:  R11: 0206 R12: 
563d3959e4d0
[   60.453010] R13: 7fff25c4d620 R14:  R15: 

[   88.447359] watchdog: BUG: soft lockup - CPU#6 stuck for 22s! 
[syncobj_timelin:2021]



-Lionel


On 07/12/2018 09:55, Chunming Zhou wrote:

we need to import/export timeline point

Signed-off-by: Chunming Zhou 
---
  drivers/gpu/drm/drm_internal.h |  4 +++
  drivers/gpu/drm/drm_ioctl.c|  6 
  drivers/gpu/drm/drm_syncobj.c  | 66 ++
  include/uapi/drm/drm.h | 10 ++
  4 files changed, 86 insertions(+)

diff --git a/drivers/gpu/drm/drm_interna

Re: [PATCH 02/11] dma-buf: add new dma_fence_chain container v4

2019-02-15 Thread Koenig, Christian

Am 15.02.19 um 15:23 schrieb Lionel Landwerlin:
> Hi Christian, David,
>
> For timeline semaphore we need points to signaled in order.
> I'm struggling to understand how this fence-chain implementation 
> preserves ordering of the seqnos.
>
> One of the scenario I can see an issue happening is when you have a 
> timeline with points 1 & 2 and userspace submits for 2 different 
> engines :
>     - first with let's say a blitter style engine on point 2
>     - then a 3d style engine on point 1

Yeah, and where exactly is the problem?

Seqno 1 will signal when the 3d style engine finishes work.

And seqno 2 will signal when both seqno 1 is signaled and the blitter 
style engine has finished its work.

> Another scenario would be signaling a timeline with points 1 & 2 with 
> those points in reverse order in the submission array.

That is actually illegal in the spec, but actually handled gracefully as 
well.

E.g. when you add seqno 1 to the syncobj container it will only signal 
when 2 is signaled as well.

Regards,
Christian.

>
> -Lionel
>
> On 07/12/2018 09:55, Chunming Zhou wrote:
>> From: Christian König 
>>
>> Lockless container implementation similar to a dma_fence_array, but with
>> only two elements per node and automatic garbage collection.
>>
>> v2: properly document dma_fence_chain_for_each, add 
>> dma_fence_chain_find_seqno,
>>  drop prev reference during garbage collection if it's not a 
>> chain fence.
>> v3: use head and iterator for dma_fence_chain_for_each
>> v4: fix reference count in dma_fence_chain_enable_signaling
>>
>> Signed-off-by: Christian König 
>> ---
>>   drivers/dma-buf/Makefile  |   3 +-
>>   drivers/dma-buf/dma-fence-chain.c | 241 ++
>>   include/linux/dma-fence-chain.h   |  81 ++
>>   3 files changed, 324 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/dma-buf/dma-fence-chain.c
>>   create mode 100644 include/linux/dma-fence-chain.h
>>
>> diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
>> index 0913a6ccab5a..1f006e083eb9 100644
>> --- a/drivers/dma-buf/Makefile
>> +++ b/drivers/dma-buf/Makefile
>> @@ -1,4 +1,5 @@
>> -obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o 
>> seqno-fence.o
>> +obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
>> + reservation.o seqno-fence.o
>>   obj-$(CONFIG_SYNC_FILE)    += sync_file.o
>>   obj-$(CONFIG_SW_SYNC)    += sw_sync.o sync_debug.o
>>   obj-$(CONFIG_UDMABUF)    += udmabuf.o
>> diff --git a/drivers/dma-buf/dma-fence-chain.c 
>> b/drivers/dma-buf/dma-fence-chain.c
>> new file mode 100644
>> index ..0c5e3c902fa0
>> --- /dev/null
>> +++ b/drivers/dma-buf/dma-fence-chain.c
>> @@ -0,0 +1,241 @@
>> +/*
>> + * fence-chain: chain fences together in a timeline
>> + *
>> + * Copyright (C) 2018 Advanced Micro Devices, Inc.
>> + * Authors:
>> + *    Christian König 
>> + *
>> + * This program is free software; you can redistribute it and/or 
>> modify it
>> + * under the terms of the GNU General Public License version 2 as 
>> published by
>> + * the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful, 
>> but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of 
>> MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>> License for
>> + * more details.
>> + */
>> +
>> +#include 
>> +
>> +static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
>> +
>> +/**
>> + * dma_fence_chain_get_prev - use RCU to get a reference to the 
>> previous fence
>> + * @chain: chain node to get the previous node from
>> + *
>> + * Use dma_fence_get_rcu_safe to get a reference to the previous 
>> fence of the
>> + * chain node.
>> + */
>> +static struct dma_fence *dma_fence_chain_get_prev(struct 
>> dma_fence_chain *chain)
>> +{
>> +    struct dma_fence *prev;
>> +
>> +    rcu_read_lock();
>> +    prev = dma_fence_get_rcu_safe(&chain->prev);
>> +    rcu_read_unlock();
>> +    return prev;
>> +}
>> +
>> +/**
>> + * dma_fence_chain_walk - chain walking function
>> + * @fence: current chain node
>> + *
>> + * Walk the chain to the next node. Returns the next fence or NULL 
>> if we are at
>> + * the end of the chain. Garbage collects chain nodes which are already
>> + * signaled.
>> + */
>> +struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
>> +{
>> +    struct dma_fence_chain *chain, *prev_chain;
>> +    struct dma_fence *prev, *replacement, *tmp;
>> +
>> +    chain = to_dma_fence_chain(fence);
>> +    if (!chain) {
>> +    dma_fence_put(fence);
>> +    return NULL;
>> +    }
>> +
>> +    while ((prev = dma_fence_chain_get_prev(chain))) {
>> +
>> +    prev_chain = to_dma_fence_chain(prev);
>> +    if (prev_chain) {
>> +    if (!dma_fence_is_signaled(prev_chain->fence))
>> +    break;
>> +
>> +    replacement = dma_fence_chain_get_prev(prev_chain);
>> +    } else

RE: [PATCH libdrm] libdrm: Fix issue about differrent domainID but same BDF

2019-02-15 Thread Deng, Emily

Ping ..

Best wishes
Emily Deng

>-Original Message-
>From: Deng, Emily 
>Sent: Friday, February 15, 2019 11:51 AM
>To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH libdrm] libdrm: Fix issue about differrent domainID but 
>same
>BDF
>
>Ping ..
>
>>-Original Message-
>>From: amd-gfx  On Behalf Of
>>Emily Deng
>>Sent: Thursday, February 14, 2019 3:54 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Deng, Emily 
>>Subject: [PATCH libdrm] libdrm: Fix issue about differrent domainID but
>>same BDF
>>
>>For multiple GPUs which has the same BDF, but has different domain ID,
>>the drmOpenByBusid will return the wrong fd when startx.
>>
>>The reproduce sequence as below:
>>1. Call drmOpenByBusid to open Card0, then will return the right fd0,
>>and the
>>fd0 is master privilege;
>>2. Call drmOpenByBusid to open Card1. In function drmOpenByBusid, it
>>will open
>>Card0 first, this time, the fd1 for opening Card0 is not master
>>privilege, and will call drmSetInterfaceVersion to identify the domain
>>ID feature, as the fd1 is not master privilege, then
>>drmSetInterfaceVersion will fail, and then won't compare domain ID, then
>return the wrong fd for Card1.
>>
>>Solution:
>>First loop search the best match fd about drm 1.4.
>>
>>Signed-off-by: Emily Deng 
>>---
>> xf86drm.c | 23 +++
>> 1 file changed, 23 insertions(+)
>>
>>diff --git a/xf86drm.c b/xf86drm.c
>>index 336d64d..b60e029 100644
>>--- a/xf86drm.c
>>+++ b/xf86drm.c
>>@@ -584,11 +584,34 @@ static int drmOpenByBusid(const char *busid, int
>type)
>> if (base < 0)
>> return -1;
>>
>>+/* We need to try for 1.4 first for proper PCI domain support */
>> drmMsg("drmOpenByBusid: Searching for BusID %s\n", busid);
>> for (i = base; i < base + DRM_MAX_MINOR; i++) {
>> fd = drmOpenMinor(i, 1, type);
>> drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
>> if (fd >= 0) {
>>+sv.drm_di_major = 1;
>>+sv.drm_di_minor = 4;
>>+sv.drm_dd_major = -1;/* Don't care */
>>+sv.drm_dd_minor = -1;/* Don't care */
>>+if (!drmSetInterfaceVersion(fd, &sv)) {
>>+buf = drmGetBusid(fd);
>>+drmMsg("drmOpenByBusid: drmGetBusid reports %s\n", buf);
>>+if (buf && drmMatchBusID(buf, busid, 1)) {
>>+drmFreeBusid(buf);
>>+return fd;
>>+}
>>+if (buf)
>>+drmFreeBusid(buf);
>>+}
>>+close(fd);
>>+}
>>+}
>>+
>>+   for (i = base; i < base + DRM_MAX_MINOR; i++) {
>>+fd = drmOpenMinor(i, 1, type);
>>+drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
>>+if (fd >= 0) {
>> /* We need to try for 1.4 first for proper PCI domain support
>>  * and if that fails, we know the kernel is busted
>>  */
>>--
>>2.7.4
>>
>>___
>>amd-gfx mailing list
>>amd-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: remove some old unused dpm helpers

2019-02-15 Thread Michel Dänzer

On 2019-02-14 9:58 p.m., Alex Deucher via amd-gfx wrote:
> Carried over from radeon, but no longer used.
> 
> Signed-off-by: Alex Deucher 

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: remove some old unused dpm helpers

2019-02-15 Thread Christian König via amd-gfx


Am 14.02.19 um 21:58 schrieb Alex Deucher via amd-gfx:

Carried over from radeon, but no longer used.

Signed-off-by: Alex Deucher 


Acked-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c | 88 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h |  9 ---
  2 files changed, 97 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c
index 1c4595562f8f..344967df3137 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c
@@ -184,61 +184,6 @@ u32 amdgpu_dpm_get_vrefresh(struct amdgpu_device *adev)
return vrefresh;
  }
  
-void amdgpu_calculate_u_and_p(u32 i, u32 r_c, u32 p_b,

- u32 *p, u32 *u)
-{
-   u32 b_c = 0;
-   u32 i_c;
-   u32 tmp;
-
-   i_c = (i * r_c) / 100;
-   tmp = i_c >> p_b;
-
-   while (tmp) {
-   b_c++;
-   tmp >>= 1;
-   }
-
-   *u = (b_c + 1) / 2;
-   *p = i_c / (1 << (2 * (*u)));
-}
-
-int amdgpu_calculate_at(u32 t, u32 h, u32 fh, u32 fl, u32 *tl, u32 *th)
-{
-   u32 k, a, ah, al;
-   u32 t1;
-
-   if ((fl == 0) || (fh == 0) || (fl > fh))
-   return -EINVAL;
-
-   k = (100 * fh) / fl;
-   t1 = (t * (k - 100));
-   a = (1000 * (100 * h + t1)) / (1 + (t1 / 100));
-   a = (a + 5) / 10;
-   ah = ((a * t) + 5000) / 1;
-   al = a - ah;
-
-   *th = t - ah;
-   *tl = t + al;
-
-   return 0;
-}
-
-bool amdgpu_is_uvd_state(u32 class, u32 class2)
-{
-   if (class & ATOM_PPLIB_CLASSIFICATION_UVDSTATE)
-   return true;
-   if (class & ATOM_PPLIB_CLASSIFICATION_HD2STATE)
-   return true;
-   if (class & ATOM_PPLIB_CLASSIFICATION_HDSTATE)
-   return true;
-   if (class & ATOM_PPLIB_CLASSIFICATION_SDSTATE)
-   return true;
-   if (class2 & ATOM_PPLIB_CLASSIFICATION2_MVC)
-   return true;
-   return false;
-}
-
  bool amdgpu_is_internal_thermal_sensor(enum amdgpu_int_thermal_type sensor)
  {
switch (sensor) {
@@ -949,39 +894,6 @@ enum amdgpu_pcie_gen amdgpu_get_pcie_gen_support(struct 
amdgpu_device *adev,
return AMDGPU_PCIE_GEN1;
  }
  
-u16 amdgpu_get_pcie_lane_support(struct amdgpu_device *adev,

-u16 asic_lanes,
-u16 default_lanes)
-{
-   switch (asic_lanes) {
-   case 0:
-   default:
-   return default_lanes;
-   case 1:
-   return 1;
-   case 2:
-   return 2;
-   case 4:
-   return 4;
-   case 8:
-   return 8;
-   case 12:
-   return 12;
-   case 16:
-   return 16;
-   }
-}
-
-u8 amdgpu_encode_pci_lane_width(u32 lanes)
-{
-   u8 encoded_lanes[] = { 0, 1, 2, 0, 3, 0, 0, 0, 4, 0, 0, 0, 5, 0, 0, 0, 
6 };
-
-   if (lanes > 16)
-   return 0;
-
-   return encoded_lanes[lanes];
-}
-
  struct amd_vce_state*
  amdgpu_get_vce_clock_state(void *handle, u32 idx)
  {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
index 2f61e9edb1c1..e871e022c129 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
@@ -486,10 +486,6 @@ void amdgpu_dpm_print_ps_status(struct amdgpu_device *adev,
  u32 amdgpu_dpm_get_vblank_time(struct amdgpu_device *adev);
  u32 amdgpu_dpm_get_vrefresh(struct amdgpu_device *adev);
  void amdgpu_dpm_get_active_displays(struct amdgpu_device *adev);
-bool amdgpu_is_uvd_state(u32 class, u32 class2);
-void amdgpu_calculate_u_and_p(u32 i, u32 r_c, u32 p_b,
- u32 *p, u32 *u);
-int amdgpu_calculate_at(u32 t, u32 h, u32 fh, u32 fl, u32 *tl, u32 *th);
  
  bool amdgpu_is_internal_thermal_sensor(enum amdgpu_int_thermal_type sensor);
  
@@ -505,11 +501,6 @@ enum amdgpu_pcie_gen amdgpu_get_pcie_gen_support(struct amdgpu_device *adev,

 enum amdgpu_pcie_gen asic_gen,
 enum amdgpu_pcie_gen 
default_gen);
  
-u16 amdgpu_get_pcie_lane_support(struct amdgpu_device *adev,

-u16 asic_lanes,
-u16 default_lanes);
-u8 amdgpu_encode_pci_lane_width(u32 lanes);
-
  struct amd_vce_state*
  amdgpu_get_vce_clock_state(void *handle, u32 idx);
  


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH AUTOSEL 4.14 23/40] drm/amd/powerplay: OD setting fix on Vega10

2019-02-15 Thread Sasha Levin via amd-gfx

From: Kenneth Feng 

[ Upstream commit 6d87dc97eb3341de3f7b1efa3156cb0e014f4a96 ]

gfxclk for OD setting is limited to 1980M for non-acg
ASICs of Vega10

Signed-off-by: Kenneth Feng 
Reviewed-by: Evan Quan 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 .../powerplay/hwmgr/vega10_processpptables.c  | 22 ++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
index e343df190375..05bb87a54e90 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
@@ -32,6 +32,7 @@
 #include "vega10_pptable.h"
 
 #define NUM_DSPCLK_LEVELS 8
+#define VEGA10_ENGINECLOCK_HARDMAX 198000
 
 static void set_hw_cap(struct pp_hwmgr *hwmgr, bool enable,
enum phm_platform_caps cap)
@@ -258,7 +259,26 @@ static int init_over_drive_limits(
struct pp_hwmgr *hwmgr,
const ATOM_Vega10_POWERPLAYTABLE *powerplay_table)
 {
-   hwmgr->platform_descriptor.overdriveLimit.engineClock =
+   const ATOM_Vega10_GFXCLK_Dependency_Table *gfxclk_dep_table =
+   (const ATOM_Vega10_GFXCLK_Dependency_Table *)
+   (((unsigned long) powerplay_table) +
+   
le16_to_cpu(powerplay_table->usGfxclkDependencyTableOffset));
+   bool is_acg_enabled = false;
+   ATOM_Vega10_GFXCLK_Dependency_Record_V2 *patom_record_v2;
+
+   if (gfxclk_dep_table->ucRevId == 1) {
+   patom_record_v2 =
+   (ATOM_Vega10_GFXCLK_Dependency_Record_V2 
*)gfxclk_dep_table->entries;
+   is_acg_enabled =
+   
(bool)patom_record_v2[gfxclk_dep_table->ucNumEntries-1].ucACGEnable;
+   }
+
+   if (powerplay_table->ulMaxODEngineClock > VEGA10_ENGINECLOCK_HARDMAX &&
+   !is_acg_enabled)
+   hwmgr->platform_descriptor.overdriveLimit.engineClock =
+   VEGA10_ENGINECLOCK_HARDMAX;
+   else
+   hwmgr->platform_descriptor.overdriveLimit.engineClock =
le32_to_cpu(powerplay_table->ulMaxODEngineClock);
hwmgr->platform_descriptor.overdriveLimit.memoryClock =
le32_to_cpu(powerplay_table->ulMaxODMemoryClock);
-- 
2.19.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-15 Thread Mikhail Gavrilov via amd-gfx

On Thu, 14 Feb 2019 at 20:51, Grodzovsky, Andrey
 wrote:
>
> Got it.
>
> Andrey
>

Cool, please don't forget give me patch for testing.


--
Best Regards,
Mike Gavrilov.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH libdrm] libdrm: Fix issue about differrent domainID but same BDF

2019-02-15 Thread Deng, Emily

Ping ..

>-Original Message-
>From: amd-gfx  On Behalf Of Emily
>Deng
>Sent: Thursday, February 14, 2019 3:54 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH libdrm] libdrm: Fix issue about differrent domainID but same
>BDF
>
>For multiple GPUs which has the same BDF, but has different domain ID, the
>drmOpenByBusid will return the wrong fd when startx.
>
>The reproduce sequence as below:
>1. Call drmOpenByBusid to open Card0, then will return the right fd0, and the
>fd0 is master privilege;
>2. Call drmOpenByBusid to open Card1. In function drmOpenByBusid, it will open
>Card0 first, this time, the fd1 for opening Card0 is not master privilege, and 
>will
>call drmSetInterfaceVersion to identify the domain ID feature, as the fd1 is 
>not
>master privilege, then drmSetInterfaceVersion will fail, and then won't compare
>domain ID, then return the wrong fd for Card1.
>
>Solution:
>First loop search the best match fd about drm 1.4.
>
>Signed-off-by: Emily Deng 
>---
> xf86drm.c | 23 +++
> 1 file changed, 23 insertions(+)
>
>diff --git a/xf86drm.c b/xf86drm.c
>index 336d64d..b60e029 100644
>--- a/xf86drm.c
>+++ b/xf86drm.c
>@@ -584,11 +584,34 @@ static int drmOpenByBusid(const char *busid, int type)
> if (base < 0)
> return -1;
>
>+/* We need to try for 1.4 first for proper PCI domain support */
> drmMsg("drmOpenByBusid: Searching for BusID %s\n", busid);
> for (i = base; i < base + DRM_MAX_MINOR; i++) {
> fd = drmOpenMinor(i, 1, type);
> drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
> if (fd >= 0) {
>+sv.drm_di_major = 1;
>+sv.drm_di_minor = 4;
>+sv.drm_dd_major = -1;/* Don't care */
>+sv.drm_dd_minor = -1;/* Don't care */
>+if (!drmSetInterfaceVersion(fd, &sv)) {
>+buf = drmGetBusid(fd);
>+drmMsg("drmOpenByBusid: drmGetBusid reports %s\n", buf);
>+if (buf && drmMatchBusID(buf, busid, 1)) {
>+drmFreeBusid(buf);
>+return fd;
>+}
>+if (buf)
>+drmFreeBusid(buf);
>+}
>+close(fd);
>+}
>+}
>+
>+   for (i = base; i < base + DRM_MAX_MINOR; i++) {
>+fd = drmOpenMinor(i, 1, type);
>+drmMsg("drmOpenByBusid: drmOpenMinor returns %d\n", fd);
>+if (fd >= 0) {
> /* We need to try for 1.4 first for proper PCI domain support
>  * and if that fails, we know the kernel is busted
>  */
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH AUTOSEL 4.19 35/65] drm/amd/powerplay: OD setting fix on Vega10

2019-02-15 Thread Sasha Levin via amd-gfx

From: Kenneth Feng 

[ Upstream commit 6d87dc97eb3341de3f7b1efa3156cb0e014f4a96 ]

gfxclk for OD setting is limited to 1980M for non-acg
ASICs of Vega10

Signed-off-by: Kenneth Feng 
Reviewed-by: Evan Quan 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 .../powerplay/hwmgr/vega10_processpptables.c  | 22 ++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
index 16b1a9cf6cf0..743d3c983082 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
@@ -32,6 +32,7 @@
 #include "vega10_pptable.h"
 
 #define NUM_DSPCLK_LEVELS 8
+#define VEGA10_ENGINECLOCK_HARDMAX 198000
 
 static void set_hw_cap(struct pp_hwmgr *hwmgr, bool enable,
enum phm_platform_caps cap)
@@ -258,7 +259,26 @@ static int init_over_drive_limits(
struct pp_hwmgr *hwmgr,
const ATOM_Vega10_POWERPLAYTABLE *powerplay_table)
 {
-   hwmgr->platform_descriptor.overdriveLimit.engineClock =
+   const ATOM_Vega10_GFXCLK_Dependency_Table *gfxclk_dep_table =
+   (const ATOM_Vega10_GFXCLK_Dependency_Table *)
+   (((unsigned long) powerplay_table) +
+   
le16_to_cpu(powerplay_table->usGfxclkDependencyTableOffset));
+   bool is_acg_enabled = false;
+   ATOM_Vega10_GFXCLK_Dependency_Record_V2 *patom_record_v2;
+
+   if (gfxclk_dep_table->ucRevId == 1) {
+   patom_record_v2 =
+   (ATOM_Vega10_GFXCLK_Dependency_Record_V2 
*)gfxclk_dep_table->entries;
+   is_acg_enabled =
+   
(bool)patom_record_v2[gfxclk_dep_table->ucNumEntries-1].ucACGEnable;
+   }
+
+   if (powerplay_table->ulMaxODEngineClock > VEGA10_ENGINECLOCK_HARDMAX &&
+   !is_acg_enabled)
+   hwmgr->platform_descriptor.overdriveLimit.engineClock =
+   VEGA10_ENGINECLOCK_HARDMAX;
+   else
+   hwmgr->platform_descriptor.overdriveLimit.engineClock =
le32_to_cpu(powerplay_table->ulMaxODEngineClock);
hwmgr->platform_descriptor.overdriveLimit.memoryClock =
le32_to_cpu(powerplay_table->ulMaxODMemoryClock);
-- 
2.19.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 4/4] drm/amdkfd: Optimize out sdma doorbell array in kgd2kfd_shared_resources

2019-02-15 Thread Zhao, Yong

We can directly calculate sdma doorbell indexes in the process doorbell
pages through the doorbell_index structure in amdgpu_device, so no need
to cache them in kgd2kfd_shared_resources any more. This alleviates the
adaptation needs when new SDMA configurations are introduced.

Change-Id: Ic657799856ed0256f36b01e502ef0cab263b1f49
Signed-off-by: Yong Zhao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 41 +--
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 16 +---
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |  4 +-
 3 files changed, 23 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 30e2b371578e..fe1d7368c1e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -131,7 +131,7 @@ static void amdgpu_doorbell_get_kfd_info(struct 
amdgpu_device *adev,
 
 void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
 {
-   int i, n;
+   int i;
int last_valid_bit;
 
if (adev->kfd.dev) {
@@ -142,7 +142,9 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
.gpuvm_size = min(adev->vm_manager.max_pfn
  << AMDGPU_GPU_PAGE_SHIFT,
  AMDGPU_GMC_HOLE_START),
-   .drm_render_minor = adev->ddev->render->index
+   .drm_render_minor = adev->ddev->render->index,
+   .sdma_doorbell_idx = adev->doorbell_index.sdma_engine,
+
};
 
/* this is going to have a few of the MSBs set that we need to
@@ -172,31 +174,6 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
&gpu_resources.doorbell_aperture_size,
&gpu_resources.doorbell_start_offset);
 
-   if (adev->asic_type < CHIP_VEGA10) {
-   kgd2kfd_device_init(adev->kfd.dev, &gpu_resources);
-   return;
-   }
-
-   n = (adev->asic_type < CHIP_VEGA20) ? 2 : 8;
-
-   for (i = 0; i < n; i += 2) {
-   /* On SOC15 the BIF is involved in routing
-* doorbells using the low 12 bits of the
-* address. Communicate the assignments to
-* KFD. KFD uses two doorbell pages per
-* process in case of 64-bit doorbells so we
-* can use each doorbell assignment twice.
-*/
-   gpu_resources.sdma_doorbell[0][i] =
-   adev->doorbell_index.sdma_engine[0] + (i >> 1);
-   gpu_resources.sdma_doorbell[0][i+1] =
-   adev->doorbell_index.sdma_engine[0] + 0x200 + 
(i >> 1);
-   gpu_resources.sdma_doorbell[1][i] =
-   adev->doorbell_index.sdma_engine[1] + (i >> 1);
-   gpu_resources.sdma_doorbell[1][i+1] =
-   adev->doorbell_index.sdma_engine[1] + 0x200 + 
(i >> 1);
-   }
-
/* Since SOC15, BIF starts to statically use the
 * lower 12 bits of doorbell addresses for routing
 * based on settings in registers like
@@ -205,10 +182,12 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
 * 12 bits of its address has to be outside the range
 * set for SDMA, VCN, and IH blocks.
 */
-   gpu_resources.non_cp_doorbells_start =
-   adev->doorbell_index.first_non_cp;
-   gpu_resources.non_cp_doorbells_end =
-   adev->doorbell_index.last_non_cp;
+   if (adev->asic_type >= CHIP_VEGA10) {
+   gpu_resources.non_cp_doorbells_start =
+   adev->doorbell_index.first_non_cp;
+   gpu_resources.non_cp_doorbells_end =
+   adev->doorbell_index.last_non_cp;
+   }
 
kgd2kfd_device_init(adev->kfd.dev, &gpu_resources);
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 8372556b52eb..c6c9530e704e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -134,12 +134,18 @@ static int allocate_doorbell(struct qcm_process_device 
*qpd, struct queue *q)
 */
q->doorbell_id = q->properties.queue_id;
} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA) {
-   /* For SDMA queues on SOC15, use static doorbell
-* assignments based on the engine and queue.
+   /* For SDMA queues

[PATCH AUTOSEL 4.20 43/77] drm/amd/powerplay: OD setting fix on Vega10

2019-02-15 Thread Sasha Levin via amd-gfx

From: Kenneth Feng 

[ Upstream commit 6d87dc97eb3341de3f7b1efa3156cb0e014f4a96 ]

gfxclk for OD setting is limited to 1980M for non-acg
ASICs of Vega10

Signed-off-by: Kenneth Feng 
Reviewed-by: Evan Quan 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 .../powerplay/hwmgr/vega10_processpptables.c  | 22 ++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
index b8747a5c9204..99d596dc0e89 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
@@ -32,6 +32,7 @@
 #include "vega10_pptable.h"
 
 #define NUM_DSPCLK_LEVELS 8
+#define VEGA10_ENGINECLOCK_HARDMAX 198000
 
 static void set_hw_cap(struct pp_hwmgr *hwmgr, bool enable,
enum phm_platform_caps cap)
@@ -258,7 +259,26 @@ static int init_over_drive_limits(
struct pp_hwmgr *hwmgr,
const ATOM_Vega10_POWERPLAYTABLE *powerplay_table)
 {
-   hwmgr->platform_descriptor.overdriveLimit.engineClock =
+   const ATOM_Vega10_GFXCLK_Dependency_Table *gfxclk_dep_table =
+   (const ATOM_Vega10_GFXCLK_Dependency_Table *)
+   (((unsigned long) powerplay_table) +
+   
le16_to_cpu(powerplay_table->usGfxclkDependencyTableOffset));
+   bool is_acg_enabled = false;
+   ATOM_Vega10_GFXCLK_Dependency_Record_V2 *patom_record_v2;
+
+   if (gfxclk_dep_table->ucRevId == 1) {
+   patom_record_v2 =
+   (ATOM_Vega10_GFXCLK_Dependency_Record_V2 
*)gfxclk_dep_table->entries;
+   is_acg_enabled =
+   
(bool)patom_record_v2[gfxclk_dep_table->ucNumEntries-1].ucACGEnable;
+   }
+
+   if (powerplay_table->ulMaxODEngineClock > VEGA10_ENGINECLOCK_HARDMAX &&
+   !is_acg_enabled)
+   hwmgr->platform_descriptor.overdriveLimit.engineClock =
+   VEGA10_ENGINECLOCK_HARDMAX;
+   else
+   hwmgr->platform_descriptor.overdriveLimit.engineClock =
le32_to_cpu(powerplay_table->ulMaxODEngineClock);
hwmgr->platform_descriptor.overdriveLimit.memoryClock =
le32_to_cpu(powerplay_table->ulMaxODMemoryClock);
-- 
2.19.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix reference counting for struct dc_sink.

2019-02-15 Thread Mathias Fröhlich

Alex,

On Wednesday, 13 February 2019 21:38:03 CET Alex Deucher wrote:
> Add amd-gfx and some DC people.

Thanks!!
When I sent, I did not remember that there is an other list for amd!
Up to now I am much more on the MESA side ...

Mathias



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 3/4] drm/amdkfd: Fix bugs regarding CP queue doorbell mask on SOC15

2019-02-15 Thread Zhao, Yong

Reserved doorbells for SDMA IH and VCN were not properly masked out
when allocating doorbells for CP user queues. This patch fixed that.

Change-Id: I670adfc3fd7725d2ed0bd9665cb7f69f8b9023c2
Signed-off-by: Yong Zhao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  | 16 
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h   | 11 +++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c| 14 +-
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 15 ++-
 4 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index e957e42c539a..30e2b371578e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -196,11 +196,19 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
gpu_resources.sdma_doorbell[1][i+1] =
adev->doorbell_index.sdma_engine[1] + 0x200 + 
(i >> 1);
}
-   /* Doorbells 0x0e0-0ff and 0x2e0-2ff are reserved for
-* SDMA, IH and VCN. So don't use them for the CP.
+
+   /* Since SOC15, BIF starts to statically use the
+* lower 12 bits of doorbell addresses for routing
+* based on settings in registers like
+* SDMA0_DOORBELL_RANGE etc..
+* In order to route a doorbell to CP engine, the lower
+* 12 bits of its address has to be outside the range
+* set for SDMA, VCN, and IH blocks.
 */
-   gpu_resources.reserved_doorbell_mask = 0x1e0;
-   gpu_resources.reserved_doorbell_val  = 0x0e0;
+   gpu_resources.non_cp_doorbells_start =
+   adev->doorbell_index.first_non_cp;
+   gpu_resources.non_cp_doorbells_end =
+   adev->doorbell_index.last_non_cp;
 
kgd2kfd_device_init(adev->kfd.dev, &gpu_resources);
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index e5ebcca7f031..03c6d6dc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -103,6 +103,17 @@
 
 #define KFD_KERNEL_QUEUE_SIZE 2048
 
+/*
+ * 512 = 0x200
+ * The doorbell index distance between SDMA RLC (2*i) and (2*i+1) in the
+ * same SDMA engine on SOC15, which has 8-byte doorbells for SDMA.
+ * 512 8-byte doorbell distance (i.e. one page away) ensures that SDMA RLC
+ * (2*i+1) doorbells (in terms of the lower 12 bit address) lie exactly in
+ * the OFFSET and SIZE set in registers like BIF_SDMA0_DOORBELL_RANGE.
+ */
+#define KFD_QUEUE_DOORBELL_MIRROR_OFFSET 512
+
+
 /*
  * Kernel module parameter to specify maximum number of supported queues per
  * device
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 80b36e860a0a..4bdae78bab8e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -607,13 +607,17 @@ static int init_doorbell_bitmap(struct qcm_process_device 
*qpd,
if (!qpd->doorbell_bitmap)
return -ENOMEM;
 
-   /* Mask out any reserved doorbells */
-   for (i = 0; i < KFD_MAX_NUM_OF_QUEUES_PER_PROCESS; i++)
-   if ((dev->shared_resources.reserved_doorbell_mask & i) ==
-   dev->shared_resources.reserved_doorbell_val) {
+   /* Mask out doorbells reserved for SDMA, IH, and VCN on SOC15. */
+   for (i = 0; i < KFD_MAX_NUM_OF_QUEUES_PER_PROCESS / 2; i++) {
+   if (i >= dev->shared_resources.non_cp_doorbells_start
+   && i <= dev->shared_resources.non_cp_doorbells_end) {
set_bit(i, qpd->doorbell_bitmap);
-   pr_debug("reserved doorbell 0x%03x\n", i);
+   set_bit(i + KFD_QUEUE_DOORBELL_MIRROR_OFFSET,
+   qpd->doorbell_bitmap);
+   pr_debug("reserved doorbell 0x%03x and 0x%03x\n", i,
+   i + KFD_QUEUE_DOORBELL_MIRROR_OFFSET);
}
+   }
 
return 0;
 }
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h 
b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 83d960110d23..0b6b34f4e5a1 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -140,17 +140,14 @@ struct kgd2kfd_shared_resources {
/* Doorbell assignments (SOC15 and later chips only). Only
 * specific doorbells are routed to each SDMA engine. Others
 * are routed to IH and VCN. They are not usable by the CP.
-*
-* Any doorbell number D that satisfies the following condition
-* is reserved: (D & reserved_doorbell_mask) == reserved_doorbell_val
-*
-* KFD currently uses 1024 (= 0x

[PATCH 2/4] drm/amdgpu: Add first_non_cp and last_non_cp in amdgpu_doorbell_index

2019-02-15 Thread Zhao, Yong

They will be used to inform KFD the doorbell range not usable for CP.

Change-Id: Icc9167771ad9539d8e31b40058e3b22be825a585
Signed-off-by: Yong Zhao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h | 9 +
 drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c | 4 
 drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c | 4 
 3 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
index 43546500ec26..5587fac671bb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
@@ -69,6 +69,8 @@ struct amdgpu_doorbell_index {
uint32_t vce_ring6_7;
} uvd_vce;
};
+   uint32_t first_non_cp;
+   uint32_t last_non_cp;
uint32_t max_assignment;
/* Per engine SDMA doorbell size in dword */
uint32_t sdma_doorbell_range;
@@ -139,6 +141,10 @@ typedef enum _AMDGPU_VEGA20_DOORBELL_ASSIGNMENT
AMDGPU_VEGA20_DOORBELL64_VCE_RING2_3 = 0x18D,
AMDGPU_VEGA20_DOORBELL64_VCE_RING4_5 = 0x18E,
AMDGPU_VEGA20_DOORBELL64_VCE_RING6_7 = 0x18F,
+
+   AMDGPU_VEGA20_DOORBELL64_FIRST_NON_CP= 
AMDGPU_VEGA20_DOORBELL_sDMA_ENGINE0,
+   AMDGPU_VEGA20_DOORBELL64_LAST_NON_CP = 
AMDGPU_VEGA20_DOORBELL64_VCE_RING6_7,
+
AMDGPU_VEGA20_DOORBELL_MAX_ASSIGNMENT= 0x18F,
AMDGPU_VEGA20_DOORBELL_INVALID   = 0x
 } AMDGPU_VEGA20_DOORBELL_ASSIGNMENT;
@@ -214,6 +220,9 @@ typedef enum _AMDGPU_DOORBELL64_ASSIGNMENT
AMDGPU_DOORBELL64_VCE_RING4_5 = 0xFE,
AMDGPU_DOORBELL64_VCE_RING6_7 = 0xFF,
 
+   AMDGPU_DOORBELL64_FIRST_NON_CP= 
AMDGPU_DOORBELL64_sDMA_ENGINE0,
+   AMDGPU_DOORBELL64_LAST_NON_CP = 
AMDGPU_DOORBELL64_VCE_RING6_7,
+
AMDGPU_DOORBELL64_MAX_ASSIGNMENT  = 0xFF,
AMDGPU_DOORBELL64_INVALID = 0x
 } AMDGPU_DOORBELL64_ASSIGNMENT;
diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c 
b/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c
index 62f49c895314..5e9e53143a8e 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c
@@ -79,6 +79,10 @@ void vega10_doorbell_index_init(struct amdgpu_device *adev)
adev->doorbell_index.uvd_vce.vce_ring2_3 = 
AMDGPU_DOORBELL64_VCE_RING2_3;
adev->doorbell_index.uvd_vce.vce_ring4_5 = 
AMDGPU_DOORBELL64_VCE_RING4_5;
adev->doorbell_index.uvd_vce.vce_ring6_7 = 
AMDGPU_DOORBELL64_VCE_RING6_7;
+
+   adev->doorbell_index.first_non_cp = AMDGPU_DOORBELL64_FIRST_NON_CP;
+   adev->doorbell_index.last_non_cp = AMDGPU_DOORBELL64_LAST_NON_CP;
+
/* In unit of dword doorbell */
adev->doorbell_index.max_assignment = AMDGPU_DOORBELL64_MAX_ASSIGNMENT 
<< 1;
adev->doorbell_index.sdma_doorbell_range = 4;
diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c 
b/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c
index 1271e1702ad4..fb6398e38be9 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c
@@ -83,6 +83,10 @@ void vega20_doorbell_index_init(struct amdgpu_device *adev)
adev->doorbell_index.uvd_vce.vce_ring2_3 = 
AMDGPU_VEGA20_DOORBELL64_VCE_RING2_3;
adev->doorbell_index.uvd_vce.vce_ring4_5 = 
AMDGPU_VEGA20_DOORBELL64_VCE_RING4_5;
adev->doorbell_index.uvd_vce.vce_ring6_7 = 
AMDGPU_VEGA20_DOORBELL64_VCE_RING6_7;
+
+   adev->doorbell_index.first_non_cp = 
AMDGPU_VEGA20_DOORBELL64_FIRST_NON_CP;
+   adev->doorbell_index.last_non_cp = AMDGPU_VEGA20_DOORBELL64_LAST_NON_CP;
+
adev->doorbell_index.max_assignment = 
AMDGPU_VEGA20_DOORBELL_MAX_ASSIGNMENT << 1;
adev->doorbell_index.sdma_doorbell_range = 20;
 }
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/4] drm/amdkfd: Move a constant definition around

2019-02-15 Thread Zhao, Yong

The similar definitions should be consecutive.

Change-Id: I936cf076363e641c60e0704d8405ae9493718e18
Signed-off-by: Yong Zhao 
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 12b66330fc6d..e5ebcca7f031 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -97,17 +97,18 @@
 #define KFD_CWSR_TBA_TMA_SIZE (PAGE_SIZE * 2)
 #define KFD_CWSR_TMA_OFFSET PAGE_SIZE
 
+#define KFD_MAX_NUM_OF_QUEUES_PER_DEVICE   \
+   (KFD_MAX_NUM_OF_PROCESSES * \
+   KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
+
+#define KFD_KERNEL_QUEUE_SIZE 2048
+
 /*
  * Kernel module parameter to specify maximum number of supported queues per
  * device
  */
 extern int max_num_of_queues_per_device;
 
-#define KFD_MAX_NUM_OF_QUEUES_PER_DEVICE   \
-   (KFD_MAX_NUM_OF_PROCESSES * \
-   KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
-
-#define KFD_KERNEL_QUEUE_SIZE 2048
 
 /* Kernel module parameter to specify the scheduling policy */
 extern int sched_policy;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

57 matches

Mail list logo