Re: [PATCH v2] drm/amdgpu: fix scheduler timeout calc

2019-06-27 Thread Cui, Flora

在 6/27/2019 6:17 PM, Christian König 写道:
> Am 27.06.19 um 12:03 schrieb Cui, Flora:
>> scheduler timeout is in jiffies
>> v2: move timeout check to amdgpu_device_get_job_timeout_settings after
>> parsing the value
>>
>> Change-Id: I26708c163db943ff8d930dd81bcab4b4b9d84eb2
>> Signed-off-by: Flora Cui 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 7 +--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index e74a175..cc29d70 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -1300,7 +1300,9 @@ int 
>> amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
>>    * By default timeout for non compute jobs is 1.
>>    * And there is no timeout enforced on compute jobs.
>>    */
>> -    adev->gfx_timeout = adev->sdma_timeout = adev->video_timeout = 
>> 1;
>> +    adev->gfx_timeout = \
>> +    adev->sdma_timeout = \
>> +    adev->video_timeout = msecs_to_jiffies(1);
>
> Of hand that looks very odd to me. This is not a macro so why the \ here?
will update in v3
>
>>   adev->compute_timeout = MAX_SCHEDULE_TIMEOUT;
>>     if (strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENTH)) {
>> @@ -1314,7 +1316,8 @@ int 
>> amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
>>   if (timeout <= 0) {
>>   index++;
>>   continue;
>> -    }
>> +    } else
>> +    timeout = msecs_to_jiffies(timeout);
>
> You can actually remove the "if (timeout <= 0)" as well, 
> msecs_to_jiffies will do the right thing for negative values.
IMHO check for timeout==0 is still needed. msecs_to_jiffies() would 
return 0 and that's not desired for scheduler timer.
>
> Christian.
>
>>     switch (index++) {
>>   case 0:
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: fix MGPU fan boost enablement for XGMI reset

2019-06-27 Thread Evan Quan
MGPU fan boost feature should not be enabled until all the
devices from the same hive are all back from reset.

Change-Id: I03a69434ff28f4eac209bd91320dde8a238a33cf
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  4 ++--
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 7541e1b076b0..9efa0423c242 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1219,6 +1219,10 @@ int amdgpu_dm_display_resume(struct amdgpu_device *adev 
);
 static inline int amdgpu_dm_display_resume(struct amdgpu_device *adev) { 
return 0; }
 #endif
 
+
+void amdgpu_register_gpu_instance(struct amdgpu_device *adev);
+void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev);
+
 #include "amdgpu_object.h"
 
 /* used by df_v3_6.c and amdgpu_pmu.c */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index a2d234c07fc4..f39eb7b37c8b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3558,6 +3558,12 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info 
*hive,
if (vram_lost)

amdgpu_device_fill_reset_magic(tmp_adev);
 
+   /*
+* Add this ASIC as tracked as reset was already
+* complete successfully.
+*/
+   amdgpu_register_gpu_instance(tmp_adev);
+
r = amdgpu_device_ip_late_init(tmp_adev);
if (r)
goto out;
@@ -3692,6 +3698,13 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
device_list_handle = _list;
}
 
+   /*
+* Mark these ASICs to be reseted as untracked first
+* And add them back after reset completed
+*/
+   list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head)
+   amdgpu_unregister_gpu_instance(tmp_adev);
+
/* block all schedulers and reset given job's ring */
list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index ed051fdb509f..e2c9d8d31ed8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -41,7 +41,7 @@
 #include "amdgpu_display.h"
 #include "amdgpu_ras.h"
 
-static void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
+void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
 {
struct amdgpu_gpu_instance *gpu_instance;
int i;
@@ -102,7 +102,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
dev->dev_private = NULL;
 }
 
-static void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
+void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
 {
struct amdgpu_gpu_instance *gpu_instance;
 
-- 
2.21.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/powerplay: use hardware fan control if no powerplay fan table

2019-06-27 Thread Evan Quan
Use SMC default fan table if no external powerplay fan table.

Change-Id: Icd7467a7fc5287a92945ba0fcc19699192b1683a
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c | 4 +++-
 drivers/gpu/drm/amd/powerplay/inc/hwmgr.h   | 1 +
 drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c | 4 
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c
index ae64ff7153d6..1cd5a8b5cdc1 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c
@@ -916,8 +916,10 @@ static int init_thermal_controller(
PHM_PlatformCaps_ThermalController
  );
 
-   if (0 == powerplay_table->usFanTableOffset)
+   if (0 == powerplay_table->usFanTableOffset) {
+   hwmgr->thermal_controller.use_hw_fan_control = 1;
return 0;
+   }
 
fan_table = (const PPTable_Generic_SubTable_Header *)
(((unsigned long)powerplay_table) +
diff --git a/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h 
b/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h
index 2f186fcbdfc5..ec53bf24396e 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h
@@ -697,6 +697,7 @@ struct pp_thermal_controller_info {
uint8_t ucType;
uint8_t ucI2cLine;
uint8_t ucI2cAddress;
+   uint8_t use_hw_fan_control;
struct pp_fan_info fanInfo;
struct pp_advance_fan_control_parameters advanceFanControlParameters;
 };
diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c 
b/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
index fbac2d3326b5..a1a9f6196009 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
@@ -2092,6 +2092,10 @@ static int polaris10_thermal_setup_fan_table(struct 
pp_hwmgr *hwmgr)
return 0;
}
 
+   /* use hardware fan control */
+   if (hwmgr->thermal_controller.use_hw_fan_control)
+   return 0;
+
tmp64 = hwmgr->thermal_controller.advanceFanControlParameters.
usPWMMin * duty100;
do_div(tmp64, 1);
-- 
2.21.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2 07/27] gpu: drm: remove memset after zalloc

2019-06-27 Thread Fuqian Huang
zalloc has already zeroed the memory.
so memset is unneeded.

Signed-off-by: Fuqian Huang 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c   | 2 --
 drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c | 2 --
 drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c| 2 --
 drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c   | 2 --
 drivers/gpu/drm/amd/powerplay/smumgr/tonga_smumgr.c | 2 --
 5 files changed, 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
index fd22b4474dbf..4e6da61d1a93 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
@@ -279,8 +279,6 @@ void *amdgpu_dm_irq_register_interrupt(struct amdgpu_device 
*adev,
return DAL_INVALID_IRQ_HANDLER_IDX;
}
 
-   memset(handler_data, 0, sizeof(*handler_data));
-
init_handler_common_data(handler_data, ih, handler_args, >dm);
 
irq_source = int_params->irq_source;
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c
index ae64ff7153d6..eb7757443bdd 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c
@@ -1065,8 +1065,6 @@ static int pp_tables_v1_0_initialize(struct pp_hwmgr 
*hwmgr)
PP_ASSERT_WITH_CODE((NULL != hwmgr->pptable),
"Failed to allocate hwmgr->pptable!", return 
-ENOMEM);
 
-   memset(hwmgr->pptable, 0x00, sizeof(struct phm_ppt_v1_information));
-
powerplay_table = get_powerplay_table(hwmgr);
 
PP_ASSERT_WITH_CODE((NULL != powerplay_table),
diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c 
b/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c
index 669bd0c2a16c..d55e264c5df5 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c
@@ -2702,8 +2702,6 @@ static int ci_initialize_mc_reg_table(struct pp_hwmgr 
*hwmgr)
cgs_write_register(hwmgr->device, mmMC_SEQ_PMG_CMD_MRS2_LP, 
cgs_read_register(hwmgr->device, mmMC_PMG_CMD_MRS2));
cgs_write_register(hwmgr->device, mmMC_SEQ_WR_CTL_2_LP, 
cgs_read_register(hwmgr->device, mmMC_SEQ_WR_CTL_2));
 
-   memset(table, 0x00, sizeof(pp_atomctrl_mc_reg_table));
-
result = atomctrl_initialize_mc_reg_table(hwmgr, module_index, table);
 
if (0 == result)
diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c 
b/drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c
index 375ccf6ff5f2..c123b4d9c621 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c
@@ -2631,8 +2631,6 @@ static int iceland_initialize_mc_reg_table(struct 
pp_hwmgr *hwmgr)
cgs_write_register(hwmgr->device, mmMC_SEQ_PMG_CMD_MRS2_LP, 
cgs_read_register(hwmgr->device, mmMC_PMG_CMD_MRS2));
cgs_write_register(hwmgr->device, mmMC_SEQ_WR_CTL_2_LP, 
cgs_read_register(hwmgr->device, mmMC_SEQ_WR_CTL_2));
 
-   memset(table, 0x00, sizeof(pp_atomctrl_mc_reg_table));
-
result = atomctrl_initialize_mc_reg_table(hwmgr, module_index, table);
 
if (0 == result)
diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/tonga_smumgr.c 
b/drivers/gpu/drm/amd/powerplay/smumgr/tonga_smumgr.c
index 3ed6c5f1e5cf..60462c7211e3 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/tonga_smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/tonga_smumgr.c
@@ -3114,8 +3114,6 @@ static int tonga_initialize_mc_reg_table(struct pp_hwmgr 
*hwmgr)
cgs_write_register(hwmgr->device, mmMC_SEQ_WR_CTL_2_LP,
cgs_read_register(hwmgr->device, mmMC_SEQ_WR_CTL_2));
 
-   memset(table, 0x00, sizeof(pp_atomctrl_mc_reg_table));
-
result = atomctrl_initialize_mc_reg_table(hwmgr, module_index, table);
 
if (!result)
-- 
2.11.0



[pull] amdgpu, ttm drm-next-5.3

2019-06-27 Thread Alex Deucher
Hi Dave, Daniel,

Arm fix as requested by Dave, plus a few navi fixes.

The following changes since commit 14808a12bdbdc21143eba70ea07830197b3a04ff:

  Merge tag 'drm-next-5.3-2019-06-25' of 
git://people.freedesktop.org/~agd5f/linux into drm-next (2019-06-27 12:33:57 
+1000)

are available in the Git repository at:

  git://people.freedesktop.org/~agd5f/linux tags/drm-next-5.3-2019-06-27

for you to fetch changes up to 440e80ce02cde7b810e4eb555768c2d77e7a27c8:

  drm/amd/display: fix a couple of spelling mistakes (2019-06-27 11:22:57 -0500)


drm-next-5.3-2019-06-27:

amdgpu:
- Fix warning on 32 bit ARM
- Fix compilation on big endian
- Misc bug fixes

ttm:
- Live lock fix


Alex Deucher (2):
  drm/amdgpu: fix warning on 32 bit
  drm/amdgpu: drop copy/paste leftover to fix big endian

Colin Ian King (1):
  drm/amd/display: fix a couple of spelling mistakes

Evan Quan (4):
  drm/amd/powerplay: check prerequisite for VCN power gating
  drm/amd/powerplay: support runtime ppfeatures setting on Navi10
  drm/amd/powerplay: add missing smu_get_clk_info_from_vbios() call
  drm/amd/powerplay: no memory activity support on Vega10

Felix Kuehling (1):
  drm/ttm: return -EBUSY if waiting for busy BO fails

Oak Zeng (1):
  drm/amdgpu: Set queue_preemption_timeout_ms default value

shaoyunl (1):
  drm/amdkfd: remove unnecessary warning message on gpu reset

 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c   |   6 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   |   3 -
 drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c|   4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c  |   1 -
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dsc.c |   2 +-
 drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c  |   8 +-
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c   |   4 +
 drivers/gpu/drm/amd/powerplay/navi10_ppt.c   | 181 ++-
 drivers/gpu/drm/ttm/ttm_bo.c |   2 +-
 10 files changed, 192 insertions(+), 21 deletions(-)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats

2019-06-27 Thread Welty, Brian

On 6/26/2019 11:01 PM, Daniel Vetter wrote:
> On Thu, Jun 27, 2019 at 12:06:13AM -0400, Kenny Ho wrote:
>> On Wed, Jun 26, 2019 at 12:12 PM Daniel Vetter  wrote:
>>>
>>> I think with all the ttm refactoring going on I think we need to de-ttm
>>> the interface functions here a bit. With Gerd Hoffmans series you can just
>>> use a gem_bo pointer here, so what's left to do is have some extracted
>>> structure for tracking memory types. I think Brian Welty has some ideas
>>> for this, even in patch form. Would be good to keep him on cc at least for
>>> the next version. We'd need to explicitly hand in the ttm_mem_reg (or
>>> whatever the specific thing is going to be).
>>
>> I assume Gerd Hoffman's series you are referring to is this one?
>> https://www.spinics.net/lists/dri-devel/msg215056.html
> 
> There's a newer one, much more complete, but yes that's the work.
> 
>> I can certainly keep an eye out for Gerd's refactoring while
>> refactoring other parts of this RFC.
>>
>> I have added Brian and Gerd to the thread for awareness.
> 
> btw just realized that maybe building the interfaces on top of ttm_mem_reg
> is maybe not the best. That's what you're using right now, but in a way
> that's just the ttm internal detail of how the backing storage is
> allocated. I think the structure we need to abstract away is
> ttm_mem_type_manager, without any of the actual management details.
> 

Any de-ttm refactoring should probably not spam all the cgroups folks.
So I removed cgroups list.

As Daniel mentioned, some of us are looking at possible refactoring of TTM
for reuse in i915 driver.
Here is a brief summary of some ideas to be considered:

 1) refactor part of ttm_mem_type_manager into a new drm_mem_type_region.
Really, should then move the array from ttm_bo_device.man[] into drm_device.

Relevant to drm_cgroup, you could then perhaps access these stats through
drm_device and don't need the mem_stats array in drmcgrp_device_resource.

  1a)  doing this right means replacing TTM_PL_XXX memory types with new DRM
 defines.  But could keep the TTM ones as redefinition of (new) DRM ones.
 Probably those private ones (TTM_PL_PRIV) make this difficult.

  All of the above could be eventually leveraged by the vram support being
  implemented now in i915 driver.
  
  2) refactor ttm_mem_reg + ttm_bus_placement into something generic for
 any GEM object,  maybe call it drm_gem_object_placement.
 ttm_mem_reg could remain as a wrapper for TTM drivers.
 This hasn't been broadly discussed with intel-gfx folks, so not sure
 this fits well into i915 or not.

 Relevant to drm_cgroup, maybe this function:
drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
struct ttm_mem_reg *new_mem)
 could potentially become:
drmcgrp_mem_track_move(struct drm_gem_object *old_bo, bool evict,
struct drm_gem_object_placement *new_place)

 Though from ttm_mem_reg, you look to only be using mem_type and size.
 I think Daniel is noting that ttm_mem_reg wasn't truly needed here, so
 you could just pass in the mem_type and size instead.

Would appreciate any feedback (positive or negative) on above
Perhaps this should move to a new thread?   I could send out basic RFC
patches for (1) if helpful but as it touches all the TTM drivers, nice to
hear some feedback first.
Anyway, this doesn't necessarily need to block forward progress on drm_cgroup,
as refactoring into common base structures could happen incrementally.

Thanks,
-Brian
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/7] drm/dp_mst: Register AUX devices for MST ports

2019-06-27 Thread Li, Sun peng (Leo)

Sorry for the late response! just jumping back on this now.

On 2019-05-16 5:40 p.m., Lyude Paul wrote:
> [CAUTION: External Email]
> 
> So a couple of things:
> 
> On Thu, 2019-05-16 at 11:17 -0400, sunpeng...@amd.com wrote:
>> From: Ville Syrjälä 
>>
>> All available downstream ports - physical and logical - are exposed for
>> each MST device. They are listed in /dev/, following the same naming
>> scheme as SST devices by appending an incremental ID.
>>
>> Although all downstream ports are exposed, only some will work as
>> expected. Consider the following topology:
>>
>> +-+
>> |  ASIC   |
>> +-+
>>Conn-0|
>>  |
>> +v+
>>+| MST HUB |+
>>|+-+|
>>|   |
>>|Port-1   Port-2|
>>  +-v-+   +-v-+
>>  |  MST  |   |  SST  |
>>  |  Display  |   |  Display  |
>>  +---+   +---+
>>|Port-1
>>x
>>
>>   MST Path  | MST Device
>>   --+--
>>   sst:0 | MST Hub
>>   mst:0-1   | MST Display
>>   mst:0-1-1 | MST Display's disconnected DP out
>>   mst:0-1-8 | MST Display's internal sink
>>   mst:0-2   | SST Display
>>
>> On certain MST displays, the upstream physical port will ACK DPCD reads.
>> However, reads on the local logical port to the internal sink will
>> *NAK*. i.e. reading mst:0-1 ACKs, but mst:0-1-8 NAKs.
>>
>> There may also be duplicates. Some displays will return the same GUID
>> when reading DPCD from both mst:0-1 and mst:0-1-8.
>>
>> There are some device-dependent behavior as well. The MST hub used
>> during testing will actually *ACK* read requests on a disconnected
>> physical port, whereas the MST displays will NAK.
>>
>> In light of these discrepancies, it's simpler to expose all downstream
>> ports - both physical and logical - and let the user decide what to use.
>>
>> Cc: Lyude Paul 
>> Signed-off-by: Ville Syrjälä 
>> Signed-off-by: Leo Li 
>> ---
>>   drivers/gpu/drm/drm_dp_aux_dev.c  |  14 -
>>   drivers/gpu/drm/drm_dp_mst_topology.c | 103 +
>> -
>>   include/drm/drm_dp_helper.h   |   4 ++
>>   include/drm/drm_dp_mst_helper.h   |   6 ++
>>   4 files changed, 112 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_dp_aux_dev.c
>> b/drivers/gpu/drm/drm_dp_aux_dev.c
>> index 6d84611..01c02b9 100644
>> --- a/drivers/gpu/drm/drm_dp_aux_dev.c
>> +++ b/drivers/gpu/drm/drm_dp_aux_dev.c
>> @@ -34,6 +34,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>>   #include 
>>   #include 
>>
>> @@ -114,6 +115,7 @@ static ssize_t name_show(struct device *dev,
>>
>>return res;
>>   }
>> +
>>   static DEVICE_ATTR_RO(name);
>>
>>   static struct attribute *drm_dp_aux_attrs[] = {
>> @@ -160,7 +162,11 @@ static ssize_t auxdev_read_iter(struct kiocb *iocb,
>> struct iov_iter *to)
>>break;
>>}
>>
>> - res = drm_dp_dpcd_read(aux_dev->aux, pos, buf, todo);
>> + if (aux_dev->aux->is_remote)
>> + res = drm_dp_mst_dpcd_read(aux_dev->aux, pos, buf,
>> todo);
>> + else
>> + res = drm_dp_dpcd_read(aux_dev->aux, pos, buf, todo);
>> +
> 
> It's still not at all clear to me why we're trying to avoid specifying actual
> callbacks for the aux device. We should remove this part entirely, remove the
> is_remote entry from struct drm_dp_aux, and then just specify our own hook in
> struct drm_dp_aux->transfer().
> 

I'm not sure if this would work well. The existing policy does retries
around the ->transfer() call. Using the same hook will cause a nested
retry - once when calling remote_aux->transfer, and another when calling
real_aux->transfer. The difference is the scope of the retry. The former
replays the entire aux transaction, while the latter replays just the
failed sideband packet. I think having the retry at the packet level
makes more sense. Either way, it shouldn't happen in a nested manner.

In general, we need a way to determine whether a message needs to be
sent via sideband. I'm not sure if there's a better way than setting a
'is_remote' flag?

Leo

>>if (res <= 0)
>>break;
>>
>> @@ -207,7 +213,11 @@ static ssize_t auxdev_write_iter(struct kiocb *iocb,
>> struct iov_iter *from)
>>break;
>>}
>>
>> - res = drm_dp_dpcd_write(aux_dev->aux, pos, buf, todo);
>> + if (aux_dev->aux->is_remote)
>> + res = drm_dp_mst_dpcd_write(aux_dev->aux, pos, buf,
>> todo);
>> + else
>> + res = drm_dp_dpcd_write(aux_dev->aux, pos, buf, todo);
>> +
>>if (res <= 0)
>>break;
>>
>> diff --git 

Re: [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats

2019-06-27 Thread Daniel Vetter
On Thu, Jun 27, 2019 at 04:17:09PM -0400, Kenny Ho wrote:
> On Thu, Jun 27, 2019 at 2:01 AM Daniel Vetter  wrote:
> > btw reminds me: I guess it would be good to have a per-type .total
> > read-only exposed, so that userspace has an idea of how much there is?
> > ttm is trying to be agnostic to the allocator that's used to manage a
> > memory type/resource, so doesn't even know that. But I think something we
> > need to expose to admins, otherwise they can't meaningfully set limits.
> 
> I don't think I understand this bit, do you mean total across multiple
> GPU of the same mem type?  Or do you mean the total available per GPU
> (or something else?)

Total for a given type on a given cpu. E.g. maybe you want to give 50% of
your vram to one cgroup, and the other 50% to the other cgroup. For that
you need to know how much vram you have. And expecting people to lspci and
then look at wikipedia for how much vram that chip should have (or
something like that) isn't great. Hence 0.vram.total, 0.tt.total, and so
on (also for all the other gpu minors ofc).  For system memory we probably
don't want to provide a total, since that's already a value that's easy to
obtain from various sources.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit

2019-06-27 Thread Daniel Vetter
On Thu, Jun 27, 2019 at 02:42:43PM -0400, Kenny Ho wrote:
> On Thu, Jun 27, 2019 at 1:43 AM Daniel Vetter  wrote:
> >
> > On Wed, Jun 26, 2019 at 06:41:32PM -0400, Kenny Ho wrote:
> > > So without the sharing restriction and some kind of ownership
> > > structure, we will have to migrate/change the owner of the buffer when
> > > the cgroup that created the buffer die before the receiving cgroup(s)
> > > and I am not sure how to do that properly at the moment.  1) Should
> > > each cgroup keep track of all the buffers that belongs to it and
> > > migrate?  (Is that efficient?)  2) which cgroup should be the new
> > > owner (and therefore have the limit applied?)  Having the creator
> > > being the owner is kind of natural, but let say the buffer is shared
> > > with 5 other cgroups, which of these 5 cgroups should be the new owner
> > > of the buffer?
> >
> > Different answers:
> >
> > - Do we care if we leak bos like this in a cgroup, if the cgroup
> >   disappears before all the bo are cleaned up?
> >
> > - Just charge the bo to each cgroup it's in? Will be quite a bit more
> >   tracking needed to get that done ...
> That seems to be the approach memcg takes, but as shown by the lwn
> link you sent me from the last rfc (talk from Roman Gushchin), that
> approach is not problem free either.  And wouldn't this approach
> disconnect resource management from the underlying resource one would
> like to control?  For example, if you have 5 MB of memory, you can
> have 5 users using 1 MB each.  But in the charge-everybody approach, a
> 1 MB usage shared 4 times will make it looks like 5MB is used.  So the
> resource being control is no longer 'real' since the amount of
> resource you have is now dynamic and depends on the amount of sharing
> one does.

The problem with memcg is that it's not just the allocation, but a ton of
memory allocated to track these allocations. At least that's my
understanding of the nature of the memcg leak. Made a lot worse by pages
being small and plentiful and shared extremely widely (e.g. it's really
hard to control who gets charged for pagecache allocations, so those
pagecache entries might outlive the memcg forever if you're unlucky).

For us it's just a counter, plus bo sharing is a lot more controlled: On
any reasonable system if you do kill the compositor, then all the clients
go down. And when you do kill a client, the compositor will release all
the shared buffers (and any other resources).

So I think for drmcg we won't have anything near the same resource leak
problem even in theory, and in practice I think the issue is none.

> > - Also, there's the legacy way of sharing a bo, with the FLINK and
> >   GEM_OPEN ioctls. We need to plug these holes too.
> >
> > Just feels like your current solution is technically well-justified, but
> > it completely defeats the point of cgroups/containers and buffer sharing
> > ...
> Um... I am going to get a bit philosophical here and suggest that the
> idea of sharing (especially uncontrolled sharing) is inherently at odd
> with containment.  It's like, if everybody is special, no one is
> special.  Perhaps an alternative is to make this configurable so that
> people can allow sharing knowing the caveat?  And just to be clear,
> the current solution allows for sharing, even between cgroup.

The thing is, why shouldn't we just allow it (with some documented
caveat)?

I mean if all people do is share it as your current patches allow, then
there's nothing funny going on (at least if we go with just leaking the
allocations). If we allow additional sharing, then that's a plus.

And if you want additional containment, that's a different thing: The
entire linux architecture for containers is that a container doesn't
exist. Instead you get a pile of building blocks that all solve different
aspects of what a container needs to do:
- cgroups for resource limits
- namespaces for resource visibility
- selinux/secomp/lsm for resource isolation and access rights

Let's not try to build a drm cgroup control that tries to do more than
what cgroups are meant to solve. If you have a need to restrict the
sharing, imo that should be done with an lsm security hook.

btw for bo sharing, I've found a 3rd sharing path (besides dma-buf and gem
flink): GETCRTC ioctl can also be used (it's the itended goal actually) to
share buffers across processes.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats

2019-06-27 Thread Kenny Ho
On Thu, Jun 27, 2019 at 2:01 AM Daniel Vetter  wrote:
>
> btw reminds me: I guess it would be good to have a per-type .total
> read-only exposed, so that userspace has an idea of how much there is?
> ttm is trying to be agnostic to the allocator that's used to manage a
> memory type/resource, so doesn't even know that. But I think something we
> need to expose to admins, otherwise they can't meaningfully set limits.

I don't think I understand this bit, do you mean total across multiple
GPU of the same mem type?  Or do you mean the total available per GPU
(or something else?)

Regards,
Kenny
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 17/87] gpu: drm: Remove call to memset after kzalloc in iceland_smumgr.c

2019-06-27 Thread Fuqian Huang
kzalloc has already zeroes the memory.
So memset is unneeded.

Signed-off-by: Fuqian Huang 
---
 drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c 
b/drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c
index 375ccf6ff5f2..c123b4d9c621 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c
@@ -2631,8 +2631,6 @@ static int iceland_initialize_mc_reg_table(struct 
pp_hwmgr *hwmgr)
cgs_write_register(hwmgr->device, mmMC_SEQ_PMG_CMD_MRS2_LP, 
cgs_read_register(hwmgr->device, mmMC_PMG_CMD_MRS2));
cgs_write_register(hwmgr->device, mmMC_SEQ_WR_CTL_2_LP, 
cgs_read_register(hwmgr->device, mmMC_SEQ_WR_CTL_2));
 
-   memset(table, 0x00, sizeof(pp_atomctrl_mc_reg_table));
-
result = atomctrl_initialize_mc_reg_table(hwmgr, module_index, table);
 
if (0 == result)
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 14/87] gpu: amd: Remove call to memset after kzalloc

2019-06-27 Thread Fuqian Huang
kzalloc already zerored the memory.
so memset is unneeded.

Signed-off-by: Fuqian Huang 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
index fd22b4474dbf..4e6da61d1a93 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
@@ -279,8 +279,6 @@ void *amdgpu_dm_irq_register_interrupt(struct amdgpu_device 
*adev,
return DAL_INVALID_IRQ_HANDLER_IDX;
}
 
-   memset(handler_data, 0, sizeof(*handler_data));
-
init_handler_common_data(handler_data, ih, handler_args, >dm);
 
irq_source = int_params->irq_source;
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 15/87] gpu: drm: Remove call to memset after kzalloc in process_pptable_v1_0.c

2019-06-27 Thread Fuqian Huang
kzalloc has already zeroes the memory.
So memset is not needed.

Signed-off-by: Fuqian Huang 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c
index ae64ff7153d6..eb7757443bdd 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c
@@ -1065,8 +1065,6 @@ static int pp_tables_v1_0_initialize(struct pp_hwmgr 
*hwmgr)
PP_ASSERT_WITH_CODE((NULL != hwmgr->pptable),
"Failed to allocate hwmgr->pptable!", return 
-ENOMEM);
 
-   memset(hwmgr->pptable, 0x00, sizeof(struct phm_ppt_v1_information));
-
powerplay_table = get_powerplay_table(hwmgr);
 
PP_ASSERT_WITH_CODE((NULL != powerplay_table),
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 16/87] gpu: drm: Remove call to memset after kzalloc

2019-06-27 Thread Fuqian Huang
kzalloc has already zeroes the memory.
So memset is unneeded.

Signed-off-by: Fuqian Huang 
---
 drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c 
b/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c
index 669bd0c2a16c..d55e264c5df5 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c
@@ -2702,8 +2702,6 @@ static int ci_initialize_mc_reg_table(struct pp_hwmgr 
*hwmgr)
cgs_write_register(hwmgr->device, mmMC_SEQ_PMG_CMD_MRS2_LP, 
cgs_read_register(hwmgr->device, mmMC_PMG_CMD_MRS2));
cgs_write_register(hwmgr->device, mmMC_SEQ_WR_CTL_2_LP, 
cgs_read_register(hwmgr->device, mmMC_SEQ_WR_CTL_2));
 
-   memset(table, 0x00, sizeof(pp_atomctrl_mc_reg_table));
-
result = atomctrl_initialize_mc_reg_table(hwmgr, module_index, table);
 
if (0 == result)
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 18/87] gpu: drm: Remove call to memset after kzalloc in tonga_smumgr.c

2019-06-27 Thread Fuqian Huang
kzalloc has already zeroes the memory.
So memset is unneeded.

Signed-off-by: Fuqian Huang 
---
 drivers/gpu/drm/amd/powerplay/smumgr/tonga_smumgr.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/tonga_smumgr.c 
b/drivers/gpu/drm/amd/powerplay/smumgr/tonga_smumgr.c
index 3ed6c5f1e5cf..60462c7211e3 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/tonga_smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/tonga_smumgr.c
@@ -3114,8 +3114,6 @@ static int tonga_initialize_mc_reg_table(struct pp_hwmgr 
*hwmgr)
cgs_write_register(hwmgr->device, mmMC_SEQ_WR_CTL_2_LP,
cgs_read_register(hwmgr->device, mmMC_SEQ_WR_CTL_2));
 
-   memset(table, 0x00, sizeof(pp_atomctrl_mc_reg_table));
-
result = atomctrl_initialize_mc_reg_table(hwmgr, module_index, table);
 
if (!result)
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] dma-buf: cleanup shared fence removal

2019-06-27 Thread Daniel Vetter
On Thu, Jun 27, 2019 at 7:20 PM Koenig, Christian
 wrote:
>
> Am 27.06.19 um 19:10 schrieb Daniel Vetter:
> > On Thu, Jun 27, 2019 at 03:48:06PM +, Koenig, Christian wrote:
> >> Am 27.06.19 um 17:34 schrieb Daniel Vetter:
> >>> On Thu, Jun 27, 2019 at 3:19 PM Christian König
> >>>  wrote:
>  Am 27.06.19 um 12:43 schrieb Daniel Vetter:
> > On Thu, Jun 27, 2019 at 12:18 PM Christian König
> >  wrote:
> >> While freeing up memory it is easier to remove a fence from a 
> >> reservation
> >> object instead of signaling it and installing a new one in all other 
> >> objects.
> >>
> >> Clean this up by adding the removal function to the core reservation 
> >> object
> >> code instead of messing with the internal in amdgpu.
> >>
> >> No functional change.
> >>
> >> Signed-off-by: Christian König 
> >> ---
> >> drivers/dma-buf/reservation.c | 62 
> >> +++
> >> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 45 +-
> >> include/linux/reservation.h   |  3 +-
> >> 3 files changed, 65 insertions(+), 45 deletions(-)
> >>
> >> diff --git a/drivers/dma-buf/reservation.c 
> >> b/drivers/dma-buf/reservation.c
> >> index ef710effedfa..e43a316a005d 100644
> >> --- a/drivers/dma-buf/reservation.c
> >> +++ b/drivers/dma-buf/reservation.c
> >> @@ -231,6 +231,68 @@ void reservation_object_add_shared_fence(struct 
> >> reservation_object *obj,
> >> }
> >> EXPORT_SYMBOL(reservation_object_add_shared_fence);
> >>
> >> +/**
> >> + * reservation_object_remove_shared_fence - remove shared fences
> >> + * @obj: the reservation object
> >> + * @context: the context of the fences to remove
> >> + *
> >> + * Remove shared fences based on their fence context.
> >> + */
> > This needs a serious explanation of "why?". Removing fences without
> > guaranteeing they're all signaled is a good way to create havoc. Your
> > commit message has a few words about what you're doing here, but it
> > still doesn't explain why this is safe and when exactly it should be
> > used.
>  Yeah, I'm not very keen about this either.
> 
>  The key point is the memory is not accessible by the hardware any more
>  because it is freed and removed from the page tables.
> 
>  So further access is prevented and in this special case it is actually
>  valid to do this even if the operation represented by the fence is still
>  ongoing.
> >>> Hm, why do you have to remove these fences then? Can't you just let
> >>> them signal and get collected as usual? As soon as you share buffers
> >>> these fences can get anywhere, so you need to correctly unwind them no
> >>> matter what.
> >>>
> >>> You're kinda still just describing what you're doing, not why.
> >> It is simply more efficient to remove the fence from one reservation
> >> object than to add a new fence to all other reservation objects in the
> >> same process.
> > Ok, you still talk in riddles and don't explain what the goal of this
> > entire dance is, so I went and read the code. Assuming I didn't misread
> > too much:
> >
> > 1. you create a fake fence on a per-process timeline.
> > 2. you attach this liberally to all the bo you're creating on that
> > process
> > 3. the fence never signals on its own, but it has a very magic
> > ->enable_signaling callback which is the only thing that makes this fence
> > switch to signalled in a finite time. Before that it's stuck forever. So
> > quite a bit a schrödinger fence: It's not a real fence (because it fails
> > to signal in bounded time) except when you start to look at it.
> > 4. Looking at the fence triggers eviction, at that point we replace this
> > magic eviction fence with the next set, reacquire buffers and then unblock
> > the kfd process once everything is in shape again.
> >
> > This is soo magic that I really don't think we should
> > encourage people without clue to maybe use this and totally break all
> > fences guarantees.
>
> Yeah, that is correct. But this is completely unrelated to why we want
> to remove the fence.
>
> > If you do want to make sure an optimized version within
> > reservation_object.c, then it should be code which replaces fences iff:
> > - they're the same context
> > - later in the ordering within that context
> > - of the same type (i.e. safe vs shared)
> >
> > That would actually be safe thing to do.
>
> No, that won't work because there is no replacement for the fence in
> question.
>
> See we want to remove the fence because the memory is freed up.
>
> > Also, the above is what I expected when asking "why do you need this", not
> > "we replace fences, its more efficient" I kinda got that from the code :-)
>
> Well I explained the short version why we do this. What you dug up here
> is correct as well, but completely 

Re: [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit

2019-06-27 Thread Kenny Ho
On Thu, Jun 27, 2019 at 1:43 AM Daniel Vetter  wrote:
>
> On Wed, Jun 26, 2019 at 06:41:32PM -0400, Kenny Ho wrote:
> > So without the sharing restriction and some kind of ownership
> > structure, we will have to migrate/change the owner of the buffer when
> > the cgroup that created the buffer die before the receiving cgroup(s)
> > and I am not sure how to do that properly at the moment.  1) Should
> > each cgroup keep track of all the buffers that belongs to it and
> > migrate?  (Is that efficient?)  2) which cgroup should be the new
> > owner (and therefore have the limit applied?)  Having the creator
> > being the owner is kind of natural, but let say the buffer is shared
> > with 5 other cgroups, which of these 5 cgroups should be the new owner
> > of the buffer?
>
> Different answers:
>
> - Do we care if we leak bos like this in a cgroup, if the cgroup
>   disappears before all the bo are cleaned up?
>
> - Just charge the bo to each cgroup it's in? Will be quite a bit more
>   tracking needed to get that done ...
That seems to be the approach memcg takes, but as shown by the lwn
link you sent me from the last rfc (talk from Roman Gushchin), that
approach is not problem free either.  And wouldn't this approach
disconnect resource management from the underlying resource one would
like to control?  For example, if you have 5 MB of memory, you can
have 5 users using 1 MB each.  But in the charge-everybody approach, a
1 MB usage shared 4 times will make it looks like 5MB is used.  So the
resource being control is no longer 'real' since the amount of
resource you have is now dynamic and depends on the amount of sharing
one does.

> - Also, there's the legacy way of sharing a bo, with the FLINK and
>   GEM_OPEN ioctls. We need to plug these holes too.
>
> Just feels like your current solution is technically well-justified, but
> it completely defeats the point of cgroups/containers and buffer sharing
> ...
Um... I am going to get a bit philosophical here and suggest that the
idea of sharing (especially uncontrolled sharing) is inherently at odd
with containment.  It's like, if everybody is special, no one is
special.  Perhaps an alternative is to make this configurable so that
people can allow sharing knowing the caveat?  And just to be clear,
the current solution allows for sharing, even between cgroup.

Regards,
Kenny
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] dma-buf: cleanup shared fence removal

2019-06-27 Thread Koenig, Christian
Am 27.06.19 um 19:10 schrieb Daniel Vetter:
> On Thu, Jun 27, 2019 at 03:48:06PM +, Koenig, Christian wrote:
>> Am 27.06.19 um 17:34 schrieb Daniel Vetter:
>>> On Thu, Jun 27, 2019 at 3:19 PM Christian König
>>>  wrote:
 Am 27.06.19 um 12:43 schrieb Daniel Vetter:
> On Thu, Jun 27, 2019 at 12:18 PM Christian König
>  wrote:
>> While freeing up memory it is easier to remove a fence from a reservation
>> object instead of signaling it and installing a new one in all other 
>> objects.
>>
>> Clean this up by adding the removal function to the core reservation 
>> object
>> code instead of messing with the internal in amdgpu.
>>
>> No functional change.
>>
>> Signed-off-by: Christian König 
>> ---
>> drivers/dma-buf/reservation.c | 62 
>> +++
>> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 45 +-
>> include/linux/reservation.h   |  3 +-
>> 3 files changed, 65 insertions(+), 45 deletions(-)
>>
>> diff --git a/drivers/dma-buf/reservation.c 
>> b/drivers/dma-buf/reservation.c
>> index ef710effedfa..e43a316a005d 100644
>> --- a/drivers/dma-buf/reservation.c
>> +++ b/drivers/dma-buf/reservation.c
>> @@ -231,6 +231,68 @@ void reservation_object_add_shared_fence(struct 
>> reservation_object *obj,
>> }
>> EXPORT_SYMBOL(reservation_object_add_shared_fence);
>>
>> +/**
>> + * reservation_object_remove_shared_fence - remove shared fences
>> + * @obj: the reservation object
>> + * @context: the context of the fences to remove
>> + *
>> + * Remove shared fences based on their fence context.
>> + */
> This needs a serious explanation of "why?". Removing fences without
> guaranteeing they're all signaled is a good way to create havoc. Your
> commit message has a few words about what you're doing here, but it
> still doesn't explain why this is safe and when exactly it should be
> used.
 Yeah, I'm not very keen about this either.

 The key point is the memory is not accessible by the hardware any more
 because it is freed and removed from the page tables.

 So further access is prevented and in this special case it is actually
 valid to do this even if the operation represented by the fence is still
 ongoing.
>>> Hm, why do you have to remove these fences then? Can't you just let
>>> them signal and get collected as usual? As soon as you share buffers
>>> these fences can get anywhere, so you need to correctly unwind them no
>>> matter what.
>>>
>>> You're kinda still just describing what you're doing, not why.
>> It is simply more efficient to remove the fence from one reservation
>> object than to add a new fence to all other reservation objects in the
>> same process.
> Ok, you still talk in riddles and don't explain what the goal of this
> entire dance is, so I went and read the code. Assuming I didn't misread
> too much:
>
> 1. you create a fake fence on a per-process timeline.
> 2. you attach this liberally to all the bo you're creating on that
> process
> 3. the fence never signals on its own, but it has a very magic
> ->enable_signaling callback which is the only thing that makes this fence
> switch to signalled in a finite time. Before that it's stuck forever. So
> quite a bit a schrödinger fence: It's not a real fence (because it fails
> to signal in bounded time) except when you start to look at it.
> 4. Looking at the fence triggers eviction, at that point we replace this
> magic eviction fence with the next set, reacquire buffers and then unblock
> the kfd process once everything is in shape again.
>
> This is soo magic that I really don't think we should
> encourage people without clue to maybe use this and totally break all
> fences guarantees.

Yeah, that is correct. But this is completely unrelated to why we want 
to remove the fence.

> If you do want to make sure an optimized version within
> reservation_object.c, then it should be code which replaces fences iff:
> - they're the same context
> - later in the ordering within that context
> - of the same type (i.e. safe vs shared)
>
> That would actually be safe thing to do.

No, that won't work because there is no replacement for the fence in 
question.

See we want to remove the fence because the memory is freed up.

> Also, the above is what I expected when asking "why do you need this", not
> "we replace fences, its more efficient" I kinda got that from the code :-)

Well I explained the short version why we do this. What you dug up here 
is correct as well, but completely unrelated to removing the fence.

Again, the reason to remove the fence from one reservation object is 
simply that it is faster to remove it from one object than to attach a 
new fence to all other objects.

It's just an optimization,
Christian.

>
> Plus reading this now with 

Re: [PATCH 2/2] dma-buf: cleanup shared fence removal

2019-06-27 Thread Daniel Vetter
On Thu, Jun 27, 2019 at 03:48:06PM +, Koenig, Christian wrote:
> Am 27.06.19 um 17:34 schrieb Daniel Vetter:
> > On Thu, Jun 27, 2019 at 3:19 PM Christian König
> >  wrote:
> >> Am 27.06.19 um 12:43 schrieb Daniel Vetter:
> >>> On Thu, Jun 27, 2019 at 12:18 PM Christian König
> >>>  wrote:
>  While freeing up memory it is easier to remove a fence from a reservation
>  object instead of signaling it and installing a new one in all other 
>  objects.
> 
>  Clean this up by adding the removal function to the core reservation 
>  object
>  code instead of messing with the internal in amdgpu.
> 
>  No functional change.
> 
>  Signed-off-by: Christian König 
>  ---
> drivers/dma-buf/reservation.c | 62 +++
> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 45 +-
> include/linux/reservation.h   |  3 +-
> 3 files changed, 65 insertions(+), 45 deletions(-)
> 
>  diff --git a/drivers/dma-buf/reservation.c 
>  b/drivers/dma-buf/reservation.c
>  index ef710effedfa..e43a316a005d 100644
>  --- a/drivers/dma-buf/reservation.c
>  +++ b/drivers/dma-buf/reservation.c
>  @@ -231,6 +231,68 @@ void reservation_object_add_shared_fence(struct 
>  reservation_object *obj,
> }
> EXPORT_SYMBOL(reservation_object_add_shared_fence);
> 
>  +/**
>  + * reservation_object_remove_shared_fence - remove shared fences
>  + * @obj: the reservation object
>  + * @context: the context of the fences to remove
>  + *
>  + * Remove shared fences based on their fence context.
>  + */
> >>> This needs a serious explanation of "why?". Removing fences without
> >>> guaranteeing they're all signaled is a good way to create havoc. Your
> >>> commit message has a few words about what you're doing here, but it
> >>> still doesn't explain why this is safe and when exactly it should be
> >>> used.
> >> Yeah, I'm not very keen about this either.
> >>
> >> The key point is the memory is not accessible by the hardware any more
> >> because it is freed and removed from the page tables.
> >>
> >> So further access is prevented and in this special case it is actually
> >> valid to do this even if the operation represented by the fence is still
> >> ongoing.
> > Hm, why do you have to remove these fences then? Can't you just let
> > them signal and get collected as usual? As soon as you share buffers
> > these fences can get anywhere, so you need to correctly unwind them no
> > matter what.
> >
> > You're kinda still just describing what you're doing, not why.
> 
> It is simply more efficient to remove the fence from one reservation 
> object than to add a new fence to all other reservation objects in the 
> same process.

Ok, you still talk in riddles and don't explain what the goal of this
entire dance is, so I went and read the code. Assuming I didn't misread
too much:

1. you create a fake fence on a per-process timeline.
2. you attach this liberally to all the bo you're creating on that
process
3. the fence never signals on its own, but it has a very magic
->enable_signaling callback which is the only thing that makes this fence
switch to signalled in a finite time. Before that it's stuck forever. So
quite a bit a schrödinger fence: It's not a real fence (because it fails
to signal in bounded time) except when you start to look at it.
4. Looking at the fence triggers eviction, at that point we replace this
magic eviction fence with the next set, reacquire buffers and then unblock
the kfd process once everything is in shape again.

This is soo magic that I really don't think we should
encourage people without clue to maybe use this and totally break all
fences guarantees.

If you do want to make sure an optimized version within
reservation_object.c, then it should be code which replaces fences iff:
- they're the same context
- later in the ordering within that context
- of the same type (i.e. safe vs shared)

That would actually be safe thing to do.

Also, the above is what I expected when asking "why do you need this", not
"we replace fences, its more efficient" I kinda got that from the code :-)

Plus reading this now with (at least the believe of) understanding what
you're doing, replacing the fences and reattaching the next one of these
magic fences doesn't feel like it's actually making anything faster. Just
more obscure ...

Looking at reservation_object_add_shared_fence it seems to dtrt already.
-Daniel

> >> How should we word this? Something like:
> >>
> >>* Remove shared fences based on their fence context. Removing a fence
> >> before it is signaled is only valid if hardware access is prevented by
> >> some other means like IOMMU or similar virtual memory protection.
> > The part that freaks me out is whether we break the fence chaing
> > anywhere by removing fences. But I guess if you're only removing the
> > 

Re: [PATCH v4 hmm 12/12] mm/hmm: Fix error flows in hmm_invalidate_range_start

2019-06-27 Thread Jason Gunthorpe
On Wed, Jun 26, 2019 at 11:18:23AM -0700, Ralph Campbell wrote:
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index b224ea635a7716..89549eac03d506 100644
> > +++ b/mm/hmm.c
> > @@ -64,7 +64,7 @@ static struct hmm *hmm_get_or_create(struct mm_struct *mm)
> > init_rwsem(>mirrors_sem);
> > hmm->mmu_notifier.ops = NULL;
> > INIT_LIST_HEAD(>ranges);
> > -   mutex_init(>lock);
> > +   spin_lock_init(>ranges_lock);
> > kref_init(>kref);
> > hmm->notifiers = 0;
> > hmm->mm = mm;
> > @@ -144,6 +144,23 @@ static void hmm_release(struct mmu_notifier *mn, 
> > struct mm_struct *mm)
> > hmm_put(hmm);
> >   }
> > +static void notifiers_decrement(struct hmm *hmm)
> > +{
> > +   lockdep_assert_held(>ranges_lock);
> > +
> 
> Why not acquire the lock here and release at the end instead
> of asserting the lock is held?
> It looks like everywhere notifiers_decrement() is called does
> that.

Yes, this is just some left over mistake, thanks

From aa371c720a9e3c632dcd9a6a2c73e325b9b2b98c Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe 
Date: Fri, 7 Jun 2019 12:10:33 -0300
Subject: [PATCH] mm/hmm: Fix error flows in hmm_invalidate_range_start

If the trylock on the hmm->mirrors_sem fails the function will return
without decrementing the notifiers that were previously incremented. Since
the caller will not call invalidate_range_end() on EAGAIN this will result
in notifiers becoming permanently incremented and deadlock.

If the sync_cpu_device_pagetables() required blocking the function will
not return EAGAIN even though the device continues to touch the
pages. This is a violation of the mmu notifier contract.

Switch, and rename, the ranges_lock to a spin lock so we can reliably
obtain it without blocking during error unwind.

The error unwind is necessary since the notifiers count must be held
incremented across the call to sync_cpu_device_pagetables() as we cannot
allow the range to become marked valid by a parallel
invalidate_start/end() pair while doing sync_cpu_device_pagetables().

Signed-off-by: Jason Gunthorpe 
Reviewed-by: Ralph Campbell 
Reviewed-by: Christoph Hellwig 
Tested-by: Philip Yang 
---
v4
 - Move lock into notifiers_decrement() (Ralph)
---
 include/linux/hmm.h |  2 +-
 mm/hmm.c| 69 ++---
 2 files changed, 41 insertions(+), 30 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index bf013e96525771..0fa8ea34ccef6d 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -86,7 +86,7 @@
 struct hmm {
struct mm_struct*mm;
struct kref kref;
-   struct mutexlock;
+   spinlock_t  ranges_lock;
struct list_headranges;
struct list_headmirrors;
struct mmu_notifier mmu_notifier;
diff --git a/mm/hmm.c b/mm/hmm.c
index b224ea635a7716..de35289df20d43 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -64,7 +64,7 @@ static struct hmm *hmm_get_or_create(struct mm_struct *mm)
init_rwsem(>mirrors_sem);
hmm->mmu_notifier.ops = NULL;
INIT_LIST_HEAD(>ranges);
-   mutex_init(>lock);
+   spin_lock_init(>ranges_lock);
kref_init(>kref);
hmm->notifiers = 0;
hmm->mm = mm;
@@ -144,6 +144,25 @@ static void hmm_release(struct mmu_notifier *mn, struct 
mm_struct *mm)
hmm_put(hmm);
 }
 
+static void notifiers_decrement(struct hmm *hmm)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(>ranges_lock, flags);
+   hmm->notifiers--;
+   if (!hmm->notifiers) {
+   struct hmm_range *range;
+
+   list_for_each_entry(range, >ranges, list) {
+   if (range->valid)
+   continue;
+   range->valid = true;
+   }
+   wake_up_all(>wq);
+   }
+   spin_unlock_irqrestore(>ranges_lock, flags);
+}
+
 static int hmm_invalidate_range_start(struct mmu_notifier *mn,
const struct mmu_notifier_range *nrange)
 {
@@ -151,6 +170,7 @@ static int hmm_invalidate_range_start(struct mmu_notifier 
*mn,
struct hmm_mirror *mirror;
struct hmm_update update;
struct hmm_range *range;
+   unsigned long flags;
int ret = 0;
 
if (!kref_get_unless_zero(>kref))
@@ -161,12 +181,7 @@ static int hmm_invalidate_range_start(struct mmu_notifier 
*mn,
update.event = HMM_UPDATE_INVALIDATE;
update.blockable = mmu_notifier_range_blockable(nrange);
 
-   if (mmu_notifier_range_blockable(nrange))
-   mutex_lock(>lock);
-   else if (!mutex_trylock(>lock)) {
-   ret = -EAGAIN;
-   goto out;
-   }
+   spin_lock_irqsave(>ranges_lock, flags);
hmm->notifiers++;
list_for_each_entry(range, >ranges, list) {
if (update.end < range->start || update.start >= range->end)
@@ -174,7 +189,7 @@ static int hmm_invalidate_range_start(struct mmu_notifier 

Re: [PATCH][next[ drm/amd/display: fix a couple of spelling mistakes

2019-06-27 Thread Alex Deucher
On Wed, Jun 26, 2019 at 9:32 AM Colin Ian King  wrote:
>
> On 26/06/2019 14:25, Daniel Stone wrote:
> > Hi Colin,
> >
> > On Wed, 26 Jun 2019 at 14:24, Colin King  wrote:
> >> There are a couple of spelling mistakes in dm_error messages and
> >> a comment. Fix these.
> >
> > Whilst there, you might fix the '[next[' typo in the commit message.
>
> Ugh, fickle fingers. Maybe the upstream devs will fix that before
> applying...

Fixed up and applied.

Thanks,

Alex

>
>
> >
> > Cheers,
> > Daniel
> >
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/2] dma-buf: cleanup shared fence removal

2019-06-27 Thread Koenig, Christian
Am 27.06.19 um 17:34 schrieb Daniel Vetter:
> On Thu, Jun 27, 2019 at 3:19 PM Christian König
>  wrote:
>> Am 27.06.19 um 12:43 schrieb Daniel Vetter:
>>> On Thu, Jun 27, 2019 at 12:18 PM Christian König
>>>  wrote:
 While freeing up memory it is easier to remove a fence from a reservation
 object instead of signaling it and installing a new one in all other 
 objects.

 Clean this up by adding the removal function to the core reservation object
 code instead of messing with the internal in amdgpu.

 No functional change.

 Signed-off-by: Christian König 
 ---
drivers/dma-buf/reservation.c | 62 +++
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 45 +-
include/linux/reservation.h   |  3 +-
3 files changed, 65 insertions(+), 45 deletions(-)

 diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
 index ef710effedfa..e43a316a005d 100644
 --- a/drivers/dma-buf/reservation.c
 +++ b/drivers/dma-buf/reservation.c
 @@ -231,6 +231,68 @@ void reservation_object_add_shared_fence(struct 
 reservation_object *obj,
}
EXPORT_SYMBOL(reservation_object_add_shared_fence);

 +/**
 + * reservation_object_remove_shared_fence - remove shared fences
 + * @obj: the reservation object
 + * @context: the context of the fences to remove
 + *
 + * Remove shared fences based on their fence context.
 + */
>>> This needs a serious explanation of "why?". Removing fences without
>>> guaranteeing they're all signaled is a good way to create havoc. Your
>>> commit message has a few words about what you're doing here, but it
>>> still doesn't explain why this is safe and when exactly it should be
>>> used.
>> Yeah, I'm not very keen about this either.
>>
>> The key point is the memory is not accessible by the hardware any more
>> because it is freed and removed from the page tables.
>>
>> So further access is prevented and in this special case it is actually
>> valid to do this even if the operation represented by the fence is still
>> ongoing.
> Hm, why do you have to remove these fences then? Can't you just let
> them signal and get collected as usual? As soon as you share buffers
> these fences can get anywhere, so you need to correctly unwind them no
> matter what.
>
> You're kinda still just describing what you're doing, not why.

It is simply more efficient to remove the fence from one reservation 
object than to add a new fence to all other reservation objects in the 
same process.

It's just an optimization,
Christian.

>
>> How should we word this? Something like:
>>
>>* Remove shared fences based on their fence context. Removing a fence
>> before it is signaled is only valid if hardware access is prevented by
>> some other means like IOMMU or similar virtual memory protection.
> The part that freaks me out is whether we break the fence chaing
> anywhere by removing fences. But I guess if you're only removing the
> shared fences that shoudl be fine ...
>
>>> Also I'm not sure (depending upon what you use this for) this is
>>> actually correct outside of amdgpu and it's ignorance of the exclusive
>>> fence.
>> No, that is completely unrelated. But I thought that I clean this up
>> before I start to tackle the exclusive fence issue.
> ... except amdgpu has internally a very strange idea of shared fences
> with auto-exclusive promotion. And I think removing exclusive fences
> will break the fence dependencies of other (earlier) dma drivers by
> other operations. Or is that why you filter on the context,
> essentially amd's way of saying "remove all shared fences for this
> thing, keep only the exclusive ones".
>
> I guess I need to read what this actually does in the code, since I
> still have no idea why you need to do this.
> -Daniel
>
>> Christian.
>>
>>> -Daniel
>>>
 +int reservation_object_remove_shared_fence(struct reservation_object *obj,
 +  uint64_t context)
 +{
 +   struct reservation_object_list *old, *new;
 +   unsigned int i, j, k;
 +
 +   reservation_object_assert_held(obj);
 +
 +   old = reservation_object_get_list(obj);
 +   if (!old)
 +   return 0;
 +
 +   new = kmalloc(offsetof(typeof(*new), shared[old->shared_max]),
 + GFP_KERNEL);
 +   if (!new)
 +   return -ENOMEM;
 +
 +   /* Go through all the shared fences in the resevation object and 
 sort
 +* the interesting ones to the end of the new list.
 +*/
 +   for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; 
 ++i) {
 +   struct dma_fence *f;
 +
 +   f = rcu_dereference_protected(old->shared[i],
 + 
 

Re: [PATCH] drm/amd/powerplay: update smu11_driver_if_navi10.h

2019-06-27 Thread Deucher, Alexander
Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Tianci Yin 

Sent: Thursday, June 27, 2019 2:49 AM
To: amd-gfx@lists.freedesktop.org
Cc: Xiao, Jack; Yin, Tianci (Rico); Zhang, Hawking
Subject: [PATCH] drm/amd/powerplay: update smu11_driver_if_navi10.h

From: tiancyin 

update the smu11_driver_if_navi10.h since navi10 smu fw
update to 42.28

Signed-off-by: tiancyin 
---
 drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h | 6 +++---
 drivers/gpu/drm/amd/powerplay/navi10_ppt.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h 
b/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h
index a8b31bc..adbbfeb 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h
@@ -26,7 +26,7 @@
 // *** IMPORTANT ***
 // SMU TEAM: Always increment the interface version if
 // any structure is changed in this file
-#define SMU11_DRIVER_IF_VERSION 0x32
+#define SMU11_DRIVER_IF_VERSION 0x33

 #define PPTABLE_NV10_SMU_VERSION 8

@@ -813,8 +813,8 @@ typedef struct {
   uint16_t UclkAverageLpfTau;
   uint16_t GfxActivityLpfTau;
   uint16_t UclkActivityLpfTau;
+  uint16_t SocketPowerLpfTau;

-  uint16_t Padding;
   // Padding - ignore
   uint32_t MmHubPadding[8]; // SMU internal use
 } DriverSmuConfig_t;
@@ -853,7 +853,7 @@ typedef struct {
   uint8_t  CurrGfxVoltageOffset  ;
   uint8_t  CurrMemVidOffset  ;
   uint8_t  Padding8  ;
-  uint16_t CurrSocketPower   ;
+  uint16_t AverageSocketPower;
   uint16_t TemperatureEdge   ;
   uint16_t TemperatureHotspot;
   uint16_t TemperatureMem;
diff --git a/drivers/gpu/drm/amd/powerplay/navi10_ppt.c 
b/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
index 99566de..373aeba 100644
--- a/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
@@ -863,7 +863,7 @@ static int navi10_get_gpu_power(struct smu_context *smu, 
uint32_t *value)
 if (ret)
 return ret;

-   *value = metrics.CurrSocketPower << 8;
+   *value = metrics.AverageSocketPower << 8;

 return 0;
 }
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] dma-buf: cleanup reservation_object_init/fini

2019-06-27 Thread Daniel Vetter
On Thu, Jun 27, 2019 at 12:18:12PM +0200, Christian König wrote:
> They are not used that often and certainly not in a hot path.
> Make them normal functions instead of an inline.
> 
> Signed-off-by: Christian König 

Reviewed-by: Daniel Vetter 

> ---
>  drivers/dma-buf/reservation.c | 45 ++
>  include/linux/reservation.h   | 46 ++-
>  2 files changed, 47 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 4d32e2c67862..ef710effedfa 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -55,6 +55,51 @@ EXPORT_SYMBOL(reservation_seqcount_class);
>  const char reservation_seqcount_string[] = "reservation_seqcount";
>  EXPORT_SYMBOL(reservation_seqcount_string);
>  
> +/**
> + * reservation_object_init - initialize a reservation object
> + * @obj: the reservation object
> + */
> +void reservation_object_init(struct reservation_object *obj)
> +{
> + ww_mutex_init(>lock, _ww_class);
> +
> + __seqcount_init(>seq, reservation_seqcount_string,
> + _seqcount_class);
> + RCU_INIT_POINTER(obj->fence, NULL);
> + RCU_INIT_POINTER(obj->fence_excl, NULL);
> +}
> +EXPORT_SYMBOL(reservation_object_init);
> +
> +/**
> + * reservation_object_fini - destroys a reservation object
> + * @obj: the reservation object
> + */
> +void reservation_object_fini(struct reservation_object *obj)
> +{
> + int i;
> + struct reservation_object_list *fobj;
> + struct dma_fence *excl;
> +
> + /*
> +  * This object should be dead and all references must have
> +  * been released to it, so no need to be protected with rcu.
> +  */
> + excl = rcu_dereference_protected(obj->fence_excl, 1);
> + if (excl)
> + dma_fence_put(excl);
> +
> + fobj = rcu_dereference_protected(obj->fence, 1);
> + if (fobj) {
> + for (i = 0; i < fobj->shared_count; ++i)
> + 
> dma_fence_put(rcu_dereference_protected(fobj->shared[i], 1));
> +
> + kfree(fobj);
> + }
> +
> + ww_mutex_destroy(>lock);
> +}
> +EXPORT_SYMBOL(reservation_object_fini);
> +
>  /**
>   * reservation_object_reserve_shared - Reserve space to add shared fences to
>   * a reservation_object.
> diff --git a/include/linux/reservation.h b/include/linux/reservation.h
> index ee750765cc94..f47e8196d039 100644
> --- a/include/linux/reservation.h
> +++ b/include/linux/reservation.h
> @@ -81,50 +81,6 @@ struct reservation_object {
>  #define reservation_object_assert_held(obj) \
>   lockdep_assert_held(&(obj)->lock.base)
>  
> -/**
> - * reservation_object_init - initialize a reservation object
> - * @obj: the reservation object
> - */
> -static inline void
> -reservation_object_init(struct reservation_object *obj)
> -{
> - ww_mutex_init(>lock, _ww_class);
> -
> - __seqcount_init(>seq, reservation_seqcount_string, 
> _seqcount_class);
> - RCU_INIT_POINTER(obj->fence, NULL);
> - RCU_INIT_POINTER(obj->fence_excl, NULL);
> -}
> -
> -/**
> - * reservation_object_fini - destroys a reservation object
> - * @obj: the reservation object
> - */
> -static inline void
> -reservation_object_fini(struct reservation_object *obj)
> -{
> - int i;
> - struct reservation_object_list *fobj;
> - struct dma_fence *excl;
> -
> - /*
> -  * This object should be dead and all references must have
> -  * been released to it, so no need to be protected with rcu.
> -  */
> - excl = rcu_dereference_protected(obj->fence_excl, 1);
> - if (excl)
> - dma_fence_put(excl);
> -
> - fobj = rcu_dereference_protected(obj->fence, 1);
> - if (fobj) {
> - for (i = 0; i < fobj->shared_count; ++i)
> - 
> dma_fence_put(rcu_dereference_protected(fobj->shared[i], 1));
> -
> - kfree(fobj);
> - }
> -
> - ww_mutex_destroy(>lock);
> -}
> -
>  /**
>   * reservation_object_get_list - get the reservation object's
>   * shared fence list, with update-side lock held
> @@ -267,6 +223,8 @@ reservation_object_get_excl_rcu(struct reservation_object 
> *obj)
>   return fence;
>  }
>  
> +void reservation_object_init(struct reservation_object *obj);
> +void reservation_object_fini(struct reservation_object *obj);
>  int reservation_object_reserve_shared(struct reservation_object *obj,
> unsigned int num_fences);
>  void reservation_object_add_shared_fence(struct reservation_object *obj,
> -- 
> 2.17.1
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] dma-buf: cleanup shared fence removal

2019-06-27 Thread Daniel Vetter
On Thu, Jun 27, 2019 at 3:19 PM Christian König
 wrote:
>
> Am 27.06.19 um 12:43 schrieb Daniel Vetter:
> > On Thu, Jun 27, 2019 at 12:18 PM Christian König
> >  wrote:
> >> While freeing up memory it is easier to remove a fence from a reservation
> >> object instead of signaling it and installing a new one in all other 
> >> objects.
> >>
> >> Clean this up by adding the removal function to the core reservation object
> >> code instead of messing with the internal in amdgpu.
> >>
> >> No functional change.
> >>
> >> Signed-off-by: Christian König 
> >> ---
> >>   drivers/dma-buf/reservation.c | 62 +++
> >>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 45 +-
> >>   include/linux/reservation.h   |  3 +-
> >>   3 files changed, 65 insertions(+), 45 deletions(-)
> >>
> >> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> >> index ef710effedfa..e43a316a005d 100644
> >> --- a/drivers/dma-buf/reservation.c
> >> +++ b/drivers/dma-buf/reservation.c
> >> @@ -231,6 +231,68 @@ void reservation_object_add_shared_fence(struct 
> >> reservation_object *obj,
> >>   }
> >>   EXPORT_SYMBOL(reservation_object_add_shared_fence);
> >>
> >> +/**
> >> + * reservation_object_remove_shared_fence - remove shared fences
> >> + * @obj: the reservation object
> >> + * @context: the context of the fences to remove
> >> + *
> >> + * Remove shared fences based on their fence context.
> >> + */
> > This needs a serious explanation of "why?". Removing fences without
> > guaranteeing they're all signaled is a good way to create havoc. Your
> > commit message has a few words about what you're doing here, but it
> > still doesn't explain why this is safe and when exactly it should be
> > used.
>
> Yeah, I'm not very keen about this either.
>
> The key point is the memory is not accessible by the hardware any more
> because it is freed and removed from the page tables.
>
> So further access is prevented and in this special case it is actually
> valid to do this even if the operation represented by the fence is still
> ongoing.

Hm, why do you have to remove these fences then? Can't you just let
them signal and get collected as usual? As soon as you share buffers
these fences can get anywhere, so you need to correctly unwind them no
matter what.

You're kinda still just describing what you're doing, not why.

> How should we word this? Something like:
>
>   * Remove shared fences based on their fence context. Removing a fence
> before it is signaled is only valid if hardware access is prevented by
> some other means like IOMMU or similar virtual memory protection.

The part that freaks me out is whether we break the fence chaing
anywhere by removing fences. But I guess if you're only removing the
shared fences that shoudl be fine ...

> > Also I'm not sure (depending upon what you use this for) this is
> > actually correct outside of amdgpu and it's ignorance of the exclusive
> > fence.
>
> No, that is completely unrelated. But I thought that I clean this up
> before I start to tackle the exclusive fence issue.

... except amdgpu has internally a very strange idea of shared fences
with auto-exclusive promotion. And I think removing exclusive fences
will break the fence dependencies of other (earlier) dma drivers by
other operations. Or is that why you filter on the context,
essentially amd's way of saying "remove all shared fences for this
thing, keep only the exclusive ones".

I guess I need to read what this actually does in the code, since I
still have no idea why you need to do this.
-Daniel

> Christian.
>
> > -Daniel
> >
> >> +int reservation_object_remove_shared_fence(struct reservation_object *obj,
> >> +  uint64_t context)
> >> +{
> >> +   struct reservation_object_list *old, *new;
> >> +   unsigned int i, j, k;
> >> +
> >> +   reservation_object_assert_held(obj);
> >> +
> >> +   old = reservation_object_get_list(obj);
> >> +   if (!old)
> >> +   return 0;
> >> +
> >> +   new = kmalloc(offsetof(typeof(*new), shared[old->shared_max]),
> >> + GFP_KERNEL);
> >> +   if (!new)
> >> +   return -ENOMEM;
> >> +
> >> +   /* Go through all the shared fences in the resevation object and 
> >> sort
> >> +* the interesting ones to the end of the new list.
> >> +*/
> >> +   for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; 
> >> ++i) {
> >> +   struct dma_fence *f;
> >> +
> >> +   f = rcu_dereference_protected(old->shared[i],
> >> + 
> >> reservation_object_held(obj));
> >> +
> >> +   if (f->context == context)
> >> +   RCU_INIT_POINTER(new->shared[--j], f);
> >> +   else
> >> +   RCU_INIT_POINTER(new->shared[k++], f);
> >> +   }
> >> +   new->shared_max = old->shared_max;

Re: [PATCH] drm/amdgpu: drop copy/paste leftover to fix big endian

2019-06-27 Thread Alex Deucher
On Thu, Jun 27, 2019 at 10:32 AM Michel Dänzer  wrote:
>
> On 2019-06-27 4:16 p.m., Alex Deucher wrote:
> > The buf swap field doesn't exist on RB1.
> >
> > Signed-off-by: Alex Deucher 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 ---
> >  1 file changed, 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > index 0061a0e8ab78..2932ade7dbd0 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > @@ -2624,9 +2624,6 @@ static int gfx_v10_0_cp_gfx_resume(struct 
> > amdgpu_device *adev)
> >   rb_bufsz = order_base_2(ring->ring_size / 8);
> >   tmp = REG_SET_FIELD(0, CP_RB1_CNTL, RB_BUFSZ, rb_bufsz);
> >   tmp = REG_SET_FIELD(tmp, CP_RB1_CNTL, RB_BLKSZ, rb_bufsz - 2);
> > -#ifdef __BIG_ENDIAN
> > - tmp = REG_SET_FIELD(tmp, CP_RB1_CNTL, BUF_SWAP, 1);
> > -#endif
> >   WREG32_SOC15(GC, 0, mmCP_RB1_CNTL, tmp);
> >   /* Initialize the ring buffer's write pointers */
> >   ring->wptr = 0;
> >
>
> So the RB0 BUF_SWAP bit applies to RB1 as well? Might be nice to clarify
> that in the commit log.

Maybe?  I suspect there is no swapping on the other RB.  I'm not even
sure the swapping works on RB0, but the bits are still there.  Someone
would probably need to validate all of this on BE hardware.

Alex

>
> Anyway,
>
> Reviewed-by: Michel Dänzer 
>
>
> --
> Earthling Michel Dänzer   |  https://www.amd.com
> Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: drop copy/paste leftover to fix big endian

2019-06-27 Thread Michel Dänzer
On 2019-06-27 4:16 p.m., Alex Deucher wrote:
> The buf swap field doesn't exist on RB1.
> 
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 0061a0e8ab78..2932ade7dbd0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -2624,9 +2624,6 @@ static int gfx_v10_0_cp_gfx_resume(struct amdgpu_device 
> *adev)
>   rb_bufsz = order_base_2(ring->ring_size / 8);
>   tmp = REG_SET_FIELD(0, CP_RB1_CNTL, RB_BUFSZ, rb_bufsz);
>   tmp = REG_SET_FIELD(tmp, CP_RB1_CNTL, RB_BLKSZ, rb_bufsz - 2);
> -#ifdef __BIG_ENDIAN
> - tmp = REG_SET_FIELD(tmp, CP_RB1_CNTL, BUF_SWAP, 1);
> -#endif
>   WREG32_SOC15(GC, 0, mmCP_RB1_CNTL, tmp);
>   /* Initialize the ring buffer's write pointers */
>   ring->wptr = 0;
> 

So the RB0 BUF_SWAP bit applies to RB1 as well? Might be nice to clarify
that in the commit log.

Anyway,

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |  https://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: drop copy/paste leftover to fix big endian

2019-06-27 Thread Zhang, Hawking
Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: 2019年6月27日 22:16
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH] drm/amdgpu: drop copy/paste leftover to fix big endian

The buf swap field doesn't exist on RB1.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 0061a0e8ab78..2932ade7dbd0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -2624,9 +2624,6 @@ static int gfx_v10_0_cp_gfx_resume(struct amdgpu_device 
*adev)
rb_bufsz = order_base_2(ring->ring_size / 8);
tmp = REG_SET_FIELD(0, CP_RB1_CNTL, RB_BUFSZ, rb_bufsz);
tmp = REG_SET_FIELD(tmp, CP_RB1_CNTL, RB_BLKSZ, rb_bufsz - 2); -#ifdef 
__BIG_ENDIAN
-   tmp = REG_SET_FIELD(tmp, CP_RB1_CNTL, BUF_SWAP, 1);
-#endif
WREG32_SOC15(GC, 0, mmCP_RB1_CNTL, tmp);
/* Initialize the ring buffer's write pointers */
ring->wptr = 0;
--
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH -next] drm/amdgpu: remove set but not used variable 'psp_enabled'

2019-06-27 Thread YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/gpu/drm/amd/amdgpu/nv.c: In function 'nv_common_early_init':
drivers/gpu/drm/amd/amdgpu/nv.c:471:7: warning:
 variable 'psp_enabled' set but not used [-Wunused-but-set-variable]

It's not used since inroduction in
commit c6b6a42175f5 ("drm/amdgpu: add navi10 common ip block (v3)")

Signed-off-by: YueHaibing 
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index af20ffb55c54..8b9fa3db8daa 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -468,7 +468,6 @@ static const struct amdgpu_asic_funcs nv_asic_funcs =
 
 static int nv_common_early_init(void *handle)
 {
-   bool psp_enabled = false;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
adev->smc_rreg = NULL;
@@ -485,10 +484,6 @@ static int nv_common_early_init(void *handle)
 
adev->asic_funcs = _asic_funcs;
 
-   if (amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_PSP) &&
-   (amdgpu_ip_block_mask & (1 << AMD_IP_BLOCK_TYPE_PSP)))
-   psp_enabled = true;
-
adev->rev_id = nv_get_rev_id(adev);
adev->external_rev_id = 0xff;
switch (adev->asic_type) {



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: drop copy/paste leftover to fix big endian

2019-06-27 Thread Alex Deucher
The buf swap field doesn't exist on RB1.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 0061a0e8ab78..2932ade7dbd0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -2624,9 +2624,6 @@ static int gfx_v10_0_cp_gfx_resume(struct amdgpu_device 
*adev)
rb_bufsz = order_base_2(ring->ring_size / 8);
tmp = REG_SET_FIELD(0, CP_RB1_CNTL, RB_BUFSZ, rb_bufsz);
tmp = REG_SET_FIELD(tmp, CP_RB1_CNTL, RB_BLKSZ, rb_bufsz - 2);
-#ifdef __BIG_ENDIAN
-   tmp = REG_SET_FIELD(tmp, CP_RB1_CNTL, BUF_SWAP, 1);
-#endif
WREG32_SOC15(GC, 0, mmCP_RB1_CNTL, tmp);
/* Initialize the ring buffer's write pointers */
ring->wptr = 0;
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix warning on 32 bit

2019-06-27 Thread Christian König

Am 27.06.19 um 15:50 schrieb Alex Deucher:

Properly cast pointer to int.

Signed-off-by: Alex Deucher 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
index c441e6ce95ec..988c0adaca91 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
@@ -1010,8 +1010,8 @@ static int vcn_v2_0_start_dpg_mode(struct amdgpu_device 
*adev, bool indirect)
  
  	if (indirect)

psp_update_vcn_sram(adev, 0, adev->vcn.dpg_sram_gpu_addr,
-   (uint32_t)((uint64_t)adev->vcn.dpg_sram_curr_addr -
-   (uint64_t)adev->vcn.dpg_sram_cpu_addr));
+   
(uint32_t)((uintptr_t)adev->vcn.dpg_sram_curr_addr -
+  
(uintptr_t)adev->vcn.dpg_sram_cpu_addr));
  
  	/* force RBC into idle state */

rb_bufsz = order_base_2(ring->ring_size);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: fix warning on 32 bit

2019-06-27 Thread Alex Deucher
Properly cast pointer to int.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
index c441e6ce95ec..988c0adaca91 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
@@ -1010,8 +1010,8 @@ static int vcn_v2_0_start_dpg_mode(struct amdgpu_device 
*adev, bool indirect)
 
if (indirect)
psp_update_vcn_sram(adev, 0, adev->vcn.dpg_sram_gpu_addr,
-   (uint32_t)((uint64_t)adev->vcn.dpg_sram_curr_addr -
-   (uint64_t)adev->vcn.dpg_sram_cpu_addr));
+   
(uint32_t)((uintptr_t)adev->vcn.dpg_sram_curr_addr -
+  
(uintptr_t)adev->vcn.dpg_sram_cpu_addr));
 
/* force RBC into idle state */
rb_bufsz = order_base_2(ring->ring_size);
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] dma-buf: cleanup shared fence removal

2019-06-27 Thread Christian König

Am 27.06.19 um 12:43 schrieb Daniel Vetter:

On Thu, Jun 27, 2019 at 12:18 PM Christian König
 wrote:

While freeing up memory it is easier to remove a fence from a reservation
object instead of signaling it and installing a new one in all other objects.

Clean this up by adding the removal function to the core reservation object
code instead of messing with the internal in amdgpu.

No functional change.

Signed-off-by: Christian König 
---
  drivers/dma-buf/reservation.c | 62 +++
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 45 +-
  include/linux/reservation.h   |  3 +-
  3 files changed, 65 insertions(+), 45 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index ef710effedfa..e43a316a005d 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -231,6 +231,68 @@ void reservation_object_add_shared_fence(struct 
reservation_object *obj,
  }
  EXPORT_SYMBOL(reservation_object_add_shared_fence);

+/**
+ * reservation_object_remove_shared_fence - remove shared fences
+ * @obj: the reservation object
+ * @context: the context of the fences to remove
+ *
+ * Remove shared fences based on their fence context.
+ */

This needs a serious explanation of "why?". Removing fences without
guaranteeing they're all signaled is a good way to create havoc. Your
commit message has a few words about what you're doing here, but it
still doesn't explain why this is safe and when exactly it should be
used.


Yeah, I'm not very keen about this either.

The key point is the memory is not accessible by the hardware any more 
because it is freed and removed from the page tables.


So further access is prevented and in this special case it is actually 
valid to do this even if the operation represented by the fence is still 
ongoing.


How should we word this? Something like:

 * Remove shared fences based on their fence context. Removing a fence 
before it is signaled is only valid if hardware access is prevented by 
some other means like IOMMU or similar virtual memory protection.



Also I'm not sure (depending upon what you use this for) this is
actually correct outside of amdgpu and it's ignorance of the exclusive
fence.


No, that is completely unrelated. But I thought that I clean this up 
before I start to tackle the exclusive fence issue.


Christian.


-Daniel


+int reservation_object_remove_shared_fence(struct reservation_object *obj,
+  uint64_t context)
+{
+   struct reservation_object_list *old, *new;
+   unsigned int i, j, k;
+
+   reservation_object_assert_held(obj);
+
+   old = reservation_object_get_list(obj);
+   if (!old)
+   return 0;
+
+   new = kmalloc(offsetof(typeof(*new), shared[old->shared_max]),
+ GFP_KERNEL);
+   if (!new)
+   return -ENOMEM;
+
+   /* Go through all the shared fences in the resevation object and sort
+* the interesting ones to the end of the new list.
+*/
+   for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; ++i) {
+   struct dma_fence *f;
+
+   f = rcu_dereference_protected(old->shared[i],
+ reservation_object_held(obj));
+
+   if (f->context == context)
+   RCU_INIT_POINTER(new->shared[--j], f);
+   else
+   RCU_INIT_POINTER(new->shared[k++], f);
+   }
+   new->shared_max = old->shared_max;
+   new->shared_count = k;
+
+   /* Install the new fence list, seqcount provides the barriers */
+   preempt_disable();
+   write_seqcount_begin(>seq);
+   RCU_INIT_POINTER(obj->fence, new);
+   write_seqcount_end(>seq);
+   preempt_enable();
+
+   /* Drop the references to the removed fences */
+   for (i = j, k = 0; i < old->shared_count; ++i) {
+   struct dma_fence *f;
+
+   f = rcu_dereference_protected(new->shared[i],
+ reservation_object_held(obj));
+   dma_fence_put(f);
+   }
+   kfree_rcu(old, rcu);
+
+   return 0;
+}
+EXPORT_SYMBOL(reservation_object_remove_shared_fence);
+
  /**
   * reservation_object_add_excl_fence - Add an exclusive fence.
   * @obj: the reservation object
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 10abae398e51..9b25ad3d003e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -224,50 +224,7 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct 
amdgpu_bo *bo,
 if (!ef)
 return -EINVAL;

-   old = reservation_object_get_list(resv);
-   if (!old)
-   return 0;
-
-   new = kmalloc(offsetof(typeof(*new), shared[old->shared_max]),
-  

Re: [PATCH 2/2] dma-buf: cleanup shared fence removal

2019-06-27 Thread Daniel Vetter
On Thu, Jun 27, 2019 at 12:18 PM Christian König
 wrote:
>
> While freeing up memory it is easier to remove a fence from a reservation
> object instead of signaling it and installing a new one in all other objects.
>
> Clean this up by adding the removal function to the core reservation object
> code instead of messing with the internal in amdgpu.
>
> No functional change.
>
> Signed-off-by: Christian König 
> ---
>  drivers/dma-buf/reservation.c | 62 +++
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 45 +-
>  include/linux/reservation.h   |  3 +-
>  3 files changed, 65 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index ef710effedfa..e43a316a005d 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -231,6 +231,68 @@ void reservation_object_add_shared_fence(struct 
> reservation_object *obj,
>  }
>  EXPORT_SYMBOL(reservation_object_add_shared_fence);
>
> +/**
> + * reservation_object_remove_shared_fence - remove shared fences
> + * @obj: the reservation object
> + * @context: the context of the fences to remove
> + *
> + * Remove shared fences based on their fence context.
> + */

This needs a serious explanation of "why?". Removing fences without
guaranteeing they're all signaled is a good way to create havoc. Your
commit message has a few words about what you're doing here, but it
still doesn't explain why this is safe and when exactly it should be
used.

Also I'm not sure (depending upon what you use this for) this is
actually correct outside of amdgpu and it's ignorance of the exclusive
fence.
-Daniel

> +int reservation_object_remove_shared_fence(struct reservation_object *obj,
> +  uint64_t context)
> +{
> +   struct reservation_object_list *old, *new;
> +   unsigned int i, j, k;
> +
> +   reservation_object_assert_held(obj);
> +
> +   old = reservation_object_get_list(obj);
> +   if (!old)
> +   return 0;
> +
> +   new = kmalloc(offsetof(typeof(*new), shared[old->shared_max]),
> + GFP_KERNEL);
> +   if (!new)
> +   return -ENOMEM;
> +
> +   /* Go through all the shared fences in the resevation object and sort
> +* the interesting ones to the end of the new list.
> +*/
> +   for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; ++i) 
> {
> +   struct dma_fence *f;
> +
> +   f = rcu_dereference_protected(old->shared[i],
> + reservation_object_held(obj));
> +
> +   if (f->context == context)
> +   RCU_INIT_POINTER(new->shared[--j], f);
> +   else
> +   RCU_INIT_POINTER(new->shared[k++], f);
> +   }
> +   new->shared_max = old->shared_max;
> +   new->shared_count = k;
> +
> +   /* Install the new fence list, seqcount provides the barriers */
> +   preempt_disable();
> +   write_seqcount_begin(>seq);
> +   RCU_INIT_POINTER(obj->fence, new);
> +   write_seqcount_end(>seq);
> +   preempt_enable();
> +
> +   /* Drop the references to the removed fences */
> +   for (i = j, k = 0; i < old->shared_count; ++i) {
> +   struct dma_fence *f;
> +
> +   f = rcu_dereference_protected(new->shared[i],
> + reservation_object_held(obj));
> +   dma_fence_put(f);
> +   }
> +   kfree_rcu(old, rcu);
> +
> +   return 0;
> +}
> +EXPORT_SYMBOL(reservation_object_remove_shared_fence);
> +
>  /**
>   * reservation_object_add_excl_fence - Add an exclusive fence.
>   * @obj: the reservation object
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 10abae398e51..9b25ad3d003e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -224,50 +224,7 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct 
> amdgpu_bo *bo,
> if (!ef)
> return -EINVAL;
>
> -   old = reservation_object_get_list(resv);
> -   if (!old)
> -   return 0;
> -
> -   new = kmalloc(offsetof(typeof(*new), shared[old->shared_max]),
> - GFP_KERNEL);
> -   if (!new)
> -   return -ENOMEM;
> -
> -   /* Go through all the shared fences in the resevation object and sort
> -* the interesting ones to the end of the list.
> -*/
> -   for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; ++i) 
> {
> -   struct dma_fence *f;
> -
> -   f = rcu_dereference_protected(old->shared[i],
> - reservation_object_held(resv));
> -
> -   if (f->context == ef->base.context)
> -   

[PATCH 1/2] dma-buf: cleanup reservation_object_init/fini

2019-06-27 Thread Christian König
They are not used that often and certainly not in a hot path.
Make them normal functions instead of an inline.

Signed-off-by: Christian König 
---
 drivers/dma-buf/reservation.c | 45 ++
 include/linux/reservation.h   | 46 ++-
 2 files changed, 47 insertions(+), 44 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 4d32e2c67862..ef710effedfa 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -55,6 +55,51 @@ EXPORT_SYMBOL(reservation_seqcount_class);
 const char reservation_seqcount_string[] = "reservation_seqcount";
 EXPORT_SYMBOL(reservation_seqcount_string);
 
+/**
+ * reservation_object_init - initialize a reservation object
+ * @obj: the reservation object
+ */
+void reservation_object_init(struct reservation_object *obj)
+{
+   ww_mutex_init(>lock, _ww_class);
+
+   __seqcount_init(>seq, reservation_seqcount_string,
+   _seqcount_class);
+   RCU_INIT_POINTER(obj->fence, NULL);
+   RCU_INIT_POINTER(obj->fence_excl, NULL);
+}
+EXPORT_SYMBOL(reservation_object_init);
+
+/**
+ * reservation_object_fini - destroys a reservation object
+ * @obj: the reservation object
+ */
+void reservation_object_fini(struct reservation_object *obj)
+{
+   int i;
+   struct reservation_object_list *fobj;
+   struct dma_fence *excl;
+
+   /*
+* This object should be dead and all references must have
+* been released to it, so no need to be protected with rcu.
+*/
+   excl = rcu_dereference_protected(obj->fence_excl, 1);
+   if (excl)
+   dma_fence_put(excl);
+
+   fobj = rcu_dereference_protected(obj->fence, 1);
+   if (fobj) {
+   for (i = 0; i < fobj->shared_count; ++i)
+   
dma_fence_put(rcu_dereference_protected(fobj->shared[i], 1));
+
+   kfree(fobj);
+   }
+
+   ww_mutex_destroy(>lock);
+}
+EXPORT_SYMBOL(reservation_object_fini);
+
 /**
  * reservation_object_reserve_shared - Reserve space to add shared fences to
  * a reservation_object.
diff --git a/include/linux/reservation.h b/include/linux/reservation.h
index ee750765cc94..f47e8196d039 100644
--- a/include/linux/reservation.h
+++ b/include/linux/reservation.h
@@ -81,50 +81,6 @@ struct reservation_object {
 #define reservation_object_assert_held(obj) \
lockdep_assert_held(&(obj)->lock.base)
 
-/**
- * reservation_object_init - initialize a reservation object
- * @obj: the reservation object
- */
-static inline void
-reservation_object_init(struct reservation_object *obj)
-{
-   ww_mutex_init(>lock, _ww_class);
-
-   __seqcount_init(>seq, reservation_seqcount_string, 
_seqcount_class);
-   RCU_INIT_POINTER(obj->fence, NULL);
-   RCU_INIT_POINTER(obj->fence_excl, NULL);
-}
-
-/**
- * reservation_object_fini - destroys a reservation object
- * @obj: the reservation object
- */
-static inline void
-reservation_object_fini(struct reservation_object *obj)
-{
-   int i;
-   struct reservation_object_list *fobj;
-   struct dma_fence *excl;
-
-   /*
-* This object should be dead and all references must have
-* been released to it, so no need to be protected with rcu.
-*/
-   excl = rcu_dereference_protected(obj->fence_excl, 1);
-   if (excl)
-   dma_fence_put(excl);
-
-   fobj = rcu_dereference_protected(obj->fence, 1);
-   if (fobj) {
-   for (i = 0; i < fobj->shared_count; ++i)
-   
dma_fence_put(rcu_dereference_protected(fobj->shared[i], 1));
-
-   kfree(fobj);
-   }
-
-   ww_mutex_destroy(>lock);
-}
-
 /**
  * reservation_object_get_list - get the reservation object's
  * shared fence list, with update-side lock held
@@ -267,6 +223,8 @@ reservation_object_get_excl_rcu(struct reservation_object 
*obj)
return fence;
 }
 
+void reservation_object_init(struct reservation_object *obj);
+void reservation_object_fini(struct reservation_object *obj);
 int reservation_object_reserve_shared(struct reservation_object *obj,
  unsigned int num_fences);
 void reservation_object_add_shared_fence(struct reservation_object *obj,
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/2] dma-buf: cleanup shared fence removal

2019-06-27 Thread Christian König
While freeing up memory it is easier to remove a fence from a reservation
object instead of signaling it and installing a new one in all other objects.

Clean this up by adding the removal function to the core reservation object
code instead of messing with the internal in amdgpu.

No functional change.

Signed-off-by: Christian König 
---
 drivers/dma-buf/reservation.c | 62 +++
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 45 +-
 include/linux/reservation.h   |  3 +-
 3 files changed, 65 insertions(+), 45 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index ef710effedfa..e43a316a005d 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -231,6 +231,68 @@ void reservation_object_add_shared_fence(struct 
reservation_object *obj,
 }
 EXPORT_SYMBOL(reservation_object_add_shared_fence);
 
+/**
+ * reservation_object_remove_shared_fence - remove shared fences
+ * @obj: the reservation object
+ * @context: the context of the fences to remove
+ *
+ * Remove shared fences based on their fence context.
+ */
+int reservation_object_remove_shared_fence(struct reservation_object *obj,
+  uint64_t context)
+{
+   struct reservation_object_list *old, *new;
+   unsigned int i, j, k;
+
+   reservation_object_assert_held(obj);
+
+   old = reservation_object_get_list(obj);
+   if (!old)
+   return 0;
+
+   new = kmalloc(offsetof(typeof(*new), shared[old->shared_max]),
+ GFP_KERNEL);
+   if (!new)
+   return -ENOMEM;
+
+   /* Go through all the shared fences in the resevation object and sort
+* the interesting ones to the end of the new list.
+*/
+   for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; ++i) {
+   struct dma_fence *f;
+
+   f = rcu_dereference_protected(old->shared[i],
+ reservation_object_held(obj));
+
+   if (f->context == context)
+   RCU_INIT_POINTER(new->shared[--j], f);
+   else
+   RCU_INIT_POINTER(new->shared[k++], f);
+   }
+   new->shared_max = old->shared_max;
+   new->shared_count = k;
+
+   /* Install the new fence list, seqcount provides the barriers */
+   preempt_disable();
+   write_seqcount_begin(>seq);
+   RCU_INIT_POINTER(obj->fence, new);
+   write_seqcount_end(>seq);
+   preempt_enable();
+
+   /* Drop the references to the removed fences */
+   for (i = j, k = 0; i < old->shared_count; ++i) {
+   struct dma_fence *f;
+
+   f = rcu_dereference_protected(new->shared[i],
+ reservation_object_held(obj));
+   dma_fence_put(f);
+   }
+   kfree_rcu(old, rcu);
+
+   return 0;
+}
+EXPORT_SYMBOL(reservation_object_remove_shared_fence);
+
 /**
  * reservation_object_add_excl_fence - Add an exclusive fence.
  * @obj: the reservation object
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 10abae398e51..9b25ad3d003e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -224,50 +224,7 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct 
amdgpu_bo *bo,
if (!ef)
return -EINVAL;
 
-   old = reservation_object_get_list(resv);
-   if (!old)
-   return 0;
-
-   new = kmalloc(offsetof(typeof(*new), shared[old->shared_max]),
- GFP_KERNEL);
-   if (!new)
-   return -ENOMEM;
-
-   /* Go through all the shared fences in the resevation object and sort
-* the interesting ones to the end of the list.
-*/
-   for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; ++i) {
-   struct dma_fence *f;
-
-   f = rcu_dereference_protected(old->shared[i],
- reservation_object_held(resv));
-
-   if (f->context == ef->base.context)
-   RCU_INIT_POINTER(new->shared[--j], f);
-   else
-   RCU_INIT_POINTER(new->shared[k++], f);
-   }
-   new->shared_max = old->shared_max;
-   new->shared_count = k;
-
-   /* Install the new fence list, seqcount provides the barriers */
-   preempt_disable();
-   write_seqcount_begin(>seq);
-   RCU_INIT_POINTER(resv->fence, new);
-   write_seqcount_end(>seq);
-   preempt_enable();
-
-   /* Drop the references to the removed fences or move them to ef_list */
-   for (i = j, k = 0; i < old->shared_count; ++i) {
-   struct dma_fence *f;
-
-   f = rcu_dereference_protected(new->shared[i],
-

Re: [PATCH v2] drm/amdgpu: fix scheduler timeout calc

2019-06-27 Thread Christian König

Am 27.06.19 um 12:03 schrieb Cui, Flora:

scheduler timeout is in jiffies
v2: move timeout check to amdgpu_device_get_job_timeout_settings after
parsing the value

Change-Id: I26708c163db943ff8d930dd81bcab4b4b9d84eb2
Signed-off-by: Flora Cui 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index e74a175..cc29d70 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1300,7 +1300,9 @@ int amdgpu_device_get_job_timeout_settings(struct 
amdgpu_device *adev)
 * By default timeout for non compute jobs is 1.
 * And there is no timeout enforced on compute jobs.
 */
-   adev->gfx_timeout = adev->sdma_timeout = adev->video_timeout = 1;
+   adev->gfx_timeout = \
+   adev->sdma_timeout = \
+   adev->video_timeout = msecs_to_jiffies(1);


Of hand that looks very odd to me. This is not a macro so why the \ here?


adev->compute_timeout = MAX_SCHEDULE_TIMEOUT;
  
  	if (strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENTH)) {

@@ -1314,7 +1316,8 @@ int amdgpu_device_get_job_timeout_settings(struct 
amdgpu_device *adev)
if (timeout <= 0) {
index++;
continue;
-   }
+   } else
+   timeout = msecs_to_jiffies(timeout);


You can actually remove the "if (timeout <= 0)" as well, 
msecs_to_jiffies will do the right thing for negative values.


Christian.

  
  			switch (index++) {

case 0:


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2] drm/amdgpu: fix scheduler timeout calc

2019-06-27 Thread Cui, Flora
scheduler timeout is in jiffies
v2: move timeout check to amdgpu_device_get_job_timeout_settings after
parsing the value

Change-Id: I26708c163db943ff8d930dd81bcab4b4b9d84eb2
Signed-off-by: Flora Cui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index e74a175..cc29d70 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1300,7 +1300,9 @@ int amdgpu_device_get_job_timeout_settings(struct 
amdgpu_device *adev)
 * By default timeout for non compute jobs is 1.
 * And there is no timeout enforced on compute jobs.
 */
-   adev->gfx_timeout = adev->sdma_timeout = adev->video_timeout = 1;
+   adev->gfx_timeout = \
+   adev->sdma_timeout = \
+   adev->video_timeout = msecs_to_jiffies(1);
adev->compute_timeout = MAX_SCHEDULE_TIMEOUT;
 
if (strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENTH)) {
@@ -1314,7 +1316,8 @@ int amdgpu_device_get_job_timeout_settings(struct 
amdgpu_device *adev)
if (timeout <= 0) {
index++;
continue;
-   }
+   } else
+   timeout = msecs_to_jiffies(timeout);
 
switch (index++) {
case 0:
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/powerplay: warn on smu interface version mismatch

2019-06-27 Thread Yuan, Xiaojie
Thanks Evan, got your point. Then I'll keep the patch only in our bring up 
branch.

BR,
Xiaojie


From: Quan, Evan
Sent: Thursday, June 27, 2019 9:51:22 AM
To: Yuan, Xiaojie; amd-gfx@lists.freedesktop.org
Cc: Wang, Kevin(Yang)
Subject: RE: [PATCH] drm/amd/powerplay: warn on smu interface version mismatch

I do not think this is a good idea.
As there is still some cases that version mismatch will cause unexpected  
issues. And they will be hard to debug.
If this is for debug purpose only, I would suggest to keep this in your custom 
branch only.

Regards,
Evan
> -Original Message-
> From: amd-gfx  On Behalf Of
> Yuan, Xiaojie
> Sent: Wednesday, June 26, 2019 2:34 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Wang, Kevin(Yang) 
> Subject: Re: [PATCH] drm/amd/powerplay: warn on smu interface version
> mismatch
>
> Current SMU IF version check is too strict, driver with old smu11_driver_if.h
> sometimes works fine with new SMU firmware. We prefer to see a warning
> instead a error for debug purposes.
>
> BR,
> Xiaojie
>
> 
> From: Yuan, Xiaojie
> Sent: Wednesday, June 26, 2019 2:24:19 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Wang, Kevin(Yang); Yuan, Xiaojie
> Subject: [PATCH] drm/amd/powerplay: warn on smu interface version
> mismatch
>
> Signed-off-by: Xiaojie Yuan 
> ---
>  drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
> b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
> index c3f48fae6f32..339d063e24ff 100644
> --- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
> +++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
> @@ -272,8 +272,7 @@ static int smu_v11_0_check_fw_version(struct
> smu_context *smu)
> "smu fw version = 0x%08x (%d.%d.%d)\n",
> smu->smc_if_version, if_version,
> smu_version, smu_major, smu_minor, smu_debug);
> -   pr_err("SMU driver if version not matched\n");
> -   ret = -EINVAL;
> +   pr_warn("SMU driver if version not matched\n");
> }
>
> return ret;
> --
> 2.20.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdgpu: handle AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID on gfx10

2019-06-27 Thread zhoucm1

any reason for not care .emit_ib_size in this one?

-David


On 2019年06月27日 06:35, Marek Olšák wrote:

From: Marek Olšák 

Signed-off-by: Marek Olšák 
---
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 17 +
  1 file changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 6baaa65a1daa..5b807a19bbbf 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4257,20 +4257,36 @@ static void gfx_v10_0_ring_emit_ib_gfx(struct 
amdgpu_ring *ring,
  }
  
  static void gfx_v10_0_ring_emit_ib_compute(struct amdgpu_ring *ring,

   struct amdgpu_job *job,
   struct amdgpu_ib *ib,
   uint32_t flags)
  {
unsigned vmid = AMDGPU_JOB_GET_VMID(job);
u32 control = INDIRECT_BUFFER_VALID | ib->length_dw | (vmid << 24);
  
+	/* Currently, there is a high possibility to get wave ID mismatch

+* between ME and GDS, leading to a hw deadlock, because ME generates
+* different wave IDs than the GDS expects. This situation happens
+* randomly when at least 5 compute pipes use GDS ordered append.
+* The wave IDs generated by ME are also wrong after suspend/resume.
+* Those are probably bugs somewhere else in the kernel driver.
+*
+* Writing GDS_COMPUTE_MAX_WAVE_ID resets wave ID counters in ME and
+* GDS to 0 for this ring (me/pipe).
+*/
+   if (ib->flags & AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID) {
+   amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
+   amdgpu_ring_write(ring, mmGDS_COMPUTE_MAX_WAVE_ID);
+   amdgpu_ring_write(ring, 
ring->adev->gds.gds_compute_max_wave_id);
+   }
+
amdgpu_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2));
BUG_ON(ib->gpu_addr & 0x3); /* Dword align */
amdgpu_ring_write(ring,
  #ifdef __BIG_ENDIAN
(2 << 0) |
  #endif
lower_32_bits(ib->gpu_addr));
amdgpu_ring_write(ring, upper_32_bits(ib->gpu_addr));
amdgpu_ring_write(ring, control);
  }
@@ -5103,20 +5119,21 @@ static void gfx_v10_0_set_rlc_funcs(struct 
amdgpu_device *adev)
}
  }
  
  static void gfx_v10_0_set_gds_init(struct amdgpu_device *adev)

  {
/* init asic gds info */
switch (adev->asic_type) {
case CHIP_NAVI10:
default:
adev->gds.gds_size = 0x1;
+   adev->gds.gds_compute_max_wave_id = 0x4ff;
adev->gds.vgt_gs_max_wave_id = 0x3ff;
break;
}
  
  	adev->gds.gws_size = 64;

adev->gds.oa_size = 16;
  }
  
  static void gfx_v10_0_set_user_wgp_inactive_bitmap_per_sh(struct amdgpu_device *adev,

  u32 bitmap)


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH -next] drm/amdgpu: fix debugfs_simple_attr.cocci warnings

2019-06-27 Thread YueHaibing
Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE
for debugfs files.

Semantic patch information:
Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file()
imposes some significant overhead as compared to
DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe().

Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci

Signed-off-by: YueHaibing 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 20ce158490db..9d9f4cbbc4bd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1082,15 +1082,15 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 
val)
return 0;
 }
 
-DEFINE_SIMPLE_ATTRIBUTE(fops_ib_preempt, NULL,
-   amdgpu_debugfs_ib_preempt, "%llu\n");
+DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL, amdgpu_debugfs_ib_preempt,
+"%llu\n");
 
 int amdgpu_debugfs_init(struct amdgpu_device *adev)
 {
adev->debugfs_preempt =
-   debugfs_create_file("amdgpu_preempt_ib", 0600,
-   adev->ddev->primary->debugfs_root,
-   (void *)adev, _ib_preempt);
+   debugfs_create_file_unsafe("amdgpu_preempt_ib", 0600,
+  adev->ddev->primary->debugfs_root,
+  (void *)adev, _ib_preempt);
if (!(adev->debugfs_preempt)) {
DRM_ERROR("unable to create amdgpu_preempt_ib debugsfs file\n");
return -EIO;







Re: [PATCH 4/7] drm/radeon: Fill out gem_object->resv

2019-06-27 Thread Daniel Vetter
On Wed, Jun 26, 2019 at 10:23:12AM +0200, Daniel Vetter wrote:
> On Wed, Jun 26, 2019 at 07:10:21AM +, Koenig, Christian wrote:
> > Those patches would become superfluous when merging Gerd's work.
> 
> Not entirely, they still remove the gem_prime_res_obj. Setting up
> gem_bo.resv is only one half of what these do here. And yeah I think that
> single addition can be removed again when Gerd's stuff lands.
>  
> > But I'm not sure if that is going to fly soon or not.
> 
> I think r-b from Thomas Zimmermann (or some other ttm+gem stakeholder) and
> we're good to land them. Thomas Hellstrom mellowed down his "nack" to
> "I'll look at this in August again and course-correct if necessary".

Just pinged Gerd on this, so we can start coordination.

btw planning to review this from amd side, I'd like to ditch
gem_prime_res_obj especially with Gerd series it's pointless.
-Daniel

> -Daniel
> 
> > 
> > Christian.
> > 
> > Am 25.06.19 um 22:42 schrieb Daniel Vetter:
> > > That way we can ditch our gem_prime_res_obj implementation. Since ttm
> > > absolutely needs the right reservation object all the boilerplate is
> > > already there and we just have to wire it up correctly.
> > >
> > > Note that gem/prime doesn't care when we do this, as long as we do it
> > > before the bo is registered and someone can call the handle2fd ioctl
> > > on it.
> > >
> > > Aside: ttm_buffer_object.ttm_resv could probably be ditched in favour
> > > of always passing a non-NULL resv to ttm_bo_init(). At least for gem
> > > drivers that would avoid having two of these, on in ttm_buffer_object
> > > and the other in drm_gem_object, one just there for confusion.
> > >
> > > Reviewed-by: Emil Velikov 
> > > Signed-off-by: Daniel Vetter 
> > > Cc: Alex Deucher 
> > > Cc: "Christian König" 
> > > Cc: "David (ChunMing) Zhou" 
> > > Cc: amd-gfx@lists.freedesktop.org
> > > ---
> > >   drivers/gpu/drm/radeon/radeon_drv.c| 2 --
> > >   drivers/gpu/drm/radeon/radeon_object.c | 1 +
> > >   drivers/gpu/drm/radeon/radeon_prime.c  | 7 ---
> > >   3 files changed, 1 insertion(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
> > > b/drivers/gpu/drm/radeon/radeon_drv.c
> > > index 4403e76e1ae0..a4a78dfdef37 100644
> > > --- a/drivers/gpu/drm/radeon/radeon_drv.c
> > > +++ b/drivers/gpu/drm/radeon/radeon_drv.c
> > > @@ -152,7 +152,6 @@ struct drm_gem_object 
> > > *radeon_gem_prime_import_sg_table(struct drm_device *dev,
> > >   struct sg_table 
> > > *sg);
> > >   int radeon_gem_prime_pin(struct drm_gem_object *obj);
> > >   void radeon_gem_prime_unpin(struct drm_gem_object *obj);
> > > -struct reservation_object *radeon_gem_prime_res_obj(struct 
> > > drm_gem_object *);
> > >   void *radeon_gem_prime_vmap(struct drm_gem_object *obj);
> > >   void radeon_gem_prime_vunmap(struct drm_gem_object *obj, void *vaddr);
> > >   
> > > @@ -566,7 +565,6 @@ static struct drm_driver kms_driver = {
> > >   .gem_prime_export = radeon_gem_prime_export,
> > >   .gem_prime_pin = radeon_gem_prime_pin,
> > >   .gem_prime_unpin = radeon_gem_prime_unpin,
> > > - .gem_prime_res_obj = radeon_gem_prime_res_obj,
> > >   .gem_prime_get_sg_table = radeon_gem_prime_get_sg_table,
> > >   .gem_prime_import_sg_table = radeon_gem_prime_import_sg_table,
> > >   .gem_prime_vmap = radeon_gem_prime_vmap,
> > > diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
> > > b/drivers/gpu/drm/radeon/radeon_object.c
> > > index 21f73fc86f38..7a2bad843f8a 100644
> > > --- a/drivers/gpu/drm/radeon/radeon_object.c
> > > +++ b/drivers/gpu/drm/radeon/radeon_object.c
> > > @@ -262,6 +262,7 @@ int radeon_bo_create(struct radeon_device *rdev,
> > >   r = ttm_bo_init(>mman.bdev, >tbo, size, type,
> > >   >placement, page_align, !kernel, acc_size,
> > >   sg, resv, _ttm_bo_destroy);
> > > + bo->gem_base.resv = bo->tbo.resv;
> > >   up_read(>pm.mclk_lock);
> > >   if (unlikely(r != 0)) {
> > >   return r;
> > > diff --git a/drivers/gpu/drm/radeon/radeon_prime.c 
> > > b/drivers/gpu/drm/radeon/radeon_prime.c
> > > index deaffce50a2e..8ce3e8045d42 100644
> > > --- a/drivers/gpu/drm/radeon/radeon_prime.c
> > > +++ b/drivers/gpu/drm/radeon/radeon_prime.c
> > > @@ -117,13 +117,6 @@ void radeon_gem_prime_unpin(struct drm_gem_object 
> > > *obj)
> > >   }
> > >   
> > >   
> > > -struct reservation_object *radeon_gem_prime_res_obj(struct 
> > > drm_gem_object *obj)
> > > -{
> > > - struct radeon_bo *bo = gem_to_radeon_bo(obj);
> > > -
> > > - return bo->tbo.resv;
> > > -}
> > > -
> > >   struct dma_buf *radeon_gem_prime_export(struct drm_gem_object *gobj,
> > >   int flags)
> > >   {
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation

Re: [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem

2019-06-27 Thread Daniel Vetter
On Wed, Jun 26, 2019 at 5:05 PM Kenny Ho  wrote:
> This is a follow up to the RFC I made previously to introduce a cgroup
> controller for the GPU/DRM subsystem [v1,v2].  The goal is to be able to
> provide resource management to GPU resources using things like container.
> The cover letter from v1 is copied below for reference.
>
> [v1]: 
> https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
> [v2]: https://www.spinics.net/lists/cgroups/msg22074.html
>
> v3:
> Base on feedbacks on v2:
> * removed .help type file from v2
> * conform to cgroup convention for default and max handling
> * conform to cgroup convention for addressing device specific limits (with 
> major:minor)
> New function:
> * adopted memparse for memory size related attributes
> * added macro to marshall drmcgrp cftype private  (DRMCG_CTF_PRIV, etc.)
> * added ttm buffer usage stats (per cgroup, for system, tt, vram.)
> * added ttm buffer usage limit (per cgroup, for vram.)
> * added per cgroup bandwidth stats and limiting (burst and average bandwidth)
>
> v2:
> * Removed the vendoring concepts
> * Add limit to total buffer allocation
> * Add limit to the maximum size of a buffer allocation
>
> v1: cover letter
>
> The purpose of this patch series is to start a discussion for a generic cgroup
> controller for the drm subsystem.  The design proposed here is a very early 
> one.
> We are hoping to engage the community as we develop the idea.
>
>
> Backgrounds
> ==
> Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
> tasks, and all their future children, into hierarchical groups with 
> specialized
> behaviour, such as accounting/limiting the resources which processes in a 
> cgroup
> can access[1].  Weights, limits, protections, allocations are the main 
> resource
> distribution models.  Existing cgroup controllers includes cpu, memory, io,
> rdma, and more.  cgroup is one of the foundational technologies that enables 
> the
> popular container application deployment and management method.
>
> Direct Rendering Manager/drm contains code intended to support the needs of
> complex graphics devices. Graphics drivers in the kernel may make use of DRM
> functions to make tasks like memory management, interrupt handling and DMA
> easier, and provide a uniform interface to applications.  The DRM has also
> developed beyond traditional graphics applications to support compute/GPGPU
> applications.
>
>
> Motivations
> =
> As GPU grow beyond the realm of desktop/workstation graphics into areas like
> data center clusters and IoT, there are increasing needs to monitor and 
> regulate
> GPU as a resource like cpu, memory and io.
>
> Matt Roper from Intel began working on similar idea in early 2018 [2] for the
> purpose of managing GPU priority using the cgroup hierarchy.  While that
> particular use case may not warrant a standalone drm cgroup controller, there
> are other use cases where having one can be useful [3].  Monitoring GPU
> resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
> (execution unit [Intel's nomenclature]), GPU job scheduling [4] can help
> sysadmins get a better understanding of the applications usage profile.  
> Further
> usage regulations of the aforementioned resources can also help sysadmins
> optimize workload deployment on limited GPU resources.
>
> With the increased importance of machine learning, data science and other
> cloud-based applications, GPUs are already in production use in data centers
> today [5,6,7].  Existing GPU resource management is very course grain, 
> however,
> as sysadmins are only able to distribute workload on a per-GPU basis [8].  An
> alternative is to use GPU virtualization (with or without SRIOV) but it
> generally acts on the entire GPU instead of the specific resources in a GPU.
> With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
> resource management (in addition to what may be available via GPU
> virtualization.)
>
> In addition to production use, the DRM cgroup can also help with testing
> graphics application robustness by providing a mean to artificially limit DRM
> resources availble to the applications.
>
>
> Challenges
> 
> While there are common infrastructure in DRM that is shared across many 
> vendors
> (the scheduler [4] for example), there are also aspects of DRM that are vendor
> specific.  To accommodate this, we borrowed the mechanism used by the cgroup 
> to
> handle different kinds of cgroup controller.
>
> Resources for DRM are also often device (GPU) specific instead of system
> specific and a system may contain more than one GPU.  For this, we borrowed 
> some
> of the ideas from RDMA cgroup controller.

Another question I have: What about HMM? With the device memory zone
the core mm will be a lot more involved in managing that, but I also
expect that we'll have classic buffer-based management for a long time
still. So these need to work 

Re: [PATCH v4 hmm 12/12] mm/hmm: Fix error flows in hmm_invalidate_range_start

2019-06-27 Thread Ralph Campbell


On 6/24/19 2:01 PM, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

If the trylock on the hmm->mirrors_sem fails the function will return
without decrementing the notifiers that were previously incremented. Since
the caller will not call invalidate_range_end() on EAGAIN this will result
in notifiers becoming permanently incremented and deadlock.

If the sync_cpu_device_pagetables() required blocking the function will
not return EAGAIN even though the device continues to touch the
pages. This is a violation of the mmu notifier contract.

Switch, and rename, the ranges_lock to a spin lock so we can reliably
obtain it without blocking during error unwind.

The error unwind is necessary since the notifiers count must be held
incremented across the call to sync_cpu_device_pagetables() as we cannot
allow the range to become marked valid by a parallel
invalidate_start/end() pair while doing sync_cpu_device_pagetables().

Signed-off-by: Jason Gunthorpe 
Reviewed-by: Ralph Campbell 
Reviewed-by: Christoph Hellwig 
Tested-by: Philip Yang 
---
  include/linux/hmm.h |  2 +-
  mm/hmm.c| 72 +++--
  2 files changed, 45 insertions(+), 29 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index bf013e96525771..0fa8ea34ccef6d 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -86,7 +86,7 @@
  struct hmm {
struct mm_struct*mm;
struct kref kref;
-   struct mutexlock;
+   spinlock_t  ranges_lock;
struct list_headranges;
struct list_headmirrors;
struct mmu_notifier mmu_notifier;
diff --git a/mm/hmm.c b/mm/hmm.c
index b224ea635a7716..89549eac03d506 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -64,7 +64,7 @@ static struct hmm *hmm_get_or_create(struct mm_struct *mm)
init_rwsem(>mirrors_sem);
hmm->mmu_notifier.ops = NULL;
INIT_LIST_HEAD(>ranges);
-   mutex_init(>lock);
+   spin_lock_init(>ranges_lock);
kref_init(>kref);
hmm->notifiers = 0;
hmm->mm = mm;
@@ -144,6 +144,23 @@ static void hmm_release(struct mmu_notifier *mn, struct 
mm_struct *mm)
hmm_put(hmm);
  }
  
+static void notifiers_decrement(struct hmm *hmm)

+{
+   lockdep_assert_held(>ranges_lock);
+


Why not acquire the lock here and release at the end instead
of asserting the lock is held?
It looks like everywhere notifiers_decrement() is called does
that.


+   hmm->notifiers--;
+   if (!hmm->notifiers) {
+   struct hmm_range *range;
+
+   list_for_each_entry(range, >ranges, list) {
+   if (range->valid)
+   continue;
+   range->valid = true;
+   }
+   wake_up_all(>wq);
+   }
+}
+
  static int hmm_invalidate_range_start(struct mmu_notifier *mn,
const struct mmu_notifier_range *nrange)
  {
@@ -151,6 +168,7 @@ static int hmm_invalidate_range_start(struct mmu_notifier 
*mn,
struct hmm_mirror *mirror;
struct hmm_update update;
struct hmm_range *range;
+   unsigned long flags;
int ret = 0;
  
  	if (!kref_get_unless_zero(>kref))

@@ -161,12 +179,7 @@ static int hmm_invalidate_range_start(struct mmu_notifier 
*mn,
update.event = HMM_UPDATE_INVALIDATE;
update.blockable = mmu_notifier_range_blockable(nrange);
  
-	if (mmu_notifier_range_blockable(nrange))

-   mutex_lock(>lock);
-   else if (!mutex_trylock(>lock)) {
-   ret = -EAGAIN;
-   goto out;
-   }
+   spin_lock_irqsave(>ranges_lock, flags);
hmm->notifiers++;
list_for_each_entry(range, >ranges, list) {
if (update.end < range->start || update.start >= range->end)
@@ -174,7 +187,7 @@ static int hmm_invalidate_range_start(struct mmu_notifier 
*mn,
  
  		range->valid = false;

}
-   mutex_unlock(>lock);
+   spin_unlock_irqrestore(>ranges_lock, flags);
  
  	if (mmu_notifier_range_blockable(nrange))

down_read(>mirrors_sem);
@@ -182,16 +195,26 @@ static int hmm_invalidate_range_start(struct mmu_notifier 
*mn,
ret = -EAGAIN;
goto out;
}
+
list_for_each_entry(mirror, >mirrors, list) {
-   int ret;
+   int rc;
  
-		ret = mirror->ops->sync_cpu_device_pagetables(mirror, );

-   if (!update.blockable && ret == -EAGAIN)
+   rc = mirror->ops->sync_cpu_device_pagetables(mirror, );
+   if (rc) {
+   if (WARN_ON(update.blockable || rc != -EAGAIN))
+   continue;
+   ret = -EAGAIN;
break;
+   }
}
up_read(>mirrors_sem);
  
  out:

+   if (ret) {
+   spin_lock_irqsave(>ranges_lock, flags);
+   notifiers_decrement(hmm);
+

Re: [PATCH 1/2] drm/amdgpu: fix transform feedback GDS hang on gfx10 (v2)

2019-06-27 Thread Christian König

Am 27.06.19 um 00:35 schrieb Marek Olšák:

From: Marek Olšák 

v2: update emit_ib_size
(though it's still wrong because it was wrong before)

Signed-off-by: Marek Olšák 


Can't judge if this is really the right thing to do because I don't know 
the details of the hw bug.


But at least of hand I can't see any obvious problems with it, so feel 
free to add an Acked-by: Christian König  to 
the series.


Regards,
Christian.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  3 ++-
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 14 +++---
  2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
index dad2186f4ed5..df8a23554831 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
@@ -24,21 +24,22 @@
  #ifndef __AMDGPU_GDS_H__
  #define __AMDGPU_GDS_H__
  
  struct amdgpu_ring;

  struct amdgpu_bo;
  
  struct amdgpu_gds {

uint32_t gds_size;
uint32_t gws_size;
uint32_t oa_size;
-   uint32_tgds_compute_max_wave_id;
+   uint32_t gds_compute_max_wave_id;
+   uint32_t vgt_gs_max_wave_id;
  };
  
  struct amdgpu_gds_reg_offset {

uint32_tmem_base;
uint32_tmem_size;
uint32_tgws;
uint32_toa;
  };
  
  #endif /* __AMDGPU_GDS_H__ */

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 16b2bcc590e7..6baaa65a1daa 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4211,20 +4211,29 @@ static void gfx_v10_0_ring_emit_hdp_flush(struct 
amdgpu_ring *ring)
  }
  
  static void gfx_v10_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,

   struct amdgpu_job *job,
   struct amdgpu_ib *ib,
   uint32_t flags)
  {
unsigned vmid = AMDGPU_JOB_GET_VMID(job);
u32 header, control = 0;
  
+	/* Prevent a hw deadlock due to a wave ID mismatch between ME and GDS.

+* This resets the wave ID counters. (needed by transform feedback)
+* TODO: This might only be needed on a VMID switch when we change
+*   the GDS OA mapping, not sure.
+*/
+   amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
+   amdgpu_ring_write(ring, mmVGT_GS_MAX_WAVE_ID);
+   amdgpu_ring_write(ring, ring->adev->gds.vgt_gs_max_wave_id);
+
if (ib->flags & AMDGPU_IB_FLAG_CE)
header = PACKET3(PACKET3_INDIRECT_BUFFER_CNST, 2);
else
header = PACKET3(PACKET3_INDIRECT_BUFFER, 2);
  
  	control |= ib->length_dw | (vmid << 24);
  
  	if (amdgpu_mcbp && (ib->flags & AMDGPU_IB_FLAG_PREEMPT)) {

control |= INDIRECT_BUFFER_PRE_ENB(1);
  
@@ -4944,21 +4953,21 @@ static const struct amdgpu_ring_funcs gfx_v10_0_ring_funcs_gfx = {

 */
5 + /* COND_EXEC */
7 + /* HDP_flush */
4 + /* VGT_flush */
14 + /* CE_META */
31 + /* DE_META */
3 + /* CNTX_CTRL */
5 + /* HDP_INVL */
8 + 8 + /* FENCE x2 */
2, /* SWITCH_BUFFER */
-   .emit_ib_size = 4, /* gfx_v10_0_ring_emit_ib_gfx */
+   .emit_ib_size = 7, /* gfx_v10_0_ring_emit_ib_gfx */
.emit_ib = gfx_v10_0_ring_emit_ib_gfx,
.emit_fence = gfx_v10_0_ring_emit_fence,
.emit_pipeline_sync = gfx_v10_0_ring_emit_pipeline_sync,
.emit_vm_flush = gfx_v10_0_ring_emit_vm_flush,
.emit_gds_switch = gfx_v10_0_ring_emit_gds_switch,
.emit_hdp_flush = gfx_v10_0_ring_emit_hdp_flush,
.test_ring = gfx_v10_0_ring_test_ring,
.test_ib = gfx_v10_0_ring_test_ib,
.insert_nop = amdgpu_ring_insert_nop,
.pad_ib = amdgpu_ring_generic_pad_ib,
@@ -5092,24 +5101,23 @@ static void gfx_v10_0_set_rlc_funcs(struct 
amdgpu_device *adev)
default:
break;
}
  }
  
  static void gfx_v10_0_set_gds_init(struct amdgpu_device *adev)

  {
/* init asic gds info */
switch (adev->asic_type) {
case CHIP_NAVI10:
-   adev->gds.gds_size = 0x1;
-   break;
default:
adev->gds.gds_size = 0x1;
+   adev->gds.vgt_gs_max_wave_id = 0x3ff;
break;
}
  
  	adev->gds.gws_size = 64;

adev->gds.oa_size = 16;
  }
  
  static void gfx_v10_0_set_user_wgp_inactive_bitmap_per_sh(struct amdgpu_device *adev,

  u32 bitmap)
  {


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/powerplay: update smu11_driver_if_navi10.h

2019-06-27 Thread Tianci Yin
From: tiancyin 

update the smu11_driver_if_navi10.h since navi10 smu fw
update to 42.28

Signed-off-by: tiancyin 
---
 drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h | 6 +++---
 drivers/gpu/drm/amd/powerplay/navi10_ppt.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h 
b/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h
index a8b31bc..adbbfeb 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h
@@ -26,7 +26,7 @@
 // *** IMPORTANT ***
 // SMU TEAM: Always increment the interface version if 
 // any structure is changed in this file
-#define SMU11_DRIVER_IF_VERSION 0x32
+#define SMU11_DRIVER_IF_VERSION 0x33
 
 #define PPTABLE_NV10_SMU_VERSION 8
 
@@ -813,8 +813,8 @@ typedef struct {
   uint16_t UclkAverageLpfTau;
   uint16_t GfxActivityLpfTau;
   uint16_t UclkActivityLpfTau;
+  uint16_t SocketPowerLpfTau;
 
-  uint16_t Padding;  
   // Padding - ignore
   uint32_t MmHubPadding[8]; // SMU internal use
 } DriverSmuConfig_t;
@@ -853,7 +853,7 @@ typedef struct {
   uint8_t  CurrGfxVoltageOffset  ;
   uint8_t  CurrMemVidOffset  ;
   uint8_t  Padding8  ;
-  uint16_t CurrSocketPower   ;
+  uint16_t AverageSocketPower;
   uint16_t TemperatureEdge   ;
   uint16_t TemperatureHotspot;
   uint16_t TemperatureMem;
diff --git a/drivers/gpu/drm/amd/powerplay/navi10_ppt.c 
b/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
index 99566de..373aeba 100644
--- a/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
@@ -863,7 +863,7 @@ static int navi10_get_gpu_power(struct smu_context *smu, 
uint32_t *value)
if (ret)
return ret;
 
-   *value = metrics.CurrSocketPower << 8;
+   *value = metrics.AverageSocketPower << 8;
 
return 0;
 }
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH v3 11/11] drm, cgroup: Allow more aggressive memory reclaim

2019-06-27 Thread Daniel Vetter
On Wed, Jun 26, 2019 at 06:52:50PM -0400, Kenny Ho wrote:
> Ok.  I am not too familiar with shrinker but I will dig into it.  Just
> so that I am looking into the right things, you are referring to
> things like struct shrinker and struct shrink_control?

Yeah. Reason I'm asking for this is this is how system memory is shrunk
right now, so at least having some conceptual similarities might be useful
here. And a lot of people have thought quite hard about system memory
shrinking and all that, so hopefully that gives us good design
inspiration.
-Daniel

> 
> Regards,
> Kenny
> 
> On Wed, Jun 26, 2019 at 12:44 PM Daniel Vetter  wrote:
> >
> > On Wed, Jun 26, 2019 at 11:05:22AM -0400, Kenny Ho wrote:
> > > Allow DRM TTM memory manager to register a work_struct, such that, when
> > > a drmcgrp is under memory pressure, memory reclaiming can be triggered
> > > immediately.
> > >
> > > Change-Id: I25ac04e2db9c19ff12652b88ebff18b44b2706d8
> > > Signed-off-by: Kenny Ho 
> > > ---
> > >  drivers/gpu/drm/ttm/ttm_bo.c| 47 +
> > >  include/drm/drm_cgroup.h| 14 ++
> > >  include/drm/ttm/ttm_bo_driver.h |  2 ++
> > >  kernel/cgroup/drm.c | 33 +++
> > >  4 files changed, 96 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > > index 79c530f4a198..5fc3bc5bd4c5 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > > @@ -1509,6 +1509,44 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, 
> > > unsigned mem_type)
> > >  }
> > >  EXPORT_SYMBOL(ttm_bo_evict_mm);
> > >
> > > +static void ttm_bo_reclaim_wq(struct work_struct *work)
> > > +{
> >
> > I think a design a bit more inspired by memcg aware core shrinkers would
> > be nice, i.e. explicitly passing:
> > - which drm_cgroup needs to be shrunk
> > - which ttm_mem_reg (well the fancy new abstracted out stuff for tracking
> >   special gpu memory resources like tt or vram or whatever)
> > - how much it needs to be shrunk
> >
> > I think with that a lot more the book-keeping could be pushed into the
> > drm_cgroup code, and the callback just needs to actually shrink enough as
> > requested.
> > -Daniel
> >
> > > + struct ttm_operation_ctx ctx = {
> > > + .interruptible = false,
> > > + .no_wait_gpu = false,
> > > + .flags = TTM_OPT_FLAG_FORCE_ALLOC
> > > + };
> > > + struct ttm_mem_type_manager *man =
> > > + container_of(work, struct ttm_mem_type_manager, reclaim_wq);
> > > + struct ttm_bo_device *bdev = man->bdev;
> > > + struct dma_fence *fence;
> > > + int mem_type;
> > > + int ret;
> > > +
> > > + for (mem_type = 0; mem_type < TTM_NUM_MEM_TYPES; mem_type++)
> > > + if (>man[mem_type] == man)
> > > + break;
> > > +
> > > + BUG_ON(mem_type >= TTM_NUM_MEM_TYPES);
> > > +
> > > + if (!drmcgrp_mem_pressure_scan(bdev, mem_type))
> > > + return;
> > > +
> > > + ret = ttm_mem_evict_first(bdev, mem_type, NULL, );
> > > + if (ret)
> > > + return;
> > > +
> > > + spin_lock(>move_lock);
> > > + fence = dma_fence_get(man->move);
> > > + spin_unlock(>move_lock);
> > > +
> > > + if (fence) {
> > > + ret = dma_fence_wait(fence, false);
> > > + dma_fence_put(fence);
> > > + }
> > > +
> > > +}
> > > +
> > >  int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
> > >   unsigned long p_size)
> > >  {
> > > @@ -1543,6 +1581,13 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, 
> > > unsigned type,
> > >   INIT_LIST_HEAD(>lru[i]);
> > >   man->move = NULL;
> > >
> > > + pr_err("drmcgrp %p type %d\n", bdev->ddev, type);
> > > +
> > > + if (type <= TTM_PL_VRAM) {
> > > + INIT_WORK(>reclaim_wq, ttm_bo_reclaim_wq);
> > > + drmcgrp_register_device_mm(bdev->ddev, type, 
> > > >reclaim_wq);
> > > + }
> > > +
> > >   return 0;
> > >  }
> > >  EXPORT_SYMBOL(ttm_bo_init_mm);
> > > @@ -1620,6 +1665,8 @@ int ttm_bo_device_release(struct ttm_bo_device 
> > > *bdev)
> > >   man = >man[i];
> > >   if (man->has_type) {
> > >   man->use_type = false;
> > > + drmcgrp_unregister_device_mm(bdev->ddev, i);
> > > + cancel_work_sync(>reclaim_wq);
> > >   if ((i != TTM_PL_SYSTEM) && ttm_bo_clean_mm(bdev, 
> > > i)) {
> > >   ret = -EBUSY;
> > >   pr_err("DRM memory manager type %d is not 
> > > clean\n",
> > > diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> > > index 360c1e6c809f..134d6e5475f3 100644
> > > --- a/include/drm/drm_cgroup.h
> > > +++ b/include/drm/drm_cgroup.h
> > > @@ -5,6 +5,7 @@
> > >  #define __DRM_CGROUP_H__
> > >
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 

Re: [RFC PATCH v3 09/11] drm, cgroup: Add per cgroup bw measure and control

2019-06-27 Thread Daniel Vetter
On Thu, Jun 27, 2019 at 12:34:05AM -0400, Kenny Ho wrote:
> On Wed, Jun 26, 2019 at 12:25 PM Daniel Vetter  wrote:
> >
> > On Wed, Jun 26, 2019 at 11:05:20AM -0400, Kenny Ho wrote:
> > > The bandwidth is measured by keeping track of the amount of bytes moved
> > > by ttm within a time period.  We defined two type of bandwidth: burst
> > > and average.  Average bandwidth is calculated by dividing the total
> > > amount of bytes moved within a cgroup by the lifetime of the cgroup.
> > > Burst bandwidth is similar except that the byte and time measurement is
> > > reset after a user configurable period.
> >
> > So I'm not too sure exposing this is a great idea, at least depending upon
> > what you're trying to do with it. There's a few concerns here:
> >
> > - I think bo movement stats might be useful, but they're not telling you
> >   everything. Applications can also copy data themselves and put buffers
> >   where they want them, especially with more explicit apis like vk.
> >
> > - which kind of moves are we talking about here? Eviction related bo moves
> >   seem not counted here, and if you have lots of gpus with funny
> >   interconnects you might also get other kinds of moves, not just system
> >   ram <-> vram.
> Eviction move is counted but I think I placed the delay in the wrong
> place (the tracking of byte moved is in previous patch in
> ttm_bo_handle_move_mem, which is common to all move as far as I can
> tell.)
> 
> > - What happens if we slow down, but someone else needs to evict our
> >   buffers/move them (ttm is atm not great at this, but Christian König is
> >   working on patches). I think there's lots of priority inversion
> >   potential here.
> >
> > - If the goal is to avoid thrashing the interconnects, then this isn't the
> >   full picture by far - apps can use copy engines and explicit placement,
> >   again that's how vulkan at least is supposed to work.
> >
> > I guess these all boil down to: What do you want to achieve here? The
> > commit message doesn't explain the intended use-case of this.
> Thrashing prevention is the intent.  I am not familiar with Vulkan so
> I will have to get back to you on that.  I don't know how those
> explicit placement translate into the kernel.  At this stage, I think
> it's still worth while to have this as a resource even if some
> applications bypass the kernel.  I certainly welcome more feedback on
> this topic.

The trouble with thrashing prevention like this is that either you don't
limit all the bo moves, and then you don't count everything. Or you limit
them all, and then you create priority inversions in the ttm eviction
handler, essentially rate-limiting everyone who's thrashing. Or at least
you run the risk of that happening.

Not what you want I think :-)

I also think that the blkcg people are still trying to figure out how to
make this work fully reliable (it's the same problem really), and a
critical piece is knowing/estimating the overall bandwidth. Without that
the admin can't really do something meaningful. The problem with that is
you don't know, not just because of vk, but any userspace that has buffers
in the pci gart uses the same interconnect just as part of its rendering
job. So if your goal is to guaranteed some minimal amount of bo move
bandwidth, then this wont work, because you have no idea how much bandwith
there even is for bo moves.

Getting thrashing limited is very hard.

I feel like a better approach would by to add a cgroup for the various
engines on the gpu, and then also account all the sdma (or whatever the
name of the amd copy engines is again) usage by ttm_bo moves to the right
cgroup. I think that's a more meaningful limitation. For direct thrashing
control I think there's both not enough information available in the
kernel (you'd need some performance counters to watch how much bandwidth
userspace batches/CS are wasting), and I don't think the ttm eviction
logic is ready to step over all the priority inversion issues this will
bring up. Managing sdma usage otoh will be a lot more straightforward (but
still has all the priority inversion problems, but in the scheduler that
might be easier to fix perhaps with the explicit dependency graph - in the
i915 scheduler we already have priority boosting afaiui).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats

2019-06-27 Thread Daniel Vetter
On Thu, Jun 27, 2019 at 12:06:13AM -0400, Kenny Ho wrote:
> On Wed, Jun 26, 2019 at 12:12 PM Daniel Vetter  wrote:
> >
> > On Wed, Jun 26, 2019 at 11:05:18AM -0400, Kenny Ho wrote:
> > > drm.memory.stats
> > > A read-only nested-keyed file which exists on all cgroups.
> > > Each entry is keyed by the drm device's major:minor.  The
> > > following nested keys are defined.
> > >
> > >   == =
> > >   system Host/system memory
> >
> > Shouldn't that be covered by gem bo stats already? Also, system memory is
> > definitely something a lot of non-ttm drivers want to be able to track, so
> > that needs to be separate from ttm.
> The gem bo stats covers all of these type.  I am treat the gem stats
> as more of the front end and a hard limit and this set of stats as the
> backing store which can be of various type.  How does non-ttm drivers
> identify various memory types?

Not explicitly, they generally just have one. I think i915 currently has
two, system and carveout (with vram getting added).

> > >   tt Host memory used by the drm device (GTT/GART)
> > >   vram   Video RAM used by the drm device
> > >   priv   Other drm device, vendor specific memory
> >
> > So what's "priv". In general I think we need some way to register the
> > different kinds of memory, e.g. stuff not in your list:
> >
> > - multiple kinds of vram (like numa-style gpus)
> > - cma (for all those non-ttm drivers that's a big one, it's like system
> >   memory but also totally different)
> > - any carveouts and stuff
> privs are vendor specific, which is why I have truncated it.  For
> example, AMD has AMDGPU_PL_GDS, GWS, OA
> https://elixir.bootlin.com/linux/v5.2-rc6/source/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h#L30
> 
> Since we are using keyed file type, we should be able to support
> vendor specific memory type but I am not sure if this is acceptable to
> cgroup upstream.  This is why I stick to the 3 memory type that is
> common across all ttm drivers.

I think we'll need custom memory pools, not just priv, and I guess some
naming scheme for them. I think just exposing them as amd-gws, amd-oa,
amd-gds would make sense.

Another thing I wonder about is multi-gpu cards, with multiple gpus and
each their own vram and other device-specific resources. For those we'd
have node0.vram and node1.vram too (on top of maybe an overall vram node,
not sure).

> > I think with all the ttm refactoring going on I think we need to de-ttm
> > the interface functions here a bit. With Gerd Hoffmans series you can just
> > use a gem_bo pointer here, so what's left to do is have some extracted
> > structure for tracking memory types. I think Brian Welty has some ideas
> > for this, even in patch form. Would be good to keep him on cc at least for
> > the next version. We'd need to explicitly hand in the ttm_mem_reg (or
> > whatever the specific thing is going to be).
> 
> I assume Gerd Hoffman's series you are referring to is this one?
> https://www.spinics.net/lists/dri-devel/msg215056.html

There's a newer one, much more complete, but yes that's the work.

> I can certainly keep an eye out for Gerd's refactoring while
> refactoring other parts of this RFC.
> 
> I have added Brian and Gerd to the thread for awareness.

btw just realized that maybe building the interfaces on top of ttm_mem_reg
is maybe not the best. That's what you're using right now, but in a way
that's just the ttm internal detail of how the backing storage is
allocated. I think the structure we need to abstract away is
ttm_mem_type_manager, without any of the actual management details.

btw reminds me: I guess it would be good to have a per-type .total
read-only exposed, so that userspace has an idea of how much there is?
ttm is trying to be agnostic to the allocator that's used to manage a
memory type/resource, so doesn't even know that. But I think something we
need to expose to admins, otherwise they can't meaningfully set limits.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx