Re: [PATCH 06/25] drm/amdgpu: Add KFD eviction fence

2018-01-30 Thread Felix Kuehling
On 2018-01-30 10:35 AM, Christian König wrote:
> Am 30.01.2018 um 16:28 schrieb Kasiviswanathan, Harish:
 [+Harish, forgot to acknowledge him in the commit description, will
 fix
 that in v2]

 Harish, please see Christian's question below in amd_kfd_fence_signal.
 Did I understand this correctly?
>> [HK]: Yes the lifetime of eviction fences is tied to the lifetime of
>> the process associated with it. When the process terminates the fence
>> is signaled and released. For all the BOs that belong to this process
>> the eviction should be detached from it when the BO is released.
>> However, this eviction fence could be still attached to shared BOs.
>> So signaling it frees those BOs.
>>
>>
>> On 2018-01-29 08:43 AM, Christian König wrote:
>>> Hi Felix & Harish,
>>>
>>> maybe explain why I found that odd: dma_fence_add_callback() sets the
>>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT flag before adding the callback.
>>>
>>> So the flag should always be set when there are callbacks.
>>> Did I miss anything?
>> I don't think we add any callbacks to our eviction fences.
>>
>> [HK] Setting DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is not required. It was
>> my oversight. Since, dma_fence_signal() function called cb_list
>> functions only if DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is set, I thought
>> it was safe to set it. However, the cb_list would be empty if no
>> callbacks are added. So setting DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is
>> redundant.
>
> Ok in this case let's just remove that and also use the
> dma_fence_signal() function (not the _locked variant) for signaling
> the DMA fence.

Sure. Though it makes me wonder why we need to signal the fence at all.
This is when the reference count of the fence is 0. Doesn't that imply
that no one is left waiting for the fence?

Regards,
  Felix

>
> Thanks,
> Christian.
>
>>
>> Best Regards,
>> Harish
>>
>>  
>> Regards,
>>    Felix
>>
>>> Regards,
>>> Christian.
>>>
>>> Am 29.01.2018 um 00:55 schrieb Felix Kuehling:
 [+Harish, forgot to acknowledge him in the commit description, will
 fix
 that in v2]

 Harish, please see Christian's question below in amd_kfd_fence_signal.
 Did I understand this correctly?

 Regards,
     Felix

 On 2018-01-28 06:42 PM, Felix Kuehling wrote:
> On 2018-01-27 04:16 AM, Christian König wrote:
>> Am 27.01.2018 um 02:09 schrieb Felix Kuehling:
> [snip]
>>> +struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64
>>> context,
>>> +   void *mm)
>>> +{
>>> +    struct amdgpu_amdkfd_fence *fence = NULL;
>>> +
>>> +    fence = kzalloc(sizeof(*fence), GFP_KERNEL);
>>> +    if (fence == NULL)
>>> +    return NULL;
>>> +
>>> +    /* mm_struct mm is used as void pointer to identify the parent
>>> + * KFD process. Don't dereference it. Fence and any threads
>>> using
>>> + * mm is guranteed to be released before process termination.
>>> + */
>>> +    fence->mm = mm;
>> That won't work. Fences can live much longer than any process who
>> created them.
>>
>> I've already found a fence in a BO still living hours after the
>> process was killed and the pid long recycled.
>>
>> I suggest to make fence->mm a real mm_struct pointer with reference
>> counting and then set it to NULL and drop the reference in
>> enable_signaling.
> I agree. But enable_signaling may be too early to drop the reference.
> amd_kfd_fence_check_mm could still be called later from
> amdgpu_ttm_bo_eviction_valuable, as long as the fence hasn't
> signaled yet.
>
> The safe place is problably in amd_kfd_fence_release.
>
>>> +    get_task_comm(fence->timeline_name, current);
>>> +    spin_lock_init(&fence->lock);
>>> +
>>> +    dma_fence_init(&fence->base, &amd_kfd_fence_ops, &fence->lock,
>>> +   context, atomic_inc_return(&fence_seq));
>>> +
>>> +    return fence;
>>> +}
>>> +
>>> +struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct
>>> dma_fence *f)
>>> +{
>>> +    struct amdgpu_amdkfd_fence *fence;
>>> +
>>> +    if (!f)
>>> +    return NULL;
>>> +
>>> +    fence = container_of(f, struct amdgpu_amdkfd_fence, base);
>>> +    if (fence && f->ops == &amd_kfd_fence_ops)
>>> +    return fence;
>>> +
>>> +    return NULL;
>>> +}
>>> +
>>> +static const char *amd_kfd_fence_get_driver_name(struct dma_fence
>>> *f)
>>> +{
>>> +    return "amdgpu_amdkfd_fence";
>>> +}
>>> +
>>> +static const char *amd_kfd_fence_get_timeline_name(struct
>>> dma_fence *f)
>>> +{
>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>> +
>>> +    return fence->timeline_name;
>>> +}
>>> +
>>> +/**
>>> + * amd_kfd_fence_enable_signaling - This gets called when TTM
>>> wants
>>>

Re: [PATCH] drm/amdgpu: fix DW estimation on VI

2018-01-30 Thread Alex Deucher
On Tue, Jan 30, 2018 at 10:03 AM, Christian König
 wrote:
> Forgot to update that during recent changes.
>
> Signed-off-by: Christian König 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 2 +-
>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c  | 4 +++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> index 83dde3b4c3ae..5680ced69359 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> @@ -1628,7 +1628,7 @@ static const struct amdgpu_ring_funcs 
> sdma_v3_0_ring_funcs = {
> 6 + /* sdma_v3_0_ring_emit_hdp_flush */
> 3 + /* hdp invalidate */
> 6 + /* sdma_v3_0_ring_emit_pipeline_sync */
> -   12 + /* sdma_v3_0_ring_emit_vm_flush */
> +   VI_FLUSH_GPU_TLB_NUM_WREG * 3 + 6 + /* 
> sdma_v3_0_ring_emit_vm_flush */
> 10 + 10 + 10, /* sdma_v3_0_ring_emit_fence x3 for user fence, 
> vm fence */
> .emit_ib_size = 7 + 6, /* sdma_v3_0_ring_emit_ib */
> .emit_ib = sdma_v3_0_ring_emit_ib,
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c 
> b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> index e7546d5b301c..0f192ab71205 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> @@ -1530,6 +1530,7 @@ static const struct amdgpu_ring_funcs 
> uvd_v6_0_ring_phys_funcs = {
> .set_wptr = uvd_v6_0_ring_set_wptr,
> .parse_cs = amdgpu_uvd_ring_parse_cs,
> .emit_frame_size =
> +   6 + 6 + /* hdp flush / invalidate */
> 10 + /* uvd_v6_0_ring_emit_pipeline_sync */
> 14, /* uvd_v6_0_ring_emit_fence x1 no user fence */
> .emit_ib_size = 8, /* uvd_v6_0_ring_emit_ib */
> @@ -1541,6 +1542,7 @@ static const struct amdgpu_ring_funcs 
> uvd_v6_0_ring_phys_funcs = {
> .pad_ib = amdgpu_ring_generic_pad_ib,
> .begin_use = amdgpu_uvd_ring_begin_use,
> .end_use = amdgpu_uvd_ring_end_use,
> +   .emit_wreg = uvd_v6_0_ring_emit_wreg,
>  };
>
>  static const struct amdgpu_ring_funcs uvd_v6_0_ring_vm_funcs = {
> @@ -1554,7 +1556,7 @@ static const struct amdgpu_ring_funcs 
> uvd_v6_0_ring_vm_funcs = {
> .emit_frame_size =
> 6 + 6 + /* hdp flush / invalidate */
> 10 + /* uvd_v6_0_ring_emit_pipeline_sync */
> -   20 + /* uvd_v6_0_ring_emit_vm_flush */
> +   VI_FLUSH_GPU_TLB_NUM_WREG * 6 + 8 + /* 
> uvd_v6_0_ring_emit_vm_flush */
> 14 + 14, /* uvd_v6_0_ring_emit_fence x2 vm fence */
> .emit_ib_size = 8, /* uvd_v6_0_ring_emit_ib */
> .emit_ib = uvd_v6_0_ring_emit_ib,
> --
> 2.14.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: All games maded with Unity engine in steam catalog crashes on AMD GPU, but worked on Intel GPU on same machine.

2018-01-30 Thread Marek Olšák
On Tue, Jan 30, 2018 at 10:36 PM, Marek Olšák  wrote:
> On Tue, Jan 30, 2018 at 10:04 PM, mikhail  
> wrote:
>> On Tue, 2018-01-30 at 20:12 +0100, Marek Olšák wrote:
>>> Can you record an apitrace on a driver that is not radeonsi?
>>
>> All traces from five listed games was recorded on Intel GPU (not
>> radeonsi).
>> I also understood why for some games traces was not recorded yesterday.
>> It happens because such games are 32 bit and for correct working
>> apitrace was needed apitrace-libs-7.1-7.fc27.i686 package, but only
>> apitrace-libs-7.1-7.fc27.x86_64 was installed.
>>
>>> If yes, can you correctly replay the apitrace on a driver that is not
>>> radeonsi?
>>
>> All traces was correctly replayed on Intel GPU (not radeonsi).
>>
>>> If yes, can you reproduce the crash if you replay the apitrace on
>>> radeonsi?
>>
>> All traces also was correctly replayed on AMD Vega 56 GPU (radeonsi)
>> without crashes.
>>
>> What does this give us?
>>
>> Anyway launching listed games under AMD Vega 56 GPU lead to new crases.
>
> Thanks. It's possible that Unity contains its own version of LLVM or
> its own version of some standard libraries that LLVM uses, and
> radeonsi doesn't like when games replace its dependencies.

It's also possible that some symbols exported by Unity clash with
those exported by LLVM.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: All games maded with Unity engine in steam catalog crashes on AMD GPU, but worked on Intel GPU on same machine.

2018-01-30 Thread Marek Olšák
On Tue, Jan 30, 2018 at 10:04 PM, mikhail  wrote:
> On Tue, 2018-01-30 at 20:12 +0100, Marek Olšák wrote:
>> Can you record an apitrace on a driver that is not radeonsi?
>
> All traces from five listed games was recorded on Intel GPU (not
> radeonsi).
> I also understood why for some games traces was not recorded yesterday.
> It happens because such games are 32 bit and for correct working
> apitrace was needed apitrace-libs-7.1-7.fc27.i686 package, but only
> apitrace-libs-7.1-7.fc27.x86_64 was installed.
>
>> If yes, can you correctly replay the apitrace on a driver that is not
>> radeonsi?
>
> All traces was correctly replayed on Intel GPU (not radeonsi).
>
>> If yes, can you reproduce the crash if you replay the apitrace on
>> radeonsi?
>
> All traces also was correctly replayed on AMD Vega 56 GPU (radeonsi)
> without crashes.
>
> What does this give us?
>
> Anyway launching listed games under AMD Vega 56 GPU lead to new crases.

Thanks. It's possible that Unity contains its own version of LLVM or
its own version of some standard libraries that LLVM uses, and
radeonsi doesn't like when games replace its dependencies.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: All games maded with Unity engine in steam catalog crashes on AMD GPU, but worked on Intel GPU on same machine.

2018-01-30 Thread mikhail
On Tue, 2018-01-30 at 20:12 +0100, Marek Olšák wrote:
> Can you record an apitrace on a driver that is not radeonsi?

All traces from five listed games was recorded on Intel GPU (not
radeonsi).
I also understood why for some games traces was not recorded yesterday.
It happens because such games are 32 bit and for correct working
apitrace was needed apitrace-libs-7.1-7.fc27.i686 package, but only
apitrace-libs-7.1-7.fc27.x86_64 was installed.

> If yes, can you correctly replay the apitrace on a driver that is not
> radeonsi?

All traces was correctly replayed on Intel GPU (not radeonsi).

> If yes, can you reproduce the crash if you replay the apitrace on
> radeonsi?

All traces also was correctly replayed on AMD Vega 56 GPU (radeonsi)
without crashes.

What does this give us?

Anyway launching listed games under AMD Vega 56 GPU lead to new crases.

--
Best Regards,
Mikhail Gavrilov
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/7] drm/amdgpu: add new emit_reg_wait callback

2018-01-30 Thread Felix Kuehling
On 2018-01-30 07:28 AM, Christian König wrote:
> Felix, Alex anybody brave enough to review this?
>
> Independent of the ATC work it seems like a nice to have cleanup.
Yes. You can add my R-B for the series.

Regards,
  Felix


>
> Regards,
> Christian.
>
> Am 30.01.2018 um 13:09 schrieb Christian König:
>> Allows us to wait for a register value/mask on a ring.
>>
>> Signed-off-by: Christian König 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h  | 1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++
>>   2 files changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index d7930f3ead33..787f79c80b6b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -1796,6 +1796,7 @@ amdgpu_get_sdma_instance(struct amdgpu_ring *ring)
>>   #define amdgpu_ring_emit_cntxcntl(r, d)
>> (r)->funcs->emit_cntxcntl((r), (d))
>>   #define amdgpu_ring_emit_rreg(r, d) (r)->funcs->emit_rreg((r), (d))
>>   #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r),
>> (d), (v))
>> +#define amdgpu_ring_emit_reg_wait(r, d, v, m)
>> (r)->funcs->emit_reg_wait((r), (d), (v), (m))
>>   #define amdgpu_ring_emit_tmz(r, b) (r)->funcs->emit_tmz((r), (b))
>>   #define amdgpu_ring_pad_ib(r, ib) ((r)->funcs->pad_ib((r), (ib)))
>>   #define amdgpu_ring_init_cond_exec(r) (r)->funcs->init_cond_exec((r))
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> index 70d05ec7bc07..867f53332305 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> @@ -150,6 +150,8 @@ struct amdgpu_ring_funcs {
>>   void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
>>   void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
>>   void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg,
>> uint32_t val);
>> +    void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg,
>> +  uint32_t val, uint32_t mask);
>>   void (*emit_tmz)(struct amdgpu_ring *ring, bool start);
>>   /* priority functions */
>>   void (*set_priority) (struct amdgpu_ring *ring,
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/4] drm/amd/pp: Delete debug info when set power_profile_mode

2018-01-30 Thread Alex Deucher
On Tue, Jan 30, 2018 at 1:10 AM, Rex Zhu  wrote:
> Change-Id: I99c148903148bb7143177e023781e408a7ecffb2
> Signed-off-by: Rex Zhu 

Series is:
Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 1 -
>  drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c | 2 --
>  2 files changed, 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index f12b9e7..39ef93a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -719,7 +719,6 @@ static ssize_t amdgpu_set_pp_power_profile_mode(struct 
> device *dev,
> count = -EINVAL;
> goto fail;
> }
> -   pr_info("value is %ld \n", parameter[parameter_size]);
> parameter_size++;
> while (isspace(*tmp_str))
> tmp_str++;
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c 
> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
> index 87f0660..4c259cd 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
> @@ -5095,8 +5095,6 @@ static int vega10_set_power_profile_mode(struct 
> pp_hwmgr *hwmgr, long *input, ui
> PPSMC_MSG_SetCustomGfxDpmParameters,
> busy_set_point | FPS<<8 |
> use_rlc_busy << 16 | 
> min_active_level<<24);
> -   pr_info("size is %d value is %x \n", size, 
> busy_set_point | FPS<<8 |
> -   use_rlc_busy << 16 | 
> min_active_level<<24);
> }
>
> return 0;
> --
> 1.9.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2] drm/amdgpu: Redefine the bit PP_OVERDRIVE_MASK for PP_FEATURES

2018-01-30 Thread Alex Deucher
On Tue, Jan 30, 2018 at 3:24 AM, Rex Zhu  wrote:
> For feature enabled by default, define its bit mask from low bit
> For feature disabled by default, define its bit mask from high bit.
>
> Change-Id: Iaa9ea6306a57b73bd3481fa152f0f01794a88427
> Signed-off-by: Rex Zhu 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   | 2 +-
>  drivers/gpu/drm/amd/powerplay/inc/hwmgr.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index d1a6958..7f08777 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -121,7 +121,7 @@
>  uint amdgpu_sdma_phase_quantum = 32;
>  char *amdgpu_disable_cu = NULL;
>  char *amdgpu_virtual_display = NULL;
> -uint amdgpu_pp_feature_mask = 0x3fff;
> +uint amdgpu_pp_feature_mask = 0x7fff;
>  int amdgpu_ngg = 0;
>  int amdgpu_prim_buf_per_se = 0;
>  int amdgpu_pos_buf_per_se = 0;
> diff --git a/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h 
> b/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h
> index 5512dc2..3ad45ac 100644
> --- a/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h
> +++ b/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h
> @@ -84,7 +84,7 @@ enum PP_FEATURE_MASK {
> PP_OD_FUZZY_FAN_CONTROL_MASK = 0x800,
> PP_SOCCLK_DPM_MASK = 0x1000,
> PP_DCEFCLK_DPM_MASK = 0x2000,
> -   PP_OVERDRIVE_MASK = 0x4000,
> +   PP_OVERDRIVE_MASK = 0x8000, /* disabled by default */

If an option is eventually enabled by default, is the intention that
we move it down to the other side?  I don't really see much value in
this patch.  Better to just adjust the default flags as necessary as
we enable things so the flag bits stay consistent.

Alex

>  };
>
>  enum PHM_BackEnd_Magic {
> --
> 1.9.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 7/7] drm/amdgpu: move waiting for VM flush into gmc_v9_0_emit_flush_gpu_tlb

2018-01-30 Thread Alex Deucher
On Tue, Jan 30, 2018 at 7:09 AM, Christian König
 wrote:
> Keep that at a common place instead of spread over all engines.
>
> Signed-off-by: Christian König 

Series is:
Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 19 +--
>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 18 +++---
>  drivers/gpu/drm/amd/amdgpu/soc15.h |  3 ++-
>  drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c  | 20 ++--
>  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c  |  9 +++--
>  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c  | 20 ++--
>  7 files changed, 33 insertions(+), 60 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 801d4a1dd7db..f7363f821cff 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3679,15 +3679,8 @@ static void gfx_v9_0_ring_emit_vm_flush(struct 
> amdgpu_ring *ring,
> unsigned vmid, unsigned pasid,
> uint64_t pd_addr)
>  {
> -   struct amdgpu_vmhub *hub = &ring->adev->vmhub[ring->funcs->vmhub];
> -   unsigned eng = ring->vm_inv_eng;
> -
> amdgpu_gmc_emit_flush_gpu_tlb(ring, vmid, pasid, pd_addr);
>
> -   /* wait for the invalidate to complete */
> -   gfx_v9_0_wait_reg_mem(ring, 0, 0, 0, hub->vm_inv_eng0_ack + eng,
> - 0, 1 << vmid, 1 << vmid, 0x20);
> -
> /* compute doesn't have PFP */
> if (ring->funcs->type == AMDGPU_RING_TYPE_GFX) {
> /* sync PFP to ME, otherwise we might get invalid PFP reads */
> @@ -4295,7 +4288,9 @@ static const struct amdgpu_ring_funcs 
> gfx_v9_0_ring_funcs_gfx = {
> .emit_frame_size = /* totally 242 maximum if 16 IBs */
> 5 +  /* COND_EXEC */
> 7 +  /* PIPELINE_SYNC */
> -   SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + 9 + /* VM_FLUSH */
> +   SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> +   SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> +   2 + /* VM_FLUSH */
> 8 +  /* FENCE for VM_FLUSH */
> 20 + /* GDS switch */
> 4 + /* double SWITCH_BUFFER,
> @@ -4344,7 +4339,9 @@ static const struct amdgpu_ring_funcs 
> gfx_v9_0_ring_funcs_compute = {
> 7 + /* gfx_v9_0_ring_emit_hdp_flush */
> 5 + /* hdp invalidate */
> 7 + /* gfx_v9_0_ring_emit_pipeline_sync */
> -   SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + 9 + /* 
> gfx_v9_0_ring_emit_vm_flush */
> +   SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> +   SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> +   2 + /* gfx_v9_0_ring_emit_vm_flush */
> 8 + 8 + 8, /* gfx_v9_0_ring_emit_fence x3 for user fence, vm 
> fence */
> .emit_ib_size = 4, /* gfx_v9_0_ring_emit_ib_compute */
> .emit_ib = gfx_v9_0_ring_emit_ib_compute,
> @@ -4376,7 +4373,9 @@ static const struct amdgpu_ring_funcs 
> gfx_v9_0_ring_funcs_kiq = {
> 7 + /* gfx_v9_0_ring_emit_hdp_flush */
> 5 + /* hdp invalidate */
> 7 + /* gfx_v9_0_ring_emit_pipeline_sync */
> -   SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + 9 + /* 
> gfx_v9_0_ring_emit_vm_flush */
> +   SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> +   SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> +   2 + /* gfx_v9_0_ring_emit_vm_flush */
> 8 + 8 + 8, /* gfx_v9_0_ring_emit_fence_kiq x3 for user fence, 
> vm fence */
> .emit_ib_size = 4, /* gfx_v9_0_ring_emit_ib_compute */
> .emit_ib = gfx_v9_0_ring_emit_ib_compute,
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 2b251df94684..2c60981d2eec 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -395,6 +395,10 @@ static uint64_t gmc_v9_0_emit_flush_gpu_tlb(struct 
> amdgpu_ring *ring,
>
> amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_req + eng, req);
>
> +   /* wait for the invalidate to complete */
> +   amdgpu_ring_emit_reg_wait(ring, hub->vm_inv_eng0_ack + eng,
> + 1 << vmid, 1 << vmid);
> +
> return pd_addr;
>  }
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index e1ae39f86adf..ce599fd24412 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -1126,21 +1126,7 @@ static void sdma_v4_0_ring_emit_vm_flush(struct 
> amdgpu_ring *ring,
>  unsigned vmid, unsigned pasid,
>  uint64_t pd_addr)
>  {
> -   struct amdgpu_vmhub *hub = &ring->adev->vmhub[ring->funcs->vmhub];
> -   unsigned eng = ring->vm_inv_eng;
> -
> amdgpu_

Re: [PATCH] drm: amd: Fix trailing semicolons

2018-01-30 Thread Alex Deucher
On Wed, Jan 17, 2018 at 1:50 PM, Luis de Bethencourt  wrote:
> The trailing semicolon is an empty statement that does no operation.
> Removing the two instances of them since they don't do anything.
>
> Signed-off-by: Luis de Bethencourt 

Applied.  thanks!

Alex

> ---
>
> Hi,
>
> After fixing the same thing in drivers/staging/rtl8723bs/, Joe Perches
> suggested I fix it treewide [0].
>
> Best regards
> Luis
>
>
> [0] 
> http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2018-January/115410.html
> [1] 
> http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2018-January/115390.html
>
>  drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 +-
>  drivers/gpu/drm/amd/powerplay/amd_powerplay.c| 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
> b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> index 61e8c3e02d16..33d91e4474ea 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> @@ -718,7 +718,7 @@ static enum link_training_result 
> perform_channel_equalization_sequence(
> uint32_t retries_ch_eq;
> enum dc_lane_count lane_count = lt_settings->link_settings.lane_count;
> union lane_align_status_updated dpcd_lane_status_updated = {{0}};
> -   union lane_status dpcd_lane_status[LANE_COUNT_DP_MAX] = {{{0}}};;
> +   union lane_status dpcd_lane_status[LANE_COUNT_DP_MAX] = {{{0}}};
>
> hw_tr_pattern = get_supported_tp(link);
>
> diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
> b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
> index fa9d1615a2cc..b98a7a8a22b5 100644
> --- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
> +++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
> @@ -162,7 +162,7 @@ static int pp_hw_init(void *handle)
> if(hwmgr->smumgr_funcs->start_smu(pp_handle->hwmgr)) {
> pr_err("smc start failed\n");
> hwmgr->smumgr_funcs->smu_fini(pp_handle->hwmgr);
> -   return -EINVAL;;
> +   return -EINVAL;
> }
> if (ret == PP_DPM_DISABLED)
> goto exit;
> --
> 2.15.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: All games maded with Unity engine in steam catalog crashes on AMD GPU, but worked on Intel GPU on same machine.

2018-01-30 Thread mikhail
Launching games with latest llvm and mesa not solve problem. :(

Current versions:
mesa: 17.4.0-0.18.git32170d8
llvm: 7.0.0-0.1.r322903

/home/mikhail/.local/share/Steam/steamapps/common/Comedy Night/Comedy
Night_Data/Mono/x86/libmono.so(+0x8a0b8) [0xebe5c0b8]
linux-gate.so.1(__kernel_rt_sigreturn+0) [0xf7f6bde0]
linux-gate.so.1(__kernel_vsyscall+0x9) [0xf7f6bdb9]
/lib/libc.so.6(gsignal+0xc2) [0xf7aa3602]
/lib/libc.so.6(abort+0x127) [0xf7aa4ea7]
/lib/libc.so.6(+0x77b7b) [0xf7aeab7b]
/lib/libc.so.6(+0x7f51e) [0xf7af251e]
/lib/libc.so.6(cfree+0x65b) [0xf7af9dcb]
/lib/libLLVM-7.0svn.so(_ZdlPv+0x18) [0xf369a408]
/lib/libLLVM-7.0svn.so(_ZdlPvj+0x18) [0xf36a4b58]
/usr/lib/dri/radeonsi_dri.so(+0x10fd0e) [0xf4b21d0e]
/usr/lib/dri/radeonsi_dri.so(+0x307016) [0xf4d19016]
/usr/lib/dri/radeonsi_dri.so(+0x308c67) [0xf4d1ac67]
/usr/lib/dri/radeonsi_dri.so(+0x309e59) [0xf4d1be59]
/usr/lib/dri/radeonsi_dri.so(+0x572517) [0xf4f84517]
/usr/lib/dri/radeonsi_dri.so(+0x2e8f7c) [0xf4cfaf7c]
/usr/lib/dri/radeonsi_dri.so(+0x2e915d) [0xf4cfb15d]
./Comedy Night.x86() [0x8fe2b70]
./Comedy Night.x86() [0x8faf255]
./Comedy Night.x86() [0x8f890e8]
./Comedy Night.x86() [0x8f8e70e]
./Comedy Night.x86() [0x8f826af]
./Comedy Night.x86() [0x8557807]
/lib/libpthread.so.0(+0x6587) [0xf7f26587]
/lib/libc.so.6(clone+0x66) [0xf7b6fee6]


What else can I do to help?


On Mon, 2018-01-29 at 19:18 +0100, Marek Olšák wrote:
> On Mon, Jan 29, 2018 at 6:51 PM, mikhail  m> wrote:
> > 
> > May be problem in my LLVM and MESA?
> > mesa: 17.4.0-0.16.git41c36c4
> > llvm: 7.0.0-0.1.r322132
> 
> I don't know.
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: LLVM 7.0 + Wayland = broken mouse control

2018-01-30 Thread mikhail
Big thanks, now it works also on the login screen.
And /etc/profile.d/gnome-workaround.sh not needed.

On Tue, 2018-01-30 at 16:52 +, Mike Lothian wrote:
> I think you can set it by:
> 
> echo "allow_rgb10_configs=false" >> /etc/environment
> 
> On Tue, 30 Jan 2018 at 16:38 mikhail 
> wrote:
> > On Tue, 2018-01-30 at 11:55 +0100, Michel Dänzer wrote:
> > >
> > > No, it just means there's an error in the way you're trying to
> > set
> > > the
> > > workaround. Maybe it needs to be "export
> > allow_rgb10_configs=false"?
> > >
> > > You can also try the method I described in
> > > https://bugs.freedesktop.org/show_bug.cgi?id=104808#c2 , I
> > verified
> > > that
> > > to work.
> > >
> > 
> > You right, my mistake I forgot add export.
> > Of cource right variant:
> > echo "export allow_rgb10_configs=false" > /etc/profile.d/gnome-
> > workaround.sh
> > 
> > Mouse click now works again with latest llvm and mesa.
> > It is unfortunate I don't know how apply this variable to the login
> > screen (gdm).
> > 
> > Anyway, thanks.
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: All games maded with Unity engine in steam catalog crashes on AMD GPU, but worked on Intel GPU on same machine.

2018-01-30 Thread Marek Olšák
On Tue, Jan 30, 2018 at 7:08 PM, mikhail  wrote:
> Launching games with latest llvm and mesa not solve problem. :(
>
> Current versions:
> mesa: 17.4.0-0.18.git32170d8
> llvm: 7.0.0-0.1.r322903
>
> /home/mikhail/.local/share/Steam/steamapps/common/Comedy Night/Comedy
> Night_Data/Mono/x86/libmono.so(+0x8a0b8) [0xebe5c0b8]
> linux-gate.so.1(__kernel_rt_sigreturn+0) [0xf7f6bde0]
> linux-gate.so.1(__kernel_vsyscall+0x9) [0xf7f6bdb9]
> /lib/libc.so.6(gsignal+0xc2) [0xf7aa3602]
> /lib/libc.so.6(abort+0x127) [0xf7aa4ea7]
> /lib/libc.so.6(+0x77b7b) [0xf7aeab7b]
> /lib/libc.so.6(+0x7f51e) [0xf7af251e]
> /lib/libc.so.6(cfree+0x65b) [0xf7af9dcb]
> /lib/libLLVM-7.0svn.so(_ZdlPv+0x18) [0xf369a408]
> /lib/libLLVM-7.0svn.so(_ZdlPvj+0x18) [0xf36a4b58]
> /usr/lib/dri/radeonsi_dri.so(+0x10fd0e) [0xf4b21d0e]
> /usr/lib/dri/radeonsi_dri.so(+0x307016) [0xf4d19016]
> /usr/lib/dri/radeonsi_dri.so(+0x308c67) [0xf4d1ac67]
> /usr/lib/dri/radeonsi_dri.so(+0x309e59) [0xf4d1be59]
> /usr/lib/dri/radeonsi_dri.so(+0x572517) [0xf4f84517]
> /usr/lib/dri/radeonsi_dri.so(+0x2e8f7c) [0xf4cfaf7c]
> /usr/lib/dri/radeonsi_dri.so(+0x2e915d) [0xf4cfb15d]
> ./Comedy Night.x86() [0x8fe2b70]
> ./Comedy Night.x86() [0x8faf255]
> ./Comedy Night.x86() [0x8f890e8]
> ./Comedy Night.x86() [0x8f8e70e]
> ./Comedy Night.x86() [0x8f826af]
> ./Comedy Night.x86() [0x8557807]
> /lib/libpthread.so.0(+0x6587) [0xf7f26587]
> /lib/libc.so.6(clone+0x66) [0xf7b6fee6]
>
>
> What else can I do to help?

Can you record an apitrace on a driver that is not radeonsi?
If yes, can you correctly replay the apitrace on a driver that is not radeonsi?
If yes, can you reproduce the crash if you replay the apitrace on radeonsi?

The possible answers to those questions are either "yes" or "no".

Thanks,
Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2

2018-01-30 Thread Deucher, Alexander
Fixed with this patch:

https://lists.freedesktop.org/archives/amd-gfx/2018-January/018472.html


Alex


From: Luís Mendes 
Sent: Tuesday, January 30, 2018 1:30 PM
To: Michel Dänzer; Koenig, Christian
Cc: Deucher, Alexander; Zhou, David(ChunMing); amd-gfx@lists.freedesktop.org
Subject: Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - 
Update 2

Hi everyone,

I've tested the kernel from amd-drm-next-4.17-wip at commit
9ab2894122275a6d636bb2654a157e88a0f7b9e2 (
drm/amdgpu: set DRIVER_ATOMIC flag early) on ARMv7l, and the reported
issues seem now to have gone. I haven't checked from which commit this
is fixed, but it is now fixed! I also noticed a performance
improvement in one of the glmark2 tests.

There seem to be some other small issues, possibly unrelated, such
that sometimes the screen becomes black and the sound stops while
playing the video for a second or less and then normal playback is
recovered, this happens rarely and at most once per power cycle, while
using X and Kodi, despite I have played many individual videos and
power cycled the machine sometimes.

I've also observed what was already reported, when watching non-VP9 videos:
[  591.729558] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.740255] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.750968] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.761628] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.772248] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.782672] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.793172] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.803681] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.814129] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.824560] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.835054] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.845437] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.855860] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.866415] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.876945] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.887454] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!

Regards,
Luís Mendes

On Wed, Jan 3, 2018 at 11:08 PM, Luís Mendes  wrote:
> Hi Michel, Christian,
>
> Michel, I have tested amd-staging-drm-next at commit "drm/amdgpu/gfx9:
> only init the apertures used by KGD (v2)" -
> 0e4946409d11913523d30bc4830d10b388438c7a and the issues remain, both
> on ARMv7 and on x86 amd64.
>
> Christian, in fact if I replay the apitraces obtained on the ARMv7
> platform on the AMD64 I am also able to reproduce the GPU hang! So it
> is not ARM platform specific. Should I send/upload the apitraces? I
> have two of them, typically when one doesn't hang the gpu the other
> hangs. One takes about 1GB of disk space while the other takes 2.3GB.
> ...
> [   69.019381] ISO 9660 Extensions: RRIP_1991A
> [  213.292094] DMAR: DRHD: handling fault status reg 2
> [  213.292102] DMAR: [INTR-REMAP] Request device [00:00.0] fault index
> 1c [fault reason 38] Blocked an interrupt request due to source-id
> verification failure
> [  223.406919] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> timeout, last signaled seq=25158, last emitted seq=25160
> [  223.406926] [drm] IP block:tonga_ih is hung!
> [  223.407167] [drm] GPU recovery disabled.
>
> Regards,
> Luís
>
>
> On Wed, Jan 3, 2018 at 5:47 PM, Luís Mendes  wrote:
>> Hi Michel, Christian,
>>
>> Christian, I have followed your suggestion and I have just submitted a
>> bug to fdo at https://bugs.freedesktop.org/show_bug.cgi?id=104481 -
>> GPU lockup Polaris 11 - AMD RX 460 and RX 550 on amd64 and on ARMv7
>> platforms while playing video.
>>
>> Michel, amdgpu.dc=0 seems to make no difference. I will try
>> amd-staging-drm-next and report back.
>>
>> Regards,
>> Luís
>>
>> On Wed, Jan 3, 2018 at 5:09 PM, Michel Dänzer  wrote:
>>> On 2018-01-03 12:02 PM, Luís Mendes wrote:

 What I believe it seems to be the case is that the GPU lock up only
 happens when doing a page flip, since the kernel locks with:
 [  243.

Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2

2018-01-30 Thread Luís Mendes
Hi everyone,

I've tested the kernel from amd-drm-next-4.17-wip at commit
9ab2894122275a6d636bb2654a157e88a0f7b9e2 (
drm/amdgpu: set DRIVER_ATOMIC flag early) on ARMv7l, and the reported
issues seem now to have gone. I haven't checked from which commit this
is fixed, but it is now fixed! I also noticed a performance
improvement in one of the glmark2 tests.

There seem to be some other small issues, possibly unrelated, such
that sometimes the screen becomes black and the sound stops while
playing the video for a second or less and then normal playback is
recovered, this happens rarely and at most once per power cycle, while
using X and Kodi, despite I have played many individual videos and
power cycled the machine sometimes.

I've also observed what was already reported, when watching non-VP9 videos:
[  591.729558] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.740255] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.750968] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.761628] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.772248] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.782672] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.793172] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.803681] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.814129] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.824560] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.835054] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.845437] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.855860] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.866415] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.876945] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.887454] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!

Regards,
Luís Mendes

On Wed, Jan 3, 2018 at 11:08 PM, Luís Mendes  wrote:
> Hi Michel, Christian,
>
> Michel, I have tested amd-staging-drm-next at commit "drm/amdgpu/gfx9:
> only init the apertures used by KGD (v2)" -
> 0e4946409d11913523d30bc4830d10b388438c7a and the issues remain, both
> on ARMv7 and on x86 amd64.
>
> Christian, in fact if I replay the apitraces obtained on the ARMv7
> platform on the AMD64 I am also able to reproduce the GPU hang! So it
> is not ARM platform specific. Should I send/upload the apitraces? I
> have two of them, typically when one doesn't hang the gpu the other
> hangs. One takes about 1GB of disk space while the other takes 2.3GB.
> ...
> [   69.019381] ISO 9660 Extensions: RRIP_1991A
> [  213.292094] DMAR: DRHD: handling fault status reg 2
> [  213.292102] DMAR: [INTR-REMAP] Request device [00:00.0] fault index
> 1c [fault reason 38] Blocked an interrupt request due to source-id
> verification failure
> [  223.406919] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> timeout, last signaled seq=25158, last emitted seq=25160
> [  223.406926] [drm] IP block:tonga_ih is hung!
> [  223.407167] [drm] GPU recovery disabled.
>
> Regards,
> Luís
>
>
> On Wed, Jan 3, 2018 at 5:47 PM, Luís Mendes  wrote:
>> Hi Michel, Christian,
>>
>> Christian, I have followed your suggestion and I have just submitted a
>> bug to fdo at https://bugs.freedesktop.org/show_bug.cgi?id=104481 -
>> GPU lockup Polaris 11 - AMD RX 460 and RX 550 on amd64 and on ARMv7
>> platforms while playing video.
>>
>> Michel, amdgpu.dc=0 seems to make no difference. I will try
>> amd-staging-drm-next and report back.
>>
>> Regards,
>> Luís
>>
>> On Wed, Jan 3, 2018 at 5:09 PM, Michel Dänzer  wrote:
>>> On 2018-01-03 12:02 PM, Luís Mendes wrote:

 What I believe it seems to be the case is that the GPU lock up only
 happens when doing a page flip, since the kernel locks with:
 [  243.693200] kworker/u4:3D089  2 0x
 [  243.693232] Workqueue: events_unbound commit_work [drm_kms_helper]
 [  243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>] 
 (schedule+0x4c/0xac)
 [  243.693259] [<80b8cdd0>] (schedule) from [<80b91024>]
 (schedule_timeout+0x228/0x444)
 [  243.693270] [<80b91024>] (schedule_timeout) from [<80886738>]
 (dma

Re: [PATCH 11/13] drm/amdgpu: add DRM_AMDGPU_ATC config option

2018-01-30 Thread Felix Kuehling

On 2018-01-30 08:59 AM, Christian König wrote:
> Am 29.01.2018 um 23:08 schrieb Felix Kuehling:
>> On 2018-01-26 03:13 PM, Christian König wrote:
>>> [SNIP]
>>> +#ifdef CONFIG_DRM_AMDGPU_ATC
>>> +    r = amd_iommu_init_device(adev->pdev, 0x1);
>> KFD queries how many PASIDs the IOMMU can support with
>> amd_iommu_device_info. KFD only assigns PASIDs within that range. It can
>> be much smaller than the 16-bits supported by the GPU.
>>
>> For a VM that uses ATC, you need to make sure it gets a PASID in the
>> range supported by the IOMMU. The PASID manager already supports that
>> and keeps smaller PASIDs for users that really need them.
>
> Yeah, seen that and I'm not really keen about it.
>
> Especially since we need multiple types of PASIDs here:
> 1. For GPUVM debugging and HMM faults, where we can use the full 16bit
> range without worrying about what IOMMU can do.
> 2. For ATC use case where we need to keep the IOMMU in the picture.
>
> Are there any hardware limitations which blocks us from using a per
> device PASID? That would simplify the whole handling quite a bit.

Conceptually PASIDs are device-specific process IDs. KFD uses the same
PASID on all devices, but that's not necessary.

Currently you allocate PASIDs per VM or per open device file. So you'll
end up with different PASIDs on each device. I think you can even end up
with multiple PASIDs in each process even on the same device, if you
create multiple VMs. That doesn't seem to be a problem for the IOMMU driver.

>
> Additional to that we don't really want this direct relationship
> between amdgpu/amdkfd and the amd_iommu_v2 driver.

Not sure what you mean. Right now amdgpu and amdkfd don't coordinate
their PASIDs (other than using a common allocator to avoid using the
same PASID for different things).

>
> So what do you think about moving the PASID handling into the IOMMU
> driver? And abstracting which driver is in use through the iommu_ops?

What if both drivers want to use the IOMMU?

Regards,
  Felix

>
> Regards,
> Christian.
>
>>
>> Regards,
>>    Felix

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: LLVM 7.0 + Wayland = broken mouse control

2018-01-30 Thread Mike Lothian
I think you can set it by:

echo "allow_rgb10_configs=false" >> /etc/environment

On Tue, 30 Jan 2018 at 16:38 mikhail  wrote:

> On Tue, 2018-01-30 at 11:55 +0100, Michel Dänzer wrote:
> >
> > No, it just means there's an error in the way you're trying to set
> > the
> > workaround. Maybe it needs to be "export allow_rgb10_configs=false"?
> >
> > You can also try the method I described in
> > https://bugs.freedesktop.org/show_bug.cgi?id=104808#c2 , I verified
> > that
> > to work.
> >
>
> You right, my mistake I forgot add export.
> Of cource right variant:
> echo "export allow_rgb10_configs=false" > /etc/profile.d/gnome-
> workaround.sh
>
> Mouse click now works again with latest llvm and mesa.
> It is unfortunate I don't know how apply this variable to the login
> screen (gdm).
>
> Anyway, thanks.
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 01/13] drm/amdgpu: fix vcn_v1_0_dec_ring_emit_wreg

2018-01-30 Thread Felix Kuehling
On 2018-01-30 08:51 AM, Christian König wrote:
> Am 30.01.2018 um 00:01 schrieb Felix Kuehling:
>> This is a very cool patch series. I made some specific comments on some
>> of the patches. But overall this is great.
>
> Thanks, going to comment on some of those on the patches.
>
>> I guess your plan is to start testing the SVM programming model with
>> ATC, and then enable the same programming model without the IOMMU using
>> HMM. That means there could be working and validated tests to run on
>> HMM.
>
> Yes, exactly. Well I think it would make implementing HMM much easier
> if we could rely on most of the mapping being done by the ATC instead
> of manually crafted GPUVM page tables.
>> I think 4.16 will get an IOMMUv2 driver that we worked on for Raven to
>> support PPR on newer IOMMU versions. Without that the GPU is not able to
>> make swapped or new pages resident or trigger a COW.
>
> Uff? So the ATC is actually not able to handle page faults?

It's supposed to. It worked on Carrizo. As I understand it, the new
IOMMUv2 on Raven needs the driver to enable that feature per-device.
That fix is coming in 4.16.

>
>> We're currently still working on one problem with Raven, related to the
>> way GFX9 retries memory accesses. Many PPR requests for the same virtual
>> address can be outstanding (in an IOMMU log buffer). After the first
>> request is handled, the GPU can continue, but the remaining requests are
>> still in the queue. This can result in the IOMMU driver trying to handle
>> a PPR for a page that's already freed by the application, which triggers
>> an invalid PPR callback.
>>
>> An invalid PPR is like the GPU-equivalent of a segfault, and KFD
>> implements it like that. With the above behaviour we end up segfaulting
>> applications that didn't do anything wrong. I guess for your
>> implementation it's not a problem because you don't implement that
>> callback yet.
>
> Yeah, that is exactly the same problem I'm currently running into with
> HMM.
>
> The interrupt handling is pipelined (even much much more than the ATC
> path), so what can happen is that applications free up some memory but
> we have stale page faults for that page in the pipeline.
>
> The only valid workaround I can see is to make sure interrupts are
> processed before returning to HMM that it can unmap pages, and that is
> a really show stopper for performance as far as I can see.

For ATC my idea is to use an invalidate_range_start MMU notifier to wait
for pending PPRs to get processed before letting the kernel unmap pages.
The underlying assumption is this: when an application frees memory, it
must be sure it's not using that memory any more. So we don't expect any
new faults for the address. We only need to wait for pending faults to
flush out of the pipe.

For HMM I think the prescreen interrupt handler stage I added should
help, so you only see one interrupt per faulting address. But you need
to figure out the right moment to clear the fault after updating the
page table.

Regards,
  Felix

>
> Regards,
> Christian.
>
>>
>> Regards,
>>    Felix
>>
>>
>> On 2018-01-26 03:13 PM, Christian König wrote:
>>> That got mixed up with the encode ring function.
>>>
>>> Signed-off-by: Christian König 
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 19 ++-
>>>   1 file changed, 18 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>> index 44c041a1fe68..24ebc3e296a6 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>> @@ -880,6 +880,22 @@ static void
>>> vcn_v1_0_dec_ring_emit_vm_flush(struct amdgpu_ring *ring,
>>>   vcn_v1_0_dec_vm_reg_wait(ring, data0, data1, mask);
>>>   }
>>>   +static void vcn_v1_0_dec_ring_emit_wreg(struct amdgpu_ring *ring,
>>> +    uint32_t reg, uint32_t val)
>>> +{
>>> +    struct amdgpu_device *adev = ring->adev;
>>> +
>>> +    amdgpu_ring_write(ring,
>>> +    PACKET0(SOC15_REG_OFFSET(UVD, 0, mmUVD_GPCOM_VCPU_DATA0), 0));
>>> +    amdgpu_ring_write(ring, reg << 2);
>>> +    amdgpu_ring_write(ring,
>>> +    PACKET0(SOC15_REG_OFFSET(UVD, 0, mmUVD_GPCOM_VCPU_DATA1), 0));
>>> +    amdgpu_ring_write(ring, val);
>>> +    amdgpu_ring_write(ring,
>>> +    PACKET0(SOC15_REG_OFFSET(UVD, 0, mmUVD_GPCOM_VCPU_CMD), 0));
>>> +    amdgpu_ring_write(ring, VCN_DEC_CMD_WRITE_REG << 1);
>>> +}
>>> +
>>>   /**
>>>    * vcn_v1_0_enc_ring_get_rptr - get enc read pointer
>>>    *
>>> @@ -1097,7 +1113,7 @@ static const struct amdgpu_ring_funcs
>>> vcn_v1_0_dec_ring_vm_funcs = {
>>>   .pad_ib = amdgpu_ring_generic_pad_ib,
>>>   .begin_use = amdgpu_vcn_ring_begin_use,
>>>   .end_use = amdgpu_vcn_ring_end_use,
>>> -    .emit_wreg = vcn_v1_0_enc_ring_emit_wreg,
>>> +    .emit_wreg = vcn_v1_0_dec_ring_emit_wreg,
>>>   };
>>>     static const struct amdgpu_ring_funcs vcn_v1_0_enc_ring_vm_funcs
>>> = {
>>> @@ -1124,6 +1140,7 @@ static cons

Re: LLVM 7.0 + Wayland = broken mouse control

2018-01-30 Thread mikhail
On Tue, 2018-01-30 at 11:55 +0100, Michel Dänzer wrote:
> 
> No, it just means there's an error in the way you're trying to set
> the
> workaround. Maybe it needs to be "export allow_rgb10_configs=false"?
> 
> You can also try the method I described in
> https://bugs.freedesktop.org/show_bug.cgi?id=104808#c2 , I verified
> that
> to work.
> 

You right, my mistake I forgot add export.
Of cource right variant:
echo "export allow_rgb10_configs=false" > /etc/profile.d/gnome-
workaround.sh

Mouse click now works again with latest llvm and mesa.
It is unfortunate I don't know how apply this variable to the login
screen (gdm).

Anyway, thanks.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH v2 1/1] drm: add kernel doc for exported gem dmabuf_ops

2018-01-30 Thread Li, Samuel

Alex helped push this into drm-tip,
https://cgit.freedesktop.org/drm/drm-tip/commit/drivers/gpu/drm?id=f7a71b0cf9e36c1b0edbfe89ce028a01164b864d

Thanks,
Samuel Li


> -Original Message-
> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Tuesday, January 30, 2018 4:33 AM
> To: Koenig, Christian 
> Cc: Li, Samuel ; amd-gfx@lists.freedesktop.org; dri-
> de...@lists.freedesktop.org
> Subject: Re: [PATCH v2 1/1] drm: add kernel doc for exported gem
> dmabuf_ops
> 
> On Fri, Jan 19, 2018 at 09:51:20AM +0100, Christian König wrote:
> > Am 18.01.2018 um 22:44 schrieb Samuel Li:
> > > Signed-off-by: Samuel Li 
> > > Reviewed-by: Daniel Vetter 
> 
> Thanks for updating the docs.
> >
> > Reviewed-by: Christian König 
> 
> I'm assuming you'll als push this one.
> -Daniel
> 
> >
> > > ---
> > >   drivers/gpu/drm/drm_prime.c | 88
> +
> > >   1 file changed, 88 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/drm_prime.c
> > > b/drivers/gpu/drm/drm_prime.c index ca09ce7..e82a976 100644
> > > --- a/drivers/gpu/drm/drm_prime.c
> > > +++ b/drivers/gpu/drm/drm_prime.c
> > > @@ -73,6 +73,9 @@
> > >* Drivers should detect this situation and return back the gem object
> > >* from the dma-buf private.  Prime will do this automatically for 
> > > drivers
> that
> > >* use the drm_gem_prime_{import,export} helpers.
> > > + *
> > > + * GEM struct &dma_buf_ops symbols are now exported. They can be
> > > + resued by
> > > + * drivers which implement GEM interface.
> > >*/
> > >   struct drm_prime_member {
> > > @@ -180,6 +183,18 @@ static int drm_prime_lookup_buf_handle(struct
> drm_prime_file_private *prime_fpri
> > >   return -ENOENT;
> > >   }
> > > +/**
> > > + * drm_gem_map_attach - dma_buf attach implementation for GEM
> > > + * @dma_buf: buffer to attach device to
> > > + * @target_dev: not used
> > > + * @attach: buffer attachment data
> > > + *
> > > + * Allocates &drm_prime_attachment and calls
> > > +&drm_driver.gem_prime_pin for
> > > + * device specific attachment. This can be used as the
> > > +&dma_buf_ops.attach
> > > + * callback.
> > > + *
> > > + * Returns 0 on success, negative error code on failure.
> > > + */
> > >   int drm_gem_map_attach(struct dma_buf *dma_buf, struct device
> *target_dev,
> > >  struct dma_buf_attachment *attach)
> > >   {
> > > @@ -201,6 +216,14 @@ int drm_gem_map_attach(struct dma_buf
> *dma_buf, struct device *target_dev,
> > >   }
> > >   EXPORT_SYMBOL(drm_gem_map_attach);
> > > +/**
> > > + * drm_gem_map_detach - dma_buf detach implementation for GEM
> > > + * @dma_buf: buffer to detach from
> > > + * @attach: attachment to be detached
> > > + *
> > > + * Cleans up &dma_buf_attachment. This can be used as the
> > > +&dma_buf_ops.detach
> > > + * callback.
> > > + */
> > >   void drm_gem_map_detach(struct dma_buf *dma_buf,
> > >   struct dma_buf_attachment *attach)
> > >   {
> > > @@ -255,6 +278,18 @@ void
> drm_prime_remove_buf_handle_locked(struct drm_prime_file_private
> *prime_fpr
> > >   }
> > >   }
> > > +/**
> > > + * drm_gem_map_dma_buf - map_dma_buf implementation for GEM
> > > + * @attach: attachment whose scatterlist is to be returned
> > > + * @dir: direction of DMA transfer
> > > + *
> > > + * Calls &drm_driver.gem_prime_get_sg_table and then maps the
> > > +scatterlist. This
> > > + * can be used as the &dma_buf_ops.map_dma_buf callback.
> > > + *
> > > + * Returns sg_table containing the scatterlist to be returned;
> > > +returns ERR_PTR
> > > + * on error. May return -EINTR if it is interrupted by a signal.
> > > + */
> > > +
> > >   struct sg_table *drm_gem_map_dma_buf(struct dma_buf_attachment
> *attach,
> > >enum dma_data_direction dir)
> > >   {
> > > @@ -294,6 +329,12 @@ struct sg_table *drm_gem_map_dma_buf(struct
> dma_buf_attachment *attach,
> > >   }
> > >   EXPORT_SYMBOL(drm_gem_map_dma_buf);
> > > +/**
> > > + * drm_gem_unmap_dma_buf - unmap_dma_buf implementation for
> GEM
> > > + *
> > > + * Not implemented. The unmap is done at drm_gem_map_detach().
> > > +This can be
> > > + * used as the &dma_buf_ops.unmap_dma_buf callback.
> > > + */
> > >   void drm_gem_unmap_dma_buf(struct dma_buf_attachment *attach,
> > >  struct sg_table *sgt,
> > >  enum dma_data_direction dir) @@ -351,6 
> > > +392,15
> @@ void
> > > drm_gem_dmabuf_release(struct dma_buf *dma_buf)
> > >   }
> > >   EXPORT_SYMBOL(drm_gem_dmabuf_release);
> > > +/**
> > > + * drm_gem_dmabuf_vmap - dma_buf vmap implementation for GEM
> > > + * @dma_buf: buffer to be mapped
> > > + *
> > > + * Sets up a kernel virtual mapping. This can be used as the
> > > +&dma_buf_ops.vmap
> > > + * callback.
> > > + *
> > > + * Returns the kernel virtual address.
> > > + */
> > >   void *drm_gem_dmabuf_vmap(struct dma_buf *dma_buf)
> > >   {
> > > 

Re: [RFC] Per file OOM badness

2018-01-30 Thread Michel Dänzer
On 2018-01-30 12:56 PM, Christian König wrote:
> Am 30.01.2018 um 12:42 schrieb Michel Dänzer:
>> On 2018-01-30 12:36 PM, Nicolai Hähnle wrote:
>>> On 30.01.2018 12:34, Michel Dänzer wrote:
 On 2018-01-30 12:28 PM, Christian König wrote:
> Am 30.01.2018 um 12:02 schrieb Michel Dänzer:
>> On 2018-01-30 11:40 AM, Christian König wrote:
>>> Am 30.01.2018 um 10:43 schrieb Michel Dänzer:
 [SNIP]
> Would it be ok to hang onto potentially arbitrary mmget references
> essentially forever? If that's ok I think we can do your process
> based
> account (minus a few minor inaccuracies for shared stuff perhaps,
> but no
> one cares about that).
 Honestly, I think you and Christian are overthinking this. Let's
 try
 charging the memory to every process which shares a buffer, and go
 from
 there.
>>> My problem is that this needs to be bullet prove.
>>>
>>> For example imagine an application which allocates a lot of BOs,
>>> then
>>> calls fork() and let the parent process die. The file descriptor
>>> lives
>>> on in the child process, but the memory is not accounted against the
>>> child.
>> What exactly are you referring to by "the file descriptor" here?
> The file descriptor used to identify the connection to the driver. In
> other words our drm_file structure in the kernel.
>
>> What happens to BO handles in general in this case? If both parent
>> and
>> child process keep the same handle for the same BO, one of them
>> destroying the handle will result in the other one not being able to
>> use
>> it anymore either, won't it?
> Correct.
>
> That usage is actually not useful at all, but we already had
> applications which did exactly that by accident.
>
> Not to mention that somebody could do it on purpose.
 Can we just prevent child processes from using their parent's DRM file
 descriptors altogether? Allowing it seems like a bad idea all around.
>>> Existing protocols pass DRM fds between processes though, don't they?
>>>
>>> Not child processes perhaps, but special-casing that seems like awful
>>> design.
>> Fair enough.
>>
>> Can we disallow passing DRM file descriptors which have any buffers
>> allocated? :)
> 
> Hehe good point, but I'm sorry I have to ruin that.
> 
> The root VM page table is allocated when the DRM file descriptor is
> created and we want to account those to whoever uses the file descriptor
> as well.

Alternatively, since the file descriptor is closed in the sending
process in this case, maybe we can "uncharge" the buffer memory from the
sending process and charge it to the receiving one during the transfer?


> Looking into the fs layer there actually only seem to be two function
> which are involved when a file descriptor is installed/removed from a
> process. So we just need to add some callbacks there.

That could work for file descriptor passing, but I'm not sure it really
helps for the fork case. Let's say we charge the buffer memory to the
child process as well. If either process later destroys a buffer handle,
the buffer becomes inaccessible to the other process as well, however
its memory remains charged to it (even though it may already be freed).

I think using a DRM file descriptor in both parent and child processes
is a pathological case that we really want to prevent rather than
worrying about how to make it work well. It doesn't seem to be working
well in general already anyway.


Maybe we could keep track of which process "owns" a DRM file descriptor,
and return an error from any relevant system calls for it from another
process. When passing an fd, its ownership would transfer to the
receiving process. When forking, the ownership would remain with the
parent process.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix DW estimation on VI

2018-01-30 Thread Tom St Denis

Tested-by: Tom St Denis 

Works for my configurations.  Thanks!

Tom

On 30/01/18 10:03 AM, Christian König wrote:

Forgot to update that during recent changes.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 2 +-
  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c  | 4 +++-
  2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
index 83dde3b4c3ae..5680ced69359 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
@@ -1628,7 +1628,7 @@ static const struct amdgpu_ring_funcs 
sdma_v3_0_ring_funcs = {
6 + /* sdma_v3_0_ring_emit_hdp_flush */
3 + /* hdp invalidate */
6 + /* sdma_v3_0_ring_emit_pipeline_sync */
-   12 + /* sdma_v3_0_ring_emit_vm_flush */
+   VI_FLUSH_GPU_TLB_NUM_WREG * 3 + 6 + /* 
sdma_v3_0_ring_emit_vm_flush */
10 + 10 + 10, /* sdma_v3_0_ring_emit_fence x3 for user fence, 
vm fence */
.emit_ib_size = 7 + 6, /* sdma_v3_0_ring_emit_ib */
.emit_ib = sdma_v3_0_ring_emit_ib,
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
index e7546d5b301c..0f192ab71205 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
@@ -1530,6 +1530,7 @@ static const struct amdgpu_ring_funcs 
uvd_v6_0_ring_phys_funcs = {
.set_wptr = uvd_v6_0_ring_set_wptr,
.parse_cs = amdgpu_uvd_ring_parse_cs,
.emit_frame_size =
+   6 + 6 + /* hdp flush / invalidate */
10 + /* uvd_v6_0_ring_emit_pipeline_sync */
14, /* uvd_v6_0_ring_emit_fence x1 no user fence */
.emit_ib_size = 8, /* uvd_v6_0_ring_emit_ib */
@@ -1541,6 +1542,7 @@ static const struct amdgpu_ring_funcs 
uvd_v6_0_ring_phys_funcs = {
.pad_ib = amdgpu_ring_generic_pad_ib,
.begin_use = amdgpu_uvd_ring_begin_use,
.end_use = amdgpu_uvd_ring_end_use,
+   .emit_wreg = uvd_v6_0_ring_emit_wreg,
  };
  
  static const struct amdgpu_ring_funcs uvd_v6_0_ring_vm_funcs = {

@@ -1554,7 +1556,7 @@ static const struct amdgpu_ring_funcs 
uvd_v6_0_ring_vm_funcs = {
.emit_frame_size =
6 + 6 + /* hdp flush / invalidate */
10 + /* uvd_v6_0_ring_emit_pipeline_sync */
-   20 + /* uvd_v6_0_ring_emit_vm_flush */
+   VI_FLUSH_GPU_TLB_NUM_WREG * 6 + 8 + /* 
uvd_v6_0_ring_emit_vm_flush */
14 + 14, /* uvd_v6_0_ring_emit_fence x2 vm fence */
.emit_ib_size = 8, /* uvd_v6_0_ring_emit_ib */
.emit_ib = uvd_v6_0_ring_emit_ib,



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 06/25] drm/amdgpu: Add KFD eviction fence

2018-01-30 Thread Christian König

Am 30.01.2018 um 16:28 schrieb Kasiviswanathan, Harish:

[+Harish, forgot to acknowledge him in the commit description, will fix
that in v2]

Harish, please see Christian's question below in amd_kfd_fence_signal.
Did I understand this correctly?

[HK]: Yes the lifetime of eviction fences is tied to the lifetime of the 
process associated with it. When the process terminates the fence is signaled 
and released. For all the BOs that belong to this process the eviction should 
be detached from it when the BO is released. However, this eviction fence could 
be still attached to shared BOs. So signaling it frees those BOs.


On 2018-01-29 08:43 AM, Christian König wrote:

Hi Felix & Harish,

maybe explain why I found that odd: dma_fence_add_callback() sets the
DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT flag before adding the callback.

So the flag should always be set when there are callbacks.
Did I miss anything?

I don't think we add any callbacks to our eviction fences.

[HK] Setting DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is not required. It was my 
oversight. Since, dma_fence_signal() function called cb_list functions only if 
DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is set, I thought it was safe to set it. 
However, the cb_list would be empty if no callbacks are added. So setting 
DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is redundant.


Ok in this case let's just remove that and also use the 
dma_fence_signal() function (not the _locked variant) for signaling the 
DMA fence.


Thanks,
Christian.



Best Regards,
Harish

  


Regards,
   Felix


Regards,
Christian.

Am 29.01.2018 um 00:55 schrieb Felix Kuehling:

[+Harish, forgot to acknowledge him in the commit description, will fix
that in v2]

Harish, please see Christian's question below in amd_kfd_fence_signal.
Did I understand this correctly?

Regards,
    Felix

On 2018-01-28 06:42 PM, Felix Kuehling wrote:

On 2018-01-27 04:16 AM, Christian König wrote:

Am 27.01.2018 um 02:09 schrieb Felix Kuehling:

[snip]

+struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
+   void *mm)
+{
+    struct amdgpu_amdkfd_fence *fence = NULL;
+
+    fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+    if (fence == NULL)
+    return NULL;
+
+    /* mm_struct mm is used as void pointer to identify the parent
+ * KFD process. Don't dereference it. Fence and any threads
using
+ * mm is guranteed to be released before process termination.
+ */
+    fence->mm = mm;

That won't work. Fences can live much longer than any process who
created them.

I've already found a fence in a BO still living hours after the
process was killed and the pid long recycled.

I suggest to make fence->mm a real mm_struct pointer with reference
counting and then set it to NULL and drop the reference in
enable_signaling.

I agree. But enable_signaling may be too early to drop the reference.
amd_kfd_fence_check_mm could still be called later from
amdgpu_ttm_bo_eviction_valuable, as long as the fence hasn't
signaled yet.

The safe place is problably in amd_kfd_fence_release.


+    get_task_comm(fence->timeline_name, current);
+    spin_lock_init(&fence->lock);
+
+    dma_fence_init(&fence->base, &amd_kfd_fence_ops, &fence->lock,
+   context, atomic_inc_return(&fence_seq));
+
+    return fence;
+}
+
+struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct
dma_fence *f)
+{
+    struct amdgpu_amdkfd_fence *fence;
+
+    if (!f)
+    return NULL;
+
+    fence = container_of(f, struct amdgpu_amdkfd_fence, base);
+    if (fence && f->ops == &amd_kfd_fence_ops)
+    return fence;
+
+    return NULL;
+}
+
+static const char *amd_kfd_fence_get_driver_name(struct dma_fence
*f)
+{
+    return "amdgpu_amdkfd_fence";
+}
+
+static const char *amd_kfd_fence_get_timeline_name(struct
dma_fence *f)
+{
+    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+
+    return fence->timeline_name;
+}
+
+/**
+ * amd_kfd_fence_enable_signaling - This gets called when TTM wants
to evict
+ *  a KFD BO and schedules a job to move the BO.
+ *  If fence is already signaled return true.
+ *  If fence is not signaled schedule a evict KFD process work item.
+ */
+static bool amd_kfd_fence_enable_signaling(struct dma_fence *f)
+{
+    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+
+    if (!fence)
+    return false;
+
+    if (dma_fence_is_signaled(f))
+    return true;
+
+    if (!kgd2kfd->schedule_evict_and_restore_process(
+    (struct mm_struct *)fence->mm, f))
+    return true;
+
+    return false;
+}
+
+static int amd_kfd_fence_signal(struct dma_fence *f)
+{
+    unsigned long flags;
+    int ret;
+
+    spin_lock_irqsave(f->lock, flags);
+    /* Set enabled bit so cb will called */
+    set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &f->flags);

Mhm, why is that necessary?

This only gets called from fence_release below. I think this is to
avoid
needlessly scheduling an eviction/restore cycle when an eviction fence
gets destroyed that 

RE: [PATCH 06/25] drm/amdgpu: Add KFD eviction fence

2018-01-30 Thread Kasiviswanathan, Harish
>> [+Harish, forgot to acknowledge him in the commit description, will fix
>> that in v2]
>>
>> Harish, please see Christian's question below in amd_kfd_fence_signal.
>> Did I understand this correctly?

[HK]: Yes the lifetime of eviction fences is tied to the lifetime of the 
process associated with it. When the process terminates the fence is signaled 
and released. For all the BOs that belong to this process the eviction should 
be detached from it when the BO is released. However, this eviction fence could 
be still attached to shared BOs. So signaling it frees those BOs.


On 2018-01-29 08:43 AM, Christian König wrote:
> Hi Felix & Harish,
>
> maybe explain why I found that odd: dma_fence_add_callback() sets the
> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT flag before adding the callback.
>
> So the flag should always be set when there are callbacks.
> Did I miss anything?

I don't think we add any callbacks to our eviction fences.

[HK] Setting DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is not required. It was my 
oversight. Since, dma_fence_signal() function called cb_list functions only if 
DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is set, I thought it was safe to set it. 
However, the cb_list would be empty if no callbacks are added. So setting 
DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is redundant.

Best Regards,
Harish

 

Regards,
  Felix

>
> Regards,
> Christian.
>
> Am 29.01.2018 um 00:55 schrieb Felix Kuehling:
>> [+Harish, forgot to acknowledge him in the commit description, will fix
>> that in v2]
>>
>> Harish, please see Christian's question below in amd_kfd_fence_signal.
>> Did I understand this correctly?
>>
>> Regards,
>>    Felix
>>
>> On 2018-01-28 06:42 PM, Felix Kuehling wrote:
>>> On 2018-01-27 04:16 AM, Christian König wrote:
 Am 27.01.2018 um 02:09 schrieb Felix Kuehling:
>>> [snip]
> +struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
> +   void *mm)
> +{
> +    struct amdgpu_amdkfd_fence *fence = NULL;
> +
> +    fence = kzalloc(sizeof(*fence), GFP_KERNEL);
> +    if (fence == NULL)
> +    return NULL;
> +
> +    /* mm_struct mm is used as void pointer to identify the parent
> + * KFD process. Don't dereference it. Fence and any threads
> using
> + * mm is guranteed to be released before process termination.
> + */
> +    fence->mm = mm;
 That won't work. Fences can live much longer than any process who
 created them.

 I've already found a fence in a BO still living hours after the
 process was killed and the pid long recycled.

 I suggest to make fence->mm a real mm_struct pointer with reference
 counting and then set it to NULL and drop the reference in
 enable_signaling.
>>> I agree. But enable_signaling may be too early to drop the reference.
>>> amd_kfd_fence_check_mm could still be called later from
>>> amdgpu_ttm_bo_eviction_valuable, as long as the fence hasn't
>>> signaled yet.
>>>
>>> The safe place is problably in amd_kfd_fence_release.
>>>
> +    get_task_comm(fence->timeline_name, current);
> +    spin_lock_init(&fence->lock);
> +
> +    dma_fence_init(&fence->base, &amd_kfd_fence_ops, &fence->lock,
> +   context, atomic_inc_return(&fence_seq));
> +
> +    return fence;
> +}
> +
> +struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct
> dma_fence *f)
> +{
> +    struct amdgpu_amdkfd_fence *fence;
> +
> +    if (!f)
> +    return NULL;
> +
> +    fence = container_of(f, struct amdgpu_amdkfd_fence, base);
> +    if (fence && f->ops == &amd_kfd_fence_ops)
> +    return fence;
> +
> +    return NULL;
> +}
> +
> +static const char *amd_kfd_fence_get_driver_name(struct dma_fence
> *f)
> +{
> +    return "amdgpu_amdkfd_fence";
> +}
> +
> +static const char *amd_kfd_fence_get_timeline_name(struct
> dma_fence *f)
> +{
> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +
> +    return fence->timeline_name;
> +}
> +
> +/**
> + * amd_kfd_fence_enable_signaling - This gets called when TTM wants
> to evict
> + *  a KFD BO and schedules a job to move the BO.
> + *  If fence is already signaled return true.
> + *  If fence is not signaled schedule a evict KFD process work item.
> + */
> +static bool amd_kfd_fence_enable_signaling(struct dma_fence *f)
> +{
> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +
> +    if (!fence)
> +    return false;
> +
> +    if (dma_fence_is_signaled(f))
> +    return true;
> +
> +    if (!kgd2kfd->schedule_evict_and_restore_process(
> +    (struct mm_struct *)fence->mm, f))
> +    return true;
> +
> +    return false;
> +}
> +
> +static int amd_kfd_fence_signal(struct dma_fence *f)
> +{

[PATCH] drm/amdgpu: fix DW estimation on VI

2018-01-30 Thread Christian König
Forgot to update that during recent changes.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c  | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
index 83dde3b4c3ae..5680ced69359 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
@@ -1628,7 +1628,7 @@ static const struct amdgpu_ring_funcs 
sdma_v3_0_ring_funcs = {
6 + /* sdma_v3_0_ring_emit_hdp_flush */
3 + /* hdp invalidate */
6 + /* sdma_v3_0_ring_emit_pipeline_sync */
-   12 + /* sdma_v3_0_ring_emit_vm_flush */
+   VI_FLUSH_GPU_TLB_NUM_WREG * 3 + 6 + /* 
sdma_v3_0_ring_emit_vm_flush */
10 + 10 + 10, /* sdma_v3_0_ring_emit_fence x3 for user fence, 
vm fence */
.emit_ib_size = 7 + 6, /* sdma_v3_0_ring_emit_ib */
.emit_ib = sdma_v3_0_ring_emit_ib,
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
index e7546d5b301c..0f192ab71205 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
@@ -1530,6 +1530,7 @@ static const struct amdgpu_ring_funcs 
uvd_v6_0_ring_phys_funcs = {
.set_wptr = uvd_v6_0_ring_set_wptr,
.parse_cs = amdgpu_uvd_ring_parse_cs,
.emit_frame_size =
+   6 + 6 + /* hdp flush / invalidate */
10 + /* uvd_v6_0_ring_emit_pipeline_sync */
14, /* uvd_v6_0_ring_emit_fence x1 no user fence */
.emit_ib_size = 8, /* uvd_v6_0_ring_emit_ib */
@@ -1541,6 +1542,7 @@ static const struct amdgpu_ring_funcs 
uvd_v6_0_ring_phys_funcs = {
.pad_ib = amdgpu_ring_generic_pad_ib,
.begin_use = amdgpu_uvd_ring_begin_use,
.end_use = amdgpu_uvd_ring_end_use,
+   .emit_wreg = uvd_v6_0_ring_emit_wreg,
 };
 
 static const struct amdgpu_ring_funcs uvd_v6_0_ring_vm_funcs = {
@@ -1554,7 +1556,7 @@ static const struct amdgpu_ring_funcs 
uvd_v6_0_ring_vm_funcs = {
.emit_frame_size =
6 + 6 + /* hdp flush / invalidate */
10 + /* uvd_v6_0_ring_emit_pipeline_sync */
-   20 + /* uvd_v6_0_ring_emit_vm_flush */
+   VI_FLUSH_GPU_TLB_NUM_WREG * 6 + 8 + /* 
uvd_v6_0_ring_emit_vm_flush */
14 + 14, /* uvd_v6_0_ring_emit_fence x2 vm fence */
.emit_ib_size = 8, /* uvd_v6_0_ring_emit_ib */
.emit_ib = uvd_v6_0_ring_emit_ib,
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: remove DC special casing for KB/ML

2018-01-30 Thread Harry Wentland
On 2018-01-25 04:57 PM, Alex Deucher wrote:
> It seems to be working now.
> 
> Bug: https://bugs.freedesktop.org/show_bug.cgi?id=102372
> Signed-off-by: Alex Deucher 

Reviewed-by: Harry Wentland 

Harry

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 309977ef5b51..2ad9de42b65b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1704,6 +1704,8 @@ bool amdgpu_device_asic_has_dc_support(enum 
> amd_asic_type asic_type)
>   case CHIP_BONAIRE:
>   case CHIP_HAWAII:
>   case CHIP_KAVERI:
> + case CHIP_KABINI:
> + case CHIP_MULLINS:
>   case CHIP_CARRIZO:
>   case CHIP_STONEY:
>   case CHIP_POLARIS11:
> @@ -1714,9 +1716,6 @@ bool amdgpu_device_asic_has_dc_support(enum 
> amd_asic_type asic_type)
>  #if defined(CONFIG_DRM_AMD_DC_PRE_VEGA)
>   return amdgpu_dc != 0;
>  #endif
> - case CHIP_KABINI:
> - case CHIP_MULLINS:
> - return amdgpu_dc > 0;
>   case CHIP_VEGA10:
>  #if defined(CONFIG_DRM_AMD_DC_DCN1_0)
>   case CHIP_RAVEN:
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: kasan report while running piglit

2018-01-30 Thread Tom St Denis

On 22/01/18 01:19 PM, Tom St Denis wrote:

On 22/01/18 09:03 AM, Tom St Denis wrote:
I've rolled back quite a ways from the tip of drm-next (to 
196f74897ba79f6d586894519f09796447d95be5) which is ~1700 commits back 
from the tip and still get the attached KASAN report every time.


I'm running piglit with DRI_PRIME=1 with the max size memory tests 
disabled.


Tom



Should add I don't get the KASAN if I don't use DRI_PRIME=1.  As soon as 
I run it with DRI_PRIME=1 I get a KASAN error in a manner of seconds 
(within the first 100 or so tests).



I've found an interesting way of recreating this.  Apparently it is 
triggered when a GL application goes full screen.  For instance,


DRI_PRIME=1 glxgears

works fine but

DRI_PRIME=1 glxgears -fullscreen

will trigger the KASAN report.

Tom
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: error messages from running uvd on polaris10

2018-01-30 Thread Tom St Denis

On 29/01/18 01:53 PM, Tom St Denis wrote:

On 29/01/18 01:50 PM, Christian König wrote:

Hi Tom,

well that is an interesting one. Looks like we use more dw on the UVD 
ring when prime is enabled.


Going to take a look tomorrow,
Christian


Hi Christian,

In this case prime=1 means polaris10 where VM is used right?

prime=0 means CZ (on my setup) which is physical I think.


Hi Christian,

Reverting 75965041675b1427a8d4cffc13dd070de9414619 eliminates the error 
messages.  I don't claim that that's the "fix" but it's certainly the 
commit that introduces the error :-)



Cheers,
Tom



Tom



Am 29.01.2018 um 19:45 schrieb Tom St Denis:

Hi all,

Getting these:

[  537.515830] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* 
amdgpu: writing more dwords to the ring than expected!
[  537.516057] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* 
amdgpu: writing more dwords to the ring than expected!
[  537.516267] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* 
amdgpu: writing more dwords to the ring than expected!
[  537.516485] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* 
amdgpu: writing more dwords to the ring than expected!
[  537.516823] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: 
writing more dwords to the ring than expected!
[  537.517080] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: 
writing more dwords to the ring than expected!
[  537.517323] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: 
writing more dwords to the ring than expected!
[  537.517605] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: 
writing more dwords to the ring than expected!
[  537.517848] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: 
writing more dwords to the ring than expected!
[  537.518090] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: 
writing more dwords to the ring than expected!
[  537.518330] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: 
writing more dwords to the ring than expected!
[  537.518581] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: 
writing more dwords to the ring than expected!
[  537.518807] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: 
writing more dwords to the ring than expected!
[  537.519034] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: 
writing more dwords to the ring than expected!
[  537.519260] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: 
writing more dwords to the ring than expected!
[  537.519486] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: 
writing more dwords to the ring than expected!



from running

DRI_PRIME=1 mpv --hwdec=auto *mkv

on my CZ+Polaris10 setup (with drm-next up to date).  I don't get the 
messages though if I run without DRI_PRIME=1 and they both have the 
same UVD6 right?


Is it because one is in physical vs VM mode?

Cheers,
Tom
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 13/13] drm/amdgpu: add AMDGPU_VM_OP_ENABLE_SVM IOCTL

2018-01-30 Thread Christian König

Am 29.01.2018 um 23:27 schrieb Felix Kuehling:

Enabling SVM after the VM has been created and the PASID allocated is
problematic because the IOMMU can support a smaller range of PASIDs than
the GPU. Ideally SVM would be a flag during VM creation, but I see that
doesn't work as it's done in amdgpu_driver_open_kms, not in an ioctl.

Could the PASID be changed on an existing VM if necessary?


Yeah, that shouldn't be much of a problem.

Another issue is that the VM can potentially be created by the X server, 
but then used by the client with DRI3.


So we would always need a separate IOCTL to note to which process a VM 
should bind.


Regards,
Christian.



One more comment inline ...

On 2018-01-26 03:13 PM, Christian König wrote:

Add an IOCTL to enable SVM for the current process.

One step further towards HMM support.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  4 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 94 +++--
  include/uapi/drm/amdgpu_drm.h   |  1 +
  3 files changed, 94 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index b18920007624..2f424f8248a9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -897,6 +897,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
struct amdgpu_bo_list *list;
struct amdgpu_bo *pd;
+   struct pci_dev *pdev;
unsigned int pasid;
int handle;
  
@@ -923,11 +924,12 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,

}
  
  	pasid = fpriv->vm.pasid;

+   pdev = fpriv->vm.pte_support_ats ? adev->pdev : NULL;
pd = amdgpu_bo_ref(fpriv->vm.root.base.bo);
  
  	amdgpu_vm_fini(adev, &fpriv->vm);

if (pasid)
-   amdgpu_pasid_free_delayed(pd->tbo.resv, NULL, pasid);
+   amdgpu_pasid_free_delayed(pd->tbo.resv, pdev, pasid);
amdgpu_bo_unref(&pd);
  
  	idr_for_each_entry(&fpriv->bo_list_handles, list, handle)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 5e53b7a2d4d5..84f41385677c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -257,6 +257,24 @@ bool amdgpu_vm_ready(struct amdgpu_vm *vm)
return ready;
  }
  
+/**

+ * amdgpu_vm_root_ats_entries - number of ATS entries in the root PD
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Returns number of entries in the root PD which should be initialized for ATS
+ * use.
+ */
+static unsigned amdgpu_vm_root_ats_entries(struct amdgpu_device *adev)
+{
+   unsigned level = adev->vm_manager.root_level;
+   unsigned shift;
+
+   shift = amdgpu_vm_level_shift(adev, level);
+   shift += AMDGPU_GPU_PAGE_SHIFT;
+   return AMDGPU_VA_HOLE_START >> shift;
+}
+
  /**
   * amdgpu_vm_clear_bo - initially clear the PDs/PTs
   *
@@ -283,9 +301,7 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev,
  
  	if (pte_support_ats) {

if (level == adev->vm_manager.root_level) {
-   ats_entries = amdgpu_vm_level_shift(adev, level);
-   ats_entries += AMDGPU_GPU_PAGE_SHIFT;
-   ats_entries = AMDGPU_VA_HOLE_START >> ats_entries;
+   ats_entries = amdgpu_vm_root_ats_entries(adev);
ats_entries = min(ats_entries, entries);
entries -= ats_entries;
} else {
@@ -329,6 +345,9 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev,
  
  	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
  
+	amdgpu_sync_resv(adev, &job->sync, bo->tbo.resv,

+AMDGPU_FENCE_OWNER_VM, false);
+
WARN_ON(job->ibs[0].length_dw > 64);
r = amdgpu_job_submit(job, ring, &vm->entity,
  AMDGPU_FENCE_OWNER_UNDEFINED, &fence);
@@ -2557,6 +2576,71 @@ bool amdgpu_vm_pasid_fault_credit(struct amdgpu_device 
*adev,
return true;
  }
  
+/**

+ * amdgpu_vm_enable_svm - enable SVM
+ *
+ * @adev: amdgpu_device pointer
+ * @vm: VM to enable SVM
+ *
+ * Initialize SVM.
+ */
+int amdgpu_vm_enable_svm(struct amdgpu_device *adev, struct amdgpu_vm *vm)
+{
+   int r;
+
+   if (!vm->pasid)
+   return -ENODEV;
+
+   r = amdgpu_bo_reserve(vm->root.base.bo, false);
+   if (r)
+   return r;
+
+   if (vm->pte_support_ats) {
+   r = -EALREADY;
+   goto error_unlock;
+   }
+
+   if (vm->root.entries) {
+   unsigned i, entries;
+
+   entries = amdgpu_vm_root_ats_entries(adev);
+   for (i = 0; i < entries; ++i) {
+   if (vm->root.entries[i].base.bo) {
+   r = -EEXIST;
+   goto error_unlock;
+  

Re: [PATCH 11/13] drm/amdgpu: add DRM_AMDGPU_ATC config option

2018-01-30 Thread Christian König

Am 29.01.2018 um 23:08 schrieb Felix Kuehling:

On 2018-01-26 03:13 PM, Christian König wrote:

[SNIP]
+#ifdef CONFIG_DRM_AMDGPU_ATC
+   r = amd_iommu_init_device(adev->pdev, 0x1);

KFD queries how many PASIDs the IOMMU can support with
amd_iommu_device_info. KFD only assigns PASIDs within that range. It can
be much smaller than the 16-bits supported by the GPU.

For a VM that uses ATC, you need to make sure it gets a PASID in the
range supported by the IOMMU. The PASID manager already supports that
and keeps smaller PASIDs for users that really need them.


Yeah, seen that and I'm not really keen about it.

Especially since we need multiple types of PASIDs here:
1. For GPUVM debugging and HMM faults, where we can use the full 16bit 
range without worrying about what IOMMU can do.

2. For ATC use case where we need to keep the IOMMU in the picture.

Are there any hardware limitations which blocks us from using a per 
device PASID? That would simplify the whole handling quite a bit.


Additional to that we don't really want this direct relationship between 
amdgpu/amdkfd and the amd_iommu_v2 driver.


So what do you think about moving the PASID handling into the IOMMU 
driver? And abstracting which driver is in use through the iommu_ops?


Regards,
Christian.



Regards,
   Felix

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 10/13] drm/amdgpu: enable VMID PASID mapping in the ATC

2018-01-30 Thread Christian König

Am 29.01.2018 um 23:05 schrieb Felix Kuehling:

Could we cache the previous VMID/PASID mapping somewhere, and only
update+wait if it changes?


Good point. And yes that needs to be fixed, but this patch set was more 
of a prove of concept anyway.


Regards,
Christian.



If this is not the right place to check it (because multiple rings can
be using the same VMID concurrently), maybe add a flag to
emit_flush_gpu_tlb to update the PASID mapping conditionally, and make
the decision higher up in then VM manager.

Regards,
   Felix


On 2018-01-26 03:13 PM, Christian König wrote:

Update the PASID in the ATC as well and wait for the update to finish.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 21 -
  drivers/gpu/drm/amd/amdgpu/soc15.h|  4 ++--
  2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 2c60981d2eec..0077db0a451f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -33,6 +33,7 @@
  #include "vega10_enum.h"
  #include "mmhub/mmhub_1_0_offset.h"
  #include "athub/athub_1_0_offset.h"
+#include "athub/athub_1_0_sh_mask.h"
  #include "oss/osssys_4_0_offset.h"
  
  #include "soc15.h"

@@ -375,7 +376,7 @@ static uint64_t gmc_v9_0_emit_flush_gpu_tlb(struct 
amdgpu_ring *ring,
uint32_t req = gmc_v9_0_get_invalidate_req(vmid);
uint64_t flags = AMDGPU_PTE_VALID;
unsigned eng = ring->vm_inv_eng;
-   uint32_t reg;
+   uint32_t reg, val;
  
  	amdgpu_gmc_get_vm_pde(ring->adev, -1, &pd_addr, &flags);

pd_addr |= flags;
@@ -393,8 +394,26 @@ static uint64_t gmc_v9_0_emit_flush_gpu_tlb(struct 
amdgpu_ring *ring,
  
  	amdgpu_ring_emit_wreg(ring, reg, pasid);
  
+	if (ring->funcs->vmhub == AMDGPU_GFXHUB)

+   reg = SOC15_REG_OFFSET(ATHUB, 0, mmATC_VMID0_PASID_MAPPING) + 
vmid;
+   else
+   reg = SOC15_REG_OFFSET(ATHUB, 0, mmATC_VMID16_PASID_MAPPING) + 
vmid;
+
+   val = 0;
+   val = REG_SET_FIELD(val, ATC_VMID0_PASID_MAPPING, PASID, pasid);
+   val = REG_SET_FIELD(val, ATC_VMID0_PASID_MAPPING, VALID, 1);
+   amdgpu_ring_emit_wreg(ring, reg, val);
+
amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_req + eng, req);
  
+	/* wait for the ATC to complete */

+   reg = SOC15_REG_OFFSET(ATHUB, 0, 
mmATC_VMID_PASID_MAPPING_UPDATE_STATUS);
+   if (ring->funcs->vmhub == AMDGPU_GFXHUB)
+   val = 0x1 << vmid;
+   else
+   val = 0x1 << vmid;
+   amdgpu_ring_emit_reg_wait(ring, reg, val, val);
+
/* wait for the invalidate to complete */
amdgpu_ring_emit_reg_wait(ring, hub->vm_inv_eng0_ack + eng,
  1 << vmid, 1 << vmid);
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.h 
b/drivers/gpu/drm/amd/amdgpu/soc15.h
index f70da8a29f86..1b8833503f4c 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.h
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.h
@@ -27,8 +27,8 @@
  #include "nbio_v6_1.h"
  #include "nbio_v7_0.h"
  
-#define SOC15_FLUSH_GPU_TLB_NUM_WREG		4

-#define SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT   1
+#define SOC15_FLUSH_GPU_TLB_NUM_WREG   5
+#define SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT   2
  
  extern const struct amd_ip_funcs soc15_common_ip_funcs;
  


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 01/13] drm/amdgpu: fix vcn_v1_0_dec_ring_emit_wreg

2018-01-30 Thread Christian König

Am 30.01.2018 um 00:01 schrieb Felix Kuehling:

This is a very cool patch series. I made some specific comments on some
of the patches. But overall this is great.


Thanks, going to comment on some of those on the patches.


I guess your plan is to start testing the SVM programming model with
ATC, and then enable the same programming model without the IOMMU using
HMM. That means there could be working and validated tests to run on HMM.


Yes, exactly. Well I think it would make implementing HMM much easier if 
we could rely on most of the mapping being done by the ATC instead of 
manually crafted GPUVM page tables.



I think 4.16 will get an IOMMUv2 driver that we worked on for Raven to
support PPR on newer IOMMU versions. Without that the GPU is not able to
make swapped or new pages resident or trigger a COW.


Uff? So the ATC is actually not able to handle page faults?


We're currently still working on one problem with Raven, related to the
way GFX9 retries memory accesses. Many PPR requests for the same virtual
address can be outstanding (in an IOMMU log buffer). After the first
request is handled, the GPU can continue, but the remaining requests are
still in the queue. This can result in the IOMMU driver trying to handle
a PPR for a page that's already freed by the application, which triggers
an invalid PPR callback.

An invalid PPR is like the GPU-equivalent of a segfault, and KFD
implements it like that. With the above behaviour we end up segfaulting
applications that didn't do anything wrong. I guess for your
implementation it's not a problem because you don't implement that
callback yet.


Yeah, that is exactly the same problem I'm currently running into with HMM.

The interrupt handling is pipelined (even much much more than the ATC 
path), so what can happen is that applications free up some memory but 
we have stale page faults for that page in the pipeline.


The only valid workaround I can see is to make sure interrupts are 
processed before returning to HMM that it can unmap pages, and that is a 
really show stopper for performance as far as I can see.


Regards,
Christian.



Regards,
   Felix


On 2018-01-26 03:13 PM, Christian König wrote:

That got mixed up with the encode ring function.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 19 ++-
  1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
index 44c041a1fe68..24ebc3e296a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -880,6 +880,22 @@ static void vcn_v1_0_dec_ring_emit_vm_flush(struct 
amdgpu_ring *ring,
vcn_v1_0_dec_vm_reg_wait(ring, data0, data1, mask);
  }
  
+static void vcn_v1_0_dec_ring_emit_wreg(struct amdgpu_ring *ring,

+   uint32_t reg, uint32_t val)
+{
+   struct amdgpu_device *adev = ring->adev;
+
+   amdgpu_ring_write(ring,
+   PACKET0(SOC15_REG_OFFSET(UVD, 0, mmUVD_GPCOM_VCPU_DATA0), 0));
+   amdgpu_ring_write(ring, reg << 2);
+   amdgpu_ring_write(ring,
+   PACKET0(SOC15_REG_OFFSET(UVD, 0, mmUVD_GPCOM_VCPU_DATA1), 0));
+   amdgpu_ring_write(ring, val);
+   amdgpu_ring_write(ring,
+   PACKET0(SOC15_REG_OFFSET(UVD, 0, mmUVD_GPCOM_VCPU_CMD), 0));
+   amdgpu_ring_write(ring, VCN_DEC_CMD_WRITE_REG << 1);
+}
+
  /**
   * vcn_v1_0_enc_ring_get_rptr - get enc read pointer
   *
@@ -1097,7 +1113,7 @@ static const struct amdgpu_ring_funcs 
vcn_v1_0_dec_ring_vm_funcs = {
.pad_ib = amdgpu_ring_generic_pad_ib,
.begin_use = amdgpu_vcn_ring_begin_use,
.end_use = amdgpu_vcn_ring_end_use,
-   .emit_wreg = vcn_v1_0_enc_ring_emit_wreg,
+   .emit_wreg = vcn_v1_0_dec_ring_emit_wreg,
  };
  
  static const struct amdgpu_ring_funcs vcn_v1_0_enc_ring_vm_funcs = {

@@ -1124,6 +1140,7 @@ static const struct amdgpu_ring_funcs 
vcn_v1_0_enc_ring_vm_funcs = {
.pad_ib = amdgpu_ring_generic_pad_ib,
.begin_use = amdgpu_vcn_ring_begin_use,
.end_use = amdgpu_vcn_ring_end_use,
+   .emit_wreg = vcn_v1_0_enc_ring_emit_wreg,
  };
  
  static void vcn_v1_0_set_dec_ring_funcs(struct amdgpu_device *adev)


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 4/4] drm/amdgpu: Use drm_oom_badness for amdgpu.

2018-01-30 Thread Andrey Grodzovsky
That definitely what I planned, just didn't want to clutter the RFC with 
multiple repeated changes.


Thanks,

Andrey



On 01/30/2018 04:24 AM, Daniel Vetter wrote:

On Thu, Jan 18, 2018 at 11:47:52AM -0500, Andrey Grodzovsky wrote:

Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 46a0c93..6a733cdc8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -828,6 +828,7 @@ static const struct file_operations amdgpu_driver_kms_fops 
= {
  #ifdef CONFIG_COMPAT
.compat_ioctl = amdgpu_kms_compat_ioctl,
  #endif
+   .oom_file_badness = drm_oom_badness,

Would be neat if we could roll this out for all gem drivers (once it's no
longer an RFC ofc).
-Daniel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/7] drm/amdgpu: add new emit_reg_wait callback

2018-01-30 Thread Christian König

Felix, Alex anybody brave enough to review this?

Independent of the ATC work it seems like a nice to have cleanup.

Regards,
Christian.

Am 30.01.2018 um 13:09 schrieb Christian König:

Allows us to wait for a register value/mask on a ring.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h  | 1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++
  2 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index d7930f3ead33..787f79c80b6b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1796,6 +1796,7 @@ amdgpu_get_sdma_instance(struct amdgpu_ring *ring)
  #define amdgpu_ring_emit_cntxcntl(r, d) (r)->funcs->emit_cntxcntl((r), (d))
  #define amdgpu_ring_emit_rreg(r, d) (r)->funcs->emit_rreg((r), (d))
  #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v))
+#define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), 
(d), (v), (m))
  #define amdgpu_ring_emit_tmz(r, b) (r)->funcs->emit_tmz((r), (b))
  #define amdgpu_ring_pad_ib(r, ib) ((r)->funcs->pad_ib((r), (ib)))
  #define amdgpu_ring_init_cond_exec(r) (r)->funcs->init_cond_exec((r))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 70d05ec7bc07..867f53332305 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -150,6 +150,8 @@ struct amdgpu_ring_funcs {
void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
+   void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg,
+ uint32_t val, uint32_t mask);
void (*emit_tmz)(struct amdgpu_ring *ring, bool start);
/* priority functions */
void (*set_priority) (struct amdgpu_ring *ring,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 6/7] drm/amdgpu: implement vcn_v1_0_(dec|enc)_ring_emit_reg_wait v2

2018-01-30 Thread Christian König
Add emit_reg_wait implementation for VCN v1.

v2: cleanup the existing code as well

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 42 +--
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
index 24ebc3e296a6..294a1bfb59df 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -838,17 +838,18 @@ static void vcn_v1_0_dec_ring_emit_ib(struct amdgpu_ring 
*ring,
amdgpu_ring_write(ring, ib->length_dw);
 }
 
-static void vcn_v1_0_dec_vm_reg_wait(struct amdgpu_ring *ring,
-   uint32_t data0, uint32_t data1, uint32_t mask)
+static void vcn_v1_0_dec_ring_emit_reg_wait(struct amdgpu_ring *ring,
+   uint32_t reg, uint32_t val,
+   uint32_t mask)
 {
struct amdgpu_device *adev = ring->adev;
 
amdgpu_ring_write(ring,
PACKET0(SOC15_REG_OFFSET(UVD, 0, mmUVD_GPCOM_VCPU_DATA0), 0));
-   amdgpu_ring_write(ring, data0);
+   amdgpu_ring_write(ring, reg << 2);
amdgpu_ring_write(ring,
PACKET0(SOC15_REG_OFFSET(UVD, 0, mmUVD_GPCOM_VCPU_DATA1), 0));
-   amdgpu_ring_write(ring, data1);
+   amdgpu_ring_write(ring, val);
amdgpu_ring_write(ring,
PACKET0(SOC15_REG_OFFSET(UVD, 0, mmUVD_GP_SCRATCH8), 0));
amdgpu_ring_write(ring, mask);
@@ -868,16 +869,16 @@ static void vcn_v1_0_dec_ring_emit_vm_flush(struct 
amdgpu_ring *ring,
pd_addr = amdgpu_gmc_emit_flush_gpu_tlb(ring, vmid, pasid, pd_addr);
 
/* wait for register write */
-   data0 = (hub->ctx0_ptb_addr_lo32 + vmid * 2) << 2;
+   data0 = hub->ctx0_ptb_addr_lo32 + vmid * 2;
data1 = lower_32_bits(pd_addr);
mask = 0x;
-   vcn_v1_0_dec_vm_reg_wait(ring, data0, data1, mask);
+   vcn_v1_0_dec_ring_emit_reg_wait(ring, data0, data1, mask);
 
/* wait for flush */
-   data0 = (hub->vm_inv_eng0_ack + eng) << 2;
+   data0 = hub->vm_inv_eng0_ack + eng;
data1 = 1 << vmid;
mask =  1 << vmid;
-   vcn_v1_0_dec_vm_reg_wait(ring, data0, data1, mask);
+   vcn_v1_0_dec_ring_emit_reg_wait(ring, data0, data1, mask);
 }
 
 static void vcn_v1_0_dec_ring_emit_wreg(struct amdgpu_ring *ring,
@@ -992,6 +993,16 @@ static void vcn_v1_0_enc_ring_emit_ib(struct amdgpu_ring 
*ring,
amdgpu_ring_write(ring, ib->length_dw);
 }
 
+static void vcn_v1_0_enc_ring_emit_reg_wait(struct amdgpu_ring *ring,
+   uint32_t reg, uint32_t val,
+   uint32_t mask)
+{
+   amdgpu_ring_write(ring, VCN_ENC_CMD_REG_WAIT);
+   amdgpu_ring_write(ring, reg << 2);
+   amdgpu_ring_write(ring, mask);
+   amdgpu_ring_write(ring, val);
+}
+
 static void vcn_v1_0_enc_ring_emit_vm_flush(struct amdgpu_ring *ring,
unsigned int vmid, unsigned pasid,
uint64_t pd_addr)
@@ -1002,17 +1013,12 @@ static void vcn_v1_0_enc_ring_emit_vm_flush(struct 
amdgpu_ring *ring,
pd_addr = amdgpu_gmc_emit_flush_gpu_tlb(ring, vmid, pasid, pd_addr);
 
/* wait for reg writes */
-   amdgpu_ring_write(ring, VCN_ENC_CMD_REG_WAIT);
-   amdgpu_ring_write(ring,
- (hub->ctx0_ptb_addr_lo32 + vmid * 2) << 2);
-   amdgpu_ring_write(ring, 0x);
-   amdgpu_ring_write(ring, lower_32_bits(pd_addr));
+   vcn_v1_0_enc_ring_emit_reg_wait(ring, hub->ctx0_ptb_addr_lo32 + vmid * 
2,
+   lower_32_bits(pd_addr), 0x);
 
/* wait for flush */
-   amdgpu_ring_write(ring, VCN_ENC_CMD_REG_WAIT);
-   amdgpu_ring_write(ring, (hub->vm_inv_eng0_ack + eng) << 2);
-   amdgpu_ring_write(ring, 1 << vmid);
-   amdgpu_ring_write(ring, 1 << vmid);
+   vcn_v1_0_enc_ring_emit_reg_wait(ring, hub->vm_inv_eng0_ack + eng,
+   1 << vmid, 1 << vmid);
 }
 
 static void vcn_v1_0_enc_ring_emit_wreg(struct amdgpu_ring *ring,
@@ -1114,6 +1120,7 @@ static const struct amdgpu_ring_funcs 
vcn_v1_0_dec_ring_vm_funcs = {
.begin_use = amdgpu_vcn_ring_begin_use,
.end_use = amdgpu_vcn_ring_end_use,
.emit_wreg = vcn_v1_0_dec_ring_emit_wreg,
+   .emit_reg_wait = vcn_v1_0_dec_ring_emit_reg_wait,
 };
 
 static const struct amdgpu_ring_funcs vcn_v1_0_enc_ring_vm_funcs = {
@@ -1141,6 +1148,7 @@ static const struct amdgpu_ring_funcs 
vcn_v1_0_enc_ring_vm_funcs = {
.begin_use = amdgpu_vcn_ring_begin_use,
.end_use = amdgpu_vcn_ring_end_use,
.emit_wreg = vcn_v1_0_enc_ring_emit_wreg,
+   .emit_reg_wait = vcn_v1_0_enc_ring_emit_reg_wait,
 };
 
 static void vcn_v1_0_set_dec_ring_funcs(struct amdgpu_device *adev)
-- 
2.14.1


[PATCH 2/7] drm/amdgpu: implement gfx_v9_0_ring_emit_reg_wait

2018-01-30 Thread Christian König
Implement emit_reg_wait for gfx v9.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index e5d5341c459a..801d4a1dd7db 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -4003,6 +4003,12 @@ static void gfx_v9_0_ring_emit_wreg(struct amdgpu_ring 
*ring, uint32_t reg,
amdgpu_ring_write(ring, val);
 }
 
+static void gfx_v9_0_ring_emit_reg_wait(struct amdgpu_ring *ring, uint32_t reg,
+   uint32_t val, uint32_t mask)
+{
+   gfx_v9_0_wait_reg_mem(ring, 0, 0, 0, reg, 0, val, mask, 0x20);
+}
+
 static void gfx_v9_0_set_gfx_eop_interrupt_state(struct amdgpu_device *adev,
 enum amdgpu_interrupt_state 
state)
 {
@@ -4321,6 +4327,7 @@ static const struct amdgpu_ring_funcs 
gfx_v9_0_ring_funcs_gfx = {
.patch_cond_exec = gfx_v9_0_ring_emit_patch_cond_exec,
.emit_tmz = gfx_v9_0_ring_emit_tmz,
.emit_wreg = gfx_v9_0_ring_emit_wreg,
+   .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait,
 };
 
 static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = {
@@ -4352,6 +4359,7 @@ static const struct amdgpu_ring_funcs 
gfx_v9_0_ring_funcs_compute = {
.pad_ib = amdgpu_ring_generic_pad_ib,
.set_priority = gfx_v9_0_ring_set_priority_compute,
.emit_wreg = gfx_v9_0_ring_emit_wreg,
+   .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait,
 };
 
 static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_kiq = {
@@ -4379,6 +4387,7 @@ static const struct amdgpu_ring_funcs 
gfx_v9_0_ring_funcs_kiq = {
.pad_ib = amdgpu_ring_generic_pad_ib,
.emit_rreg = gfx_v9_0_ring_emit_rreg,
.emit_wreg = gfx_v9_0_ring_emit_wreg,
+   .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait,
 };
 
 static void gfx_v9_0_set_ring_funcs(struct amdgpu_device *adev)
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 5/7] drm/amdgpu: implement vce_v4_0_emit_reg_wait v2

2018-01-30 Thread Christian König
Add emit_reg_wait implementation for VCE v4.

v2: call new function directly from existing code

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
index e62a24b90aaf..2a4f73ddea97 100755
--- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
@@ -965,6 +965,15 @@ static void vce_v4_0_ring_insert_end(struct amdgpu_ring 
*ring)
amdgpu_ring_write(ring, VCE_CMD_END);
 }
 
+static void vce_v4_0_emit_reg_wait(struct amdgpu_ring *ring, uint32_t reg,
+  uint32_t val, uint32_t mask)
+{
+   amdgpu_ring_write(ring, VCE_CMD_REG_WAIT);
+   amdgpu_ring_write(ring, reg << 2);
+   amdgpu_ring_write(ring, mask);
+   amdgpu_ring_write(ring, val);
+}
+
 static void vce_v4_0_emit_vm_flush(struct amdgpu_ring *ring,
   unsigned int vmid, unsigned pasid,
   uint64_t pd_addr)
@@ -975,16 +984,12 @@ static void vce_v4_0_emit_vm_flush(struct amdgpu_ring 
*ring,
pd_addr = amdgpu_gmc_emit_flush_gpu_tlb(ring, vmid, pasid, pd_addr);
 
/* wait for reg writes */
-   amdgpu_ring_write(ring, VCE_CMD_REG_WAIT);
-   amdgpu_ring_write(ring, (hub->ctx0_ptb_addr_lo32 + vmid * 2) << 2);
-   amdgpu_ring_write(ring, 0x);
-   amdgpu_ring_write(ring, lower_32_bits(pd_addr));
+   vce_v4_0_emit_reg_wait(ring, hub->ctx0_ptb_addr_lo32 + vmid * 2,
+  lower_32_bits(pd_addr), 0x);
 
/* wait for flush */
-   amdgpu_ring_write(ring, VCE_CMD_REG_WAIT);
-   amdgpu_ring_write(ring, (hub->vm_inv_eng0_ack + eng) << 2);
-   amdgpu_ring_write(ring, 1 << vmid);
-   amdgpu_ring_write(ring, 1 << vmid);
+   vce_v4_0_emit_reg_wait(ring, hub->vm_inv_eng0_ack + eng,
+  1 << vmid, 1 << vmid);
 }
 
 static void vce_v4_0_emit_wreg(struct amdgpu_ring *ring,
@@ -1079,6 +1084,7 @@ static const struct amdgpu_ring_funcs 
vce_v4_0_ring_vm_funcs = {
.begin_use = amdgpu_vce_ring_begin_use,
.end_use = amdgpu_vce_ring_end_use,
.emit_wreg = vce_v4_0_emit_wreg,
+   .emit_reg_wait = vce_v4_0_emit_reg_wait,
 };
 
 static void vce_v4_0_set_ring_funcs(struct amdgpu_device *adev)
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 7/7] drm/amdgpu: move waiting for VM flush into gmc_v9_0_emit_flush_gpu_tlb

2018-01-30 Thread Christian König
Keep that at a common place instead of spread over all engines.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 19 +--
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 18 +++---
 drivers/gpu/drm/amd/amdgpu/soc15.h |  3 ++-
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c  | 20 ++--
 drivers/gpu/drm/amd/amdgpu/vce_v4_0.c  |  9 +++--
 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c  | 20 ++--
 7 files changed, 33 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 801d4a1dd7db..f7363f821cff 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3679,15 +3679,8 @@ static void gfx_v9_0_ring_emit_vm_flush(struct 
amdgpu_ring *ring,
unsigned vmid, unsigned pasid,
uint64_t pd_addr)
 {
-   struct amdgpu_vmhub *hub = &ring->adev->vmhub[ring->funcs->vmhub];
-   unsigned eng = ring->vm_inv_eng;
-
amdgpu_gmc_emit_flush_gpu_tlb(ring, vmid, pasid, pd_addr);
 
-   /* wait for the invalidate to complete */
-   gfx_v9_0_wait_reg_mem(ring, 0, 0, 0, hub->vm_inv_eng0_ack + eng,
- 0, 1 << vmid, 1 << vmid, 0x20);
-
/* compute doesn't have PFP */
if (ring->funcs->type == AMDGPU_RING_TYPE_GFX) {
/* sync PFP to ME, otherwise we might get invalid PFP reads */
@@ -4295,7 +4288,9 @@ static const struct amdgpu_ring_funcs 
gfx_v9_0_ring_funcs_gfx = {
.emit_frame_size = /* totally 242 maximum if 16 IBs */
5 +  /* COND_EXEC */
7 +  /* PIPELINE_SYNC */
-   SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + 9 + /* VM_FLUSH */
+   SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
+   SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
+   2 + /* VM_FLUSH */
8 +  /* FENCE for VM_FLUSH */
20 + /* GDS switch */
4 + /* double SWITCH_BUFFER,
@@ -4344,7 +4339,9 @@ static const struct amdgpu_ring_funcs 
gfx_v9_0_ring_funcs_compute = {
7 + /* gfx_v9_0_ring_emit_hdp_flush */
5 + /* hdp invalidate */
7 + /* gfx_v9_0_ring_emit_pipeline_sync */
-   SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + 9 + /* 
gfx_v9_0_ring_emit_vm_flush */
+   SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
+   SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
+   2 + /* gfx_v9_0_ring_emit_vm_flush */
8 + 8 + 8, /* gfx_v9_0_ring_emit_fence x3 for user fence, vm 
fence */
.emit_ib_size = 4, /* gfx_v9_0_ring_emit_ib_compute */
.emit_ib = gfx_v9_0_ring_emit_ib_compute,
@@ -4376,7 +4373,9 @@ static const struct amdgpu_ring_funcs 
gfx_v9_0_ring_funcs_kiq = {
7 + /* gfx_v9_0_ring_emit_hdp_flush */
5 + /* hdp invalidate */
7 + /* gfx_v9_0_ring_emit_pipeline_sync */
-   SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 + 9 + /* 
gfx_v9_0_ring_emit_vm_flush */
+   SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
+   SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
+   2 + /* gfx_v9_0_ring_emit_vm_flush */
8 + 8 + 8, /* gfx_v9_0_ring_emit_fence_kiq x3 for user fence, 
vm fence */
.emit_ib_size = 4, /* gfx_v9_0_ring_emit_ib_compute */
.emit_ib = gfx_v9_0_ring_emit_ib_compute,
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 2b251df94684..2c60981d2eec 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -395,6 +395,10 @@ static uint64_t gmc_v9_0_emit_flush_gpu_tlb(struct 
amdgpu_ring *ring,
 
amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_req + eng, req);
 
+   /* wait for the invalidate to complete */
+   amdgpu_ring_emit_reg_wait(ring, hub->vm_inv_eng0_ack + eng,
+ 1 << vmid, 1 << vmid);
+
return pd_addr;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index e1ae39f86adf..ce599fd24412 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1126,21 +1126,7 @@ static void sdma_v4_0_ring_emit_vm_flush(struct 
amdgpu_ring *ring,
 unsigned vmid, unsigned pasid,
 uint64_t pd_addr)
 {
-   struct amdgpu_vmhub *hub = &ring->adev->vmhub[ring->funcs->vmhub];
-   unsigned eng = ring->vm_inv_eng;
-
amdgpu_gmc_emit_flush_gpu_tlb(ring, vmid, pasid, pd_addr);
-
-   /* wait for flush */
-   amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_POLL_REGMEM) |
- SDMA_PKT_POLL_REGMEM_HEADER_HDP_FLUSH(0) |
- SDMA_PKT_POLL_REGMEM_HEADER_FUNC(3)); /* equal 

[PATCH 3/7] drm/amdgpu: implement sdma_v4_0_ring_emit_reg_wait

2018-01-30 Thread Christian König
Add emit_reg_wait implementation for SDMA v4.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index 8505458d7041..e1ae39f86adf 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1152,6 +1152,20 @@ static void sdma_v4_0_ring_emit_wreg(struct amdgpu_ring 
*ring,
amdgpu_ring_write(ring, val);
 }
 
+static void sdma_v4_0_ring_emit_reg_wait(struct amdgpu_ring *ring, uint32_t 
reg,
+uint32_t val, uint32_t mask)
+{
+   amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_POLL_REGMEM) |
+ SDMA_PKT_POLL_REGMEM_HEADER_HDP_FLUSH(0) |
+ SDMA_PKT_POLL_REGMEM_HEADER_FUNC(3)); /* equal */
+   amdgpu_ring_write(ring, reg << 2);
+   amdgpu_ring_write(ring, 0);
+   amdgpu_ring_write(ring, val); /* reference */
+   amdgpu_ring_write(ring, mask); /* mask */
+   amdgpu_ring_write(ring, SDMA_PKT_POLL_REGMEM_DW5_RETRY_COUNT(0xfff) |
+ SDMA_PKT_POLL_REGMEM_DW5_INTERVAL(10));
+}
+
 static int sdma_v4_0_early_init(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
@@ -1588,6 +1602,7 @@ static const struct amdgpu_ring_funcs 
sdma_v4_0_ring_funcs = {
.insert_nop = sdma_v4_0_ring_insert_nop,
.pad_ib = sdma_v4_0_ring_pad_ib,
.emit_wreg = sdma_v4_0_ring_emit_wreg,
+   .emit_reg_wait = sdma_v4_0_ring_emit_reg_wait,
 };
 
 static void sdma_v4_0_set_ring_funcs(struct amdgpu_device *adev)
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/7] drm/amdgpu: add new emit_reg_wait callback

2018-01-30 Thread Christian König
Allows us to wait for a register value/mask on a ring.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h  | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index d7930f3ead33..787f79c80b6b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1796,6 +1796,7 @@ amdgpu_get_sdma_instance(struct amdgpu_ring *ring)
 #define amdgpu_ring_emit_cntxcntl(r, d) (r)->funcs->emit_cntxcntl((r), (d))
 #define amdgpu_ring_emit_rreg(r, d) (r)->funcs->emit_rreg((r), (d))
 #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v))
+#define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), 
(d), (v), (m))
 #define amdgpu_ring_emit_tmz(r, b) (r)->funcs->emit_tmz((r), (b))
 #define amdgpu_ring_pad_ib(r, ib) ((r)->funcs->pad_ib((r), (ib)))
 #define amdgpu_ring_init_cond_exec(r) (r)->funcs->init_cond_exec((r))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 70d05ec7bc07..867f53332305 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -150,6 +150,8 @@ struct amdgpu_ring_funcs {
void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
+   void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg,
+ uint32_t val, uint32_t mask);
void (*emit_tmz)(struct amdgpu_ring *ring, bool start);
/* priority functions */
void (*set_priority) (struct amdgpu_ring *ring,
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 4/7] drm/amdgpu: implement uvd_v7_0_(enc_|)ring_emit_reg_wait v2

2018-01-30 Thread Christian König
Add emit_reg_wait implementation for UVD v7.

v2: call new function directly from the existing code

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 40 +--
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
index d317c764cc91..fcec0bea101d 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
@@ -1241,17 +1241,17 @@ static void uvd_v7_0_ring_emit_wreg(struct amdgpu_ring 
*ring,
amdgpu_ring_write(ring, 8);
 }
 
-static void uvd_v7_0_vm_reg_wait(struct amdgpu_ring *ring,
-   uint32_t data0, uint32_t data1, uint32_t mask)
+static void uvd_v7_0_ring_emit_reg_wait(struct amdgpu_ring *ring, uint32_t reg,
+   uint32_t val, uint32_t mask)
 {
struct amdgpu_device *adev = ring->adev;
 
amdgpu_ring_write(ring,
PACKET0(SOC15_REG_OFFSET(UVD, 0, mmUVD_GPCOM_VCPU_DATA0), 0));
-   amdgpu_ring_write(ring, data0);
+   amdgpu_ring_write(ring, reg << 2);
amdgpu_ring_write(ring,
PACKET0(SOC15_REG_OFFSET(UVD, 0, mmUVD_GPCOM_VCPU_DATA1), 0));
-   amdgpu_ring_write(ring, data1);
+   amdgpu_ring_write(ring, val);
amdgpu_ring_write(ring,
PACKET0(SOC15_REG_OFFSET(UVD, 0, mmUVD_GP_SCRATCH8), 0));
amdgpu_ring_write(ring, mask);
@@ -1271,16 +1271,16 @@ static void uvd_v7_0_ring_emit_vm_flush(struct 
amdgpu_ring *ring,
pd_addr = amdgpu_gmc_emit_flush_gpu_tlb(ring, vmid, pasid, pd_addr);
 
/* wait for reg writes */
-   data0 = (hub->ctx0_ptb_addr_lo32 + vmid * 2) << 2;
+   data0 = hub->ctx0_ptb_addr_lo32 + vmid * 2;
data1 = lower_32_bits(pd_addr);
mask = 0x;
-   uvd_v7_0_vm_reg_wait(ring, data0, data1, mask);
+   uvd_v7_0_ring_emit_reg_wait(ring, data0, data1, mask);
 
/* wait for flush */
-   data0 = (hub->vm_inv_eng0_ack + eng) << 2;
+   data0 = hub->vm_inv_eng0_ack + eng;
data1 = 1 << vmid;
mask =  1 << vmid;
-   uvd_v7_0_vm_reg_wait(ring, data0, data1, mask);
+   uvd_v7_0_ring_emit_reg_wait(ring, data0, data1, mask);
 }
 
 static void uvd_v7_0_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count)
@@ -1298,6 +1298,16 @@ static void uvd_v7_0_enc_ring_insert_end(struct 
amdgpu_ring *ring)
amdgpu_ring_write(ring, HEVC_ENC_CMD_END);
 }
 
+static void uvd_v7_0_enc_ring_emit_reg_wait(struct amdgpu_ring *ring,
+   uint32_t reg, uint32_t val,
+   uint32_t mask)
+{
+   amdgpu_ring_write(ring, HEVC_ENC_CMD_REG_WAIT);
+   amdgpu_ring_write(ring, reg << 2);
+   amdgpu_ring_write(ring, mask);
+   amdgpu_ring_write(ring, val);
+}
+
 static void uvd_v7_0_enc_ring_emit_vm_flush(struct amdgpu_ring *ring,
unsigned int vmid, unsigned pasid,
uint64_t pd_addr)
@@ -1308,16 +1318,12 @@ static void uvd_v7_0_enc_ring_emit_vm_flush(struct 
amdgpu_ring *ring,
pd_addr = amdgpu_gmc_emit_flush_gpu_tlb(ring, vmid, pasid, pd_addr);
 
/* wait for reg writes */
-   amdgpu_ring_write(ring, HEVC_ENC_CMD_REG_WAIT);
-   amdgpu_ring_write(ring, (hub->ctx0_ptb_addr_lo32 + vmid * 2) << 2);
-   amdgpu_ring_write(ring, 0x);
-   amdgpu_ring_write(ring, lower_32_bits(pd_addr));
+   uvd_v7_0_enc_ring_emit_reg_wait(ring, hub->ctx0_ptb_addr_lo32 + vmid * 
2,
+   lower_32_bits(pd_addr), 0x);
 
/* wait for flush */
-   amdgpu_ring_write(ring, HEVC_ENC_CMD_REG_WAIT);
-   amdgpu_ring_write(ring, (hub->vm_inv_eng0_ack + eng) << 2);
-   amdgpu_ring_write(ring, 1 << vmid);
-   amdgpu_ring_write(ring, 1 << vmid);
+   uvd_v7_0_enc_ring_emit_reg_wait(ring, hub->vm_inv_eng0_ack + eng,
+   1 << vmid, 1 << vmid);
 }
 
 static void uvd_v7_0_enc_ring_emit_wreg(struct amdgpu_ring *ring,
@@ -1676,6 +1682,7 @@ static const struct amdgpu_ring_funcs 
uvd_v7_0_ring_vm_funcs = {
.begin_use = amdgpu_uvd_ring_begin_use,
.end_use = amdgpu_uvd_ring_end_use,
.emit_wreg = uvd_v7_0_ring_emit_wreg,
+   .emit_reg_wait = uvd_v7_0_ring_emit_reg_wait,
 };
 
 static const struct amdgpu_ring_funcs uvd_v7_0_enc_ring_vm_funcs = {
@@ -1704,6 +1711,7 @@ static const struct amdgpu_ring_funcs 
uvd_v7_0_enc_ring_vm_funcs = {
.begin_use = amdgpu_uvd_ring_begin_use,
.end_use = amdgpu_uvd_ring_end_use,
.emit_wreg = uvd_v7_0_enc_ring_emit_wreg,
+   .emit_reg_wait = uvd_v7_0_enc_ring_emit_reg_wait,
 };
 
 static void uvd_v7_0_set_ring_funcs(struct amdgpu_device *adev)
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.o

Re: [RFC] Per file OOM badness

2018-01-30 Thread Christian König

Am 30.01.2018 um 12:42 schrieb Michel Dänzer:

On 2018-01-30 12:36 PM, Nicolai Hähnle wrote:

On 30.01.2018 12:34, Michel Dänzer wrote:

On 2018-01-30 12:28 PM, Christian König wrote:

Am 30.01.2018 um 12:02 schrieb Michel Dänzer:

On 2018-01-30 11:40 AM, Christian König wrote:

Am 30.01.2018 um 10:43 schrieb Michel Dänzer:

[SNIP]

Would it be ok to hang onto potentially arbitrary mmget references
essentially forever? If that's ok I think we can do your process
based
account (minus a few minor inaccuracies for shared stuff perhaps,
but no
one cares about that).

Honestly, I think you and Christian are overthinking this. Let's try
charging the memory to every process which shares a buffer, and go
from
there.

My problem is that this needs to be bullet prove.

For example imagine an application which allocates a lot of BOs, then
calls fork() and let the parent process die. The file descriptor lives
on in the child process, but the memory is not accounted against the
child.

What exactly are you referring to by "the file descriptor" here?

The file descriptor used to identify the connection to the driver. In
other words our drm_file structure in the kernel.


What happens to BO handles in general in this case? If both parent and
child process keep the same handle for the same BO, one of them
destroying the handle will result in the other one not being able to
use
it anymore either, won't it?

Correct.

That usage is actually not useful at all, but we already had
applications which did exactly that by accident.

Not to mention that somebody could do it on purpose.

Can we just prevent child processes from using their parent's DRM file
descriptors altogether? Allowing it seems like a bad idea all around.

Existing protocols pass DRM fds between processes though, don't they?

Not child processes perhaps, but special-casing that seems like awful
design.

Fair enough.

Can we disallow passing DRM file descriptors which have any buffers
allocated? :)


Hehe good point, but I'm sorry I have to ruin that.

The root VM page table is allocated when the DRM file descriptor is 
created and we want to account those to whoever uses the file descriptor 
as well.


We could now make an exception for the root VM page table to not be 
accounted (shouldn't be that much compared to the rest of the VM tree), 
but Nicolai is right all those exceptions are just an awful design :)


Looking into the fs layer there actually only seem to be two function 
which are involved when a file descriptor is installed/removed from a 
process. So we just need to add some callbacks there.


Regards,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Per file OOM badness

2018-01-30 Thread Michel Dänzer
On 2018-01-30 12:36 PM, Nicolai Hähnle wrote:
> On 30.01.2018 12:34, Michel Dänzer wrote:
>> On 2018-01-30 12:28 PM, Christian König wrote:
>>> Am 30.01.2018 um 12:02 schrieb Michel Dänzer:
 On 2018-01-30 11:40 AM, Christian König wrote:
> Am 30.01.2018 um 10:43 schrieb Michel Dänzer:
>> [SNIP]
>>> Would it be ok to hang onto potentially arbitrary mmget references
>>> essentially forever? If that's ok I think we can do your process
>>> based
>>> account (minus a few minor inaccuracies for shared stuff perhaps,
>>> but no
>>> one cares about that).
>> Honestly, I think you and Christian are overthinking this. Let's try
>> charging the memory to every process which shares a buffer, and go
>> from
>> there.
> My problem is that this needs to be bullet prove.
>
> For example imagine an application which allocates a lot of BOs, then
> calls fork() and let the parent process die. The file descriptor lives
> on in the child process, but the memory is not accounted against the
> child.
 What exactly are you referring to by "the file descriptor" here?
>>>
>>> The file descriptor used to identify the connection to the driver. In
>>> other words our drm_file structure in the kernel.
>>>
 What happens to BO handles in general in this case? If both parent and
 child process keep the same handle for the same BO, one of them
 destroying the handle will result in the other one not being able to
 use
 it anymore either, won't it?
>>> Correct.
>>>
>>> That usage is actually not useful at all, but we already had
>>> applications which did exactly that by accident.
>>>
>>> Not to mention that somebody could do it on purpose.
>>
>> Can we just prevent child processes from using their parent's DRM file
>> descriptors altogether? Allowing it seems like a bad idea all around.
> 
> Existing protocols pass DRM fds between processes though, don't they?
> 
> Not child processes perhaps, but special-casing that seems like awful
> design.

Fair enough.

Can we disallow passing DRM file descriptors which have any buffers
allocated? :)


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Per file OOM badness

2018-01-30 Thread Nicolai Hähnle

On 30.01.2018 12:34, Michel Dänzer wrote:

On 2018-01-30 12:28 PM, Christian König wrote:

Am 30.01.2018 um 12:02 schrieb Michel Dänzer:

On 2018-01-30 11:40 AM, Christian König wrote:

Am 30.01.2018 um 10:43 schrieb Michel Dänzer:

[SNIP]

Would it be ok to hang onto potentially arbitrary mmget references
essentially forever? If that's ok I think we can do your process based
account (minus a few minor inaccuracies for shared stuff perhaps,
but no
one cares about that).

Honestly, I think you and Christian are overthinking this. Let's try
charging the memory to every process which shares a buffer, and go from
there.

My problem is that this needs to be bullet prove.

For example imagine an application which allocates a lot of BOs, then
calls fork() and let the parent process die. The file descriptor lives
on in the child process, but the memory is not accounted against the
child.

What exactly are you referring to by "the file descriptor" here?


The file descriptor used to identify the connection to the driver. In
other words our drm_file structure in the kernel.


What happens to BO handles in general in this case? If both parent and
child process keep the same handle for the same BO, one of them
destroying the handle will result in the other one not being able to use
it anymore either, won't it?

Correct.

That usage is actually not useful at all, but we already had
applications which did exactly that by accident.

Not to mention that somebody could do it on purpose.


Can we just prevent child processes from using their parent's DRM file
descriptors altogether? Allowing it seems like a bad idea all around.


Existing protocols pass DRM fds between processes though, don't they?

Not child processes perhaps, but special-casing that seems like awful 
design.


Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Per file OOM badness

2018-01-30 Thread Nicolai Hähnle

On 30.01.2018 11:48, Michel Dänzer wrote:

On 2018-01-30 11:42 AM, Daniel Vetter wrote:

On Tue, Jan 30, 2018 at 10:43:10AM +0100, Michel Dänzer wrote:

On 2018-01-30 10:31 AM, Daniel Vetter wrote:


I guess a good first order approximation would be if we simply charge any
newly allocated buffers to the process that created them, but that means
hanging onto lots of mm_struct pointers since we want to make sure we then
release those pages to the right mm again (since the process that drops
the last ref might be a totally different one, depending upon how the
buffers or DRM fd have been shared).

Would it be ok to hang onto potentially arbitrary mmget references
essentially forever? If that's ok I think we can do your process based
account (minus a few minor inaccuracies for shared stuff perhaps, but no
one cares about that).


Honestly, I think you and Christian are overthinking this. Let's try
charging the memory to every process which shares a buffer, and go from
there.


I'm not concerned about wrongly accounting shared buffers (they don't
matter), but imbalanced accounting. I.e. allocate a buffer in the client,
share it, but then the compositor drops the last reference.


I don't think the order matters. The memory is "uncharged" in each
process when it drops its reference.


Daniel made a fair point about passing DRM fds between processes, though.

It's not a problem with how the fds are currently used, but somebody 
could do the following:


1. Create a DRM fd in process A, allocate lots of buffers.
2. Pass the fd to process B via some IPC mechanism.
3. Exit process A.

There needs to be some assurance that the BOs are accounted as belonging 
to process B in the end.


Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Per file OOM badness

2018-01-30 Thread Michel Dänzer
On 2018-01-30 12:28 PM, Christian König wrote:
> Am 30.01.2018 um 12:02 schrieb Michel Dänzer:
>> On 2018-01-30 11:40 AM, Christian König wrote:
>>> Am 30.01.2018 um 10:43 schrieb Michel Dänzer:
 [SNIP]
> Would it be ok to hang onto potentially arbitrary mmget references
> essentially forever? If that's ok I think we can do your process based
> account (minus a few minor inaccuracies for shared stuff perhaps,
> but no
> one cares about that).
 Honestly, I think you and Christian are overthinking this. Let's try
 charging the memory to every process which shares a buffer, and go from
 there.
>>> My problem is that this needs to be bullet prove.
>>>
>>> For example imagine an application which allocates a lot of BOs, then
>>> calls fork() and let the parent process die. The file descriptor lives
>>> on in the child process, but the memory is not accounted against the
>>> child.
>> What exactly are you referring to by "the file descriptor" here?
> 
> The file descriptor used to identify the connection to the driver. In
> other words our drm_file structure in the kernel.
> 
>> What happens to BO handles in general in this case? If both parent and
>> child process keep the same handle for the same BO, one of them
>> destroying the handle will result in the other one not being able to use
>> it anymore either, won't it?
> Correct.
> 
> That usage is actually not useful at all, but we already had
> applications which did exactly that by accident.
> 
> Not to mention that somebody could do it on purpose.

Can we just prevent child processes from using their parent's DRM file
descriptors altogether? Allowing it seems like a bad idea all around.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Per file OOM badness

2018-01-30 Thread Christian König

Am 30.01.2018 um 12:02 schrieb Michel Dänzer:

On 2018-01-30 11:40 AM, Christian König wrote:

Am 30.01.2018 um 10:43 schrieb Michel Dänzer:

[SNIP]

Would it be ok to hang onto potentially arbitrary mmget references
essentially forever? If that's ok I think we can do your process based
account (minus a few minor inaccuracies for shared stuff perhaps, but no
one cares about that).

Honestly, I think you and Christian are overthinking this. Let's try
charging the memory to every process which shares a buffer, and go from
there.

My problem is that this needs to be bullet prove.

For example imagine an application which allocates a lot of BOs, then
calls fork() and let the parent process die. The file descriptor lives
on in the child process, but the memory is not accounted against the child.

What exactly are you referring to by "the file descriptor" here?


The file descriptor used to identify the connection to the driver. In 
other words our drm_file structure in the kernel.



What happens to BO handles in general in this case? If both parent and
child process keep the same handle for the same BO, one of them
destroying the handle will result in the other one not being able to use
it anymore either, won't it?

Correct.

That usage is actually not useful at all, but we already had 
applications which did exactly that by accident.


Not to mention that somebody could do it on purpose.

Regards,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Per file OOM badness

2018-01-30 Thread Michel Dänzer
On 2018-01-30 11:40 AM, Christian König wrote:
> Am 30.01.2018 um 10:43 schrieb Michel Dänzer:
>> [SNIP]
>>> Would it be ok to hang onto potentially arbitrary mmget references
>>> essentially forever? If that's ok I think we can do your process based
>>> account (minus a few minor inaccuracies for shared stuff perhaps, but no
>>> one cares about that).
>> Honestly, I think you and Christian are overthinking this. Let's try
>> charging the memory to every process which shares a buffer, and go from
>> there.
> 
> My problem is that this needs to be bullet prove.
> 
> For example imagine an application which allocates a lot of BOs, then
> calls fork() and let the parent process die. The file descriptor lives
> on in the child process, but the memory is not accounted against the child.

What exactly are you referring to by "the file descriptor" here?


What happens to BO handles in general in this case? If both parent and
child process keep the same handle for the same BO, one of them
destroying the handle will result in the other one not being able to use
it anymore either, won't it?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: LLVM 7.0 + Wayland = broken mouse control

2018-01-30 Thread Michel Dänzer

[ Dropping gpudriverdevsupport, please never cross-post between that and
public mailing lists ]

On 2018-01-29 08:58 PM, mikhail wrote:
> 
> On Mon, 2018-01-29 at 19:18 +0100, Marek Olšák wrote:
>> Please report this issue to the gnome-shell team. gnome-shell has a
>> bug in how they handle (or ignore) 10-bits-per-channel visuals.
> 
> I already reported here: 
> https://gitlab.gnome.org/GNOME/mutter/issues/2
> 
>> The workaround is to set the environment variable
>> "allow_rgb10_configs=false" for gnome-shell.
> 
> I added environment variable as you said:
> # echo "allow_rgb10_configs=false" > /etc/profile.d/gnome-workaround.sh
> But it not helps workaround problem.
> Maybe we made a mistake in the diagnosis?
No, it just means there's an error in the way you're trying to set the
workaround. Maybe it needs to be "export allow_rgb10_configs=false"?

You can also try the method I described in
https://bugs.freedesktop.org/show_bug.cgi?id=104808#c2 , I verified that
to work.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Per file OOM badness

2018-01-30 Thread Michel Dänzer
On 2018-01-30 11:42 AM, Daniel Vetter wrote:
> On Tue, Jan 30, 2018 at 10:43:10AM +0100, Michel Dänzer wrote:
>> On 2018-01-30 10:31 AM, Daniel Vetter wrote:
>>
>>> I guess a good first order approximation would be if we simply charge any
>>> newly allocated buffers to the process that created them, but that means
>>> hanging onto lots of mm_struct pointers since we want to make sure we then
>>> release those pages to the right mm again (since the process that drops
>>> the last ref might be a totally different one, depending upon how the
>>> buffers or DRM fd have been shared).
>>>
>>> Would it be ok to hang onto potentially arbitrary mmget references
>>> essentially forever? If that's ok I think we can do your process based
>>> account (minus a few minor inaccuracies for shared stuff perhaps, but no
>>> one cares about that).
>>
>> Honestly, I think you and Christian are overthinking this. Let's try
>> charging the memory to every process which shares a buffer, and go from
>> there.
> 
> I'm not concerned about wrongly accounting shared buffers (they don't
> matter), but imbalanced accounting. I.e. allocate a buffer in the client,
> share it, but then the compositor drops the last reference.

I don't think the order matters. The memory is "uncharged" in each
process when it drops its reference.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Per file OOM badness

2018-01-30 Thread Daniel Vetter
On Tue, Jan 30, 2018 at 10:43:10AM +0100, Michel Dänzer wrote:
> On 2018-01-30 10:31 AM, Daniel Vetter wrote:
> > On Wed, Jan 24, 2018 at 01:11:09PM +0100, Christian König wrote:
> >> Am 24.01.2018 um 12:50 schrieb Michal Hocko:
> >>> On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
>  On 2018-01-24 12:01 PM, Michal Hocko wrote:
> > On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> >>> [...]
> >> 2. If the OOM killer kills a process which is sharing BOs with another
> >> process, this should result in the other process dropping its 
> >> references
> >> to the BOs as well, at which point the memory is released.
> > OK. How exactly are those BOs mapped to the userspace?
>  I'm not sure what you're asking. Userspace mostly uses a GEM handle to
>  refer to a BO. There can also be userspace CPU mappings of the BO's
>  memory, but userspace doesn't need CPU mappings for all BOs and only
>  creates them as needed.
> >>> OK, I guess you have to bear with me some more. This whole stack is a
> >>> complete uknonwn. I am mostly after finding a boundary where you can
> >>> charge the allocated memory to the process so that the oom killer can
> >>> consider it. Is there anything like that? Except for the proposed file
> >>> handle hack?
> >>
> >> Not that I knew of.
> >>
> >> As I said before we need some kind of callback that a process now starts to
> >> use a file descriptor, but without anything from that file descriptor 
> >> mapped
> >> into the address space.
> > 
> > For more context: With DRI3 and wayland the compositor opens the DRM fd
> > and then passes it to the client, which then starts allocating stuff. That
> > makes book-keeping rather annoying.
> 
> Actually, what you're describing is only true for the buffers shared by
> an X server with an X11 compositor. For the actual applications, the
> buffers are created on the client side and then shared with the X server
> / Wayland compositor.
> 
> Anyway, it doesn't really matter. In all cases, the buffers are actually
> used by all parties that are sharing them, so charging the memory to all
> of them is perfectly appropriate.
> 
> 
> > I guess a good first order approximation would be if we simply charge any
> > newly allocated buffers to the process that created them, but that means
> > hanging onto lots of mm_struct pointers since we want to make sure we then
> > release those pages to the right mm again (since the process that drops
> > the last ref might be a totally different one, depending upon how the
> > buffers or DRM fd have been shared).
> > 
> > Would it be ok to hang onto potentially arbitrary mmget references
> > essentially forever? If that's ok I think we can do your process based
> > account (minus a few minor inaccuracies for shared stuff perhaps, but no
> > one cares about that).
> 
> Honestly, I think you and Christian are overthinking this. Let's try
> charging the memory to every process which shares a buffer, and go from
> there.

I'm not concerned about wrongly accounting shared buffers (they don't
matter), but imbalanced accounting. I.e. allocate a buffer in the client,
share it, but then the compositor drops the last reference.

If we store the mm_struct pointer in drm_gem_object, we don't need any
callback from the vfs when fds are shared or anything like that. We can
simply account any newly allocated buffers to the current->mm, and then
store that later for dropping the account for when the gem obj is
released. This would entirely ignore any complications with shared
buffers, which I think we can do because even when we pass the DRM fd to a
different process, the actual buffer allocations are not passed around
like that for private buffers. And private buffers are the only ones that
really matter.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Per file OOM badness

2018-01-30 Thread Christian König

Am 30.01.2018 um 10:43 schrieb Michel Dänzer:

[SNIP]

Would it be ok to hang onto potentially arbitrary mmget references
essentially forever? If that's ok I think we can do your process based
account (minus a few minor inaccuracies for shared stuff perhaps, but no
one cares about that).

Honestly, I think you and Christian are overthinking this. Let's try
charging the memory to every process which shares a buffer, and go from
there.


My problem is that this needs to be bullet prove.

For example imagine an application which allocates a lot of BOs, then 
calls fork() and let the parent process die. The file descriptor lives 
on in the child process, but the memory is not accounted against the child.


Otherwise we would allow easy construction of deny of service problems.

To avoid that I think we need to add something like new file_operations 
callbacks which informs a file descriptor that it is going to be used in 
a new process or stopped to be used in a process.


Regards,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Per file OOM badness

2018-01-30 Thread Michal Hocko
On Tue 30-01-18 10:29:10, Michel Dänzer wrote:
> On 2018-01-24 12:50 PM, Michal Hocko wrote:
> > On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
> >> On 2018-01-24 12:01 PM, Michal Hocko wrote:
> >>> On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> > [...]
>  2. If the OOM killer kills a process which is sharing BOs with another
>  process, this should result in the other process dropping its references
>  to the BOs as well, at which point the memory is released.
> >>>
> >>> OK. How exactly are those BOs mapped to the userspace?
> >>
> >> I'm not sure what you're asking. Userspace mostly uses a GEM handle to
> >> refer to a BO. There can also be userspace CPU mappings of the BO's
> >> memory, but userspace doesn't need CPU mappings for all BOs and only
> >> creates them as needed.
> > 
> > OK, I guess you have to bear with me some more. This whole stack is a
> > complete uknonwn. I am mostly after finding a boundary where you can
> > charge the allocated memory to the process so that the oom killer can
> > consider it. Is there anything like that? Except for the proposed file
> > handle hack?
> 
> How about the other way around: what APIs can we use to charge /
> "uncharge" memory to a process? If we have those, we can experiment with
> different places to call them.

add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES.

-- 
Michal Hocko
SUSE Labs
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Per file OOM badness

2018-01-30 Thread Michel Dänzer
On 2018-01-30 10:31 AM, Daniel Vetter wrote:
> On Wed, Jan 24, 2018 at 01:11:09PM +0100, Christian König wrote:
>> Am 24.01.2018 um 12:50 schrieb Michal Hocko:
>>> On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
 On 2018-01-24 12:01 PM, Michal Hocko wrote:
> On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
>>> [...]
>> 2. If the OOM killer kills a process which is sharing BOs with another
>> process, this should result in the other process dropping its references
>> to the BOs as well, at which point the memory is released.
> OK. How exactly are those BOs mapped to the userspace?
 I'm not sure what you're asking. Userspace mostly uses a GEM handle to
 refer to a BO. There can also be userspace CPU mappings of the BO's
 memory, but userspace doesn't need CPU mappings for all BOs and only
 creates them as needed.
>>> OK, I guess you have to bear with me some more. This whole stack is a
>>> complete uknonwn. I am mostly after finding a boundary where you can
>>> charge the allocated memory to the process so that the oom killer can
>>> consider it. Is there anything like that? Except for the proposed file
>>> handle hack?
>>
>> Not that I knew of.
>>
>> As I said before we need some kind of callback that a process now starts to
>> use a file descriptor, but without anything from that file descriptor mapped
>> into the address space.
> 
> For more context: With DRI3 and wayland the compositor opens the DRM fd
> and then passes it to the client, which then starts allocating stuff. That
> makes book-keeping rather annoying.

Actually, what you're describing is only true for the buffers shared by
an X server with an X11 compositor. For the actual applications, the
buffers are created on the client side and then shared with the X server
/ Wayland compositor.

Anyway, it doesn't really matter. In all cases, the buffers are actually
used by all parties that are sharing them, so charging the memory to all
of them is perfectly appropriate.


> I guess a good first order approximation would be if we simply charge any
> newly allocated buffers to the process that created them, but that means
> hanging onto lots of mm_struct pointers since we want to make sure we then
> release those pages to the right mm again (since the process that drops
> the last ref might be a totally different one, depending upon how the
> buffers or DRM fd have been shared).
> 
> Would it be ok to hang onto potentially arbitrary mmget references
> essentially forever? If that's ok I think we can do your process based
> account (minus a few minor inaccuracies for shared stuff perhaps, but no
> one cares about that).

Honestly, I think you and Christian are overthinking this. Let's try
charging the memory to every process which shares a buffer, and go from
there.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 1/1] drm: add kernel doc for exported gem dmabuf_ops

2018-01-30 Thread Daniel Vetter
On Fri, Jan 19, 2018 at 09:51:20AM +0100, Christian König wrote:
> Am 18.01.2018 um 22:44 schrieb Samuel Li:
> > Signed-off-by: Samuel Li 
> > Reviewed-by: Daniel Vetter 

Thanks for updating the docs.
> 
> Reviewed-by: Christian König 

I'm assuming you'll als push this one.
-Daniel

> 
> > ---
> >   drivers/gpu/drm/drm_prime.c | 88 
> > +
> >   1 file changed, 88 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> > index ca09ce7..e82a976 100644
> > --- a/drivers/gpu/drm/drm_prime.c
> > +++ b/drivers/gpu/drm/drm_prime.c
> > @@ -73,6 +73,9 @@
> >* Drivers should detect this situation and return back the gem object
> >* from the dma-buf private.  Prime will do this automatically for 
> > drivers that
> >* use the drm_gem_prime_{import,export} helpers.
> > + *
> > + * GEM struct &dma_buf_ops symbols are now exported. They can be resued by
> > + * drivers which implement GEM interface.
> >*/
> >   struct drm_prime_member {
> > @@ -180,6 +183,18 @@ static int drm_prime_lookup_buf_handle(struct 
> > drm_prime_file_private *prime_fpri
> > return -ENOENT;
> >   }
> > +/**
> > + * drm_gem_map_attach - dma_buf attach implementation for GEM
> > + * @dma_buf: buffer to attach device to
> > + * @target_dev: not used
> > + * @attach: buffer attachment data
> > + *
> > + * Allocates &drm_prime_attachment and calls &drm_driver.gem_prime_pin for
> > + * device specific attachment. This can be used as the &dma_buf_ops.attach
> > + * callback.
> > + *
> > + * Returns 0 on success, negative error code on failure.
> > + */
> >   int drm_gem_map_attach(struct dma_buf *dma_buf, struct device *target_dev,
> >struct dma_buf_attachment *attach)
> >   {
> > @@ -201,6 +216,14 @@ int drm_gem_map_attach(struct dma_buf *dma_buf, struct 
> > device *target_dev,
> >   }
> >   EXPORT_SYMBOL(drm_gem_map_attach);
> > +/**
> > + * drm_gem_map_detach - dma_buf detach implementation for GEM
> > + * @dma_buf: buffer to detach from
> > + * @attach: attachment to be detached
> > + *
> > + * Cleans up &dma_buf_attachment. This can be used as the 
> > &dma_buf_ops.detach
> > + * callback.
> > + */
> >   void drm_gem_map_detach(struct dma_buf *dma_buf,
> > struct dma_buf_attachment *attach)
> >   {
> > @@ -255,6 +278,18 @@ void drm_prime_remove_buf_handle_locked(struct 
> > drm_prime_file_private *prime_fpr
> > }
> >   }
> > +/**
> > + * drm_gem_map_dma_buf - map_dma_buf implementation for GEM
> > + * @attach: attachment whose scatterlist is to be returned
> > + * @dir: direction of DMA transfer
> > + *
> > + * Calls &drm_driver.gem_prime_get_sg_table and then maps the scatterlist. 
> > This
> > + * can be used as the &dma_buf_ops.map_dma_buf callback.
> > + *
> > + * Returns sg_table containing the scatterlist to be returned; returns 
> > ERR_PTR
> > + * on error. May return -EINTR if it is interrupted by a signal.
> > + */
> > +
> >   struct sg_table *drm_gem_map_dma_buf(struct dma_buf_attachment *attach,
> >  enum dma_data_direction dir)
> >   {
> > @@ -294,6 +329,12 @@ struct sg_table *drm_gem_map_dma_buf(struct 
> > dma_buf_attachment *attach,
> >   }
> >   EXPORT_SYMBOL(drm_gem_map_dma_buf);
> > +/**
> > + * drm_gem_unmap_dma_buf - unmap_dma_buf implementation for GEM
> > + *
> > + * Not implemented. The unmap is done at drm_gem_map_detach().  This can be
> > + * used as the &dma_buf_ops.unmap_dma_buf callback.
> > + */
> >   void drm_gem_unmap_dma_buf(struct dma_buf_attachment *attach,
> >struct sg_table *sgt,
> >enum dma_data_direction dir)
> > @@ -351,6 +392,15 @@ void drm_gem_dmabuf_release(struct dma_buf *dma_buf)
> >   }
> >   EXPORT_SYMBOL(drm_gem_dmabuf_release);
> > +/**
> > + * drm_gem_dmabuf_vmap - dma_buf vmap implementation for GEM
> > + * @dma_buf: buffer to be mapped
> > + *
> > + * Sets up a kernel virtual mapping. This can be used as the 
> > &dma_buf_ops.vmap
> > + * callback.
> > + *
> > + * Returns the kernel virtual address.
> > + */
> >   void *drm_gem_dmabuf_vmap(struct dma_buf *dma_buf)
> >   {
> > struct drm_gem_object *obj = dma_buf->priv;
> > @@ -360,6 +410,14 @@ void *drm_gem_dmabuf_vmap(struct dma_buf *dma_buf)
> >   }
> >   EXPORT_SYMBOL(drm_gem_dmabuf_vmap);
> > +/**
> > + * drm_gem_dmabuf_vunmap - dma_buf vunmap implementation for GEM
> > + * @dma_buf: buffer to be unmapped
> > + * @vaddr: the virtual address of the buffer
> > + *
> > + * Releases a kernel virtual mapping. This can be used as the
> > + * &dma_buf_ops.vunmap callback.
> > + */
> >   void drm_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr)
> >   {
> > struct drm_gem_object *obj = dma_buf->priv;
> > @@ -369,6 +427,11 @@ void drm_gem_dmabuf_vunmap(struct dma_buf *dma_buf, 
> > void *vaddr)
> >   }
> >   EXPORT_SYMBOL(drm_gem_dmabuf_vunmap);
> > +/**
> > + * drm_gem_dmabuf_kmap_atomic - map_atomic implemen

Re: [RFC] Per file OOM badness

2018-01-30 Thread Daniel Vetter
On Wed, Jan 24, 2018 at 01:11:09PM +0100, Christian König wrote:
> Am 24.01.2018 um 12:50 schrieb Michal Hocko:
> > On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
> > > On 2018-01-24 12:01 PM, Michal Hocko wrote:
> > > > On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> > [...]
> > > > > 2. If the OOM killer kills a process which is sharing BOs with another
> > > > > process, this should result in the other process dropping its 
> > > > > references
> > > > > to the BOs as well, at which point the memory is released.
> > > > OK. How exactly are those BOs mapped to the userspace?
> > > I'm not sure what you're asking. Userspace mostly uses a GEM handle to
> > > refer to a BO. There can also be userspace CPU mappings of the BO's
> > > memory, but userspace doesn't need CPU mappings for all BOs and only
> > > creates them as needed.
> > OK, I guess you have to bear with me some more. This whole stack is a
> > complete uknonwn. I am mostly after finding a boundary where you can
> > charge the allocated memory to the process so that the oom killer can
> > consider it. Is there anything like that? Except for the proposed file
> > handle hack?
> 
> Not that I knew of.
> 
> As I said before we need some kind of callback that a process now starts to
> use a file descriptor, but without anything from that file descriptor mapped
> into the address space.

For more context: With DRI3 and wayland the compositor opens the DRM fd
and then passes it to the client, which then starts allocating stuff. That
makes book-keeping rather annoying.

I guess a good first order approximation would be if we simply charge any
newly allocated buffers to the process that created them, but that means
hanging onto lots of mm_struct pointers since we want to make sure we then
release those pages to the right mm again (since the process that drops
the last ref might be a totally different one, depending upon how the
buffers or DRM fd have been shared).

Would it be ok to hang onto potentially arbitrary mmget references
essentially forever? If that's ok I think we can do your process based
account (minus a few minor inaccuracies for shared stuff perhaps, but no
one cares about that).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Per file OOM badness

2018-01-30 Thread Michel Dänzer
On 2018-01-24 12:50 PM, Michal Hocko wrote:
> On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
>> On 2018-01-24 12:01 PM, Michal Hocko wrote:
>>> On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> [...]
 2. If the OOM killer kills a process which is sharing BOs with another
 process, this should result in the other process dropping its references
 to the BOs as well, at which point the memory is released.
>>>
>>> OK. How exactly are those BOs mapped to the userspace?
>>
>> I'm not sure what you're asking. Userspace mostly uses a GEM handle to
>> refer to a BO. There can also be userspace CPU mappings of the BO's
>> memory, but userspace doesn't need CPU mappings for all BOs and only
>> creates them as needed.
> 
> OK, I guess you have to bear with me some more. This whole stack is a
> complete uknonwn. I am mostly after finding a boundary where you can
> charge the allocated memory to the process so that the oom killer can
> consider it. Is there anything like that? Except for the proposed file
> handle hack?

How about the other way around: what APIs can we use to charge /
"uncharge" memory to a process? If we have those, we can experiment with
different places to call them.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 4/4] drm/amdgpu: Use drm_oom_badness for amdgpu.

2018-01-30 Thread Daniel Vetter
On Thu, Jan 18, 2018 at 11:47:52AM -0500, Andrey Grodzovsky wrote:
> Signed-off-by: Andrey Grodzovsky 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 46a0c93..6a733cdc8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -828,6 +828,7 @@ static const struct file_operations 
> amdgpu_driver_kms_fops = {
>  #ifdef CONFIG_COMPAT
>   .compat_ioctl = amdgpu_kms_compat_ioctl,
>  #endif
> + .oom_file_badness = drm_oom_badness,

Would be neat if we could roll this out for all gem drivers (once it's no
longer an RFC ofc).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 07/25] drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support

2018-01-30 Thread Christian König

Am 29.01.2018 um 21:25 schrieb Felix Kuehling:

On 2018-01-29 06:42 AM, Christian König wrote:

Am 29.01.2018 um 00:02 schrieb Felix Kuehling:

On 2018-01-27 04:19 AM, Christian König wrote:

Am 27.01.2018 um 02:09 schrieb Felix Kuehling:

Add GPUVM size and DRM render node. Also add function to query the
VMID mask to avoid hard-coding it in multiple places later.

Signed-off-by: Felix Kuehling 
---
    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  | 19
+--
    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h  |  2 ++
    drivers/gpu/drm/amd/include/kgd_kfd_interface.h |  6 ++
    3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index c9f204d..294c467 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -30,6 +30,8 @@
    const struct kgd2kfd_calls *kgd2kfd;
    bool (*kgd2kfd_init_p)(unsigned int, const struct kgd2kfd_calls**);
    +static const unsigned int compute_vmid_bitmap = 0xFF00;
+
    int amdgpu_amdkfd_init(void)
    {
    int ret;
@@ -137,9 +139,12 @@ void amdgpu_amdkfd_device_init(struct
amdgpu_device *adev)
    int last_valid_bit;
    if (adev->kfd) {
    struct kgd2kfd_shared_resources gpu_resources = {
-    .compute_vmid_bitmap = 0xFF00,
+    .compute_vmid_bitmap = compute_vmid_bitmap,
    .num_pipe_per_mec = adev->gfx.mec.num_pipe_per_mec,
-    .num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe
+    .num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe,
+    .gpuvm_size = adev->vm_manager.max_pfn
+    << AMDGPU_GPU_PAGE_SHIFT,

That most likely doesn't work as intended on Vega10. The address space
is divided into an upper and a lower range, but max_pfn includes both.

I suggest to use something like min(adev->vm_manager.max_pfn <<
AMDGPU_GPU_PAGE_SHIFT, AMDGPU_VM_HOLE_START).

I think this is fine as it is. This just tells the Thunk the size of the
virtual address space supported by the GPU. Currently the Thunk only
uses 40-bits for SVM. But eventually it will be able to use the entire
47 bits of user address space. Any excess address space will just go
unused.

I'm also wondering how universal the split 48-bit virtual address space
layout is. Even for x86_64 there seems to be a 5-level page table layout
that supports 56 bits of user mode addresses
(Documentation/x86/x86_64/mm.txt). AArch64 seems to support 48-bit user
mode addresses (Documentation/arm64/memory.txt). I haven't found similar
information for PowerPC yet.

We should avoid coding too much architecture-specific logic into this
driver that's supposed to support other architectures as well. I should
also review the aperture placement with bigger user mode address spaces
in mind.

And that is exactly the reason why I've suggested to change that.

See the split between lower and upper range is not related to the CPU
configuration, instead it is a property of our GPU setup and hardware
generation.

How exactly does the GPUVM hardware behave when it sees a virtual
address "in the hole". Does it generate a VM fault? Or does it simply
ignore the high bits?


At least on the Vega10 system I've tested you get a range fault when you 
try to access the hole.



If it ignored the high bits, it would work OK for both x86_64 (with a
split address space) and ARM64 (with a full 48-bit user mode virtual
address space).

On the other hand, if addresses in the hole generate a VM fault, then I
agree with you, and we should only report 47-bits of virtual address
space to user mode for SVM purposes.


Yes, according to my testing exactly that is the case here.

The hardware documentation is a bit ambivalent. So I initially thought 
as well that the high bits are just ignored, but that doesn't seem to be 
the case.


Regards,
Christian.



Regards,
   Felix


So we should either split it in gpuvm_size_low/gpuvm_size_high or to
just use the lower range and limit the value as suggested above.

This should avoid having any GPU generation and CPU configuration
specific logic in the common interface.

Regards,
Christian.


Regards,
    Felix

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2] drm/amdgpu: Redefine the bit PP_OVERDRIVE_MASK for PP_FEATURES

2018-01-30 Thread Rex Zhu
For feature enabled by default, define its bit mask from low bit
For feature disabled by default, define its bit mask from high bit.

Change-Id: Iaa9ea6306a57b73bd3481fa152f0f01794a88427
Signed-off-by: Rex Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   | 2 +-
 drivers/gpu/drm/amd/powerplay/inc/hwmgr.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index d1a6958..7f08777 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -121,7 +121,7 @@
 uint amdgpu_sdma_phase_quantum = 32;
 char *amdgpu_disable_cu = NULL;
 char *amdgpu_virtual_display = NULL;
-uint amdgpu_pp_feature_mask = 0x3fff;
+uint amdgpu_pp_feature_mask = 0x7fff;
 int amdgpu_ngg = 0;
 int amdgpu_prim_buf_per_se = 0;
 int amdgpu_pos_buf_per_se = 0;
diff --git a/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h 
b/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h
index 5512dc2..3ad45ac 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/hwmgr.h
@@ -84,7 +84,7 @@ enum PP_FEATURE_MASK {
PP_OD_FUZZY_FAN_CONTROL_MASK = 0x800,
PP_SOCCLK_DPM_MASK = 0x1000,
PP_DCEFCLK_DPM_MASK = 0x2000,
-   PP_OVERDRIVE_MASK = 0x4000,
+   PP_OVERDRIVE_MASK = 0x8000, /* disabled by default */
 };
 
 enum PHM_BackEnd_Magic {
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx